TOPS motif database

TOPStructure
TOPS motif search in protein secondary structure

About the TOPS project

The TOPS project is aimed at the representation/analysis of protein secondary structure at the topological level. TOPS diagrams have been introduced by D.R.Westhead, D.R.Gilbert, J.M.Thornton, D.C.Hatton and T.P.J.Flores as a formalisation of "protein cartoons" that informally have been used by biologists for some time. Currently TOPS diagrams preserve information about secondary structure elements (strands, helices), hydrogen bonds (at the strand level) and spatial orientation (as a set of chiralities). Whilst such description sometimes may be too simplistic, it has the advantage that search and comparison at the TOPS lavel can be performed much faster than using other representations (eg atomic coordinates etc.). For more detailed information visit one of the TOPS pages referenced in Links/Contacts section.

TOPS motifs

TOPS motifs are fragments of TOPS diagrams that are shared by several proteins (domains) believed to have some biological relationship together with some biological annotaion. Appearance of such motif in unfamiliar protein can suggest its similarity to the other proteins that share the same motif. The trustworthiness of such conclusion may vary, however, it may be quite high if the motif that has been found has few or none "false" matches among the familiar proteins. The current motif database is based upon CATH classification, i.e. as biologically similar are considered proteins that share the same CATH number prefix. Hence the motifs can be used as a tool of fast prediction of CATH number.
The predictive power of this approach significantly improves, if instead of looking for presence or absence of just a particular motif, the unfamiliar protein is characterized by two sets of motifs (so called profile) - "positive" i.e. motifs that are present in this protein and "negative" i.e. motifs that are absent. The profile then is compared to the profiles of known classes and the prediction is based on the results of such comparison. Unfortunately, the profile method does not work that well, if the submitted structure has been split into domains differently than structures which were used for profile construction. One way to deal with this problem is to analyze all possible sub-domains of the given structure and to report the ones, which give the best matches.

Available services

    Three different comparison services are available. These allow the submission of protein secondary structure in PDB format and then perform the search of known topology motifs in the submitted structure.
    The Best Motif search looks for "good" motif, which is likely to indicate that protein belongs to a particular group. As the result the found topology motifs are displayed together with their CATH number prefix (sorry, no more biological annotation so far) and estimated likelihood that the prediction is correct (the percentage of known "false" matches for the motif). Only motifs with likelihood at least 5% are being searched for. As a rule, the more complicated secondary structure for the submitted protein, the more likely that "good" motifs for it will be found. However, the probability that you will get any result (for randomly chosen protein) is only about 1/3.
    The Profile search constructs the profile of motifs for the submitted protein, which then is compared to the profiles of known groups. As the result the prefixes of predicted CATH groups and likelihood estimations of these predictions are displayed. In principle you can expect better predictions (and are more likely to get any results at all) with this search method.
    The Profile search with domain finding performs the profile search for all possible sub-domains of the submitted structure (i.e. for each pair of SSEs it is assumed that there is a sub-domain with endpoints at these SSEs; sub-domains with length 1 are not considered). As the result the sub-domains, which give the best predictions, are displayed together with the corresponding predictions.

Links/Contacts

For more information about TOPS projects you can visit one of the following pages: Topology Of Protein Structure, Protein Topology Home Page and Bioinformatics Research Center at Glasgow University. The TOPS motif search page is located at the Institute of Mathematics and Computer Science, University of Latvia and currently maintained by Juris Viksna.

Software

The comparison service uses the DSSP program for determination of secondary structure from PDB file (developed by W. Kabsch and C. Sander) and TOPS program for construction of TOPS diagram from the secondary structure (developed by T.Flores and D.Westhead). The software for topology motif search in TOPS diagrams has been developed by Juris Viksna.