Contact Metric Server


Fast protein similarity search and functional annotation across the PDB



Synopsis

The Contact Metric Server is a web-based tool for searching structurally similar proteins in the RCSB Protein Data Bank (PDB) by applying the contact metric method [1], and to use the results for automated functional annotation of proteins with Gene Ontology (GO) terms.

Input

The input (query structure) is assigned through the PDB identifier which consists of four alphanumeric characters; additionally, a polypeptide chain identifier is required resulting in an input string of five characters. For example, PDB entry 1F88 chain A leads to '1F88A'. The input is not case sensitive and for PDB files without chain identifier the underscore '_' must be given, e.g., the query structure PDB 1UBQ may have inputs '1UBQ_', or '1ubq_'.

Process

The similarity search starts after the PDB identifier has been submitted. If the query's PDB identifier could not be found on the server's list of precompiled PDB structures, the server will try to access the PDB and to download the invoked structure. The search process takes around ~30 seconds to finish, and it is independent of the complexity (chain length, number of secondary structure elements, etc.) of the query. The procedure uses the so-called length corrected contact metric (LCM) to calculate distances [1] between the query and every filed PDB chain. During the run, some progress info will be written out.

Output

The output is a html file (e.g., 2imjA.html) which contains a table of the fifty most similar (according to their contact metric distances) PDB chains to the query. The table header gives information about the run: 1) number of PDB chains used, 2) version of the Gene Ontology annotation list for PDB structures (GOA) 3) version of the CATH release 4) output date and time. Beginning with the input structure (rank 0), the following rows are divided into eight cells: 1) rank number, 2) LCM distance between query and the given structure with distance range between 0 and a maximum of 1, 3) sequence identity between query and given amino acid sequence, 4) PDB identifier with hyperlink to PDB record, 5) GO 'Molecular Function' terms, 6) GO 'Biological Process' terms, 7) GO 'Cellular Component' terms, 8) CATH structural classification of the structure.

Matches to structures with an LCM distance below 0.15 are considered statistically significant (99.7% c.l.) [1]. This is visually represented through grey background color for all rows with entries having larger distances than 0.15.

If no GO terms are available for the query, the Contact Metric Server attempts an automated inference procedure based on all GO terms among significant structures. Using a distance weighted majority rule, GO functional terms are inferred with a probability p assignment to the query structure, and the output is given in red color. If no significant matches with annotated GO terms are available, the annotation procedure fails.

Citation

If you intend to use the Contact Metric server as part of a published work, please cite the following paper:
Lisewski, A.M. and O. Lichtarge. "Rapid detection of similarity in protein structure and function through contact metric distances." Nucl. Acids Res. 2006 34: e152; doi:10.1093/nar/gkl788.




[1] Lisewski, A.M. and Lichtarge, O. Rapid detection of similarity in protein structure and function through contact metric distances, Nucleic Acids Research, 34:e152 (2006)