A domain-peptide interaction analysisWe developed a methodology to represent and analyze the interactions between Protein Recognition Modules and short peptides. The interaction surface is characterized by the amino acids in physical contact and a contact matrix that identifies the pairs of interacting residues (PAIRs) of the domain and the peptide respectively.
The contact positions identified on the domain and peptide sequences, and the pair of residues in physical contacts, represent the features by which we intend to discriminate interactions from non-interactions. On a dataset of yeast SH3 domain-peptide interactions, we evaluated the two distributions of PAIRs of each contact position, with respect to positive (interacting domain-peptide) and negative (non-interacting domain-peptide) cases.
From the comparison of the two distributions the significance of each contact position and of each amino acids pair in the formation of the complex can be measured. We defined a score that expresses this significance, such that each interaction can be represented as the set of scores corresponding to the set of PAIRs that characterize the interaction itself.
Finally we consider the set of scores as inputs of a neural network that will classify domain-peptide pairs as interacting or not.
People involved: Enrico Ferraro, Allegra Via, Gabriele Ausiello and Manuela Helmer-Citterich.
Identification and characterization of recurrent substructures in protein-protein interfacesThe aim of this project is to decompose protein-protein interaction surfaces in small 3D fragments in order to identify patches of residues which are eventually reused in different protein interfaces.
A local structural comparison algorithm is used to identify similar substructures in a non-redundant dataset of interfaces extracted from the PDB. The output of this first phase is a list of fragment pairs located in different protein- protein interfaces and sharing a similar three dimensional structure.
The next step is to identify correspondences, if they exist, in their respective interaction partners.
The main issues we want to address are:
1) Do similar substructures interact with similar partners?
2) Do certain 3d patterns consistently interact with the same set of residues?
3) Given a certain fragment, can we identify its preferred interaction partner in a list of possible candidates?
Once this dataset has been constructed, various applicatons, especially related to protein docking, can be envisaged. As an example one could use the identification of recurring substructures as a quality indicator in ranking predicted protein complexes.
People involved: Federico Gherardini, Gabriele Ausiello and Manuela Helmer-Citterich.
Improving 1D and 3D patterns via alignments of protein structuresWe built a web server for the semi-automated construction of sequence and structural patterns of protein residues. It analyzes recursively multiple alignments of protein structures by looking for structural and biochemical similarities between pairs of amino acids. Conserved residues are identified and used for deriving refined sequence or structural patterns. The server, with related documentation, is available at http://surface.bio.uniroma2.it/3dProfile/
People involved: Allegra Via, Daniele Peluso and Manuela Helmer-Citterich.
Mapping OMIM mutations onto PDB structuresoThe OMIM database is a collection of hereditary point mutations associated to diseases in Homo sapiens. In order to provide new insights into the molecular mechanisms that cause hereditary diseases, we have performed an accurate mapping of OMIM mutations onto 3D protein structures using Seq2Struct, a database of highly reliable sequence-structure links.
The resulting information will be added to the Pdbfun webserver as a new annotation at the residue level, and made publicly available to the scientific community.
People involved: Daniele Peluso, Allegra Via, Manuela Helmer-Citterich
Seq2Struct: a resource for establishing sequence-structure linksSeveral methods for establishing cross-links between Protein Data Bank (PDB) structures or Structural Classification of Proteins (SCOP) domains and Swiss-Prot + TrEMBL sequences (or vice versa) rely on database annotations.
Alternatively, sequence alignment procedures can be used. In this study, we describe Seq2Struct, a web resource for the identification of sequence-structure links. The resource consists of an exhaustive collection of annotated links between Swiss-Prot + TrEMBL and PDB + SCOP database entries. Links are based on pre-established highly reliable thresholds and stored in a relational database, which has been enhanced using annotations derived from Swiss-Prot, PDB, SCOP, GOA and DSSP databases. The Seq2Struct resource, is available at http://surface.bio.uniroma2.it/seq2struct/.
People involved: Allegra Via, Andreas Zanzoni and Manuela Helmer-Citterich.
Statistical study of false occurrences of functional motifs on protein sequencesThe detection of a functional motif in yet uncharacterized protein sequences is a well-established method for assigning function to proteins. A critical problem, however, concerns the evaluation of the false prediction rate of a motif in sequence databases, i.e. the significance of finding a motif in several proteins.
The number of false positive (FP) matches of a pattern has been often assessed from the number of its occurrences expected by chance (E) for the mere aggregation of letters in a database search, as can be calculated from the residues frequency in the database. The relationship between E (expected) and FP (observed), however, has not been thoroughly investigated so far. It is reasonable to expect that the function fitting the set of data (E,FP) is linear, but it is not clear a priori if there are exceptions (i.e. number of false predictions on a biological database sensitively greater or lower than the expected number of hits on the corresponding random database), how frequent they are, and the reason why they occur.
In this work, we carried out a statistical study of such relationship and an analysis of the unexpected behaviours, thus providing insights into the random nature of protein sequences. Our findings suggest diverse fascinating mechanisms and constraints occurring during evolution, which might “regulate” the random appearance of functional motifs in protein sequences.
People involved: Allegra Via, Federico Gherardini, Enrico Ferraro and Manuela Helmer-Citterich.
Structural analysis and comparison of protein phosphorylation sitesThe phosphorylation of specific protein residues is a crucial event in the regulation of several cellular processes. The recent improvement in the experimental identification of phosphoproteins and phosphoresidues has increased dramatically the amount of phosphorylation sites data and the need of computational tools for collecting and analysing this data has grown accordingly.
We have developed a procedure for the annotation and analysis of the three-dimensional structure of experimentally verified protein phosphorylation sites (or instances) retrieved from the phospho.ELM database. For each instance a structural neighbourhood, that we call zone, was defined using a distance criterion. A procedure was implemented in order to annotate each residue belonging to the defined zones with diverse functional information.
Furthermore, we are performing different structural comparisons in order to identify biologically significant local similarities. the objective is to infer which kinase could be responsible of the phosphorylation of a given residue and to identify nteresting candidate 3D motif in common between substrates of the same kinase or kinase families.
All this information will be made publicly available in a dedicated database and also integrated in the Pdbfun web server.
People involved: Andreas Zanzoni, Manuela Helmer-Citterich
SURFACE: a database of protein surface regions for functional annotationThe SURFACE (SUrface Residues and Functions Annotated, Compared and Evaluated, URL: http://cbm.bio.uniroma2.it/surface/) database is the repository of annotated and compared protein surface regions.
SURFACE contains the results of a large-scale protein annotation and local structural comparison project. A non-redundant set of protein chains is used to build a database of protein surface patches, defined as putative surface functional sites. Each patch is annotated with sequence and structure-derived information about function or interaction abilities.
A new procedure for structure comparison is used to perform an all-versus-all patches comparison. Selection of the results obtained with stringent parameters offers a similarity score that can be used to associate different patches and allows reliable annotation by similarity. Annotation exerted through the comparison of regions of protein surface allows to highlight similarities, which cannot be recognized by other methods of sequence or structure comparison. A graphic representation of the surface patches, functional annotations and of the structural superpositions is available through the web interface.
People involved: Fabrizio Ferrè, Gabriele Ausiello, Andreas Zanzoni and Manuela Helmer-Citterich.
The ELM Structural FilterELM [http://elm.eu.org] is a computational biology resource for investigating candidate functional sites in eukarytic proteins. Functional sites which fit to the description "linear motif" are currently specified as patterns using Regular Expression rules. To improve the predictive power, context-based rules and logical filters are being developed and applied to reduce the amount of false positives (cellular context, globular domain, organism, structure) The ELM structural filter makes use of the known three-dimensional information, whenever this is available, for discriminating true from false positive hits. For all ELMs of known structure, data about secondary structure and accessibility are collected in order to build a scoring scheme for the predicted ELMs. When the motifs on the user sequence are analyzed, a prediction of the secondary structure and accessibility of each motif is performed using homology modelling techniques. Based on the comparison between predicted values and data collected from true positive instances, the filter produces a score. The score is correlated to the degree of conservation of the ELM accessibility and secondary structure features across diverse true positive protein structures.
People involved: Allegra Via and Manuela Helmer-Citterich.