Database design for biomolecular data

We are involved in database desing for a number of collaborative European Projects: MolPAGE, ENGAGE, CAGEKID. The focus is on development of Data Werahouse for integrated storing and analysis of data obtained by different experimental techniques from the same data samples. The project also involves a development of a system for annotation and storage of data for management of biomedical samples as well as assay data produced for these samples. The system contains two main components - SIMS (patient and Sample Information Management System) and AIMS (Assay Information Management System) that are inetegrated within SIMBIOMS open source project.

Motif search in protein structures

Motifs are fragments of protein structures that are shared by several proteins (domains) believed to have some biological relationship together with some biological annotaion. Appearance of such motif in unfamiliar protein can suggest its similarity to the other proteins that share the same motif. The trustworthiness of such conclusion may vary, however, it may be quite high if the motif that has been found has few or none "false" matches among the familiar proteins.

We are studying motifs that are based on protein "topology" (so called TOPS diagrams) and have developed a motif database on the basis of CATH classification, as well as tools for searching for these motifs in protein structures (given as PDB files).

Evolution of protein structures

Whilst there are well developed models describing evolution of protein sequences and, as a rule, the comparison of protein sequences is done according to these models, the situation is quite different for protein structures. Traditional structure comparison methods rely only on measuring the distances between the elements of two structures, without taking into account that there might be different likelihoods for different types of structural changes might occur. Still, there are some studies on evolution of protein structures and several types of structural changes (such as transformation of a loop into a helix or vice versa etc) have been proposed and have been motivated by particular examples.

We are studying types of structural changes that occur at "topological level" and are trying to estimate the probabilities with which different structural changes in proteins might occur.

Modelling of gene regulatory networks

Various mathematical models describing gene regulatory networks as well as algorithms for network reconstruction from experimental data have been a subject of intense studies, largely motivated by the current availability of high-throughput experimental data. Our research is focused around finite state linear model (FSLM) proposed by A.Brazma. The model incorporates biologically intuitive gene regulatory mechanism similar to that in Boolean networks, but can describe also the continuous changes in protein concentrations.

We have studied properties of FSLM network dynamics and have shown that the problem whether a concrete gene will reach an active state in general is algorithmically unsolvable. Thus, we can conclude that the model incorporates networks with acyclic dynamics, and such problems as the equivalence of two networks in general are algorithmically unsolvable. However, concrete networks show certain regularity and usually slight changes in initial conditions cannot shift network behaviour radically. One of the most interesting questions when considering some network is identification of stable regions and attractors.

We have developed symbolic methods for analysis of the dynamics of FSLM networks that provide safe approximation of all network behaviours. These methods allow to find potential attractors in gene regulatory network and to draw conclusions about stable parts in gene regulatory networks. Experiments with lambda phage network showed that our methods can identify biologically meaningful attractors that correspond to characteristic lambda phage behaviours - lysis and lysogeny.