Genome-wide structural and functional protein characterization by ab initio protein structure prediction

Research output: ThesisDoctoral Thesis (compilation)


Very little is known about a considerable part of all proteins and it is time consuming

and expensive to study each individual protein to determine its function,

structure and cellular role. Proteins retain structural, functional and sequential

characteristics from ancestral proteins and hence two proteins that share a common

ancestor, i.e. are homologs, will to some extent have similar sequence,

structure and function. One way to learn something about a protein is to identify

its homologous and use information from those homologs to annotate the protein

of interest. Close homologs with a common ancestor can be detected using sequence

alone, but more distant homologs cannot. Structure is more conserved

than sequence and enables detection of a common ancestor between more distantly

related proteins and thereby also enabling transfer of information to a

larger fraction of the uncharacterized proteins. This thesis covers my efforts to

develop a method to use ab initio protein structure prediction to detect distant

homologs and use the homologs to annotate proteins from the genome of Saccharomyces


The ab initio protein structure prediction software used in this thesis, Rosetta,

can predict a protein's tertiary structure using the amino acid sequence alone.

Rosetta works by reducing the search space by approximating the local conformation

with conformations from the protein data bank, and judging the over all

fitness of the simulated protein structure through a statistically derived energy

function. The program has been successful in the last three Critical assessment of

techniques for protein structure prediction (CASP) and the results from the last


CASP is reported in Paper I. Distant homologs can be detected by comparing the

structures generated by Rosetta with structures from the Protein Data Bank

(PDB). In general, however, such a comparison is noisy, that is, gives many answers,

of which only a few are correct. The noise can be filtered out by utilizing

the fact that there is a strong relationship between protein function and protein

structure, and either use functional information from a database or infer functional

information from one or more experimental high-throughput technologies.

This idea was tested in Paper II were 100 proteins were investigated using protein

structure prediction, yeast two hybrid, fluorescent microscopy and mass

spectrometry. The data from all four technologies was integrated and 77% of the

proteins were assigned a function.

Data integration is very labor-intensive when done by hand, and the amount of

information generated for each protein investigated is substantial. Everything

needs to be automated and all data have to be stored and managed in an efficient

way to be able to apply this technology on a genome-wide scale. Paper III and

Paper IV cover information management, that is, how the data used and produced

in the project is organized and stored. Paper V reports both how we automated

the integration process using the software described in Paper I and II and

the application of the technology to the genome of Saccharomyces cerevisiae.


Research areas and keywords

Subject classification (UKÄ) – MANDATORY

  • Medical Engineering


  • Biomedicinska vetenskaper, Biomedical sciences, Saccharymyces cerevisiae, Ab initio protein structure prediction, Protein annotation
Original languageEnglish
Awarding Institution
Supervisors/Assistant supervisor
Award date2005 Dec 16
  • Department of Electrical Measurements, Lund University
Print ISBNs91-628-6689-3
Publication statusPublished - 2005
Publication categoryResearch

Bibliographic note

Defence details Date: 2005-12-16 Time: 10:15 Place: Room E:1406, E-building, Ole Römers väg 3, Lund Institute of Technology External reviewer(s) Name: Fenyö, David Title: Dr Affiliation: The Rockefeller University, New York, New York, USA ---