Incidence of "quasi-ditags" in catalogs generated by Serial Analysis of Gene Expression (SAGE)

Sergey Anisimov, AA Sharov

Research output: Contribution to journalArticlepeer-review

Abstract

Background: Serial Analysis of Gene Expression (SAGE) is a functional genomic technique that quantitatively analyzes the cellular transcriptome. The analysis of SAGE libraries relies on the identification of ditags from sequencing files; however, the software used to examine SAGE libraries cannot distinguish between authentic versus false ditags ("quasi-ditags"). Results: We provide examples of quasi-ditags that originate from cloning and sequencing artifacts (i.e. genomic contamination or random combinations of nucleotides) that are included in SAGE libraries. We have employed a mathematical model to predict the frequency of quasi-ditags in random nucleotide sequences, and our data show that clones containing less than or equal to 2 ditags (which include chromosomal cloning artifacts) should be excluded from the analysis of SAGE catalogs. Conclusions: Cloning and sequencing artifacts contaminating SAGE libraries could be eliminated using simple pre-screening procedure to increase the reliability of the data.
Original languageEnglish
JournalBMC Bioinformatics
Volume5
DOIs
Publication statusPublished - 2004

Bibliographical note

The information about affiliations in this record was updated in December 2015.
The record was previously connected to the following departments: Neuronal Survival (013212041)

Subject classification (UKÄ)

  • Bioinformatics and Systems Biology

Fingerprint

Dive into the research topics of 'Incidence of "quasi-ditags" in catalogs generated by Serial Analysis of Gene Expression (SAGE)'. Together they form a unique fingerprint.

Cite this