GlycoWord / Glycotechnology-C10

Application of Bioinformatics to Glycoresearch: Glycoinformatics

The realm of bioinformatics has markedly advanced to analyze huge amounts of genome sequence data which are the blueprints of life. To understand systematically the functions of glycans as informational molecules not coded in DNA, it is necessary to construct databases by collecting basic data of glycans, genes for proteins which participate in synthesis or degradation of glycans, and molecules interactive with glycans in order to obtain useful information from the database. Constructing databases is proceeding internationally in the glycoscience field. Researchers can browse and retrieve data in the database and withdraw information concerning the classification of glycans, molecular evolution (see The Evolutionary History of Glycosyltransferase Genes), and relationship with hereditary diseases, etc., using the statistical approaches of bioinformatics so as to make the best of for a particular research project. Moreover, such an approach may lead to the prediction and discovery of novel substances, properties, or dynamic changes in the system, resulting in great advances in the glycoscience field. Some of the public databases focusing on glycans are introduced here with their characteristics and applications.

Glycan Structure Database
The Kyoto Encyclopedia of Genes and Genomes (KEGG)/GLYCAN utilized the graph-theoretic approach to draw glycan structures which enabled scoring of structural homologies between glycans and searching for similar glycan structures. The direct link with the KEGG Pathway database makes information on biosynthetic and metabolic pathways of glycans and participating enzymes accessible to users. Utilizing the bioinformatics approach, a repertoire of glycan structures of the organism has been predicted from the expression profiles of glycosyltransferases in the transcriptome and glycan-related pathways (1). Moreover, CarbBank (CCSD, now discontinued) which is the first worldwide carbohydrate database is available via several databases including KEGG. LIPIDBANK for Web provides information on related glycolipids, biological activities, genes, etc., by inputting a glycan structure.

Glycan-related enzymes and carbohydrate-binding proteins
The CAZy database (CAZy) describes the families of catalytic and carbohydrate-binding modules or functional domains of enzymes that degrade, modify, or create glycosidic bonds. The Glycogene Database (GGDB) provides comprehensive information on human glycogenes focusing on glycosyltransferases and sugar transporters. Based on these databases, the prediction of novel genes (in silico cloning) has been done and the common characteristics among carbohydrate-binding modules has been drawn out from bioinformatic analyses of glycosyltransferase families (2). The 3D lectin database consists of information on lectins from various origins. Every database is linked to versatile sites such as PDB (protein data bank) to offer a wide range of related information on proteins and genes.

Consortium for Functional Glycomics is a research initiative to understand the role of carbohydrate-protein interactions at the cell surface in cell-cell communication. It is divided into four categories, i.e., Central (glycan mass data, lectin-ligand interactions, mouse phenotype, glycan profiling of tissue, glycogene, etc.), CBP (carbohydrate-binding proteins), GT (glycosyltransferase) and Glycan (structure and biological activities), and also provides resources such as glycan array, glycogene chips, glycosyltransferases and mutant animals (3).

There are many open databases useful for glycoscientists besides the above, and their number and mutual links among them are increasing. Research will be greatly promoted by additions to the database and using the predictions made by computer calculation which should be performed by various researchers in experimental and bioinformatics fields cooperating with each other. The incorporation of glycoinformatics into systems glycobiology where the bioorganisms are viewed as biomolecular networks will open a paradigm of understanding complex glycan functions in the cell, organ, and individual organism.

Figure The understanding of the complex interaction of all levels of biological glycan information

Hiroko Takekawa and Haruko Ogawa
(Ochanomizu University, Graduate School, and the Glycoscience Institute)

References (1) Kawano S, Hashimoto K, Miyama T, Goto S, Kanehisa M: Prediction of glycan structures from gene expression data based on glycosyltransferase reactions. Bioinformatics, 21, 3976-3982, 2005

(2) Narimatsu H: Construction of a human glycogene library and comprehensive functional analysis. Glycoconj J, 21, 17-24, 2004

(3) Comelli EM, Head SR, Gilmartin T, Whisenant T, Haslam SM, North SJ, Wong NK, Kudo T, Narimatsu H, Esko JD, Drickamer K, Dell A, Paulson JC: A Focused Microarray Approach to Functional Glycomics: Transcriptional Regulation of the Glycome. Glycobiology , 16, 117-131, 2006

Jan. 31, 2006