Feb 03, 2020

Databases for glycoconjugates
(GlyCosmos Glycoproteins and Glycolipids, GlycoProtDB, GlycoNAVI:TCarp, GlycoPOST)
(Glycoforum. 2020 Vol.23 (1), A2)
DOI: https://doi.org/10.32285/glycoforum.23A2

Issaku Yamada / Kiyohiko Angata / Yu Watanabe / Tamiko Ono

山田 一作

Issaku Yamada
Dr. Yamada completed his doctoral studies at the Tokyo Metropolitan University Graduate School of Engineering in 1997. After working at Nagoya University as a Japan Society for the Promotion of Science special fellow (COE) and elsewhere, he was appointed as a researcher at the Noguchi Institute in 2002 and began to study glycoscience there from 2006. Since then, he has been engaged in glycan informatics research, including glycan structure notation, ontology development, and databases.

安形 清彦

Kiyohiko Angata
After completing his doctoral studies at the University of Tsukuba Graduate School of Life and Environmental Sciences, Dr. Angata studied glycobiology at the La Jolla Cancer Research Foundation (presently the Sanford Burnham Prebys Medical Discovery Institute) under the tutelage of Professor Minoru Fukuda. In his current position at the National Institute of Advanced Industrial Science and Technology of Japan, he is conducting research to analyze glycan gene expression and new glycan functions, to find commercial applications of glycans, and to develop a glycan-related database (ACGG-DB).

渡辺 由

Yu Watanabe
Yu Watanabe was appointed as a technical assistant at the Niigata University Graduate School of Medical and Dental Science in 2013 and has since served as a researcher. She has been in her current position as a specially appointed junior lecturer since 2018. She is now working to develop web-based databases and analytical tools for the life sciences.

小野多美子

Tamiko Ono
Tamiko Ono joined the JST Integration Promotion Program as a research assistant for the Kinoshita Lab in the Soka University Faculty of Science and Engineering in September 2017 and has been engaged in developing the GlyCosmos portal and collecting relevant data.

1. Preface

In this sixth installment of the series, we would like to describe databases and repositories for glycolipids and glycoproteins, which are collectively known as glycoconjugates. Many proteins are glycosylated and correspondingly exhibit a wide variety of functions, and these protein-modifying glycans comprise a wide variety of structures. It is also known that the proteins are modified with different glycan structures depending on diseases and other factors. Here is an introduction to databases for glycoconjugates and mass spectrometry data repositories for glycans and glycoproteins.

2. Databases for glycoconjugates

2-1. GlyCosmos Glycoproteins and Glycolipids

GlyCosmos Glycoproteins is a list of glycoproteins selected as proteins with glycan modification annotations from UniProt 1 (Fig. 1). Shown in the list are protein names, UniProt ID, gene symbols, organisms, and the number of glycosylation sites. MCAW IDs as linked to UniProt IDs in the MCAW-DB 2 are also displayed. Information on the MCAW-DB is available in the “Present Status of Lectin Databases” installment of this series. Protein Name and Gene Symbol text search is available by clicking on Search on the upper right side of the list. The Glycoprotein entry page (Fig. 2) can be accessed by clicking on the Protein Name in the list. The entry page has a wide variety of contents, including Glycosylation Sites, Sequence, Feature, PDB Images, Pathway, MCAW-DB (Glycan Recognition Profile) Image, and Human Protein Atlas (Table 1).

GlyCosmos Glycolipids is a list of glycolipids selected from the LIPID MAPS Structure Database (LMSD) 8 by keyword search for “glyco,” “glycan,” “sugar,” and “saccharolipids.” Select the desired category of glycolipids via the Lipid Classification on the GlyCosmos Glycolipid top page (Fig. 3). The categories are stratified; when the lowermost category is clicked, a list of glycolipids appears. Shown in the list are the category, LIPID MAPS ID (LM ID), common name, systematic name, exact mass, and chemical formula of each glycolipid. Clicking on an LM ID opens the corresponding LIPID MAPS page.

図1
Figure 1. GlyCosmos Glycoproteins top page, where keyword search can be performed via the text field on the upper right.
図2
Figure 2. GlyCosmos Glycoproteins entry page (UniProtID: O00602) displaying a wide variety of information on the selected glycoprotein.
Table 1. Glycoproteins entry page contents
ContentDescription
Glycosylation SitesGlycan modification sites, and, if available, PubMed IDs for literature information, are displayed.
SequenceN-glycosylation Site and Potential Sequon information, along with sequences, are displayed.
FeatureInformation on sequence annotations, including protein domains and amino acid modifications can be visualized. This display is powered by the ProtVista tool 3 (Fig. 4).
PDB ImagesImages of conformations from the Protein Data Bank (PDB) 4. A link to the LiteMol Viewer for 3D visualization 5 is available to allow molecular structures to be identified in more detail (Fig. 5). The LiteMol Viewer shows glycan structures in SNFG format.
PathwayA list of pathways for which reactions are mediated by glycoproteins is displayed. The pathways shown here have been extracted from the Reactome database 6. When the desired pathway is clicked, the GlyCosmos Pathways page is opened and detailed information on the pathway can be found.
MCAW-DB (Glycan Recognition Profile) ImageMCAW-DB alignment results are displayed.
Human Protein AtlasWhen the target species is Homo sapiens, organ-specific cell localizations can be realized. This screen shows a list of organs with “High” expression levels in the HumanProteinAtlas 7, and a link to the HumanProteinAtlas is available.
図3
Figure 3. GlyCosmos Glycolipid top page where each column can be searched, or a cross-column search can be made using the text field on the upper right.
図4
Figure 4. Sequence visualization using ProtVista (UniProtID: Q2N0S6)
図5
Figure 5. Molecular structure visualization using PDB Images and LiteMol (UniProtID: Q2N0S6)
2-2. GlycoProtDB : Glycoprotein Database

In recent years, many studies have reported on various protein databases, including modifications, with the advances in proteome analysis and compilation of large data sets. However, only a few databases include glycan binding sites or glycan structures. Information is available from databases, such as Unipep for peptides with identified N-glycans; UniProtKB, which includes glycan modification site data; GlyConnect, which includes glycan compositions; and other databases. The Glycoprotein Database (GlycoProtDB, Kaji et al. 2012) has been open to the public within the framework of the Japan Consortium for Glycobiology and Glycotechnology Database (JCGGDB), providing N-linked glycosylation sites identified by mass spectrometry with a focus on glycan structures. At present, the latest version with modified interfaces (modes of display, etc.) is available in ACGG, the features of which are described below.

The GlycoProtDB lists data acquired from tissues, cells, sera and other biomaterials prepared from nematode, human, and mouse. The desired tissues and cells can be selected from the column on the left side, and displayed in alphabetical order (Fig. 6). In addition, data search can be achieved by entering the name of the glycoprotein of interest.

図6
Figure 6. GlycoProtDB search page
I) Data for selected nematode, human, and mouse tissues and cells can be searched. II) Data search can be achieved by entering a glycoprotein name. III) Gene symbol, IV) UniProt accession number

When a glycoprotein is selected from the search results, the N-glycosylation sites and amino acid sequence on the glycoprotein can be viewed on the detail page (Figs. 7 and 8). A major feature of GlycoProtDB is capable to find tissue- and lectin-binding-specific differences.

図7
Figure 7. Detail page for glycoprotein data (upper panel)
Glycoprotein glycosylation sites are displayed specifically for I) sample-deriving tissues and II) lectins used for the sample preparation. Blue pins represent sites identified using the isotope-coded glycosylation site-specific tagging (IGOT) method; and red pins (encircled examples), using the GlycoRidge method.
図8
Figure 8. Detail page for glycoprotein data (lower panel)
Table of correspondence of glycoprotein amino acid sequences and peptide sequences, and lectins identified by mass spectrometry.

When multiple sites having glycan structures recognized by the same lectin are compared among different samples, tissue-specific differences can be seen. The results of an analysis of glycans obtained without cleaving from the glycoprotein (GlycoRidge method) are displayed with red pins (Fig. 7). When a red pin is selected, the glycan structure (glycan composition) can be found on the viewer (Fig. 9). When the cursor is moved onto the glycan identified in the viewer, candidate glycan structures are presented. Hence, the GlycoProtDB database not only makes it possible to know the true N-glycosylation sites, but also allows users to easily visualize tissue-, cell-, and serum-specific differences, including glycan structures.

図9
Figure 9. A viewer page for glycan structures identified using the GlycoRidge method
More than one glycan structure (composition) present at the same site in the glycoprotein is identified using the GlycoRidge method and displayed. When the cursor is moved onto the chart, the glycan structure estimated from the composition can be viewed.
2-3. GlycoNAVI : TCarp

GlycoNAVI is a website constructed to support glycan science research. On this website, data in the Protein Data Bank (PDB) 11 is analyzed, and a secondary database of organized glycan-related data (TCarp) is described. The PDB has data on conformations of glycoproteins, glycolipids, free glycans, and other entities. The list of glycan structures shown in Figure 10 can be accessed via the Glycans site on the GlycoNAVI top page. Glycan structures can be searched by sequentially accessing the list or entering the glycan structure repository GlyTouCan 12 accession number or a WURCS glycan structure representation 13. When a GlyTouCan accession number is clicked, the GlyTouCan entry page appears.

図10
Figure 10. Glycan structure repository GlyTouCan accession numbers, WURCS glycan structure notations, and SNFG-format list of glycan structures
The number of entries on one page can be changed via the upper left pulldown menu, and search can be initiated by clicking on the Search button in the upper right.

When the WURCS strings for the desired glycan structure in Figure 10 is clicked, a list of entry pages including the glycan structure (Fig. 11) appears. In this list, conformations, GlyTouCan accession numbers, and glycan structure SNFG representations (https://www.ncbi.nlm.nih.gov/glycans/snfg.html) 14 are displayed. Just as described above, the number of entries on one page can be changed via the upper left pulldown menu and search can be initiated by clicking on the Search button in the upper right. When an ID in this list is clicked, detail pages are displayed.

図11
Figure 11. List of glycan structures represented by WURCS strings

The list of glycan structures in Figure 12 can be accessed via the Proteins site on the GlycoNAVI top page. This list displays the number of glycans contained, PDB title, and other information. Just as described above, you can change the number of entries on one page via the upper left pulldown menu and search can be initiated by clicking on the Search button in the upper right. When an ID in this list is clicked, detail pages are displayed.

図12
Figure 12. List of detail pages
PDB title, explanation, PDB ID, and conformation are displayed on each page.

On this detail page, three-dimensional structural representations of glycan molecular structures are depicted using 3D-SNFG 15, PDB links, PDB entry titles and explanations, experimental procedures, analysis dates and times (Fig. 13), GlyTouCan accession numbers and links, glycan structure SNFG representations, glycan conformations (Fig. 14), literature references, PubMed links, digital object identifiers (DOIs), and their links (Fig. 15). Figure 16 shows the results of a glycan-related verification of chemical structure data analyzed. When displayed, this result means that there is any site to note in structures of the glycan.

図13
Figure 13. Glycan and molecule conformations and PDB title and explanation on the detail page
図14
Figure 14. Glycan structure, GlyTouCan accession number, and link to GlyTouCan on the detail page
図15
Figure 15. Literature information on the detail page
図16
Figure 16. Verification results for a glycan molecule on the detail page
2-4. GlycoPOST

GlycoPOST is a repository for depositing glycoprotein mass spectrometry data (Fig. 17). The user can register his/her own experimental data and access and download data registered by other users. This is often used to present experimental data relevant to a published article.

The data posted to the repository consist of meta-data, including experimental conditions, and electronic files. Meta-data can be entered in GlycoPOST by selecting the appropriate entry in the pulldown menu or providing a statement in the text box (Fig. 18). These entries comply with the guidelines for reporting glycan-related experiments proposed by MIRAGE 16. This database is compatible with other databases and repository sites that are in compliance with the same guidelines; data can be imported and exported via Microsoft Excel files.

The files posted include raw data and peak lists from the mass spectromerter and identification results. Based on the PRESTO independently developed JavaScript library, the GlycoPOST file upload system enables users to upload files at higher-than-conventional speeds by expanding the standard functionality of the web browser (Fig. 19). This function allows all data-posting processes to be implemented via the web browser only.

Information on specific posting procedures and terminology is available at https://glycopost.glycosmos.org/help.

図17
Figure 17. GlycoPOST top page
図18
Figure 18. Example meta-data entry screen
図19
Figure 19. File selection and uploading

References

  1. Bateman A, Martin MJ, O’Donovan C, et al. (2017) UniProt: the universal protein knowledgebase. Nucleic Acids Res 45:D158–D169 . doi: 10.1093/nar/gkw1099
  2. Hosoda, M. et al. (2018) MCAW-DB: a glycan profile database capturing the ambiguity of glycan recognition patterns. Carbohydrate research 464, 44-56. doi: 10.1016/j.carres.2018.05.003
  3. Watkins X, et al. (2017) ProtVista: visualization of protein sequence annotations. Bioinformatics 33(13):2040-2041. doi: 10.1093/bioinformatics/btx120
  4. Kinjo AR, Bekker G-J, Wako H, et al. (2018) New tools and functions in data-out activities at Protein Data Bank Japan (PDBj). Protein Sci 27:95–102 . doi: 10.1002/pro.3273
  5. Sehnal D, et al (2019) Rapidly Display Glycan Symbols in 3D Structures: 3D-SNFG in LiteMol. J Proteome Res 18(2):770-774. doi: 10.1021/acs.jproteome.8b00473
  6. Fabregat A, Jupe S, Matthews L, et al. (2018) The Reactome Pathway Knowledgebase. Nucleic Acids Res 46:D649–D655 . doi: 10.1093/nar/gkx1132
  7. Uhlén M, et al. (2015) Proteomics. Tissue-based map of the human proteome. Science 347(6220):1260419. doi: 10.1126/science.1260419
  8. Sud M, et al. (2007) LMSD: LIPID MAPS structure database. Nucleic Acids Res 35(Database issue):D527-32
  9. Kaji H, Shikanai T, Sasaki-Sawa A, Wen H, Fujita M, Suzuki Y, Sugahara D, Sawaki H, Yamauchi Y, Shinkawa T, Taoka M, Takahashi N, Isobe T, Narimatsu H. (2012). Large-scale identification of N-glycosylated proteins of mouse tissues and construction of a glycoprotein database, GlycoProtDB. J Proteome Res. 11, 4553-4566.
  10. Noro E, Togayachi A, Sato T, Tomioka A, Fujita M, Sukegawa M, Suzuki N, Kaji H, Narimatsu H. Large-Scale Identification of N-Glycan Glycoproteins Carrying Lewis x and Site-Specific N-Glycan Alterations in Fut9 Knockout Mice. 2015 Sep 4;14(9):3823-34.
  11. wwPDB consortium (2019) Nucleic Acids Res.Jan 8;47(D1):D520-D528. doi: 10.1093/nar/gky949.
  12. Tiemeyer, M. et al. GlyTouCan: an accessible glycan structure repository. Glycobiology 27, 915–919 (2017). doi: 10.1093/glycob/cwx066.
  13. Matsubara, M. et al. WURCS 2.0 update to encapsulate ambiguous carbohydrate structures. Journal of chemical information and modeling 57.4 632-637 (2017). DOI: 10.1021/acs.jcim.6b00650.
  14. Neelamegham, S. et al. Updates to the Symbol Nomenclature for Glycans guidelines. (2019) Glycobiology Aug 20;29(9):620-624. doi: 10.1093/glycob/cwz045.
  15. Thieker DF et al. 3D implementation of the symbol nomenclature for graphical representation of glycans. (2016) Glycobiology. Aug;26(8):786-7. doi: 10.1093/glycob/cww076.
  16. Kolarich D, Rapp E, Struwe WB, et al. (2013) The minimum information required for a glycomics experiment (MIRAGE) project: improving the standards for reporting mass-spectrometry-based glycoanalytic data. Mol Cell Proteomics 12:991–5 . doi: 10.1074/mcp.O112.026492
top