Email updates

Keep up to date with the latest news and content from Genome Medicine and BioMed Central.

Journal App

google play app store
Open Access Highly Accessed Research

Inferring novel gene-disease associations using Medical Subject Heading Over-representation Profiles

Warren A Cheung1, BF Francis Ouellette2 and Wyeth W Wasserman3*

Author Affiliations

1 Bioinformatics Graduate Program, Centre for Molecular Medicine and Therapeutics at the Child and Family Research Institute, University of British Columbia, 980 W. 28th Ave, Vancouver, V5Z 4H4, Canada

2 Department of Cells and Systems Biology, Ontario Institute for Cancer Research, University of Toronto, 101 College Street, Toronto, M5G 0A3, Canada

3 Department of Medical Genetics, Centre for Molecular Medicine and Therapeutics at the Child and Family Research Institute, University of British Columbia, 980 W. 28th Ave, Vancouver, V5Z 4H4, Canada

For all author emails, please log on.

Genome Medicine 2012, 4:75  doi:10.1186/gm376

Published: 28 September 2012

Abstract

Background

MEDLINE®/PubMed® currently indexes over 18 million biomedical articles, providing unprecedented opportunities and challenges for text analysis. Using Medical Subject Heading Over-representation Profiles (MeSHOPs), an entity of interest can be robustly summarized, quantitatively identifying associated biomedical terms and predicting novel indirect associations.

Methods

A procedure is introduced for quantitative comparison of MeSHOPs derived from a group of MEDLINE® articles for a biomedical topic (for example, articles for a specific gene or disease). Similarity scores are computed to compare MeSHOPs of genes and diseases.

Results

Similarity scores successfully infer novel associations between diseases and genes. The number of papers addressing a gene or disease has a strong influence on predicted associations, revealing an important bias for gene-disease relationship prediction. Predictions derived from comparisons of MeSHOPs achieves a mean 8% AUC improvement in the identification of gene-disease relationships compared to gene-independent baseline properties.

Conclusions

MeSHOP comparisons are demonstrated to provide predictive capacity for novel relationships between genes and human diseases. We demonstrate the impact of literature bias on the performance of gene-disease prediction methods. MeSHOPs provide a rich source of annotation to facilitate relationship discovery in biomedical informatics.