wiki:public/WP2.1.2

From Escherichia coli Microarray Raw Data to Pathways and Published Abstracts Using Taverna and Web Services

Many microarray studies have analysed data in a user-intensive manner to identify regulons, pathways and relevant literature. A two-colour cDNA microarray dataset comprising a time-course experiment of Escherichia coli cells during an aerobic to anaerobic environment is used to demonstrate a data-driven methodology that identifies known pathways from a set of differentially expressed genes. These pathways are subsequently used to obtain a corpus of published abstracts (from the PubMed? database) relating to each biological pathway identified. The workflow consists of three parts: Microarray Data Analysis; Pathways extraction; and PubMed? abstract retrieval, which is implemented systematically through the use of web services and workflows. For the purpose of implementing this systematic pathway-driven approach, we have chosen to use the Taverna workbench http://taverna.sourceforge.net. The “Microarray Data Analysis” part provides data loading, normalisation and significance testing of microarray data, providing a range of diagnostic plots of the microarray data, including histograms, box plots and principal components analysis plots using R and Bioconductor. The “Pathway extraction” part searches for differentially-expressed genes and cross-references them to the KEGG database to obtain gene and pathway descriptions http://www.myexperiment.org/workflows/10. The “PubMed? abstract retrieval” part takes the pathway descriptions and searches the PubMed? database to identify up to 500 abstracts related to the chosen biological pathway http://www.myexperiment.org/workflows/172. The result of this research is a re-usable methodology that directly processes raw microarray data files and that can ultimately identify published abstracts (from PubMed?) that are relevant to the genes and pathways found during analysis of the microarray data. This provides biomedical researchers with an integrated system for the analysis of Affymetrix/cDNA microarray experiments, makes analyses more accessible to biologists/medics rather than solely bioinformaticians, reduces training requirements and time, improves the productivity of bioinformatics array support staff, reduces the number of errors associated with manual data analysis and improves the reproducibility of methods http://www.myexperiment.org/workflows/187.

Maleki-Dizaji, S., Holcombe, M., Rolfe, M.D., Fisher, P., Green, J., Poole, R.K., Graham, A.I. and SysMO-SUMO consortium. (2009) A systematic approach to understanding Escherichia coli responses to oxygen: from microarray raw data to pathways and published abstracts. Online Journal of Bioinformatics 10, 51-59. http://users.comcen.com.au/~journals/ecoliprintabs2009.htm

Last modified 15 years ago Last modified on May 5, 2009, 12:06:00 PM