Global gene expression profile mining in stem cells and their progeny

Ruau, David; Zenke, Martin (Thesis advisor)

Aachen : Publikationsserver der RWTH Aachen University (2009)
Dissertation / PhD Thesis


Today’s biology relies heavily on technological advances made during the last 30 years. At the same time, the way of analyzing a biological question changed and nowadays we aim at understanding globally the system under study. DNA microarray technology allows events to be measured at a genome wide scale leveraging the need for an educated guess approach. Global gene expression profiling is the most frequent application of DNA microarrays and is used to study different cell types in diverse experimental setup. In this work, I have (1) subjected DNA microarray data to different mining approaches for the identification of gene signatures in stem cell differentiation and reprogramming, and (2) developed a workflow for semantic annotation of microarray data from public repositories. Dendritic cells (DC) were treated with TGF-beta and subjected to global gene expression profiling. DC are derived from hematopoietic stem cells. They initiate immunity and induce antigen-specific tolerance making these cells major candidates for cell-based therapies. Our study revealed key regulatory factors in the answer of DC to TGF-beta. Chromatin structure determines gene expression and thereby cell identity and cell fates. Thus, specific drugs that alter DNA methylation and histone acetylation modify the chromatin structure and were found to broaden the developmental potential of neural stem cells. Microarray analysis revealed the induction of pluripotency and pluripotency associated genes by drug treatment and this is suggested to account for the altered potential of these cells. Pluripotent stem cells, including embryonic stem cells (ESC), are able to generate all cell types present in the adult body, however the isolation and cultivation of such cells raised some ethical and technical concern. For this reason alternative methods have been developed to generate ES-like cells from somatic cells or adult stem cells. The medical and research applications of such cells are extremely promising. Gene expression profiling of induced pluripotent stem (iPS) cells, either generated using two or four reprogramming transcription factors (Oct4 and Klf4 or Oct4, Klf4, Sox2 and c-Myc), revealed a genomic signature similar to ESC. The gene array technology underlying such studies generates an enormous amount of data that is usually stored in database made available to the community. However the descriptions available for the experiments are made in free text. Thus, retrieving and associating microarray experiments is subjected to a comprehensible labeling from the submitter. Linking different sources of data requires description made in a vocabulary on which everybody agrees such as biomedical ontologies. Ontology organizes knowledge of a particular domain in a define network of relationships. In this work I developed a workflow for semantic annotation of microarray public repositories. Gene Expression Omnibus (GEO), the biggest public repository of microarray data, was subjected to the workflow and annotated using different ontologies. The method relies on text mining, outlier detection and an algorithm for label propagation of labeled objects to unlabeled objects, in order to increase the labeling coverage. The algorithm adapts the label propagation to the specificity of the biological sample type measured. Integrative bioinformatics studies that merge different data types to discover new relationships between diseases, phenotypes and gene expression profiles will benefit from standardized annotation of the experiments.