|
Abstract
Using DNA microarray technology, it is now possible to measure the expression levels of tens of thousands of genes. Statistical analysis of these expression levels provides insight into the function of genes and their biological pathways, as well as information about the genomic underpinnings of many common diseases. Cluster analysis is a form of unsupervised learning commonly used to analyze microarray data, and there are several different types of cluster analysis to choose from. It is widely acknowledged that the different types of cluster analysis can produce vastly inconsistent results, yet there is no known way to deal with these inconsistencies. In this thesis, I present a novel approach to the cluster analysis of microarray data. The proposed methodology combines and distills the information generated by different types of cluster analysis, and produces a representative clustering structure. Several new statistics are developed to identify dominant clusters and assess consistency across clustering algorithms. Using real data from leukemia patients, the proposed methodology is shown to outperform the naïve choice of a single algorithm.
Download
Full Document
- Format: .htm (1.8MB) / .pdf (2.6MB)
By Section (.pdf)
- Section 1: Introduction (187kb)
- Section 2: Background on Microarray Technology (116kb)
- Section 3: Cluster Analysis of Microarray Experiments (235kb)
- Section 4: Methods (273kb)
- Section 5: Results (1.4MB)
- Section 6: Conclusions and References (141kb)
- Appendix A: Sources of Variability and Indeterminacy in the Clustering Process (793kb)
- Appendix B: Implementation (243kb)
|