© josh blumenstock
10 over 100
the big issue
OneWorld TV documentary
first quarterly report
IMSA paper published
JNCI paper published
Putting cancer on the run
music of my mind (short story)
Iguašu Falls (short story)
flawed leaders
Mining Clusters for Knowledge: Finding Algorithm-Independent Groups in Microarray Data

Using DNA microarray technology, it is now possible to measure the expression levels of tens of thousands of genes. Statistical analysis of these expression levels provides insight into the function of genes and their biological pathways, as well as information about the genomic underpinnings of many common diseases. Cluster analysis is a form of unsupervised learning commonly used to analyze microarray data, and there are several different types of cluster analysis to choose from. It is widely acknowledged that the different types of cluster analysis can produce vastly inconsistent results, yet there is no known way to deal with these inconsistencies. In this thesis, I present a novel approach to the cluster analysis of microarray data. The proposed methodology combines and distills the information generated by different types of cluster analysis, and produces a representative clustering structure. Several new statistics are developed to identify dominant clusters and assess consistency across clustering algorithms. Using real data from leukemia patients, the proposed methodology is shown to outperform the na´ve choice of a single algorithm.


    Full Document
  • Format: .htm (1.8MB) / .pdf (2.6MB)

    By Section (.pdf)
  • Section 1: Introduction (187kb)
  • Section 2: Background on Microarray Technology (116kb)
  • Section 3: Cluster Analysis of Microarray Experiments (235kb)
  • Section 4: Methods (273kb)
  • Section 5: Results (1.4MB)
  • Section 6: Conclusions and References (141kb)
  • Appendix A: Sources of Variability and Indeterminacy in the Clustering Process (793kb)
  • Appendix B: Implementation (243kb)

Creative Commons License
go to the top of the page