CLUSTERING GENE EXPRESSION PROFILES USING MIXTURE MODEL ENSEMBLE AVERAGING APPROACH
Clustering has been an important tool for extracting underlying gene expression patterns from massive microarray data. However, most of the existing clustering methods cannot automatically separate noise genes, including scattered, singleton and mini-cluster genes, from other genes. Inclusion of noise genes into regular clustering processes can impede identification of gene expression patterns. The model-based clustering method has the potential to automatically separate noise genes from other genes so that major gene expression patterns can be better identified. In this paper, we propose to use the ensemble averaging method to improve the performance of the single model-based clustering method. We also propose a new density estimator for noise genes for Gaussian mixture modeling. Our numerical results indicate that the ensemble averaging method outperforms other clustering methods, such as the quality-based method and the single model-based clustering method, in clustering datasets with noise genes.
ensemble average, Gaussian mixture models, scatter genes, silhouette width.