Open Access
2007 A Gaussian Mixture Model to Detect Clusters Embedded in Feature Subspace
Yuanhong Li, Ming Dong, Jing Hua
Commun. Inf. Syst. 7(4): 337-352 (2007).

Abstract

The goal of unsupervised learning, i.e., clustering, is to determine the intrinsic structure of unlabeled data. Feature selection for clustering improves the performance of grouping by removing irrelevant features. Typical feature selection algorithms select a common feature subset for all the clusters. Consequently, clusters embedded in different feature subspaces are not able to be identified. In this paper, we introduce a probabilistic model based on Gaussian mixture to solve this problem. Particularly, the feature relevance for an individual cluster is treated as a probability, which is represented by localized feature saliency and estimated through Expectation Maximization (EM) algorithm during the clustering process. In addition, the number of clusters is determined simultaneously by integrating a Minimum Message Length (MML) criterion. Experiments carried on both synthetic and real-world datasets illustrate the performance of the proposed approach in finding clusters embedded in feature subspace.

Citation

Download Citation

Yuanhong Li. Ming Dong. Jing Hua. "A Gaussian Mixture Model to Detect Clusters Embedded in Feature Subspace." Commun. Inf. Syst. 7 (4) 337 - 352, 2007.

Information

Published: 2007
First available in Project Euclid: 23 May 2008

zbMATH: 1182.62140
MathSciNet: MR2403572

Rights: Copyright © 2007 International Press of Boston

Vol.7 • No. 4 • 2007
Back to Top