An algorithm for data-driven bandwidth selection 论文

2003IEEE Transactions on Pattern Analysis and Machine Intelligence引用 338
Bayesian Methods and Mixture ModelsGene expression and cancer classificationImage and Signal Denoising Methods

详细信息

发表期刊/会议
IEEE Transactions on Pattern Analysis and Machine Intelligence
发表日期
2003-02-01
发表年份
2003

关键词

Bayesian Methods and Mixture ModelsGene expression and cancer classificationImage and Signal Denoising Methods

摘要

The analysis of a feature space that exhibits multiscale patterns often requires kernel estimation techniques with locally adaptive bandwidths, such as the variable-bandwidth mean shift. Proper selection of the kernel bandwidth is, however, a critical step for superior space analysis and partitioning. This paper presents a mean shift-based approach for local bandwidth selection in the multimodal, multivariate case. The method is based on a fundamental property of normal distributions regarding the bias of the normalized density gradient. This paper demonstrates that, within the large sample approximation, the local covariance is estimated by the matrix that maximizes the magnitude of the normalized mean shift vector. Using this property, the paper develops a reliable algorithm which takes into account the stability of local bandwidth estimates across scales. The validity of the theoretical results is proven in various space partitioning experiments involving the variable-bandwidth mean shift.