A novel classifier ensemble method with sparsity and diversity
We consider the classifier ensemble problem in this paper. Due to its superior performance to individual classifiers, class ensemble has been intensively studied in the literature. Generally speaking, there are two prevalent research directions on this, i.e., to diversely generate classifier components, and to sparsely combine multiple classifiers. While most current approaches are emphasized on either sparsity or diversity only, we investigate the classifier ensemble by learning both sparsity and diversity simultaneously.We manage to formulate the classifier ensemble problem with the sparsity or/and diversity
learning in a general framework. In particular, the classifier ensemble with sparsity and diversity can be represented as a mathematical optimization problem. We then propose a heuristic algorithm, capable of obtaining ensemble classifiers with consideration of both sparsity and diversity. We exploit the genetic algorithm, and optimize sparsity and diversity for classifier selection and combination heuristically and iteratively. As one major contribution, we introduce the concept of the diversity contribution ability so as to select proper classifier components and evolve classifier weights eventually. Finally, we compare our proposed novel method with other conventional classifier ensemble methods such as Bagging, least squares combination, sparsity learning, and AdaBoost, extensively on UCI benchmark data sets and the Pascal Large Scale Learning Challenge 2008 webspam data. The experimental results confirm that our approach leads to better performance in many aspects.
要点:提出了一个启发式算法,从最优化的角度同时考虑稀疏性和多样性来生成分类器。
关注点:如何度量稀疏性和多样性
Relationships between Diversity of Classification Ensembles and Single-Class Performance Measures
In class imbalance learning problems, how to better recognize examples from the minority class is the key focus, since it is usually more important and expensive than the majority class. Quite a few ensemble solutions have been proposed in the literature with varying degrees of success. It is generally believed that diversity in an ensemble could help to improve the performance of class imbalance learning. However, no study has actually investigated diversity in depth in terms of its definitions and effects in the context of class imbalance learning. It is unclear whether diversity will have a similar or different impact on the performance of minority and majority classes. In this paper, we aim to gain a deeper understanding of if and when ensemble diversity has a positive impact on the classification of imbalanced data sets. First, we explain when and why diversity measured by Q-statistic can bring improved overall accuracy based on two classification patterns proposed by Kuncheva et al. We define and give insights into good and bad patterns in imbalanced scenarios. Then, the pattern analysis is extended to single-class performance measures, including recall, precision, and Fmeasure, which are widely used in class imbalance learning. Six different situations of diversity’s impact on these measures are obtained through theoretical analysis. Finally, to further understand how diversity affects the single class performance and overall performance in class imbalance problems, we carry out extensive experimental studies on both artificial data sets and real-world benchmarks with highly skewed class distributions. We find strong correlations between diversity and discussed performance measures. Diversity shows a positive impact on the minority class in general. It is also beneficial to the overall performance in terms of AUC and G-mean.
要点:讨论了类别不平衡问题中的集成多样性应当如何度量及其影响(对多数类和少数类是否有影响有怎样的影响)
关注点:依然是度量,并且学习一下他如何证明多样性的影响
Diversity in classifier ensembles: fertile concept or dead end?
Diversity is deemed a crucial concept in the field of multiple classifier systems, although no exact definition has been found so far.Existing diversity measures exhibit some issues, both from the theoretical viewpoint, and from the practical viewpoint of ensemble construction.We propose to address some of these issues through the derivation of decompositions of classification error, analogue to the well-known biasvariance-covariance and ambiguity decompositions of regression error.We then discuss whether the resulting decompositions can provide a more clear definition of diversity, and whether they can be exploited more effectively for the practical purpose of ensemble construction.
要点:讨论了方差分解能否提供更清晰实用的多样性定义
关注点:看他阐述的目前的多样性度量所存在的问题
Diversity Regularized Ensemble Pruning
Diversity among individual classifiers is recognized to play a key role in ensemble, however, few theoretical properties are known for classification. In this paper, by focusing on the popular ensemble pruning setting (i.e., combining classifier by voting and measuring diversity in pairwise manner), we present a theoretical study on the effect of diversity on the generalization performance of voting in the PAC-learning framework. It is disclosed that the diversity is closely-related to the hypothesis space complexity, and encouraging diversity can be regarded to apply regularization on ensemble methods. Guided by this analysis, we apply explicit diversity regularization to ensemble pruning, and propose the Diversity Regularized Ensemble Pruning (DREP) method. Experimental results show the effectiveness of DREP.
要点:研究了在PAC学习框架下多样性对投票的泛化性能的影响,并在此基础上提出了多样性正则集成剪枝
关注点:还是度量