Stability of Feature Ranking Algorithms on Binary Data

Aqsa Shabbir, Kashif Javed, Haroon A Babri, Yasmin Ansari

Abstract


Stability or robustness is a crucial yardstick for analyzing and evaluating feature selectionalgorithms which have become indispensible due to unprecedented advancements in knowledge datadiscovery and management. Stability of feature selection algorithms is taken as the insensitivity of thealgorithm to perturbations in the training data with reference to the performance of the algorithm withall training data. In this work, we propose an algorithm for evaluating and quantifying the robustnessof feature ranking algorithms and test three feature ranking algorithms: relief, diff-criterian andmutual information on four different real life binary data sets from text mining, handwritingrecognition, medical diagnoses and medicinal sciences. We then analyze the stability profiles offeature selectors and determine how stability is a desirable characteristic of a feature rankingalgorithm. We find that diff-criterian, and mutual information, outperform relief in stability.

Full Text:

PDF

References


E. Alpaydin, 2010. Introduction to Machine Learning, 2nded. The MIT Press. Cambridge, London, England.

R. O. Duda, P. E. Hart, and D. G. Stork, 2001. Pattern Classification, 2nd ed. Wiley.

Han and M. Kamber, 2006. Data Mining: Concepts and Techniques, 2nd ed. Elsevier.

Guyon, S. Gunn, M. Nikravesh, and L. A. Zadeh. 2006. Feature Extraction: Foundations and Applications. Springer.

S. Loscalzo, L. Yu, and C. Ding. 2009. “Consensus group based stable feature selection,” in Proc. 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-09), pp. 567–576.

R. Bellman, 1961. “Adaptive Control Processes: A Guided Tour,” Princeton University Press, Princeton, NJ.

S. Raudys and A. Jain, 1991. “Small sample size effects in statistical pattern recognition: recommendations for practitioners,” IEEE Transaction on Pattern Analysis and Machine Intelligence, Vol. 13, No.3, pp.252-264.

R. Kohavi and G. John, 1997. “Wrappers for Feature Selection,” Artificial Intelligence, Elsevier, Vol. 97, No. 1-2, pp. 273–324.

A. Juan and E. Vidal, 2002. “On the use of Bernoulli Mixture Models for Text Classification,” Pattern Recognition, Vol. 35, pp. 2705–2710.

“Annual KDD cup 2001,” 2001. see http://www.sigkdd.org/kddcup/

J. Wilbur, J. Ghosh, C. Nakatsu, S. Brouder, and R. Doerge, 2002. “Variable Selection in High-Dimensional Multivariate Binary Data with Application to the Analysis of Microbial Community DNA Fingerprints,” Biometrics, Vol. 58, pp. 378–386.

H. Liu and L. Yu, 2005. “Toward integrating feature selection algorithms for classification and clustering,” IEEE Transactions on Knowledge and Data Engineering (TKDE), Vol. 17, No. 4, pp. 491–502.

Der-Shung Yang, Larry Rendell, and Gunnar Blix. 1991. “A scheme for feature construction and a comparison of empirical method,” in

Proc. 10th International Joint Conference on Artificial Intelligence, pp. 699–704.

I. Guyon and A. Elisseeff, 2003. “An introduction to variable and feature selection,” Journal of Machine Learning Research, Vol. 3, pp. 1157–1182.

YvanSaeys, IakiInza, and Pedro Larraaga. 2007. “A review of feature selection techniques in bioinformatics,” Bioinformatics, vol. 23, No. 19, pp.2507-2517.

A. Kalousis, J. Prados, and M. Hilario. 2007. “Stability of Feature Selection Algorithms: A Study on High-Dimensional Spaces,” Knowledge and Information Systems, Vol. 12, No. 1, pp. 95-116.

Salem Alelyani, Zheng Zhao and Huan Liu. 2011. “A Dilemma in Assessing Stability of Feature Selection Algorithms,” in Proc. IEEE International Conference on High Performance Computing and Communications, pp.701-707.

Petr Somol and Jana Novovicova, 2010. “Evaluating the stability of feature selectors that optimize feature subset cardinality.” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 32, No. 11.

Jana Novovicova, Petr Somol and Pavel Pudil, 2009. “A New Measure of Feature Selection Algorithms’ Stability,” in Proc. IEEE International Conference on Data Mining Workshops, pp. 382-387.

Yue Han and Lei Yu. 2010. “A variance reduction framework for stable feature selection,” in Proc. IEEE International Conference on Data Mining, pp. 206-215.

Jean D. Gibbons. 1993. Non Parametric Measures of Association, first ed. Sage Publications.

R.B. Nelsen, “Kendall Tau Metric,” Encyclopedia of Mathematics. Available at http://www.encyclopediaofmath.org/index.php ? title=Kendall_tau_metric&oldid=12869

K. Dunne, P. Cunningham, and F. Azuaje, 2002. “Solutions to Instability Problems with Sequential Wrapper-Based Approaches to Feature Selection,” Technical Report TCD-CD- 2002-28, Dept. of Computer Science, Trinity College.

“Agnostic Learning vs. Prior Knowledge Challenge by International Joint Conference on Neural Networks (IJCNN),” 2007 see http://www.agnostic.inf.ethz.ch

“Causality Challenge #1: Causation and Prediction,” 2008, see http://www.causality.inf.ethz.ch/challenge.php

The 20 Newsgroups data set, Available at http://people.csail.mit.edu/jrennie/20Newsgrou ps

K. Kira and L. A. Rendell. 1992. “A Practical Approach to Feature Selection,” in Proceedings of the 9th International Conference on Machine Learning, pp. 249–256.

T. M. Cover and J. A. Thomas. 1991. “Elements of Information Theory,” John Wiley and Sons, INC.

Kashif Javed, Haroon A Babri and Mehreen Saeed. 2012. “Feature Selection based on Class-dependent Densities for High Dimensional Binary Data,” IEEE Transactions on Knowledge and Data Engineering, vol.24, no.3, pp. 465-477.

“CLOP,” available athttp://ymer.org/research/files/clop/clop.zip.






Copyright (c) 2016 Kashif Javed

Powered By KICS