A Dataset of Program Educational Objectives Mapped to ABET Outcomes: Data Cleansing, Exploratory Data Analysis and Modeling
Datasets or collections are becoming important assets by themselves and now they can be accepted as a primary intellectual output of a research. The quality and usage of the datasets depend mainly on the context under which they have been collected, processed, analyzed, validated, and interpreted. This paper aims to present a collection of program educational objectives mapped to student’s outcomes collected from self-study reports prepared by 32 engineering programs accredited by ABET. The manual mapping (classification) of this data is a notoriously tedious, time consuming process. In addition, it requires experts in the area, which are mostly not available. It has been shown the operational settings under which the collection has been produced. The collection has been cleansed, preprocessed, some features have been selected and preliminary exploratory data analysis has been performed so as to illustrate the properties and usefulness of the collection. At the end, the collection has been benchmarked using nine of the most widely used supervised multiclass classification techniques (Binary Relevance, Label Powerset, Classifier Chains, Pruned Sets, Random k-label sets, Ensemble of Classifier Chains, Ensemble of Pruned Sets, Multi-Label k-Nearest Neighbors and Back-Propagation Multi-Label Learning). The techniques have been compared to each other using five well-known measurements (Accuracy, Hamming Loss, Micro-F, Macro-F, and Macro-F). The Ensemble of Classifier Chains and Ensemble of Pruned Sets have achieved encouraging performance compared to other experimented multi-label classification methods. The Classifier Chains method has shown the worst performance. To recap, the benchmark has achieved promising results by utilizing preliminary exploratory data analysis performed on the collection, proposing new trends for research and providing a baseline for future studies.
 Fabrizio Sebastiani, Machine learning in automated text categorization. ACM Computing Surveys, 34(1):1–47, 2002.
 Shweta C. Dharmadhikari, Maya Ingle, Parag Kulkarni, Empirical Studies on Machine Learning Based Text Classification Algorithms, Advanced Computing: An International Journal (ACIJ), Vol.2, No.6, November 2011.
 Daniel Carlos Guimarães Pedronette, Combined unsupervised and semi-supervised learning for data classification, 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP), Salerno, Italy, 13-16 Sept. 2016.
 Lunke Fei, Yong Xu , Xiaozhao Fang, and Jian Yang, Low rank representation with adaptive distance penalty for semi-supervised subspace classification, Pattern Recognition, Volume 67, Pages 252–262, July 2017.
 Bishop Christopher M., Pattern Recognition and Machine Learning, Springer, 2007.
 Murphy Kevin P., Machine Learning: A Probabilistic Perspective, The MIT Press, 2012.
 Duda Richar O., Hart PE, Stork DG, Pattern Classification. Wiley-Interscience, 2000.
 D. D. Lewis, R. E. Schapire, J. P. Callan, and R. Papka, Training algorithms for linear text classifiers, In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 96), pages 298–306, 1996.
 D. D. Lewis, Reuters-21578 text Categorization test collection. Distribution 1.0. README file (version 1.2). Manuscript, September 26, 1997.
 Y. Yang, An evaluation of statistical approaches to text categorization. Information Retrieval, 1 (1/2):67–88, 1999.
 David D. Lewis, Yiming Yang, Tony G. Rose, Fan Li RCV1: A New Benchmark Collection for Text Categorization Research, Journal of Machine Learning Research 5: 361-397, 2004.
 P. Y. Pawar and S. H. Gawande, A Comparative Study on Different Types of Approaches to Text Categorization, International Journal of Machine Learning and Computing, 2-4:423-426, 2012.
 ABET, ABET Strategic Plan, Accreditation Board for Engineering and Technology, Inc., (ABET), November 1, 1997
 Engineering Accreditation Commission (ABET), Criteria for Accrediting Engineering Programs Effective for Review During the 2015-2016 Accreditation Cycle, 415 N. Charles Street Baltimore, MD 21201, United States of Ameriaca, ABET, 2014.
 ABET, Criteria for Accrediting Engineering Programs Effective for Reviews During the 2016-2017 Accrediting Cycle.
 Ronald Dekker, The Importance of Having Data-sets, Proceedings of the IATUL Conferences, Paper 16, May 23rd, 2006.
 Anwar Ali Yahya, Zakaria Toukal, Addin Osman, Bloom’s Taxonomy–Based Classification for Item Bank Questions Using Support Vector Machines, Modern Advances in Intelligent Systems and Tools, SCI 431, pp 135-140, 2012.
 Anwar Ali Yahya and Addin Osman, Automatic Classification of Questions into Bloom's Cognitive Levels using Support Vector Machines, The International Arab Conference on Information Technology, At Naif Arab University for Security Science (NAUSS), Riyadh, Saudi Arabia, 2011.
 Anwar Ali Yahya, Addin Osman, Ahmed Abdu Alattab, Educational data mining: A case study of teacher's classroom questions, 13th International Conference on Intelligent Systems Design and Applications (ISDA), UPM, Selangor, Malaysia, 2013.
 D. Koller and M. Sahami. Hierarchically classifying documents using very few words. In International Conference on Machine Learning (ICML’97), pages 170–178, Nashville, 1997.
 A. S. Weigend, E. D. Wiener, and J. O. Pedersen. Exploiting hierarchy in text categorization. Information Retrieval, 1(3):193–216, 1999.
 G. Tsoumakas, I. Katakis, I. Vlahavas, Mining multi-label data, in: Data Mining and Knowledge Discovery Handbook, Springer, Berlin/Heidelberg, 2010, pp. 667–685.
 Jesse Read, Peter Reutemann, Bernhard Pfahringer, and Geoff Holmes. MEKA: A multi-label/multi-target extension to Weka. Journal of Machine Learning Research, 17(21):1–5, 2016. URL http://jmlr.org/papers/v17/12-164.html.
 I. H. Witten, ans E. Frank, ―Data Mining: Practical Machine Learning tools and techniques, Morgan Kaufmann, 2005.
 J. Read, B. Pfahringer and G. Holmes, ―Multi-label classification using ensembles of pruned sets‖, Proc 8th IEEE International Conference on Data Mining, Pisa, Italy, pages 995-1000. IEEE Computer Society, 2008.
 N. Abbadeni, A. Ghoneim, and A. Alghamdi, Program Educational Objectives Definition and Assessment for Quality and Accreditation, International Journal of Engineering Pedagogy, 3 (3):33-46, 2013.
 J. W. Tukey, Exploratory Data Analysis 1st Edition, Addison-Wesley, 1977.
 D. Cielen, A. D. B. Meysman, M. Ali, Using Python Tools, Introducing Data Science, Big Data, Machine Learning, and More, Manning Publication Co., Shelter Island, NY 11964, 2016.
 J. M. Stanton, J. M., Introduction to Data Science, Third Edition. iTunes Open Source eBook. Available: https://itunes.apple.com/us/book/introduction-to-data-science/id529088127?mt=11, 2012.
 S. Bird, E. Klein and E. Loper, Natural Language Processing with Python, O’Reilly Media, 2009.