Open Science Research Excellence
%0 Journal Article
%A Bingchun Liu and  Pei-Chann Chang and  Natasha Huang and  Dun Li
%D 2018 
%J  International Journal of Computer, Electrical, Automation, Control and Information Engineering
%B World Academy of Science, Engineering and Technology
%I International Science Index 144, 2018
%T Multi-Level Air Quality Classification in China Using Information Gain and Support Vector Machine
%U http://waset.org/publications/10009891
%V 144
%X Machine Learning and Data Mining are the two important tools for extracting useful information and knowledge from large datasets. In machine learning, classification is a wildly used technique to predict qualitative variables and is generally preferred over regression from an operational point of view. Due to the enormous increase in air pollution in various countries especially China, Air Quality Classification has become one of the most important topics in air quality research and modelling. This study aims at introducing a hybrid classification model based on information theory and Support Vector Machine (SVM) using the air quality data of four cities in China namely Beijing, Guangzhou, Shanghai and Tianjin from Jan 1, 2014 to April 30, 2016. China's Ministry of Environmental Protection has classified the daily air quality into 6 levels namely Serious Pollution, Severe Pollution, Moderate Pollution, Light Pollution, Good and Excellent based on their respective Air Quality Index (AQI) values. Using the information theory, information gain (IG) is calculated and feature selection is done for both categorical features and continuous numeric features. Then SVM Machine Learning algorithm is implemented on the selected features with cross-validation. The final evaluation reveals that the IG and SVM hybrid model performs better than SVM (alone), Artificial Neural Network (ANN) and K-Nearest Neighbours (KNN) models in terms of accuracy as well as complexity.

%P 1092 - 1101