A Fuzzy-Rough Feature Selection Based on Binary Shuffled Frog Leaping Algorithm
Feature selection and attribute reduction are crucial
problems, and widely used techniques in the field of machine
learning, data mining and pattern recognition to overcome the
well-known phenomenon of the Curse of Dimensionality. This paper
presents a feature selection method that efficiently carries out attribute
reduction, thereby selecting the most informative features of a dataset.
It consists of two components: 1) a measure for feature subset
evaluation, and 2) a search strategy. For the evaluation measure,
we have employed the fuzzy-rough dependency degree (FRFDD)
of the lower approximation-based fuzzy-rough feature selection
(L-FRFS) due to its effectiveness in feature selection. As for the
search strategy, a modified version of a binary shuffled frog leaping
algorithm is proposed (B-SFLA). The proposed feature selection
method is obtained by hybridizing the B-SFLA with the FRDD. Nine
classifiers have been employed to compare the proposed approach
with several existing methods over twenty two datasets, including
nine high dimensional and large ones, from the UCI repository.
The experimental results demonstrate that the B-SFLA approach
significantly outperforms other metaheuristic methods in terms of the
number of selected features and the classification accuracy.
Application of Granular Computing Paradigm in Knowledge Induction
This paper illustrates an application of granular computing approach, namely rough set theory in data mining. The paper outlines the formalism of granular computing and elucidates the mathematical underpinning of rough set theory, which has been widely used by the data mining and the machine learning community. A real-world application is illustrated, and the classification performance is compared with other contending machine learning algorithms. The predictive performance of the rough set rule induction model shows comparative success with respect to other contending algorithms.
Generalized Rough Sets Applied to Graphs Related to Urban Problems
Branch of modern mathematics, graphs represent instruments
for optimization and solving practical applications in
various fields such as economic networks, engineering, network optimization,
the geometry of social action, generally, complex systems
including contemporary urban problems (path or transport efficiencies,
biourbanism, & c.). In this paper is studied the interconnection
of some urban network, which can lead to a simulation problem of a
digraph through another digraph. The simulation is made univoc or
more general multivoc. The concepts of fragment and atom are very
useful in the study of connectivity in the digraph that is simulation
- including an alternative evaluation of k- connectivity. Rough set
approach in (bi)digraph which is proposed in premier in this paper
contribute to improved significantly the evaluation of k-connectivity.
This rough set approach is based on generalized rough sets - basic
facts are presented in this paper.
Properties and Approximation Distribution Reductions in Multigranulation Rough Set Model
Some properties of approximation sets are studied in multi-granulation optimist model in rough set theory using maximal compatible classes. The relationships between or among lower and upper approximations in single and multiple granulation are compared and discussed. Through designing Boolean functions and discernibility matrices in incomplete information systems, the lower and upper approximation sets and reduction in multi-granulation environments can be found. By using examples, the correctness of computation approach is consolidated. The related conclusions obtained are suitable for further investigating in multiple granulation RSM.
Invariant Characters of Tolerance Class and Reduction under Homomorphism in IIS
Some invariant properties of incomplete information systems homomorphism are studied in this paper. Demand conditions of tolerance class, attribute reduction, indispensable attribute and dispensable attribute being invariant under homomorphism in incomplete information system are revealed and discussed. The existing condition of endohomomorphism on an incomplete information system is also explored. It establishes some theoretical foundations for further investigations on incomplete information systems in rough set theory, like in information systems.
Studies on Properties of Knowledge Dependency and Reduction Algorithm in Tolerance Rough Set Model
Relation between tolerance class and indispensable attribute and knowledge dependency in rough set model with tolerance relation is explored. After giving definitions and concepts of knowledge dependency and knowledge dependency degree for incomplete information system in tolerance rough set model by distinguishing decision attribute containing missing attribute value or not, the result of maintaining reflectivity, transitivity, augmentation, decomposition law and merge law for complete knowledge dependency is proved. Knowledge dependency degrees (not complete knowledge dependency degrees) only satisfy some laws after transitivity, augmentation and decomposition operations. An algorithm to solve attribute reduction in an incomplete decision table is designed. The correctness is checked by an example.
An Improved Limited Tolerance Rough Set Model
Some extended rough set models in incomplete information system cannot distinguish the two objects that have few known attributes and more unknown attributes; some cannot make a flexible and accurate discrimination. In order to solve this problem, this paper suggests an improved limited tolerance rough set model using two thresholds to control what two objects have a relationship between them in limited tolerance relation and to classify objects. Our practical study case shows the model can get fine and reasonable decision results.
Cost Sensitive Feature Selection in Decision-Theoretic Rough Set Models for Customer Churn Prediction: The Case of Telecommunication Sector Customers
In recent days, there is a change and the ongoing development of the telecommunications sector in the global market. In this sector, churn analysis techniques are commonly used for analysing why some customers terminate their service subscriptions prematurely. In addition, customer churn is utmost significant in this sector since it causes to important business loss. Many companies make various researches in order to prevent losses while increasing customer loyalty. Although a large quantity of accumulated data is available in this sector, their usefulness is limited by data quality and relevance. In this paper, a cost-sensitive feature selection framework is developed aiming to obtain the feature reducts to predict customer churn. The framework is a cost based optional pre-processing stage to remove redundant features for churn management. In addition, this cost-based feature selection algorithm is applied in a telecommunication company in Turkey and the results obtained with this algorithm.
Fuzzy Population-Based Meta-Heuristic Approaches for Attribute Reduction in Rough Set Theory
One of the global combinatorial optimization
problems in machine learning is feature selection. It concerned with
removing the irrelevant, noisy, and redundant data, along with
keeping the original meaning of the original data. Attribute reduction
in rough set theory is an important feature selection method. Since
attribute reduction is an NP-hard problem, it is necessary to
investigate fast and effective approximate algorithms. In this paper,
we proposed two feature selection mechanisms based on memetic
algorithms (MAs) which combine the genetic algorithm with a fuzzy
record to record travel algorithm and a fuzzy controlled great deluge
algorithm, to identify a good balance between local search and
genetic search. In order to verify the proposed approaches, numerical
experiments are carried out on thirteen datasets. The results show that
the MAs approaches are efficient in solving attribute reduction
problems when compared with other meta-heuristic approaches.
An Improved Variable Tolerance RSM with a Proportion Threshold
In rough set models, tolerance relation, similarity
relation and limited tolerance relation solve different situation
problems for incomplete information systems in which there exists a
phenomenon of missing value. If two objects have the same few
known attributes and more unknown attributes, they cannot
distinguish them well. In order to solve this problem, we presented two
improved limited and variable precision rough set models. One is
symmetric, the other one is non-symmetric. They all use more
stringent condition to separate two small probability equivalent objects
into different classes. The two models are needed to engage further
study in detail. In the present paper, we newly form object classes with
a different respect comparing to the first suggested model. We
overcome disadvantages of non-symmetry regarding to the second
suggested model. We discuss relationships between or among several
models and also make rule generation. The obtained results by
applying the second model are more accurate and reasonable.
Pruning Algorithm for the Minimum Rule Reduct Generation
In this paper we consider the rule reduct generation
problem. Rule Reduct Generation (RG) and Modified Rule
Generation (MRG) algorithms, that are used to solve this problem,
are well-known. Alternative to these algorithms, we develop Pruning
Rule Generation (PRG) algorithm. We compare the PRG algorithm
with RG and MRG.
Data Mining to Capture User-Experience: A Case Study in Notebook Product Appearance Design
In the era of rapidly increasing notebook market, consumer electronics manufacturers are facing a highly dynamic and competitive environment. In particular, the product appearance is the first part for user to distinguish the product from the product of other brands. Notebook product should differ in its appearance to engage users and contribute to the user experience (UX). The UX evaluates various product concepts to find the design for user needs; in addition, help the designer to further understand the product appearance preference of different market segment. However, few studies have been done for exploring the relationship between consumer background and the reaction of product appearance. This study aims to propose a data mining framework to capture the user’s information and the important relation between product appearance factors. The proposed framework consists of problem definition and structuring, data preparation, rules generation, and results evaluation and interpretation. An empirical study has been done in Taiwan that recruited 168 subjects from different background to experience the appearance performance of 11 different portable computers. The results assist the designers to develop product strategies based on the characteristics of consumers and the product concept that related to the UX, which help to launch the products to the right customers and increase the market shares. The results have shown the practical feasibility of the proposed framework.
Some Properties of IF Rough Relational Algebraic Operators in Medical Databases
Some properties of Intuitionistic Fuzzy (IF) rough relational algebraic operators under an IF rough relational data model are investigated and illustrated using diabetes and heart disease databases. These properties are important and desirable for processing queries in an effective and efficient manner.
Rough Neural Networks in Adapting Cellular Automata Rule for Reducing Image Noise
The reduction or removal of noise in a color image is an essential part of image processing, whether the final information is used for human perception or for an automatic inspection and analysis. This paper describes the modeling system based on the rough neural network model to adaptive cellular automata for various image processing tasks and noise remover. In this paper, we consider the problem of object processing in colored image using rough neural networks to help deriving the rules which will be used in cellular automata for noise image. The proposed method is compared with some classical and recent methods. The results demonstrate that the new model is capable of being trained to perform many different tasks, and that the quality of these results is comparable or better than established specialized algorithms.
Dynamic Features Selection for Heart Disease Classification
The healthcare environment is generally perceived as
being information rich yet knowledge poor. However, there is a lack
of effective analysis tools to discover hidden relationships and trends
in data. In fact, valuable knowledge can be discovered from
application of data mining techniques in healthcare system. In this
study, a proficient methodology for the extraction of significant
patterns from the Coronary Heart Disease warehouses for heart
attack prediction, which unfortunately continues to be a leading cause
of mortality in the whole world, has been presented. For this purpose,
we propose to enumerate dynamically the optimal subsets of the
reduced features of high interest by using rough sets technique
associated to dynamic programming. Therefore, we propose to
validate the classification using Random Forest (RF) decision tree to
identify the risky heart disease cases. This work is based on a large
amount of data collected from several clinical institutions based on
the medical profile of patient. Moreover, the experts- knowledge in
this field has been taken into consideration in order to define the
disease, its risk factors, and to establish significant knowledge
relationships among the medical factors. A computer-aided system is
developed for this purpose based on a population of 525 adults. The
performance of the proposed model is analyzed and evaluated based
on set of benchmark techniques applied in this classification problem.
A New Hybrid K-Mean-Quick Reduct Algorithm for Gene Selection
Feature selection is a process to select features which are more informative. It is one of the important steps in knowledge discovery. The problem is that all genes are not important in gene expression data. Some of the genes may be redundant, and others may be irrelevant and noisy. Here a novel approach is proposed Hybrid K-Mean-Quick Reduct (KMQR) algorithm for gene selection from gene expression data. In this study, the entire dataset is divided into clusters by applying K-Means algorithm. Each cluster contains similar genes. The high class discriminated genes has been selected based on their degree of dependence by applying Quick Reduct algorithm to all the clusters. Average Correlation Value (ACV) is calculated for the high class discriminated genes. The clusters which have the ACV value as 1 is determined as significant clusters, whose classification accuracy will be equal or high when comparing to the accuracy of the entire dataset. The proposed algorithm is evaluated using WEKA classifiers and compared. The proposed work shows that the high classification accuracy.
Some Solid Transportation Models with Crisp and Rough Costs
In this paper, some practical solid transportation models are formulated considering per trip capacity of each type of conveyances with crisp and rough unit transportation costs. This is applicable for the system in which full vehicles, e.g. trucks, rail coaches are to be booked for transportation of products so that transportation cost is determined on the full of the conveyances. The models with unit transportation costs as rough variables are transformed into deterministic forms using rough chance constrained programming with the help of trust measure. Numerical examples are provided to illustrate the proposed models in crisp environment as well as with unit transportation costs as rough variables.
Analyzing Periurban Fringe with Rough Set
The distinction among urban, periurban and rural areas represents a classical example of uncertainty in land classification. Satellite images, geostatistical analysis and all kinds of spatial data are very useful in urban sprawl studies, but it is important to define precise rules in combining great amounts of data to build complex knowledge about territory. Rough Set theory may be a useful method to employ in this field. It represents a different mathematical approach to uncertainty by capturing the indiscernibility. Two different phenomena can be indiscernible in some contexts and classified in the same way when combining available information about them. This approach has been applied in a case of study, comparing the results achieved with both Map Algebra technique and Spatial Rough Set. The study case area, Potenza Province, is particularly suitable for the application of this theory, because it includes 100 municipalities with different number of inhabitants and morphologic features.
The Lower and Upper Approximations in a Group
In this paper, we generalize some propositions in [C.Z. Wang, D.G. Chen, A short note on some properties of rough groups, Comput. Math. Appl. 59(2010)431-436.] and we give some equivalent conditions for rough subgroups. The notion of minimal upper rough subgroups is introduced and a equivalent characterization is given, which implies the rough version of Lagranges Theorem.
Reasoning With Non-Binary Logics
Students in high education are presented with new terms and concepts in nearly every lecture they attend. Many of them prefer Web-based self-tests for evaluation of their concepts understanding since they can use those tests independently of tutors- working hours and thus avoid the necessity of being in a particular place at a particular time. There is a large number of multiple-choice tests in almost every subject designed to contribute to higher level learning or discover misconceptions. Every single test provides immediate feedback to a student about the outcome of that test. In some cases a supporting system displays an overall score in case a test is taken several times by a student. What we still find missing is how to secure delivering of personalized feedback to a user while taking into consideration the user-s progress. The present work is motivated to throw some light on that question.
Rough Set Based Intelligent Welding Quality Classification
The knowledge base of welding defect recognition is
essentially incomplete. This characteristic determines that the recognition results do not reflect the actual situation. It also has a further influence on the classification of welding quality. This paper is
concerned with the study of a rough set based method to reduce the influence and improve the classification accuracy. At first, a rough set
model of welding quality intelligent classification has been built. Both condition and decision attributes have been specified. Later on, groups
of the representative multiple compound defects have been chosen
from the defect library and then classified correctly to form the
decision table. Finally, the redundant information of the decision table has been reducted and the optimal decision rules have been reached. By this method, we are able to reclassify the misclassified defects to
the right quality level. Compared with the ordinary ones, this method
has higher accuracy and better robustness.
Agent Decision using Granular Computing in Traffic System
In recent years multi-agent systems have emerged as one of the interesting architectures facilitating distributed collaboration and distributed problem solving. Each node (agent) of the network might pursue its own agenda, exploit its environment, develop its own problem solving strategy and establish required communication strategies. Within each node of the network, one could encounter a diversity of problem-solving approaches. Quite commonly the agents can realize their processing at the level of information granules that is the most suitable from their local points of view. Information granules can come at various levels of granularity. Each agent could exploit a certain formalism of information granulation engaging a machinery of fuzzy sets, interval analysis, rough sets, just to name a few dominant technologies of granular computing. Having this in mind, arises a fundamental issue of forming effective interaction linkages between the agents so that they fully broadcast their findings and benefit from interacting with others.
Covering-based Rough sets Based on the Refinement of Covering-element
Covering-based rough sets is an extension of rough
sets and it is based on a covering instead of a partition of the
universe. Therefore it is more powerful in describing some practical
problems than rough sets. However, by extending the rough sets,
covering-based rough sets can increase the roughness of each model
in recognizing objects. How to obtain better approximations from
the models of a covering-based rough sets is an important issue.
In this paper, two concepts, determinate elements and indeterminate
elements in a universe, are proposed and given precise definitions
respectively. This research makes a reasonable refinement of the
covering-element from a new viewpoint. And the refinement may
generate better approximations of covering-based rough sets models.
To prove the theory above, it is applied to eight major coveringbased
rough sets models which are adapted from other literature.
The result is, in all these models, the lower approximation increases
effectively. Correspondingly, in all models, the upper approximation
decreases with exceptions of two models in some special situations.
Therefore, the roughness of recognizing objects is reduced. This
research provides a new approach to the study and application of
covering-based rough sets.
A Rough-set Based Approach to Design an Expert System for Personnel Selection
Effective employee selection is a critical component
of a successful organization. Many important criteria for personnel
selection such as decision-making ability, adaptability, ambition, and
self-organization are naturally vague and imprecise to evaluate. The
rough sets theory (RST) as a new mathematical approach to
vagueness and uncertainty is a very well suited tool to deal with
qualitative data and various decision problems. This paper provides
conceptual, descriptive, and simulation results, concentrating chiefly
on human resources and personnel selection factors. The current
research derives certain decision rules which are able to facilitate
personnel selection and identifies several significant features based
on an empirical study conducted in an IT company in Iran.
Heterogeneous Attribute Reduction in Noisy System based on a Generalized Neighborhood Rough Sets Model
Neighborhood Rough Sets (NRS) has been proven to
be an efficient tool for heterogeneous attribute reduction. However,
most of researches are focused on dealing with complete and noiseless
data. Factually, most of the information systems are noisy, namely,
filled with incomplete data and inconsistent data. In this paper, we
introduce a generalized neighborhood rough sets model, called
VPTNRS, to deal with the problem of heterogeneous attribute
reduction in noisy system. We generalize classical NRS model with
tolerance neighborhood relation and the probabilistic theory.
Furthermore, we use the neighborhood dependency to evaluate the
significance of a subset of heterogeneous attributes and construct a
forward greedy algorithm for attribute reduction based on it.
Experimental results show that the model is efficient to deal with noisy
Relation between Significance of Attribute Set and Single Attribute
In the research field of Rough Set, few papers concern the significance of attribute set. However, there is important relation between the significance of single attribute and that of attribute set, which should not be ignored. In this paper, we draw conclusions by case analysis that (1) the attribute set including single attributes with high significance is certainly significant, while, (2)the attribute set which consists of single attributes with low significance possibly has high significance. We validate the conclusions on discernibility matrix and the results demonstrate the contribution of our conclusions.
A Medical Images Based Retrieval System using Soft Computing Techniques
Content-Based Image Retrieval (CBIR) has been
one on the most vivid research areas in the field of computer vision
over the last 10 years. Many programs and tools have been
developed to formulate and execute queries based on the visual or
audio content and to help browsing large multimedia repositories.
Still, no general breakthrough has been achieved with respect to
large varied databases with documents of difering sorts and with
varying characteristics. Answers to many questions with respect to
speed, semantic descriptors or objective image interpretations are
still unanswered. In the medical field, images, and especially
digital images, are produced in ever increasing quantities and used
for diagnostics and therapy. In several articles, content based
access to medical images for supporting clinical decision making
has been proposed that would ease the management of clinical data
and scenarios for the integration of content-based access methods
into Picture Archiving and Communication Systems (PACS) have
been created. This paper gives an overview of soft computing
techniques. New research directions are being defined that can
prove to be useful. Still, there are very few systems that seem to be
used in clinical practice. It needs to be stated as well that the goal
is not, in general, to replace text based retrieval methods as they
exist at the moment.
Soft Computing based Retrieval System for Medical Applications
With increasing data in medical databases, medical
data retrieval is growing in popularity. Some of this analysis
including inducing propositional rules from databases using many
soft techniques, and then using these rules in an expert system.
Diagnostic rules and information on features are extracted from
clinical databases on diseases of congenital anomaly. This paper
explain the latest soft computing techniques and some of the
adaptive techniques encompasses an extensive group of methods
that have been applied in the medical domain and that are used for
the discovery of data dependencies, importance of features,
patterns in sample data, and feature space dimensionality
reduction. These approaches pave the way for new and interesting
avenues of research in medical imaging and represent an important
challenge for researchers.
Some Separations in Covering Approximation Spaces
Adopting Zakowski-s upper approximation operator
C and lower approximation operator C, this paper investigates
granularity-wise separations in covering approximation spaces. Some
characterizations of granularity-wise separations are obtained by
means of Pawlak rough sets and some relations among granularitywise
separations are established, which makes it possible to research
covering approximation spaces by logical methods and mathematical
methods in computer science. Results of this paper give further
applications of Pawlak rough set theory in pattern recognition and
An Efficient and Generic Hybrid Framework for High Dimensional Data Clustering
Clustering in high dimensional space is a difficult
problem which is recurrent in many fields of science and
engineering, e.g., bioinformatics, image processing, pattern
reorganization and data mining. In high dimensional space some of
the dimensions are likely to be irrelevant, thus hiding the possible
clustering. In very high dimensions it is common for all the objects in
a dataset to be nearly equidistant from each other, completely
masking the clusters. Hence, performance of the clustering algorithm
In this paper, we propose an algorithmic framework which
combines the (reduct) concept of rough set theory with the k-means
algorithm to remove the irrelevant dimensions in a high dimensional
space and obtain appropriate clusters. Our experiment on test data
shows that this framework increases efficiency of the clustering
process and accuracy of the results.