Excellence in Research and Innovation for Humanity

Ping Ji

Publications

4

Publications

4
2401
A Text Clustering System based on k-means Type Subspace Clustering and Ontology
Abstract:

This paper presents a text clustering system developed based on a k-means type subspace clustering algorithm to cluster large, high dimensional and sparse text data. In this algorithm, a new step is added in the k-means clustering process to automatically calculate the weights of keywords in each cluster so that the important words of a cluster can be identified by the weight values. For understanding and interpretation of clustering results, a few keywords that can best represent the semantic topic are extracted from each cluster. Two methods are used to extract the representative words. The candidate words are first selected according to their weights calculated by our new algorithm. Then, the candidates are fed to the WordNet to identify the set of noun words and consolidate the synonymy and hyponymy words. Experimental results have shown that the clustering algorithm is superior to the other subspace clustering algorithms, such as PROCLUS and HARP and kmeans type algorithm, e.g., Bisecting-KMeans. Furthermore, the word extraction method is effective in selection of the words to represent the topics of the clusters.

Keywords:
Subspace Clustering, Text Mining, Feature Weighting, Cluster Interpretation, Ontology
3
2565
Microstructure Changes of Machined Surfaceson Austenitic 304 Stainless Steel
Abstract:
This paper presents a experiment to estimate the influences of cutting conditions in microstructure changes of machining austenitic 304 stainless steel, especially for wear insert. The wear insert were prefabricated with a width of 0.5 mm. And the forces, temperature distribution, RS, and microstructure changes were measured by force dynamometer, infrared thermal camera, X-ray diffraction, XRD, SEM, respectively. The results told that the different combinations of machining condition have a significant influence on machined surface microstructure changes. In addition to that, the ANOVA and AOMwere used to tell the different influences of cutting speed, feed rate, and wear insert.
Keywords:
Microstructure Changes, Wear width, Stainless steel
2
8342
Solving the Quadratic Assignment Problems by a Genetic Algorithm with a New Replacement Strategy
Abstract:
This paper proposes a genetic algorithm based on a new replacement strategy to solve the quadratic assignment problems, which are NP-hard. The new replacement strategy aims to improve the performance of the genetic algorithm through well balancing the convergence of the searching process and the diversity of the population. In order to test the performance of the algorithm, the instances in QAPLIB, a quadratic assignment problem library, are tried and the results are compared with those reported in the literature. The performance of the genetic algorithm is promising. The significance is that this genetic algorithm is generic. It does not rely on problem-specific genetic operators, and may be easily applied to various types of combinatorial problems.
Keywords:
Quadratic assignment problem, Genetic algorithm, Replacement strategy, QAPLIB.
1
8670
Novel Hybrid Method for Gene Selection and Cancer Prediction
Abstract:
Microarray data profiles gene expression on a whole genome scale, therefore, it provides a good way to study associations between gene expression and occurrence or progression of cancer. More and more researchers realized that microarray data is helpful to predict cancer sample. However, the high dimension of gene expressions is much larger than the sample size, which makes this task very difficult. Therefore, how to identify the significant genes causing cancer becomes emergency and also a hot and hard research topic. Many feature selection algorithms have been proposed in the past focusing on improving cancer predictive accuracy at the expense of ignoring the correlations between the features. In this work, a novel framework (named by SGS) is presented for stable gene selection and efficient cancer prediction . The proposed framework first performs clustering algorithm to find the gene groups where genes in each group have higher correlation coefficient, and then selects the significant genes in each group with Bayesian Lasso and important gene groups with group Lasso, and finally builds prediction model based on the shrinkage gene space with efficient classification algorithm (such as, SVM, 1NN, Regression and etc.). Experiment results on real world data show that the proposed framework often outperforms the existing feature selection and prediction methods, say SAM, IG and Lasso-type prediction model.
Keywords:
Gene Selection, Cancer Prediction, Lasso, Clustering,Classification.