Open Science Research Excellence

Mona Soliman Habib

Publications

1

Publications

1
12114
Addressing Scalability Issues of Named Entity Recognition Using Multi-Class Support Vector Machines
Abstract:
This paper explores the scalability issues associated with solving the Named Entity Recognition (NER) problem using Support Vector Machines (SVM) and high-dimensional features. The performance results of a set of experiments conducted using binary and multi-class SVM with increasing training data sizes are examined. The NER domain chosen for these experiments is the biomedical publications domain, especially selected due to its importance and inherent challenges. A simple machine learning approach is used that eliminates prior language knowledge such as part-of-speech or noun phrase tagging thereby allowing for its applicability across languages. No domain-specific knowledge is included. The accuracy measures achieved are comparable to those obtained using more complex approaches, which constitutes a motivation to investigate ways to improve the scalability of multiclass SVM in order to make the solution more practical and useable. Improving training time of multi-class SVM would make support vector machines a more viable and practical machine learning solution for real-world problems with large datasets. An initial prototype results in great improvement of the training time at the expense of memory requirements.
Keywords:
Named entity recognition, support vector machines,language independence, bioinformatics.