Open Science Research Excellence

Open Science Index

Commenced in January 2007 Frequency: Monthly Edition: International Paper Count: 22

22
10010298
Analyzing Environmental Emotive Triggers in Terrorist Propaganda
Authors:
Abstract:

The purpose of this study is to measure the intersection of environmental security entities in terrorist propaganda. To the best of author’s knowledge, this is the first study of its kind to examine this intersection within terrorist propaganda. Rosoka, natural language processing software and frame analysis are used to advance our understanding of how environmental frames function as emotive triggers. Violent jihadi demagogues use frames to suggest violent and non-violent solutions to their grievances. Emotive triggers are framed in a way to leverage individual and collective attitudes in psychological warfare. A comparative research design is used because of the differences and similarities that exist between two variants of violent jihadi propaganda that target western audiences. Analysis is based on salience and network text analysis, which generates violent jihadi semantic networks. Findings indicate that environmental frames are used as emotive triggers across both data sets, but also as tactical and information data points. A significant finding is that certain core environmental emotive triggers like “water,” “soil,” and “trees” are significantly salient at the aggregate level across both data sets. All environmental entities can be classified into two categories, symbolic and literal. Importantly, this research illustrates how demagogues use environmental emotive triggers in cyber space from a subcultural perspective to mobilize target audiences to their ideology and praxis. Understanding the anatomy of propaganda construction is necessary in order to generate effective counter narratives in information operations. This research advances an additional method to inform practitioners and policy makers of how environmental security and propaganda intersect.

21
10009602
Composite Kernels for Public Emotion Recognition from Twitter
Abstract:

The Internet has grown into a powerful medium for information dispersion and social interaction that leads to a rapid growth of social media which allows users to easily post their emotions and perspectives regarding certain topics online. Our research aims at using natural language processing and text mining techniques to explore the public emotions expressed on Twitter by analyzing the sentiment behind tweets. In this paper, we propose a composite kernel method that integrates tree kernel with the linear kernel to simultaneously exploit both the tree representation and the distributed emotion keyword representation to analyze the syntactic and content information in tweets. The experiment results demonstrate that our method can effectively detect public emotion of tweets while outperforming the other compared methods.

20
10009666
Emotional Analysis for Text Search Queries on Internet
Abstract:
The goal of this study is to analyze if search queries carried out in search engines such as Google, can offer emotional information about the user that performs them. Knowing the emotional state in which the Internet user is located can be a key to achieve the maximum personalization of content and the detection of worrying behaviors. For this, two studies were carried out using tools with advanced natural language processing techniques. The first study determines if a query can be classified as positive, negative or neutral, while the second study extracts emotional content from words and applies the categorical and dimensional models for the representation of emotions. In addition, we use search queries in Spanish and English to establish similarities and differences between two languages. The results revealed that text search queries performed by users on the Internet can be classified emotionally. This allows us to better understand the emotional state of the user at the time of the search, which could involve adapting the technology and personalizing the responses to different emotional states.
19
10009490
Q-Map: Clinical Concept Mining from Clinical Documents
Abstract:
Over the past decade, there has been a steep rise in the data-driven analysis in major areas of medicine, such as clinical decision support system, survival analysis, patient similarity analysis, image analytics etc. Most of the data in the field are well-structured and available in numerical or categorical formats which can be used for experiments directly. But on the opposite end of the spectrum, there exists a wide expanse of data that is intractable for direct analysis owing to its unstructured nature which can be found in the form of discharge summaries, clinical notes, procedural notes which are in human written narrative format and neither have any relational model nor any standard grammatical structure. An important step in the utilization of these texts for such studies is to transform and process the data to retrieve structured information from the haystack of irrelevant data using information retrieval and data mining techniques. To address this problem, the authors present Q-Map in this paper, which is a simple yet robust system that can sift through massive datasets with unregulated formats to retrieve structured information aggressively and efficiently. It is backed by an effective mining technique which is based on a string matching algorithm that is indexed on curated knowledge sources, that is both fast and configurable. The authors also briefly examine its comparative performance with MetaMap, one of the most reputed tools for medical concepts retrieval and present the advantages the former displays over the latter.
18
10009580
Methodology for Developing an Intelligent Tutoring System Based on Marzano’s Taxonomy
Abstract:

The Mexican educational system faces diverse challenges related with the quality and coverage of education. The development of Intelligent Tutoring Systems (ITS) may help to solve some of them by helping teachers to customize their classes according to the performance of the students in online courses. In this work, we propose the adaptation of a functional ITS based on Bloom’s taxonomy called Sistema de Apoyo Generalizado para la Enseñanza Individualizada (SAGE), to measure student’s metacognition and their emotional response based on Marzano’s taxonomy. The students and the system will share the control over the advance in the course, so they can improve their metacognitive skills. The system will not allow students to get access to subjects not mastered yet. The interaction between the system and the student will be implemented through Natural Language Processing techniques, thus avoiding the use of sensors to evaluate student’s response. The teacher will evaluate student’s knowledge utilization, which is equivalent to the last cognitive level in Marzano’s taxonomy.

17
10009084
Adaption Model for Building Agile Pronunciation Dictionaries Using Phonemic Distance Measurements
Abstract:
Where human beings can easily learn and adopt pronunciation variations, machines need training before put into use. Also humans keep minimum vocabulary and their pronunciation variations are stored in front-end of their memory for ready reference, while machines keep the entire pronunciation dictionary for ready reference. Supervised methods are used for preparation of pronunciation dictionaries which take large amounts of manual effort, cost, time and are not suitable for real time use. This paper presents an unsupervised adaptation model for building agile and dynamic pronunciation dictionaries online. These methods mimic human approach in learning the new pronunciations in real time. A new algorithm for measuring sound distances called Dynamic Phone Warping is presented and tested. Performance of the system is measured using an adaptation model and the precision metrics is found to be better than 86 percent.
16
10007217
Causal Relation Identification Using Convolutional Neural Networks and Knowledge Based Features
Abstract:

Causal relation identification is a crucial task in information extraction and knowledge discovery. In this work, we present two approaches to causal relation identification. The first is a classification model trained on a set of knowledge-based features. The second is a deep learning based approach training a model using convolutional neural networks to classify causal relations. We experiment with several different convolutional neural networks (CNN) models based on previous work on relation extraction as well as our own research. Our models are able to identify both explicit and implicit causal relations as well as the direction of the causal relation. The results of our experiments show a higher accuracy than previously achieved for causal relation identification tasks.

15
10006246
Part of Speech Tagging Using Statistical Approach for Nepali Text
Authors:
Abstract:
Part of Speech Tagging has always been a challenging task in the era of Natural Language Processing. This article presents POS tagging for Nepali text using Hidden Markov Model and Viterbi algorithm. From the Nepali text, annotated corpus training and testing data set are randomly separated. Both methods are employed on the data sets. Viterbi algorithm is found to be computationally faster and accurate as compared to HMM. The accuracy of 95.43% is achieved using Viterbi algorithm. Error analysis where the mismatches took place is elaborately discussed.
14
10005939
Context Detection in Spreadsheets Based on Automatically Inferred Table Schema
Abstract:
Programming requires years of training. With natural language and end user development methods, programming could become available to everyone. It enables end users to program their own devices and extend the functionality of the existing system without any knowledge of programming languages. In this paper, we describe an Interactive Spreadsheet Processing Module (ISPM), a natural language interface to spreadsheets that allows users to address ranges within the spreadsheet based on inferred table schema. Using the ISPM, end users are able to search for values in the schema of the table and to address the data in spreadsheets implicitly. Furthermore, it enables them to select and sort the spreadsheet data by using natural language. ISPM uses a machine learning technique to automatically infer areas within a spreadsheet, including different kinds of headers and data ranges. Since ranges can be identified from natural language queries, the end users can query the data using natural language. During the evaluation 12 undergraduate students were asked to perform operations (sum, sort, group and select) using the system and also Excel without ISPM interface, and the time taken for task completion was compared across the two systems. Only for the selection task did users take less time in Excel (since they directly selected the cells using the mouse) than in ISPM, by using natural language for end user software engineering, to overcome the present bottleneck of professional developers.
13
2501
Sounds Alike Name Matching for Myanmar Language
Abstract:
Personal name matching system is the core of essential task in national citizen database, text and web mining, information retrieval, online library system, e-commerce and record linkage system. It has necessitated to the all embracing research in the vicinity of name matching. Traditional name matching methods are suitable for English and other Latin based language. Asian languages which have no word boundary such as Myanmar language still requires sounds alike matching system in Unicode based application. Hence we proposed matching algorithm to get analogous sounds alike (phonetic) pattern that is convenient for Myanmar character spelling. According to the nature of Myanmar character, we consider for word boundary fragmentation, collation of character. Thus we use pattern conversion algorithm which fabricates words in pattern with fragmented and collated. We create the Myanmar sounds alike phonetic group to help in the phonetic matching. The experimental results show that fragmentation accuracy in 99.32% and processing time in 1.72 ms.
12
4467
Performance Analysis of MT Evaluation Measures and Test Suites
Abstract:
Many measures have been proposed for machine translation evaluation (MTE) while little research has been done on the performance of MTE methods. This paper is an effort for MTE performance analysis. A general frame is proposed for the description of the MTE measure and the test suite, including whether the automatic measure is consistent with human evaluation, whether different results from various measures or test suites are consistent, whether the content of the test suite is suitable for performance evaluation, the degree of difficulty of the test suite and its influence on the MTE, the relationship of MTE result significance and the size of the test suite, etc. For a better clarification of the frame, several experiment results are analyzed relating human evaluation, BLEU evaluation, and typological MTE. A visualization method is introduced for better presentation of the results. The study aims for aid in construction of test suite and method selection in MTE practice.
11
10674
Thematic Role Extraction Using Shallow Parsing
Abstract:
Extracting thematic (semantic) roles is one of the major steps in representing text meaning. It refers to finding the semantic relations between a predicate and syntactic constituents in a sentence. In this paper we present a rule-based approach to extract semantic roles from Persian sentences. The system exploits a twophase architecture to (1) identify the arguments and (2) label them for each predicate. For the first phase we developed a rule based shallow parser to chunk Persian sentences and for the second phase we developed a knowledge-based system to assign 16 selected thematic roles to the chunks. The experimental results of testing each phase are shown at the end of the paper.
10
12429
Distributional Semantics Approach to Thai Word Sense Disambiguation
Abstract:

Word sense disambiguation is one of the most important open problems in natural language processing applications such as information retrieval and machine translation. Many approach strategies can be employed to resolve word ambiguity with a reasonable degree of accuracy. These strategies are: knowledgebased, corpus-based, and hybrid-based. This paper pays attention to the corpus-based strategy that employs an unsupervised learning method for disambiguation. We report our investigation of Latent Semantic Indexing (LSI), an information retrieval technique and unsupervised learning, to the task of Thai noun and verbal word sense disambiguation. The Latent Semantic Indexing has been shown to be efficient and effective for Information Retrieval. For the purposes of this research, we report experiments on two Thai polysemous words, namely  /hua4/ and /kep1/ that are used as a representative of Thai nouns and verbs respectively. The results of these experiments demonstrate the effectiveness and indicate the potential of applying vector-based distributional information measures to semantic disambiguation.

9
374
Structural Parsing of Natural Language Text in Tamil Using Phrase Structure Hybrid Language Model
Abstract:
Parsing is important in Linguistics and Natural Language Processing to understand the syntax and semantics of a natural language grammar. Parsing natural language text is challenging because of the problems like ambiguity and inefficiency. Also the interpretation of natural language text depends on context based techniques. A probabilistic component is essential to resolve ambiguity in both syntax and semantics thereby increasing accuracy and efficiency of the parser. Tamil language has some inherent features which are more challenging. In order to obtain the solutions, lexicalized and statistical approach is to be applied in the parsing with the aid of a language model. Statistical models mainly focus on semantics of the language which are suitable for large vocabulary tasks where as structural methods focus on syntax which models small vocabulary tasks. A statistical language model based on Trigram for Tamil language with medium vocabulary of 5000 words has been built. Though statistical parsing gives better performance through tri-gram probabilities and large vocabulary size, it has some disadvantages like focus on semantics rather than syntax, lack of support in free ordering of words and long term relationship. To overcome the disadvantages a structural component is to be incorporated in statistical language models which leads to the implementation of hybrid language models. This paper has attempted to build phrase structured hybrid language model which resolves above mentioned disadvantages. In the development of hybrid language model, new part of speech tag set for Tamil language has been developed with more than 500 tags which have the wider coverage. A phrase structured Treebank has been developed with 326 Tamil sentences which covers more than 5000 words. A hybrid language model has been trained with the phrase structured Treebank using immediate head parsing technique. Lexicalized and statistical parser which employs this hybrid language model and immediate head parsing technique gives better results than pure grammar and trigram based model.
8
1014
Semi-Automatic Analyzer to Detect Authorial Intentions in Scientific Documents
Abstract:
Information Retrieval has the objective of studying models and the realization of systems allowing a user to find the relevant documents adapted to his need of information. The information search is a problem which remains difficult because the difficulty in the representing and to treat the natural languages such as polysemia. Intentional Structures promise to be a new paradigm to extend the existing documents structures and to enhance the different phases of documents process such as creation, editing, search and retrieval. The intention recognition of the author-s of texts can reduce the largeness of this problem. In this article, we present intentions recognition system is based on a semi-automatic method of extraction the intentional information starting from a corpus of text. This system is also able to update the ontology of intentions for the enrichment of the knowledge base containing all possible intentions of a domain. This approach uses the construction of a semi-formal ontology which considered as the conceptualization of the intentional information contained in a text. An experiments on scientific publications in the field of computer science was considered to validate this approach.
7
12430
SMaTTS: Standard Malay Text to Speech System
Abstract:
This paper presents a rule-based text- to- speech (TTS) Synthesis System for Standard Malay, namely SMaTTS. The proposed system using sinusoidal method and some pre- recorded wave files in generating speech for the system. The use of phone database significantly decreases the amount of computer memory space used, thus making the system very light and embeddable. The overall system was comprised of two phases the Natural Language Processing (NLP) that consisted of the high-level processing of text analysis, phonetic analysis, text normalization and morphophonemic module. The module was designed specially for SM to overcome few problems in defining the rules for SM orthography system before it can be passed to the DSP module. The second phase is the Digital Signal Processing (DSP) which operated on the low-level process of the speech waveform generation. A developed an intelligible and adequately natural sounding formant-based speech synthesis system with a light and user-friendly Graphical User Interface (GUI) is introduced. A Standard Malay Language (SM) phoneme set and an inclusive set of phone database have been constructed carefully for this phone-based speech synthesizer. By applying the generative phonology, a comprehensive letter-to-sound (LTS) rules and a pronunciation lexicon have been invented for SMaTTS. As for the evaluation tests, a set of Diagnostic Rhyme Test (DRT) word list was compiled and several experiments have been performed to evaluate the quality of the synthesized speech by analyzing the Mean Opinion Score (MOS) obtained. The overall performance of the system as well as the room for improvements was thoroughly discussed.
6
10259
Word Stemming Algorithms and Retrieval Effectiveness in Malay and Arabic Documents Retrieval Systems
Abstract:
Documents retrieval in Information Retrieval Systems (IRS) is generally about understanding of information in the documents concern. The more the system able to understand the contents of documents the more effective will be the retrieval outcomes. But understanding of the contents is a very complex task. Conventional IRS apply algorithms that can only approximate the meaning of document contents through keywords approach using vector space model. Keywords may be unstemmed or stemmed. When keywords are stemmed and conflated in retrieving process, we are a step forwards in applying semantic technology in IRS. Word stemming is a process in morphological analysis under natural language processing, before syntactic and semantic analysis. We have developed algorithms for Malay and Arabic and incorporated stemming in our experimental systems in order to measure retrieval effectiveness. The results have shown that the retrieval effectiveness has increased when stemming is used in the systems.
5
5867
Determining the Gender of Korean Names for Pronoun Generation
Abstract:
It is an important task in Korean-English machine translation to classify the gender of names correctly. When a sentence is composed of two or more clauses and only one subject is given as a proper noun, it is important to find the gender of the proper noun for correct translation of the sentence. This is because a singular pronoun has a gender in English while it does not in Korean. Thus, in Korean-English machine translation, the gender of a proper noun should be determined. More generally, this task can be expanded into the classification of the general Korean names. This paper proposes a statistical method for this problem. By considering a name as just a sequence of syllables, it is possible to get a statistics for each name from a collection of names. An evaluation of the proposed method yields the improvement in accuracy over the simple looking-up of the collection. While the accuracy of the looking-up method is 64.11%, that of the proposed method is 81.49%. This implies that the proposed method is more plausible for the gender classification of the Korean names.
4
7143
Morpho-Phonological Modelling in Natural Language Processing
Abstract:

In this paper we propose a computational model for the representation and processing of morpho-phonological phenomena in a natural language, like Modern Greek. We aim at a unified treatment of inflection, compounding, and word-internal phonological changes, in a model that is used for both analysis and generation. After discussing certain difficulties cuase by well-known finitestate approaches, such as Koskenniemi-s two-level model [7] when applied to a computational treatment of compounding, we argue that a morphology-based model provides a more adequate account of word-internal phenomena. Contrary to the finite state approaches that cannot handle hierarchical word constituency in a satisfactory way, we propose a unification-based word grammar, as the nucleus of our strategy, which takes into consideration word representations that are based on affixation and [stem stem] or [stem word] compounds. In our formalism, feature-passing operations are formulated with the use of the unification device, and phonological rules modeling the correspondence between lexical and surface forms apply at morpheme boundaries. In the paper, examples from Modern Greek illustrate our approach. Morpheme structures, stress, and morphologically conditioned phoneme changes are analyzed and generated in a principled way.

3
6726
An Enhanced Tool for Implementing Dialogue Forms in Conversational Applications
Abstract:

Natural Language Understanding Systems (NLU) will not be widely deployed unless they are technically mature and cost effective to develop. Cost effective development hinges on the availability of tools and techniques enabling the rapid production of NLU applications through minimal human resources. Further, these tools and techniques should allow quick development of applications in a user friendly way and should be easy to upgrade in order to continuously follow the evolving technologies and standards. This paper presents a visual tool for the structuring and editing of dialog forms, the key element of driving conversation in NLU applications based on IBM technology. The main focus is given on the basic component used to describe Human – Machine interactions of that kind, the Dialogue Manager. In essence, the description of a tool that enables the visual representation of the Dialogue Manager mainly during the implementation phase is illustrated.

2
10979
Greek Compounds: A Challenging Case for the Parsing Techniques of PC-KIMMO v.2
Abstract:

In this paper we describe the recognition process of Greek compound words using the PC-KIMMO software. We try to show certain limitations of the system with respect to the principles of compound formation in Greek. Moreover, we discuss the computational processing of phenomena such as stress and syllabification which are indispensable for the analysis of such constructions and we try to propose linguistically-acceptable solutions within the particular system.

1
4889
PIELG: A Protein Interaction Extraction Systemusing a Link Grammar Parser from Biomedical Abstracts
Abstract:
Due to the ever growing amount of publications about protein-protein interactions, information extraction from text is increasingly recognized as one of crucial technologies in bioinformatics. This paper presents a Protein Interaction Extraction System using a Link Grammar Parser from biomedical abstracts (PIELG). PIELG uses linkage given by the Link Grammar Parser to start a case based analysis of contents of various syntactic roles as well as their linguistically significant and meaningful combinations. The system uses phrasal-prepositional verbs patterns to overcome preposition combinations problems. The recall and precision are 74.4% and 62.65%, respectively. Experimental evaluations with two other state-of-the-art extraction systems indicate that PIELG system achieves better performance. For further evaluation, the system is augmented with a graphical package (Cytoscape) for extracting protein interaction information from sequence databases. The result shows that the performance is remarkably promising.
Vol:13 No:06 2019Vol:13 No:05 2019Vol:13 No:04 2019Vol:13 No:03 2019Vol:13 No:02 2019Vol:13 No:01 2019
Vol:12 No:12 2018Vol:12 No:11 2018Vol:12 No:10 2018Vol:12 No:09 2018Vol:12 No:08 2018Vol:12 No:07 2018Vol:12 No:06 2018Vol:12 No:05 2018Vol:12 No:04 2018Vol:12 No:03 2018Vol:12 No:02 2018Vol:12 No:01 2018
Vol:11 No:12 2017Vol:11 No:11 2017Vol:11 No:10 2017Vol:11 No:09 2017Vol:11 No:08 2017Vol:11 No:07 2017Vol:11 No:06 2017Vol:11 No:05 2017Vol:11 No:04 2017Vol:11 No:03 2017Vol:11 No:02 2017Vol:11 No:01 2017
Vol:10 No:12 2016Vol:10 No:11 2016Vol:10 No:10 2016Vol:10 No:09 2016Vol:10 No:08 2016Vol:10 No:07 2016Vol:10 No:06 2016Vol:10 No:05 2016Vol:10 No:04 2016Vol:10 No:03 2016Vol:10 No:02 2016Vol:10 No:01 2016
Vol:9 No:12 2015Vol:9 No:11 2015Vol:9 No:10 2015Vol:9 No:09 2015Vol:9 No:08 2015Vol:9 No:07 2015Vol:9 No:06 2015Vol:9 No:05 2015Vol:9 No:04 2015Vol:9 No:03 2015Vol:9 No:02 2015Vol:9 No:01 2015
Vol:8 No:12 2014Vol:8 No:11 2014Vol:8 No:10 2014Vol:8 No:09 2014Vol:8 No:08 2014Vol:8 No:07 2014Vol:8 No:06 2014Vol:8 No:05 2014Vol:8 No:04 2014Vol:8 No:03 2014Vol:8 No:02 2014Vol:8 No:01 2014
Vol:7 No:12 2013Vol:7 No:11 2013Vol:7 No:10 2013Vol:7 No:09 2013Vol:7 No:08 2013Vol:7 No:07 2013Vol:7 No:06 2013Vol:7 No:05 2013Vol:7 No:04 2013Vol:7 No:03 2013Vol:7 No:02 2013Vol:7 No:01 2013
Vol:6 No:12 2012Vol:6 No:11 2012Vol:6 No:10 2012Vol:6 No:09 2012Vol:6 No:08 2012Vol:6 No:07 2012Vol:6 No:06 2012Vol:6 No:05 2012Vol:6 No:04 2012Vol:6 No:03 2012Vol:6 No:02 2012Vol:6 No:01 2012
Vol:5 No:12 2011Vol:5 No:11 2011Vol:5 No:10 2011Vol:5 No:09 2011Vol:5 No:08 2011Vol:5 No:07 2011Vol:5 No:06 2011Vol:5 No:05 2011Vol:5 No:04 2011Vol:5 No:03 2011Vol:5 No:02 2011Vol:5 No:01 2011
Vol:4 No:12 2010Vol:4 No:11 2010Vol:4 No:10 2010Vol:4 No:09 2010Vol:4 No:08 2010Vol:4 No:07 2010Vol:4 No:06 2010Vol:4 No:05 2010Vol:4 No:04 2010Vol:4 No:03 2010Vol:4 No:02 2010Vol:4 No:01 2010
Vol:3 No:12 2009Vol:3 No:11 2009Vol:3 No:10 2009Vol:3 No:09 2009Vol:3 No:08 2009Vol:3 No:07 2009Vol:3 No:06 2009Vol:3 No:05 2009Vol:3 No:04 2009Vol:3 No:03 2009Vol:3 No:02 2009Vol:3 No:01 2009
Vol:2 No:12 2008Vol:2 No:11 2008Vol:2 No:10 2008Vol:2 No:09 2008Vol:2 No:08 2008Vol:2 No:07 2008Vol:2 No:06 2008Vol:2 No:05 2008Vol:2 No:04 2008Vol:2 No:03 2008Vol:2 No:02 2008Vol:2 No:01 2008
Vol:1 No:12 2007Vol:1 No:11 2007Vol:1 No:10 2007Vol:1 No:09 2007Vol:1 No:08 2007Vol:1 No:07 2007Vol:1 No:06 2007Vol:1 No:05 2007Vol:1 No:04 2007Vol:1 No:03 2007Vol:1 No:02 2007Vol:1 No:01 2007