Localization of Geospatial Events and Hoax Prediction in the UFO Database
Unidentified Flying Objects (UFOs) have been an interesting topic for most enthusiasts and hence people all over the United States report such findings online at the National UFO Report Center (NUFORC). Some of these reports are a hoax and among those that seem legitimate, our task is not to establish that these events confirm that they indeed are events related to flying objects from aliens in outer space. Rather, we intend to identify if the report was a hoax as was identified by the UFO database team with their existing curation criterion. However, the database provides a wealth of information that can be exploited to provide various analyses and insights such as social reporting, identifying real-time spatial events and much more. We perform analysis to localize these time-series geospatial events and correlate with known real-time events. This paper does not confirm any legitimacy of alien activity, but rather attempts to gather information from likely legitimate reports of UFOs by studying the online reports. These events happen in geospatial clusters and also are time-based. We look at cluster density and data visualization to search the space of various cluster realizations to decide best probable clusters that provide us information about the proximity of such activity. A random forest classifier is also presented that is used to identify true events and hoax events, using the best possible features available such as region, week, time-period and duration. Lastly, we show the performance of the scheme on various days and correlate with real-time events where one of the UFO reports strongly correlates to a missile test conducted in the United States.
Analysis of Cooperative Learning Behavior Based on the Data of Students' Movement
The purpose of this paper is to analyze the cooperative learning behavior pattern based on the data of students' movement. The study firstly reviewed the cooperative learning theory and its research status, and briefly introduced the k-means clustering algorithm. Then, it used clustering algorithm and mathematical statistics theory to analyze the activity rhythm of individual student and groups in different functional areas, according to the movement data provided by 10 first-year graduate students. It also focused on the analysis of students' behavior in the learning area and explored the law of cooperative learning behavior. The research result showed that the cooperative learning behavior analysis method based on movement data proposed in this paper is feasible. From the results of data analysis, the characteristics of behavior of students and their cooperative learning behavior patterns could be found.
An Improved K-Means Algorithm for Gene Expression Data Clustering
Data mining technique used in the field of clustering is a subject of active research and assists in biological pattern recognition and extraction of new knowledge from raw data. Clustering means the act of partitioning an unlabeled dataset into groups of similar objects. Each group, called a cluster, consists of objects that are similar between themselves and dissimilar to objects of other groups. Several clustering methods are based on partitional clustering. This category attempts to directly decompose the dataset into a set of disjoint clusters leading to an integer number of clusters that optimizes a given criterion function. The criterion function may emphasize a local or a global structure of the data, and its optimization is an iterative relocation procedure. The K-Means algorithm is one of the most widely used partitional clustering techniques. Since K-Means is extremely sensitive to the initial choice of centers and a poor choice of centers may lead to a local optimum that is quite inferior to the global optimum, we propose a strategy to initiate K-Means centers. The improved K-Means algorithm is compared with the original K-Means, and the results prove how the efficiency has been significantly improved.
Optical Flow Based System for Cross Traffic Alert
This document describes an advanced system and methodology for Cross Traffic Alert (CTA), able to detect vehicles that move into the vehicle driving path from the left or right side. The camera is supposed to be not only on a vehicle still, e.g. at a traffic light or at an intersection, but also moving slowly, e.g. in a car park. In all of the aforementioned conditions, a driver’s short loss of concentration or distraction can easily lead to a serious accident. A valid support to avoid these kinds of car crashes is represented by the proposed system. It is an extension of our previous work, related to a clustering system, which only works on fixed cameras. Just a vanish point calculation and simple optical flow filtering, to eliminate motion vectors due to the car relative movement, is performed to let the system achieve high performances with different scenarios, cameras and resolutions. The proposed system just uses as input the optical flow, which is hardware implemented in the proposed platform and since the elaboration of the whole system is really speed and power consumption, it is inserted directly in the camera framework, allowing to execute all the processing in real-time.
Web Proxy Detection via Bipartite Graphs and One-Mode Projections
With the Internet becoming the dominant channel for business and life, many IPs are increasingly masked using web proxies for illegal purposes such as propagating malware, impersonate phishing pages to steal sensitive data or redirect victims to other malicious targets. Moreover, as Internet traffic continues to grow in size and complexity, it has become an increasingly challenging task to detect the proxy service due to their dynamic update and high anonymity. In this paper, we present an approach based on behavioral graph analysis to study the behavior similarity of web proxy users. Specifically, we use bipartite graphs to model host communications from network traffic and build one-mode projections of bipartite graphs for discovering social-behavior similarity of web proxy users. Based on the similarity matrices of end-users from the derived one-mode projection graphs, we apply a simple yet effective spectral clustering algorithm to discover the inherent web proxy users behavior clusters. The web proxy URL may vary from time to time. Still, the inherent interest would not. So, based on the intuition, by dint of our private tools implemented by WebDriver, we examine whether the top URLs visited by the web proxy users are web proxies. Our experiment results based on real datasets show that the behavior clusters not only reduce the number of URLs analysis but also provide an effective way to detect the web proxies, especially for the unknown web proxies.
CoP-Networks: Virtual Spaces for New Faculty’s Professional Development in the 21st Higher Education
The 21st century higher education and globalization challenge new faculty members to build effective professional networks and partnership with industry in order to accelerate their growth and success. This creates the need for community of practice (CoP)-oriented development approaches that focus on cognitive apprenticeship while considering individual predisposition and future career needs. This work adopts data mining, clustering analysis, and social networking technologies to present the CoP-Network as a virtual space that connects together similar career-aspiration individuals who are socially influenced to join and engage in a process for domain-related knowledge and practice acquisitions. The CoP-Network model can be integrated into higher education to extend traditional graduate and professional development programs.
Automatic Landmark Selection Based on Feature Clustering for Visual Autonomous Unmanned Aerial Vehicle Navigation
The selection of specific landmarks for an Unmanned
Aerial Vehicles’ Visual Navigation systems based on Automatic
Landmark Recognition has significant influence on the precision of
the system’s estimated position. At the same time, manual selection
of the landmarks does not guarantee a high recognition rate, which
would also result on a poor precision. This work aims to develop an
automatic landmark selection that will take the image of the flight
area and identify the best landmarks to be recognized by the Visual
Navigation Landmark Recognition System. The criterion to select
a landmark is based on features detected by ORB or AKAZE and
edges information on each possible landmark. Results have shown
that disposition of possible landmarks is quite different from the
Method of Cluster Based Cross-Domain Knowledge Acquisition for Biologically Inspired Design
Biologically inspired design inspires inventions and new technologies in the field of engineering by mimicking functions, principles, and structures in the biological domain. To deal with the obstacles of cross-domain knowledge acquisition in the existing biologically inspired design process, functional semantic clustering based on functional feature semantic correlation and environmental constraint clustering composition based on environmental characteristic constraining adaptability are proposed. A knowledge cell clustering algorithm and the corresponding prototype system is developed. Finally, the effectiveness of the method is verified by the visual prosthetic device design.
Implementation of an IoT Sensor Data Collection and Analysis Library
Due to the development of information technology and wireless Internet technology, various data are being generated in various fields. These data are advantageous in that they provide real-time information to the users themselves. However, when the data are accumulated and analyzed, more various information can be extracted. In addition, development and dissemination of boards such as Arduino and Raspberry Pie have made it possible to easily test various sensors, and it is possible to collect sensor data directly by using database application tools such as MySQL. These directly collected data can be used for various research and can be useful as data for data mining. However, there are many difficulties in using the board to collect data, and there are many difficulties in using it when the user is not a computer programmer, or when using it for the first time. Even if data are collected, lack of expert knowledge or experience may cause difficulties in data analysis and visualization. In this paper, we aim to construct a library for sensor data collection and analysis to overcome these problems.
Scattering Operator and Spectral Clustering for Ultrasound Images: Application on Deep Venous Thrombi
Deep Venous Thrombosis (DVT) occurs when a
thrombus is formed within a deep vein (most often in the legs). This
disease can be deadly if a part or the whole thrombus reaches the
lung and causes a Pulmonary Embolism (PE). This disorder, often
asymptomatic, has multifactorial causes: immobilization, surgery,
pregnancy, age, cancers, and genetic variations. Our project aims to
relate the thrombus epidemiology (origins, patient predispositions,
PE) to its structure using ultrasound images. Ultrasonography and
elastography were collected using Toshiba Aplio 500 at Brest
Hospital. This manuscript compares two classification approaches:
spectral clustering and scattering operator. The former is based on
the graph and matrix theories while the latter cascades wavelet
convolutions with nonlinear modulus and averaging operators.
A Parallel Implementation of k-Means in MATLAB
The aim of this work is the parallel implementation
of k-means in MATLAB, in order to reduce the execution time.
Specifically, a new function in MATLAB for serial k-means algorithm
is developed, which meets all the requirements for the conversion to a
function in MATLAB with parallel computations. Additionally, two
different variants for the definition of initial values are presented.
In the sequel, the parallel approach is presented. Finally, the
performance tests for the computation times respect to the numbers
of features and classes are illustrated.
Human Behavior Modeling in Video Surveillance of Conference Halls
In this paper, we present a human behavior modeling approach in videos scenes. This approach is used to model the normal behaviors in the conference halls. We exploited the Probabilistic Latent Semantic Analysis technique (PLSA), using the 'Bag-of-Terms' paradigm, as a tool for exploring video data to learn the model by grouping similar activities. Our term vocabulary consists of 3D spatio-temporal patch groups assigned by the direction of motion. Our video representation ensures the spatial information, the object trajectory, and the motion. The main importance of this approach is that it can be adapted to detect abnormal behaviors in order to ensure and enhance human security.
Ensuring Uniform Energy Consumption in Non-Deterministic Wireless Sensor Network to Protract Networks Lifetime
Wireless sensor networks have enticed much of the spotlight from researchers all around the world, owing to its extensive applicability in agricultural, industrial and military fields. Energy conservation node deployment stratagems play a notable role for active implementation of Wireless Sensor Networks. Clustering is the approach in wireless sensor networks which improves energy efficiency in the network. The clustering algorithm needs to have an optimum size and number of clusters, as clustering, if not implemented properly, cannot effectively increase the life of the network. In this paper, an algorithm has been proposed to address connectivity issues with the aim of ensuring the uniform energy consumption of nodes in every part of the network. The results obtained after simulation showed that the proposed algorithm has an edge over existing algorithms in terms of throughput and networks lifetime.
K-Means Based Matching Algorithm for Multi-Resolution Feature Descriptors
Matching high dimensional features between images is computationally expensive for exhaustive search approaches in computer vision. Although the dimension of the feature can be degraded by simplifying the prior knowledge of homography, matching accuracy may degrade as a tradeoff. In this paper, we present a feature matching method based on k-means algorithm that reduces the matching cost and matches the features between images instead of using a simplified geometric assumption. Experimental results show that the proposed method outperforms the previous linear exhaustive search approaches in terms of the inlier ratio of matched pairs.
Generalization of Clustering Coefficient on Lattice Networks Applied to Criminal Networks
A lattice network is a special type of network in
which all nodes have the same number of links, and its boundary
conditions are periodic. The most basic lattice network is the ring, a
one-dimensional network with periodic border conditions. In contrast,
the Cartesian product of d rings forms a d-dimensional lattice
network. An analytical expression currently exists for the clustering
coefficient in this type of network, but the theoretical value is valid
only up to certain connectivity value; in other words, the analytical
expression is incomplete. Here we obtain analytically the clustering
coefficient expression in d-dimensional lattice networks for any link
density. Our analytical results show that the clustering coefficient for
a lattice network with density of links that tend to 1, leads to the
value of the clustering coefficient of a fully connected network. We
developed a model on criminology in which the generalized clustering
coefficient expression is applied. The model states that delinquents
learn the know-how of crime business by sharing knowledge, directly
or indirectly, with their friends of the gang. This generalization shed
light on the network properties, which is important to develop new
models in different fields where network structure plays an important
role in the system dynamic, such as criminology, evolutionary game
theory, econophysics, among others.
Summarizing Data Sets for Data Mining by Using Statistical Methods in Coastal Engineering
Coastal regions are the one of the most commonly used places by the natural balance and the growing population. In coastal engineering, the most valuable data is wave behaviors. The amount of this data becomes very big because of observations that take place for periods of hours, days and months. In this study, some statistical methods such as the wave spectrum analysis methods and the standard statistical methods have been used. The goal of this study is the discovery profiles of the different coast areas by using these statistical methods, and thus, obtaining an instance based data set from the big data to analysis by using data mining algorithms. In the experimental studies, the six sample data sets about the wave behaviors obtained by 20 minutes of observations from Mersin Bay in Turkey and converted to an instance based form, while different clustering techniques in data mining algorithms were used to discover similar coastal places. Moreover, this study discusses that this summarization approach can be used in other branches collecting big data such as medicine.
A Computational Cost-Effective Clustering Algorithm in Multidimensional Space Using the Manhattan Metric: Application to the Global Terrorism Database
The increasing amount of collected data has limited the performance of the current analyzing algorithms. Thus, developing new cost-effective algorithms in terms of complexity, scalability, and accuracy raised significant interests. In this paper, a modified effective k-means based algorithm is developed and experimented. The new algorithm aims to reduce the computational load without significantly affecting the quality of the clusterings. The algorithm uses the City Block distance and a new stop criterion to guarantee the convergence. Conducted experiments on a real data set show its high performance when compared with the original k-means version.
Clustering Categorical Data Using the K-Means Algorithm and the Attribute’s Relative Frequency
Clustering is a well known data mining technique used in pattern recognition and information retrieval. The initial dataset to be clustered can either contain categorical or numeric data. Each type of data has its own specific clustering algorithm. In this context, two algorithms are proposed: the k-means for clustering numeric datasets and the k-modes for categorical datasets. The main encountered problem in data mining applications is clustering categorical dataset so relevant in the datasets. One main issue to achieve the clustering process on categorical values is to transform the categorical attributes into numeric measures and directly apply the k-means algorithm instead the k-modes. In this paper, it is proposed to experiment an approach based on the previous issue by transforming the categorical values into numeric ones using the relative frequency of each modality in the attributes. The proposed approach is compared with a previously method based on transforming the categorical datasets into binary values. The scalability and accuracy of the two methods are experimented. The obtained results show that our proposed method outperforms the binary method in all cases.
Optimal Maintenance Clustering for Rail Track Components Subject to Possession Capacity Constraints
This paper studies the optimal maintenance planning of preventive maintenance and renewal activities for components in a single railway track when the available time for maintenance is limited. The rail-track system consists of several types of components, such as rail, ballast, and switches with different preventive maintenance and renewal intervals. To perform maintenance or renewal on the track, a train free period for maintenance, called a possession, is required. Since a major possession directly affects the regular train schedule, maintenance and renewal activities are clustered as much as possible. In a highly dense and utilized railway network, the possession time on the track is critical since the demand for train operations is very high and a long possession has a severe impact on the regular train schedule. We present an optimization model and investigate the maintenance schedules with and without the possession capacity constraint. In addition, we also integrate the social-economic cost related to the effects of the maintenance time to the variable possession cost into the optimization model. A numerical example is provided to illustrate the model.
The Survey Research and Evaluation of Green Residential Building Based on the Improved Group Analytical Hierarchy Process Method in Yinchuan
Due to the economic downturn and the deterioration of the living environment, the development of residential buildings as high energy consuming building is gradually changing from “extensive” to green building in China. So, the evaluation system of green building is continuously improved, but the current evaluation work has the following problems: (1) There are differences in the cost of the actual investment and the purchasing power of residents, also construction target of green residential building is single and lacks multi-objective performance development. (2) Green building evaluation lacks regional characteristics and cannot reflect the different regional residents demand. (3) In the process of determining the criteria weight, the experts’ judgment matrix is difficult to meet the requirement of consistency. Therefore, to solve those problems, questionnaires which are about the green residential building for Ningxia area are distributed, and the results of questionnaires can feedback the purchasing power of residents and the acceptance of the green building cost. Secondly, combined with the geographical features of Ningxia minority areas, the evaluation criteria system of green residential building is constructed. Finally, using the improved group AHP method and the grey clustering method, the criteria weight is determined, and a real case is evaluated, which is located in Xing Qing district, Ningxia. A conclusion can be obtained that the professional evaluation for this project and good social recognition is basically the same.
Hierarchical Checkpoint Protocol in Data Grids
Grid of computing nodes has emerged as a
representative means of connecting distributed computers or
resources scattered all over the world for the purpose of computing
and distributed storage. Since fault tolerance becomes complex due
to the availability of resources in decentralized grid environment,
it can be used in connection with replication in data grids. The
objective of our work is to present fault tolerance in data grids
with data replication-driven model based on clustering. The
performance of the protocol is evaluated with Omnet++ simulator.
The computational results show the efficiency of our protocol in
terms of recovery time and the number of process in rollbacks.
Energy-Efficient Clustering Protocol in Wireless Sensor Networks for Healthcare Monitoring
Wireless sensor networks (WSNs) can facilitate continuous monitoring of patients and increase early detection of emergency conditions and diseases. High density WSNs helps us to accurately monitor a remote environment by intelligently combining the data from the individual nodes. Due to energy capacity limitation of sensors, enhancing the lifetime and the reliability of WSNs are important factors in designing of these networks. The clustering strategies are verified as effective and practical algorithms for reducing energy consumption in WSNs and can tackle WSNs limitations. In this paper, an Energy-efficient weight-based Clustering Protocol (EWCP) is presented. Artificial retina is selected as a case study of WSNs applied in body sensors. Cluster heads’ (CHs) selection is equipped with energy efficient parameters. Moreover, cluster members are selected based on their distance to the selected CHs. Comparing with the other benchmark protocols, the lifetime of EWCP is improved significantly.
Chemical Reaction Algorithm for Expectation Maximization Clustering
Clustering is an intensive research for some years
because of its multifaceted applications, such as biology, information
retrieval, medicine, business and so on. The expectation maximization
(EM) is a kind of algorithm framework in clustering methods, one
of the ten algorithms of machine learning. Traditionally, optimization
of objective function has been the standard approach in EM. Hence,
research has investigated the utility of evolutionary computing and
related techniques in the regard. Chemical Reaction Optimization
(CRO) is a recently established method. So the property embedded
in CRO is used to solve optimization problems. This paper presents
an algorithm framework (EM-CRO) with modified CRO operators
based on EM cluster problems. The hybrid algorithm is mainly
to solve the problem of initial value sensitivity of the objective
function optimization clustering algorithm. Our experiments mainly
take the EM classic algorithm:k-means and fuzzy k-means as an
example, through the CRO algorithm to optimize its initial value, get
K-means-CRO and FKM-CRO algorithm. The experimental results
of them show that there is improved efficiency for solving objective
function optimization clustering problems.
A Comparative Study on Fuzzy and Neuro-Fuzzy Enabled Cluster Based Routing Protocols for Wireless Sensor Networks
Dynamic Routing in Wireless Sensor Networks (WSNs) has played a significant task in research for the recent years. Energy consumption and data delivery in time are the major parameters with the usage of sensor nodes that are significant criteria for these networks. The location of sensor nodes must not be prearranged. Clustering in WSN is a key methodology which is used to enlarge the life-time of a sensor network. It consists of numerous real-time applications. The features of WSNs are minimized the consumption of energy. Soft computing techniques can be included to accomplish improved performance. This paper surveys the modern trends in routing enclose fuzzy logic and Neuro-fuzzy logic based on the clustering techniques and implements a comparative study of the numerous related methodologies.
3D Mesh Coarsening via Uniform Clustering
In this paper, we present a fast and efficient mesh coarsening algorithm for 3D triangular meshes. Theis approach can be applied to very complex 3D meshes of arbitrary topology and with millions of vertices. The algorithm is based on the clustering of the input mesh elements, which divides the faces of an input mesh into a given number of clusters for clustering purpose by approximating the Centroidal Voronoi Tessellation of the input mesh. Once a clustering is achieved, it provides us an efficient way to construct uniform tessellations, and therefore leads to good coarsening of polygonal meshes. With proliferation of 3D scanners, this coarsening algorithm is particularly useful for reverse engineering applications of 3D models, which in many cases are dense, non-uniform, irregular and arbitrary topology. Examples demonstrating effectiveness of the new algorithm are also included in the paper.
LiDAR Based Real Time Multiple Vehicle Detection and Tracking
Self-driving vehicle require a high level of situational
awareness in order to maneuver safely when driving in real world
condition. This paper presents a LiDAR based real time perception
system that is able to process sensor raw data for multiple target
detection and tracking in dynamic environment. The proposed
algorithm is nonparametric and deterministic that is no assumptions
and priori knowledge are needed from the input data and no
initializations are required. Additionally, the proposed method is
working on the three-dimensional data directly generated by LiDAR
while not scarifying the rich information contained in the domain of
3D. Moreover, a fast and efficient for real time clustering algorithm
is applied based on a radially bounded nearest neighbor (RBNN).
Hungarian algorithm procedure and adaptive Kalman filtering are
used for data association and tracking algorithm. The proposed
algorithm is able to run in real time with average run time of 70ms
Electricity Generation from Renewables and Targets: An Application of Multivariate Statistical Techniques
Renewable energy is referred to as "clean energy" and common popular support for the use of renewable energy (RE) is to provide electricity with zero carbon dioxide emissions. This study provides useful insight into the European Union (EU) RE, especially, into electricity generation obtained from renewables, and their targets. The objective of this study is to identify groups of European countries, using multivariate statistical analysis and selected indicators. The hierarchical clustering method is used to decide the number of clusters for EU countries. The conducted statistical hierarchical cluster analysis is based on the Ward’s clustering method and squared Euclidean distances. Hierarchical cluster analysis identified eight distinct clusters of European countries. Then, non-hierarchical clustering (k-means) method was applied. Discriminant analysis was used to determine the validity of the results with data normalized by Z score transformation. To explore the relationship between the selected indicators, correlation coefficients were computed. The results of the study reveal the current situation of RE in European Union Member States.
Intelligent Recognition of Diabetes Disease via FCM Based Attribute Weighting
In this paper, an attribute weighting method called fuzzy C-means clustering based attribute weighting (FCMAW) for classification of Diabetes disease dataset has been used. The aims of this study are to reduce the variance within attributes of diabetes dataset and to improve the classification accuracy of classifier algorithm transforming from non-linear separable datasets to linearly separable datasets. Pima Indians Diabetes dataset has two classes including normal subjects (500 instances) and diabetes subjects (268 instances). Fuzzy C-means clustering is an improved version of K-means clustering method and is one of most used clustering methods in data mining and machine learning applications. In this study, as the first stage, fuzzy C-means clustering process has been used for finding the centers of attributes in Pima Indians diabetes dataset and then weighted the dataset according to the ratios of the means of attributes to centers of theirs. Secondly, after weighting process, the classifier algorithms including support vector machine (SVM) and k-NN (k- nearest neighbor) classifiers have been used for classifying weighted Pima Indians diabetes dataset. Experimental results show that the proposed attribute weighting method (FCMAW) has obtained very promising results in the classification of Pima Indians diabetes dataset.
Applying Hybrid Graph Drawing and Clustering Methods on Stock Investment Analysis
Stock investment decisions are often made based on current events of the global economy and the analysis of historical data. Conversely, visual representation could assist investors’ gain deeper understanding and better insight on stock market trends more efficiently. The trend analysis is based on long-term data collection. The study adopts a hybrid method that combines the Clustering algorithm and Force-directed algorithm to overcome the scalability problem when visualizing large data. This method exemplifies the potential relationships between each stock, as well as determining the degree of strength and connectivity, which will provide investors another understanding of the stock relationship for reference. Information derived from visualization will also help them make an informed decision. The results of the experiments show that the proposed method is able to produced visualized data aesthetically by providing clearer views for connectivity and edge weights.
A Cuckoo Search with Differential Evolution for Clustering Microarray Gene Expression Data
A DNA microarray technology is a collection of microscopic DNA spots attached to a solid surface. Scientists use DNA microarrays to measure the expression levels of large numbers of genes simultaneously or to genotype multiple regions of a genome. Elucidating the patterns hidden in gene expression data offers a tremendous opportunity for an enhanced understanding of functional genomics. However, the large number of genes and the complexity of biological networks greatly increase the challenges of comprehending and interpreting the resulting mass of data, which often consists of millions of measurements. It is handled by clustering which reveals the natural structures and identifying the interesting patterns in the underlying data. In this paper, gene based clustering in gene expression data is proposed using Cuckoo Search with Differential Evolution (CS-DE). The experiment results are analyzed with gene expression benchmark datasets. The results show that CS-DE outperforms CS in benchmark datasets. To find the validation of the clustering results, this work is tested with one internal and one external cluster validation indexes.