|Commenced in January 1999||Frequency: Monthly||Edition: International||Paper Count: 918|
Distributed database is a collection of logically related databases that cooperate in a transparent manner. Query processing uses a communication network for transmitting data between sites. It refers to one of the challenges in the database world. The development of sophisticated query optimization technology is the reason for the commercial success of database systems, which complexity and cost increase with increasing number of relations in the query. Mariposa, query trading and query trading with processing task-trading strategies developed for autonomous distributed database systems, but they cause high optimization cost because of involvement of all nodes in generating an optimal plan. In this paper, we proposed a modification on the autonomous strategy K-QTPT that make the seller’s nodes with the lowest cost have gradually high priorities to reduce the optimization time. We implement our proposed strategy and present the results and analysis based on those results.
Data mining has, over recent years, seen big advances because of the spread of internet, which generates everyday a tremendous volume of data, and also the immense advances in technologies which facilitate the analysis of these data. In particular, classification techniques are a subdomain of Data Mining which determines in which group each data instance is related within a given dataset. It is used to classify data into different classes according to desired criteria. Generally, a classification technique is either statistical or machine learning. Each type of these techniques has its own limits. Nowadays, current data are becoming increasingly heterogeneous; consequently, current classification techniques are encountering many difficulties. This paper defines new measure functions to quantify the resemblance between instances and then combines them in a new approach which is different from actual algorithms by its reliability computations. Results of the proposed approach exceeded most common classification techniques with an f-measure exceeding 97% on the IRIS Dataset.
The aim of this paper is the comparison of three different methods, in order to produce fuzzy tolerance relations for rainfall data classification. More specifically, the three methods are correlation coefficient, cosine amplitude and max-min method. The data were obtained from seven rainfall stations in the region of central Greece and refers to 20-year time series of monthly rainfall height average. Three methods were used to express these data as a fuzzy relation. This specific fuzzy tolerance relation is reformed into an equivalence relation with max-min composition for all three methods. From the equivalence relation, the rainfall stations were categorized and classified according to the degree of confidence. The classification shows the similarities among the rainfall stations. Stations with high similarity can be utilized in water resource management scenarios interchangeably or to augment data from one to another. Due to the complexity of calculations, it is important to find out which of the methods is computationally simpler and needs fewer compositions in order to give reliable results.
‘Water-related energy’ is energy use which is directly or indirectly influenced by changes to water use. Informatics applying a range of mathematical, statistical and rule-based approaches can be used to reveal important information on demand from the available data provided at second, minute or hourly intervals. This study aims to combine these two concepts to improve the current water end use disaggregation problem through applying a wide range of most advanced pattern recognition techniques to analyse the concurrent high-resolution water-energy consumption data. The obtained results have shown that recognition accuracies of all end-uses have significantly increased, especially for mechanised categories, including clothes washer, dishwasher and evaporative air cooler where over 95% of events were correctly classified.
Due to the development of information technology and wireless Internet technology, various data are being generated in various fields. These data are advantageous in that they provide real-time information to the users themselves. However, when the data are accumulated and analyzed, more various information can be extracted. In addition, development and dissemination of boards such as Arduino and Raspberry Pie have made it possible to easily test various sensors, and it is possible to collect sensor data directly by using database application tools such as MySQL. These directly collected data can be used for various research and can be useful as data for data mining. However, there are many difficulties in using the board to collect data, and there are many difficulties in using it when the user is not a computer programmer, or when using it for the first time. Even if data are collected, lack of expert knowledge or experience may cause difficulties in data analysis and visualization. In this paper, we aim to construct a library for sensor data collection and analysis to overcome these problems.
In today's popularity and progress of information technology, the big data set and its analysis are no longer a major conundrum. Now, we could not only use the relevant big data to analysis and emulate the possible status of urban development in the near future, but also provide more comprehensive and reasonable policy implementation basis for government units or decision-makers via the analysis and emulation results as mentioned above. In this research, we set Taipei City as the research scope, and use the relevant big data variables (e.g., population, facility utilization and related social policy ratings) and Analytic Network Process (ANP) approach to implement in-depth research and discussion for the possible reduction of land use in primary and secondary schools of Taipei City. In addition to enhance the prosperous urban activities for the urban public facility utilization, the final results of this research could help improve the efficiency of urban land use in the future. Furthermore, the assessment model and research framework established in this research also provide a good reference for schools or other public facilities land use and adaptive reuse strategies in the future.
Most Virtual Learning Environments do not provide support mechanisms for the integrated planning, construction and follow-up of Instructional Design supported by Learning Analytic results. The present work aims to present an authoring tool that will be responsible for constructing the structure of an Instructional Design (ID), without the data being altered during the execution of the course. The visual interface aims to present the critical situations present in this ID, serving as a support tool for the course follow-up and possible improvements, which can be made during its execution or in the planning of a new edition of this course. The model for the ID is based on High-Level Petri Nets and the visualization forms are determined by the specific kind of the data generated by an e-course, a population of students generating sequentially dependent data.
The Intelligent transportation system is essential to build smarter cities. Machine learning based transportation prediction could be highly promising approach by delivering invisible aspect visible. In this context, this research aims to make a prototype model that predicts transportation network by using big data and machine learning technology. In detail, among urban transportation systems this research chooses bus system. The research problem that existing headway model cannot response dynamic transportation conditions. Thus, bus delay problem is often occurred. To overcome this problem, a prediction model is presented to fine patterns of bus delay by using a machine learning implementing the following data sets; traffics, weathers, and bus statues. This research presents a flexible headway model to predict bus delay and analyze the result. The prototyping model is composed by real-time data of buses. The data are gathered through public data portals and real time Application Program Interface (API) by the government. These data are fundamental resources to organize interval pattern models of bus operations as traffic environment factors (road speeds, station conditions, weathers, and bus information of operating in real-time). The prototyping model is designed by the machine learning tool (RapidMiner Studio) and conducted tests for bus delays prediction. This research presents experiments to increase prediction accuracy for bus headway by analyzing the urban big data. The big data analysis is important to predict the future and to find correlations by processing huge amount of data. Therefore, based on the analysis method, this research represents an effective use of the machine learning and urban big data to understand urban dynamics.
Incorporating Home eNodeB (HeNB) in cellular networks, e.g. Long Term Evolution Advanced (LTE-A), is beneficial for extending coverage and enhancing capacity at low price especially within the non-line-of sight (NLOS) environments such as homes. HeNB or femtocell is a small low powered base station which provides radio coverage to the mobile users in an indoor environment. This deployment results in a heterogeneous network where the available spectrum becomes shared between two layers. Therefore, a problem of Inter Cell Interference (ICI) appears. This issue is the main challenge in LTE-A. To deal with this challenge, various techniques based on frequency, time and power control are proposed. This paper deals with the impact of carrier aggregation and higher order MIMO (Multiple Input Multiple Output) schemes on the LTE-Advanced performance. Simulation results show the advantages of these schemes on the system capacity (4.109 b/s/Hz when bandwidth B=100 MHz and when applying MIMO 8x8 for SINR=30 dB), maximum theoretical peak data rate (more than 4 Gbps for B=100 MHz and when MIMO 8x8 is used) and spectral efficiency (15 b/s/Hz and 30b/s/Hz when MIMO 4x4 and MIMO 8x8 are applying respectively for SINR=30 dB).
Earthquake is an inevitable catastrophic natural disaster. The damages of buildings and man-made structures, where most of the human activities occur are the major cause of casualties from earthquakes. A comparison of optical and SAR data is presented in the case of Kathmandu valley which was hardly shaken by 2015-Nepal Earthquake. Though many existing researchers have conducted optical data based estimated or suggested combined use of optical and SAR data for improved accuracy, however finding cloud-free optical images when urgently needed are not assured. Therefore, this research is specializd in developing SAR based technique with the target of rapid and accurate geospatial reporting. Should considers that limited time available in post-disaster situation offering quick computation exclusively based on two pairs of pre-seismic and co-seismic single look complex (SLC) images. The InSAR coherence pre-seismic, co-seismic and post-seismic was used to detect the change in damaged area. In addition, the ground truth data from field applied to optical data by random forest classification for detection of damaged area. The ground truth data collected in the field were used to assess the accuracy of supervised classification approach. Though a higher accuracy obtained from the optical data then integration by optical-SAR data. Limitation of cloud-free images when urgently needed for earthquak evevent are and is not assured, thus further research on improving the SAR based damage detection is suggested. Availability of very accurate damage information is expected for channelling the rescue and emergency operations. It is expected that the quick reporting of the post-disaster damage situation quantified by the rapid earthquake assessment should assist in channeling the rescue and emergency operations, and in informing the public about the scale of damage.
Finding relevant information on the World Wide Web is becoming highly challenging day by day. Web usage mining is used for the extraction of relevant and useful knowledge, such as user behaviour patterns, from web access log records. Web access log records all the requests for individual files that the users have requested from the website. Web usage mining is important for Customer Relationship Management (CRM), as it can ensure customer satisfaction as far as the interaction between the customer and the organization is concerned. Web usage mining is helpful in improving website structure or design as per the user’s requirement by analyzing the access log file of a website through a log analyzer tool. The focus of this paper is to enhance the accessibility and usability of a guitar selling web site by analyzing their access log through Deep Log Analyzer tool. The results show that the maximum number of users is from the United States and that they use Opera 9.8 web browser and the Windows XP operating system.
Health for all is considered as a sign of well-being and inclusive growth. New healthcare technologies are contributing to the quality of human lives by promoting health education and awareness, leading to the prevention, early diagnosis and treatment of the symptoms of diseases. Healthcare technologies have now migrated from the medical and institutionalized settings to the home and everyday life. This paper explores these new technologies and investigates how they contribute to health education and awareness, promoting the objective of high-value health system for all. The methodology used for the research is literature review. The paper also discusses the opportunities and challenges with futuristic healthcare technologies. The combined advances in genomics medicine, wearables and the IoT with enhanced data collection in electronic health record (EHR) systems, environmental sensors, and mobile device applications can contribute in a big way to high-value health system for all. The promise by these technologies includes reduced total cost of healthcare, reduced incidence of medical diagnosis errors, and reduced treatment variability. The major barriers to adoption include concerns with security, privacy, and integrity of healthcare data, regulation and compliance issues, service reliability, interoperability and portability of data, and user friendliness and convenience of these technologies.
DNA data have been used in forensics for decades. However, current research looks at using the DNA as a biometric identity verification modality. The goal is to improve the speed of identification. We aim at using gene data that was initially used for autism detection to find if and how accurate is this data for identification applications. Mainly our goal is to find if our data preprocessing technique yields data useful as a biometric identification tool. We experiment with using the nearest neighbor classifier to identify subjects. Results show that optimal classification rate is achieved when the test set is corrupted by normally distributed noise with zero mean and standard deviation of 1. The classification rate is close to optimal at higher noise standard deviation reaching 3. This shows that the data can be used for identity verification with high accuracy using a simple classifier such as the k-nearest neighbor (k-NN).
The environmental impact related to ornamental stones (such as marbles and granites) is largely debated. Starting from the industrial revolution, continuous improvements of machineries led to a higher exploitation of this natural resource and to a more international interaction between markets. As a consequence, the environmental impact of the extraction and processing of stones has increased. Nevertheless, if compared with other building materials, ornamental stones are generally more durable, natural, and recyclable. From the scientific point of view, studies on stone life cycle sustainability have been carried out, but these are often partial or not very significant because of the high percentage of approximations and assumptions in calculations. This is due to the lack, in life cycle databases (e.g. Ecoinvent, Thinkstep, and ELCD), of datasets about the specific technologies employed in the stone production chain. For example, databases do not contain information about diamond wires, chains or explosives, materials commonly used in quarries and transformation plants. The project presented in this paper aims to populate the life cycle databases with specific data of specific stone processes. To this goal, the methodology follows the standardized approach of Life Cycle Assessment (LCA), according to the requirements of UNI 14040-14044 and to the International Reference Life Cycle Data System (ILCD) Handbook guidelines of the European Commission. The study analyses the processes of the entire production chain (from-cradle-to-gate system boundaries), including the extraction of benches, the cutting of blocks into slabs/tiles and the surface finishing. Primary data have been collected in Italian quarries and transformation plants which use technologies representative of the current state-of-the-art. Since the technologies vary according to the hardness of the stone, the case studies comprehend both soft stones (marbles) and hard stones (gneiss). In particular, data about energy, materials and emissions were collected in marble basins of Carrara and in Beola and Serizzo basins located in the province of Verbano Cusio Ossola. Data were then elaborated through an appropriate software to build a life cycle model. The model was realized setting free parameters that allow an easy adaptation to specific productions. Through this model, the study aims to boost the direct participation of stone companies and encourage the use of LCA tool to assess and improve the stone sector environmental sustainability. At the same time, the realization of accurate Life Cycle Inventory data aims at making available, to researchers and stone experts, ILCD compliant datasets of the most significant processes and technologies related to the ornamental stone sector.
The dynamic and highly competitive nature of the consumer electronics retail industry means that businesses in this industry are experiencing different decision making challenges in relation to pricing, inventory control, consumer satisfaction and product offerings. To overcome the challenges facing retailers and create opportunities, we propose a generic data warehousing solution which can be applied to a wide range of consumer electronics retailers with a minimum configuration. The solution includes a dimensional data model, a template SQL script, a high level architectural descriptions, ETL tool developed using C#, a set of APIs, and data access tools. It has been successfully applied by ASK Outlets Ltd UK resulting in improved productivity and enhanced sales growth.
This study was designed enable application of multivariate technique in the interpretation of categorical data for measuring health care services satisfaction in Turkey. The data was collected from a total of 17726 respondents. The establishment of the sample group and collection of the data were carried out by a joint team from The Ministry of Health and Turkish Statistical Institute (Turk Stat) of Turkey. The multiple correspondence analysis (MCA) was used on the data of 2882 respondents who answered the questionnaire in full. The multiple correspondence analysis indicated that, in the evaluation of health services females, public employees, younger and more highly educated individuals were more concerned and complainant than males, private sector employees, older and less educated individuals. Overall 53 % of the respondents were pleased with the improvements in health care services in the past three years. This study demonstrates the public consciousness in health services and health care satisfaction in Turkey. It was found that most the respondents were pleased with the improvements in health care services over the past three years. Awareness of health service quality increases with education levels. Older individuals and males would appear to have lower expectancies in health services.
Coastal regions are the one of the most commonly used places by the natural balance and the growing population. In coastal engineering, the most valuable data is wave behaviors. The amount of this data becomes very big because of observations that take place for periods of hours, days and months. In this study, some statistical methods such as the wave spectrum analysis methods and the standard statistical methods have been used. The goal of this study is the discovery profiles of the different coast areas by using these statistical methods, and thus, obtaining an instance based data set from the big data to analysis by using data mining algorithms. In the experimental studies, the six sample data sets about the wave behaviors obtained by 20 minutes of observations from Mersin Bay in Turkey and converted to an instance based form, while different clustering techniques in data mining algorithms were used to discover similar coastal places. Moreover, this study discusses that this summarization approach can be used in other branches collecting big data such as medicine.
The increasing amount of collected data has limited the performance of the current analyzing algorithms. Thus, developing new cost-effective algorithms in terms of complexity, scalability, and accuracy raised significant interests. In this paper, a modified effective k-means based algorithm is developed and experimented. The new algorithm aims to reduce the computational load without significantly affecting the quality of the clusterings. The algorithm uses the City Block distance and a new stop criterion to guarantee the convergence. Conducted experiments on a real data set show its high performance when compared with the original k-means version.