Open Science Research Excellence

Open Science Index

Commenced in January 2007 Frequency: Monthly Edition: International Paper Count: 14

A Consideration of the Achievement of Productive Level Parallel Programming Skills

This paper gives a consideration of the achievement of productive level parallel programming skills, based on the data of the graduation studies in the Polytechnic University of Japan. The data show that most students can achieve only parallel programming skills during the graduation study (about 600 to 700 hours), if the programming environment is limited to GPGPUs. However, the data also show that it is a very high level task that a student achieves productive level parallel programming skills during only the graduation study. In addition, it shows that the parallel programming environments for GPGPU, such as CUDA and OpenCL, may be more suitable for parallel computing education than other environments such as MPI on a cluster system and Cell.B.E. These results must be useful for the areas of not only software developments, but also hardware product developments using computer technologies.

Parallelization of Ensemble Kalman Filter (EnKF) for Oil Reservoirs with Time-lapse Seismic Data

In this paper we describe the design and implementation of a parallel algorithm for data assimilation with ensemble Kalman filter (EnKF) for oil reservoir history matching problem. The use of large number of observations from time-lapse seismic leads to a large turnaround time for the analysis step, in addition to the time consuming simulations of the realizations. For efficient parallelization it is important to consider parallel computation at the analysis step. Our experiments show that parallelization of the analysis step in addition to the forecast step has good scalability, exploiting the same set of resources with some additional efforts.

Performance Comparison of Parallel Sorting Algorithms on the Cluster of Workstations

Sorting appears the most attention among all computational tasks over the past years because sorted data is at the heart of many computations. Sorting is of additional importance to parallel computing because of its close relation to the task of routing data among processes, which is an essential part of many parallel algorithms. Many parallel sorting algorithms have been investigated for a variety of parallel computer architectures. In this paper, three parallel sorting algorithms have been implemented and compared in terms of their overall execution time. The algorithms implemented are the odd-even transposition sort, parallel merge sort and parallel rank sort. Cluster of Workstations or Windows Compute Cluster has been used to compare the algorithms implemented. The C# programming language is used to develop the sorting algorithms. The MPI (Message Passing Interface) library has been selected to establish the communication and synchronization between processors. The time complexity for each parallel sorting algorithm will also be mentioned and analyzed.

Bridged Quantum Cellular Automata based on Si/SiO2 Superlattices
The new architecture for quantum cellular automata is offered. A QCA cell includes two layers nc-Si, divided by a dielectric. Among themselves cells are connected by the bridge from a conductive material. The comparison is made between this and QCA, offered earlier by C. Lent's group.
High Performance in Parallel Data Integration: An Empirical Evaluation of the Ratio Between Processing Time and Number of Physical Nodes

Many studies have shown that parallelization decreases efficiency [1], [2]. There are many reasons for these decrements. This paper investigates those which appear in the context of parallel data integration. Integration processes generally cannot be allocated to packages of identical size (i. e. tasks of identical complexity). The reason for this is unknown heterogeneous input data which result in variable task lengths. Process delay is defined by the slowest processing node. It leads to a detrimental effect on the total processing time. With a real world example, this study will show that while process delay does initially increase with the introduction of more nodes it ultimately decreases again after a certain point. The example will make use of the cloud computing platform Hadoop and be run inside Amazon-s EC2 compute cloud. A stochastic model will be set up which can explain this effect.

An Enhanced Distributed System to improve theTime Complexity of Binary Indexed Trees

Distributed Computing Systems are usually considered the most suitable model for practical solutions of many parallel algorithms. In this paper an enhanced distributed system is presented to improve the time complexity of Binary Indexed Trees (BIT). The proposed system uses multi-uniform processors with identical architectures and a specially designed distributed memory system. The analysis of this system has shown that it has reduced the time complexity of the read query to O(Log(Log(N))), and the update query to constant complexity, while the naive solution has a time complexity of O(Log(N)) for both queries. The system was implemented and simulated using VHDL and Verilog Hardware Description Languages, with xilinx ISE 10.1, as the development environment and ModelSim 6.1c, similarly as the simulation tool. The simulation has shown that the overhead resulting by the wiring and communication between the system fragments could be fairly neglected, which makes it applicable to practically reach the maximum speed up offered by the proposed model.

Fast Database Indexing for Large Protein Sequence Collections Using Parallel N-Gram Transformation Algorithm
With the rapid development in the field of life sciences and the flooding of genomic information, the need for faster and scalable searching methods has become urgent. One of the approaches that were investigated is indexing. The indexing methods have been categorized into three categories which are the lengthbased index algorithms, transformation-based algorithms and mixed techniques-based algorithms. In this research, we focused on the transformation based methods. We embedded the N-gram method into the transformation-based method to build an inverted index table. We then applied the parallel methods to speed up the index building time and to reduce the overall retrieval time when querying the genomic database. Our experiments show that the use of N-Gram transformation algorithm is an economical solution; it saves time and space too. The result shows that the size of the index is smaller than the size of the dataset when the size of N-Gram is 5 and 6. The parallel N-Gram transformation algorithm-s results indicate that the uses of parallel programming with large dataset are promising which can be improved further.
An Innovational Intermittent Algorithm in Networks-On-Chip (NOC)
Every day human life experiences new equipments more automatic and with more abilities. So the need for faster processors doesn-t seem to finish. Despite new architectures and higher frequencies, a single processor is not adequate for many applications. Parallel processing and networks are previous solutions for this problem. The new solution to put a network of resources on a chip is called NOC (network on a chip). The more usual topology for NOC is mesh topology. There are several routing algorithms suitable for this topology such as XY, fully adaptive, etc. In this paper we have suggested a new algorithm named Intermittent X, Y (IX/Y). We have developed the new algorithm in simulation environment to compare delay and power consumption with elders' algorithms.
A Parallel Quadtree Approach for Image Compression using Wavelets
Wavelet transforms are multiresolution decompositions that can be used to analyze signals and images. Image compression is one of major applications of wavelet transforms in image processing. It is considered as one of the most powerful methods that provides a high compression ratio. However, its implementation is very time-consuming. At the other hand, parallel computing technologies are an efficient method for image compression using wavelets. In this paper, we propose a parallel wavelet compression algorithm based on quadtrees. We implement the algorithm using MatlabMPI (a parallel, message passing version of Matlab), and compute its isoefficiency function, and show that it is scalable. Our experimental results confirm the efficiency of the algorithm also.
Performance Analysis of Parallel Client-Server Model Versus Parallel Mobile Agent Model
Mobile agent has motivated the creation of a new methodology for parallel computing. We introduce a methodology for the creation of parallel applications on the network. The proposed Mobile-Agent parallel processing framework uses multiple Javamobile Agents. Each mobile agent can travel to the specified machine in the network to perform its tasks. We also introduce the concept of master agent, which is Java object capable of implementing a particular task of the target application. Master agent is dynamically assigns the task to mobile agents. We have developed and tested a prototype application: Mobile Agent Based Parallel Computing. Boosted by the inherited benefits of using Java and Mobile Agents, our proposed methodology breaks the barriers between the environments, and could potentially exploit in a parallel manner all the available computational resources on the network. This paper elaborates performance issues of a mobile agent for parallel computing.
On the Efficient Implementation of a Serial and Parallel Decomposition Algorithm for Fast Support Vector Machine Training Including a Multi-Parameter Kernel

This work deals with aspects of support vector machine learning for large-scale data mining tasks. Based on a decomposition algorithm for support vector machine training that can be run in serial as well as shared memory parallel mode we introduce a transformation of the training data that allows for the usage of an expensive generalized kernel without additional costs. We present experiments for the Gaussian kernel, but usage of other kernel functions is possible, too. In order to further speed up the decomposition algorithm we analyze the critical problem of working set selection for large training data sets. In addition, we analyze the influence of the working set sizes onto the scalability of the parallel decomposition scheme. Our tests and conclusions led to several modifications of the algorithm and the improvement of overall support vector machine learning performance. Our method allows for using extensive parameter search methods to optimize classification accuracy.

Optimal All-to-All Personalized Communication in All-Port Tori

All-to-all personalized communication, also known as complete exchange, is one of the most dense communication patterns in parallel computing. In this paper, we propose new indirect algorithms for complete exchange on all-port ring and torus. The new algorithms fully utilize all communication links and transmit messages along shortest paths to completely achieve the theoretical lower bounds on message transmission, which have not be achieved among other existing indirect algorithms. For 2D r × c ( r % c ) all-port torus, the algorithm has time complexities of optimal transmission cost and O(c) message startup cost. In addition, the proposed algorithms accommodate non-power-of-two tori where the number of nodes in each dimension needs not be power-of-two or square. Finally, the algorithms are conceptually simple and symmetrical for every message and every node so that they can be easily implemented and achieve the optimum in practice.

A Systematic Construction of Instability Bounds in LIS Networks

In this work, we study the impact of dynamically changing link slowdowns on the stability properties of packetswitched networks under the Adversarial Queueing Theory framework. Especially, we consider the Adversarial, Quasi-Static Slowdown Queueing Theory model, where each link slowdown may take on values in the two-valued set of integers {1, D} with D > 1 which remain fixed for a long time, under a (w, p)-adversary. In this framework, we present an innovative systematic construction for the estimation of adversarial injection rate lower bounds, which, if exceeded, cause instability in networks that use the LIS (Longest-in- System) protocol for contention-resolution. In addition, we show that a network that uses the LIS protocol for contention-resolution may result in dropping its instability bound at injection rates p > 0 when the network size and the high slowdown D take large values. This is the best ever known instability lower bound for LIS networks.

Architecture Based on Dynamic Graphs for the Dynamic Reconfiguration of Farms of Computers

In the last years, the computers have increased their capacity of calculus and networks, for the interconnection of these machines. The networks have been improved until obtaining the actual high rates of data transferring. The programs that nowadays try to take advantage of these new technologies cannot be written using the traditional techniques of programming, since most of the algorithms were designed for being executed in an only processor,in a nonconcurrent form instead of being executed concurrently ina set of processors working and communicating through a network.This paper aims to present the ongoing development of a new system for the reconfiguration of grouping of computers, taking into account these new technologies.

Vol:13 No:05 2019Vol:13 No:04 2019Vol:13 No:03 2019Vol:13 No:02 2019Vol:13 No:01 2019
Vol:12 No:12 2018Vol:12 No:11 2018Vol:12 No:10 2018Vol:12 No:09 2018Vol:12 No:08 2018Vol:12 No:07 2018Vol:12 No:06 2018Vol:12 No:05 2018Vol:12 No:04 2018Vol:12 No:03 2018Vol:12 No:02 2018Vol:12 No:01 2018
Vol:11 No:12 2017Vol:11 No:11 2017Vol:11 No:10 2017Vol:11 No:09 2017Vol:11 No:08 2017Vol:11 No:07 2017Vol:11 No:06 2017Vol:11 No:05 2017Vol:11 No:04 2017Vol:11 No:03 2017Vol:11 No:02 2017Vol:11 No:01 2017
Vol:10 No:12 2016Vol:10 No:11 2016Vol:10 No:10 2016Vol:10 No:09 2016Vol:10 No:08 2016Vol:10 No:07 2016Vol:10 No:06 2016Vol:10 No:05 2016Vol:10 No:04 2016Vol:10 No:03 2016Vol:10 No:02 2016Vol:10 No:01 2016
Vol:9 No:12 2015Vol:9 No:11 2015Vol:9 No:10 2015Vol:9 No:09 2015Vol:9 No:08 2015Vol:9 No:07 2015Vol:9 No:06 2015Vol:9 No:05 2015Vol:9 No:04 2015Vol:9 No:03 2015Vol:9 No:02 2015Vol:9 No:01 2015
Vol:8 No:12 2014Vol:8 No:11 2014Vol:8 No:10 2014Vol:8 No:09 2014Vol:8 No:08 2014Vol:8 No:07 2014Vol:8 No:06 2014Vol:8 No:05 2014Vol:8 No:04 2014Vol:8 No:03 2014Vol:8 No:02 2014Vol:8 No:01 2014
Vol:7 No:12 2013Vol:7 No:11 2013Vol:7 No:10 2013Vol:7 No:09 2013Vol:7 No:08 2013Vol:7 No:07 2013Vol:7 No:06 2013Vol:7 No:05 2013Vol:7 No:04 2013Vol:7 No:03 2013Vol:7 No:02 2013Vol:7 No:01 2013
Vol:6 No:12 2012Vol:6 No:11 2012Vol:6 No:10 2012Vol:6 No:09 2012Vol:6 No:08 2012Vol:6 No:07 2012Vol:6 No:06 2012Vol:6 No:05 2012Vol:6 No:04 2012Vol:6 No:03 2012Vol:6 No:02 2012Vol:6 No:01 2012
Vol:5 No:12 2011Vol:5 No:11 2011Vol:5 No:10 2011Vol:5 No:09 2011Vol:5 No:08 2011Vol:5 No:07 2011Vol:5 No:06 2011Vol:5 No:05 2011Vol:5 No:04 2011Vol:5 No:03 2011Vol:5 No:02 2011Vol:5 No:01 2011
Vol:4 No:12 2010Vol:4 No:11 2010Vol:4 No:10 2010Vol:4 No:09 2010Vol:4 No:08 2010Vol:4 No:07 2010Vol:4 No:06 2010Vol:4 No:05 2010Vol:4 No:04 2010Vol:4 No:03 2010Vol:4 No:02 2010Vol:4 No:01 2010
Vol:3 No:12 2009Vol:3 No:11 2009Vol:3 No:10 2009Vol:3 No:09 2009Vol:3 No:08 2009Vol:3 No:07 2009Vol:3 No:06 2009Vol:3 No:05 2009Vol:3 No:04 2009Vol:3 No:03 2009Vol:3 No:02 2009Vol:3 No:01 2009
Vol:2 No:12 2008Vol:2 No:11 2008Vol:2 No:10 2008Vol:2 No:09 2008Vol:2 No:08 2008Vol:2 No:07 2008Vol:2 No:06 2008Vol:2 No:05 2008Vol:2 No:04 2008Vol:2 No:03 2008Vol:2 No:02 2008Vol:2 No:01 2008
Vol:1 No:12 2007Vol:1 No:11 2007Vol:1 No:10 2007Vol:1 No:09 2007Vol:1 No:08 2007Vol:1 No:07 2007Vol:1 No:06 2007Vol:1 No:05 2007Vol:1 No:04 2007Vol:1 No:03 2007Vol:1 No:02 2007Vol:1 No:01 2007