Fast Database Indexing for Large Protein Sequence Collections Using Parallel N-Gram Transformation Algorithm
 R. Bader, "OpenMP - Parallel programming on shared memory systems
": Leibniz-rechenzentrum. Retrieved September 14,2008 from:
 B. Barney, "Introduction to Parallel Computing." vol. 2008: Livermore
Computing,National Laboratory. Retrieved July 4, 2008 from:
 O. Beng Chin, P. Hwee Hwa, W. Hao, W. Limsoon, and Y. Cui, "Fast
filter-and-refine algorithms for subsequence selection," in Database
Engineering and Applications Symposium, 2002. Proceedings.
International, 2002, pp. 243-254.
 A. Califano and I. Rigoutsos, "FLASH: a fast look-up algorithm for
string homology," in Computer Vision and Pattern Recognition, 1993.
Proceedings CVPR '93., 1993 IEEE Computer Society Conference on,
New York, NY, USA, 1993, pp. 353-359.
 G. Cooper, M. Raymer, T. Doom, D. Krane, and N. Futamura, "Indexing
genomic databases," in Bioinformatics and Bioengineering, 2004. BIBE
2004. Proceedings. Fourth IEEE Symposium on, 2004, pp. 587-591.
 Q. Cory, "Introduction to programming shared-memory and distributedmemory
parallel computers," Crossroads, vol. 8, pp. 16-22, 2002.
 H. Ela, P. A. Malcolm, and W. I. Robert, "A Database Index to Large
Biological Sequences," in Proceedings of the 27th International
Conference on Very Large Data Bases: Morgan Kaufmann Publishers
 H. Ela, P. A. Malcolm, and W. I. Robert, "Database indexing for large
DNA and protein sequence collections," The VLDB Journal, vol. 11, pp.
 Z. Elberrichi and B. Aljohar, "N-grams in Texts Categorization,"
Scientific Journal of King Faisal University (Basic and Applied
Sciences), vol. Vol.8 No.2, pp. 25-38, 2007.
 C. Fondrat and P. Dessen, "A rapid access motif database (RAMdb) with
a search algorithm for the retrieval patterns in nucleic acids or protein
databanks," Comput. Appl. Biosci., vol. 11, pp. 273-279, June 1, 1995
 I. Foster, Designing and Building Parallel Programs. Boston, MA, USA:
Addison-Wesley Longman Publishing Co., Inc, 1995.
 A. Grama, A. Gupta, G. Karypis, and V. Kumar, Introduction to Parallel
Computing, Second Edition ed.: Addison Wesley, 2003.
 U. Hobohm and C. Sander, "A Sequence Property Approach to
Searching Protein Databases," Journal of Molecular Biology, vol. 251,
pp. 390-399, 1995.
 L. Hsiao Ping, T. Yin Te, S. Ching Hua, S. Tzu Fang, and T. Chuan Yi,
"An IDC-based algorithm for efficient homology filtration with
guaranteed seriate coverage," in Bioinformatics and Bioengineering,
2004. BIBE 2004. Proceedings. Fourth IEEE Symposium on, 2004, pp.
 M. Huerta, F. Haseltine, and Y. Liu. vol. 2008: National Institute of
Mental Health (NIH) , Biomedical information science and technology
initiative (BISTI), Retrieved February 6, 2008, from:
 M. N. Hwang and J. Kim, "Protein Sequence Search based on N-gram
Indexing," Bioinformatics and Biosystems, vol. Vol. 1, pp. 53-57, 2006.
 X. Jiang, P. Zhang, X. Liu, and S. S. T. Yau, "Survey on index based
homology search algorithms," Springer Science and Business /media, pp.
40:185-212, 23 March 2007 2007.
 T. Kahveci and A. K. Singh, "An Efficient Index Structure for String
Databases," in Proceedings of the37th VLDB conference, Roma, Italy,
2001, pp. 351--360.
 T. Kahveci and A. K. Singh, "MAP: searching large genome databases,"
in Pacific Symposium on Biocomputing, Hawaii, 2003, pp. 303-314.
 W. J. Kent, "BLAT---The BLAST-Like Alignment Tool," Genome Res.,
pp. GR-2292R, March 20, 2002.
 M. S. Kim, K. Y. Whang, J. G. Lee, and M. J. Lee, "n-Gram/2L: A
Space and Time Efficient Two-Level n-Gram Inverted Index Structure,"
in VLDB, Trondheim, Norway, 2005, pp. 325-336.
 C. D. Manning, P. Raghavan, and H. Schutze, Introduction to
Information Retrieval: Cambridge University Press, 2008.
 S. Microsystems, "Multithreaded programming guide," Sun
Microsystems, Inc Business. Retrieved September 14,2008 from:
 Z. B. Miled, N. Li, M. Mahoui, and O. Bukhres, "Information Retrieval
in Biomedical Research," Wiley encyclopedia of biomedical
 Z. Ning, A. J. Cox, and J. C. Mullikin, "SSAHA: A Fast Search Method
for Large DNA Databases," Genome Res., vol. 11, pp. 1725-1729,
October 1, 2001 2001.
 T. H. Ong, K. L. Tan, and H. Wang, "Indexing Genomic Databases for
Fast Homology Searching," in Proceedings of the 13th International
Conference on Database and Expert Systems Applications, 2002.
 B. Parhami, Introduction to Parallel Processing Algorithms and
Architectures New York: Kluwer Academic Publishers, 2002.
 C. Weimin and K. Aberer, "Efficient querying on genomic databases by
using metric space indexing techniques," in Proceedings of the 8th
International Workshop on Database and Expert Systems Applications:
IEEE Computer Society, 1997.
 H. Williams and J. Zobel, "Indexing nucleotide databases for fast query
evaluation," in Advances in Database Technology ÔÇö EDBT '96, 1996,
 H. E. WILLIAMS, "Effective query filtering for fast homology
searching," Pacific Symposium on Biocomputing, pp. 214 - 225 1999.
 H. E. Williams and J. Zobel, "Indexing and retrieval for genomic
databases," Knowledge and Data Engineering, IEEE Transactions on,
vol. 14, pp. 63-78, 2002.
 C. Xia, L. Shuai Cheng, O. Beng Chin, and K. H. T. Anthony, "Piers: an
efficient model for similarity search in DNA sequence databases,"
SIGMOD Rec., vol. 33, pp. 39-44, 2004.
 L. Yip Chi and B. Kao, "A study on n-gram indexing of musical
features," in International Conference on Multimedia and Expo, 2000
IEEE New York, NY, USA, 2000, pp. 869-872 vol.2.