Open Science Research Excellence

Open Science Index

Commenced in January 2007 Frequency: Monthly Edition: International Publications Count: 29710


Select areas to restrict search in scientific publication database:
10010204
Data Gathering and Analysis for Arabic Historical Documents
Authors:
Abstract:
This paper introduces a new dataset (and the methodology used to generate it) based on a wide range of historical Arabic documents containing clean data simple and homogeneous-page layouts. The experiments are implemented on printed and handwritten documents obtained respectively from some important libraries such as Qatar Digital Library, the British Library and the Library of Congress. We have gathered and commented on 150 archival document images from different locations and time periods. It is based on different documents from the 17th-19th century. The dataset comprises differing page layouts and degradations that challenge text line segmentation methods. Ground truth is produced using the Aletheia tool by PRImA and stored in an XML representation, in the PAGE (Page Analysis and Ground truth Elements) format. The dataset presented will be easily available to researchers world-wide for research into the obstacles facing various historical Arabic documents such as geometric correction of historical Arabic documents.
Digital Object Identifier (DOI):

References:

[1] Yang, P., Antonacopoulos, A., Clausner, C. & Pletschacher, S. Grid-based modelling and correction of arbitrarily warped historical document images for large-scale digitisation. Proceedings of the 2011 Workshop on Historical Document Imaging and Processing, 2011. ACM, 106-111.
[2] Lund, W. B. 2014. Ensemble Methods for Historical Machine-Printed Document Recognition
[3] Rahnemoonfar, M. 2010.Correction of arbitrarygeometric artefactsin historical documents. Salford: University of Salford.
[4] M. Pechwitz, S. S. Maddouri, V. M¨argner, N. Ellouze, H. Amiri, et al., “Ifn/enit-database of handwritten arabic words,” in Proc. of CIFED, vol. 2pp. 127–136, Citeseer, 2002.
[5] Slimane, F., Ingold, R., Kanoun, S., Alimi, A.M. and Hennebert, J., 2009, July. A new arabic printed text image database and evaluation protocols. In Document Analysis and Recognition, 2009. ICDAR'09. 10th International Conference on (pp. 946-950). IEEE.
[6] Mahmoud, S. A., Ahmad, I., Alshayeb, M., Al-Khatib, W. G., Parvez, M. T., Fink, G. A., Märgner, V. and El Abed, H., 2012, September. KHATT: Arabic offline handwritten text database. In Frontiers in Handwriting Recognition (ICFHR), 2012 International Conference on (pp. 449-454). IEEE.
[7] Mousa, I. S. 2001. The Arabs in the first communication revolution: development of the Arabic Script. Canadian Journal of communication, 26.
[8] Abuhaiba, I. S. 2003. A discrete Arabic script for better automatic document understanding.
[9] Alromima W, Elgohary R, Moawad IF, Aref M. Applying ontological engineering approach for Arabic Quran corpus: A comprehensive survey. InIntelligent Computing and Information Systems (ICICIS), 2015 IEEE Seventh International Conference on 2015 Dec 12 (pp. 620-627). IEEE.
[10] Suen, C. Y., Nikfal, S., Zhang, B. and Janbi, J., 2017. Characteristics of English, Chinese and Arabic Typefaces. In Advances in Chinese Document and Text Processing (pp. 1-30).
[11] Clausner, C., Pletschacher, S., and Antonacopoulos, A.(2011). Aletheia - an advanced document layout andtext ground-truthing system for production environments.In International Conference on Document Analysis and Recognition. Beijing, China, pp. 48–52.
[12] S. Pletschacher and A. Antonacopoulos, "The PAGE (PageAnalysis and Ground-Truth Elements) Format Framework",Proc. ICPR2008, Istanbul, Turkey, August 23-26, 2010,IEEE-CS Press, pp. 257-260.
Vol:13 No:06 2019Vol:13 No:05 2019Vol:13 No:04 2019Vol:13 No:03 2019Vol:13 No:02 2019Vol:13 No:01 2019
Vol:12 No:12 2018Vol:12 No:11 2018Vol:12 No:10 2018Vol:12 No:09 2018Vol:12 No:08 2018Vol:12 No:07 2018Vol:12 No:06 2018Vol:12 No:05 2018Vol:12 No:04 2018Vol:12 No:03 2018Vol:12 No:02 2018Vol:12 No:01 2018
Vol:11 No:12 2017Vol:11 No:11 2017Vol:11 No:10 2017Vol:11 No:09 2017Vol:11 No:08 2017Vol:11 No:07 2017Vol:11 No:06 2017Vol:11 No:05 2017Vol:11 No:04 2017Vol:11 No:03 2017Vol:11 No:02 2017Vol:11 No:01 2017
Vol:10 No:12 2016Vol:10 No:11 2016Vol:10 No:10 2016Vol:10 No:09 2016Vol:10 No:08 2016Vol:10 No:07 2016Vol:10 No:06 2016Vol:10 No:05 2016Vol:10 No:04 2016Vol:10 No:03 2016Vol:10 No:02 2016Vol:10 No:01 2016
Vol:9 No:12 2015Vol:9 No:11 2015Vol:9 No:10 2015Vol:9 No:09 2015Vol:9 No:08 2015Vol:9 No:07 2015Vol:9 No:06 2015Vol:9 No:05 2015Vol:9 No:04 2015Vol:9 No:03 2015Vol:9 No:02 2015Vol:9 No:01 2015
Vol:8 No:12 2014Vol:8 No:11 2014Vol:8 No:10 2014Vol:8 No:09 2014Vol:8 No:08 2014Vol:8 No:07 2014Vol:8 No:06 2014Vol:8 No:05 2014Vol:8 No:04 2014Vol:8 No:03 2014Vol:8 No:02 2014Vol:8 No:01 2014
Vol:7 No:12 2013Vol:7 No:11 2013Vol:7 No:10 2013Vol:7 No:09 2013Vol:7 No:08 2013Vol:7 No:07 2013Vol:7 No:06 2013Vol:7 No:05 2013Vol:7 No:04 2013Vol:7 No:03 2013Vol:7 No:02 2013Vol:7 No:01 2013
Vol:6 No:12 2012Vol:6 No:11 2012Vol:6 No:10 2012Vol:6 No:09 2012Vol:6 No:08 2012Vol:6 No:07 2012Vol:6 No:06 2012Vol:6 No:05 2012Vol:6 No:04 2012Vol:6 No:03 2012Vol:6 No:02 2012Vol:6 No:01 2012
Vol:5 No:12 2011Vol:5 No:11 2011Vol:5 No:10 2011Vol:5 No:09 2011Vol:5 No:08 2011Vol:5 No:07 2011Vol:5 No:06 2011Vol:5 No:05 2011Vol:5 No:04 2011Vol:5 No:03 2011Vol:5 No:02 2011Vol:5 No:01 2011
Vol:4 No:12 2010Vol:4 No:11 2010Vol:4 No:10 2010Vol:4 No:09 2010Vol:4 No:08 2010Vol:4 No:07 2010Vol:4 No:06 2010Vol:4 No:05 2010Vol:4 No:04 2010Vol:4 No:03 2010Vol:4 No:02 2010Vol:4 No:01 2010
Vol:3 No:12 2009Vol:3 No:11 2009Vol:3 No:10 2009Vol:3 No:09 2009Vol:3 No:08 2009Vol:3 No:07 2009Vol:3 No:06 2009Vol:3 No:05 2009Vol:3 No:04 2009Vol:3 No:03 2009Vol:3 No:02 2009Vol:3 No:01 2009
Vol:2 No:12 2008Vol:2 No:11 2008Vol:2 No:10 2008Vol:2 No:09 2008Vol:2 No:08 2008Vol:2 No:07 2008Vol:2 No:06 2008Vol:2 No:05 2008Vol:2 No:04 2008Vol:2 No:03 2008Vol:2 No:02 2008Vol:2 No:01 2008
Vol:1 No:12 2007Vol:1 No:11 2007Vol:1 No:10 2007Vol:1 No:09 2007Vol:1 No:08 2007Vol:1 No:07 2007Vol:1 No:06 2007Vol:1 No:05 2007Vol:1 No:04 2007Vol:1 No:03 2007Vol:1 No:02 2007Vol:1 No:01 2007