Mathematical modeling of the frequencies of words of different lengths in written Hindi language corpora and examination of the role of texts’ stylistic factor in model’s parameters

Authors

  • Hemlata Pande Dept. of Mathematics, G.P.G.C. Bageshwar, Uttarakhand
  • Hoshiyar S. Dhami Uttarakhand Residential University, Almora Uttarakhand

DOI:

https://doi.org/10.1558/jrds.33107

Keywords:

Word length, model, 2-Poisson, classification, Hindi

Abstract

In quantitative research related to the areas of language and linguistics, first the linguistic features are specified and counted, and then statistical models are constructed in order to explicate these observed facts. In the present paper, an attempt has been made to represent the pattern of occurrence of words of different lengths in various corpora of Hindi language in the form of a mathematical model and an inspection has been made to check the dependency of the parameters of investigated model for a particular text in the type of text by selection of texts under categories media/essay and creative writing; or in other words we have attempted to test the applications of the parameters of the model in text classification process.

Author Biographies

  • Hemlata Pande, Dept. of Mathematics, G.P.G.C. Bageshwar, Uttarakhand

    Dr. Hemlata Pande is at present affiliated to the Department of Mathematics, Govt. P. G. College Bageshwar, Uttarakhand, India.

  • Hoshiyar S. Dhami, Uttarakhand Residential University, Almora Uttarakhand

    Prof. H. S. Dhami, M.Sc., Ph.D., is Vice Chancellor, of Uttararakhand Resendiatal University, Almora Uttarakhand , India.

References

Abbe, S. (2000). Word length distribution in Arabic letters. Journal of Quantitative Linguistics, 7 (2), 121–127. https://doi.org/10.1076/0929-6174(200008)07:02;1-Z;FT121

Alekseev, P. M. (1998). Graphemic and syllabic length of words in text and vocabulary. Journal of Quantitative Linguistics, 5 (1–2), 5–12. https://doi.org/10.1080/09296179808590107

Anti?, G., Kelih, E., and Grzybek, P. (2006). Zero syllable words in determining word length. In P. Grzybek (Ed.) Contributions to the Science of Text and Language: Word Length Studies and Related Issues, 117–156. Springer, Netherlands. https://doi.org/10.1007/1-4020-4068-7_4

Anti?, G., Stadlober, E., Grzybek, P., and Kelih, E. (2006). Word Length and Frequency Distributions in Different Text Genres. From Data and Information Analysis to Know­ledge Engineering, 310–317. Springer, Berlin Heidelberg. https://doi.org/10.1007/3-540-31314-1_37

Aoyama, H and Constable, J. (1999). Word length frequency and distribution in English: Part I. Prose. Literary and Linguistic Computing 14 (3), 339–358. https://doi.org/10.1093/llc/14.3.339

Bharati, A., Rao K, P., Sangal R. and Bendre, S. M. (2002). Basic statistical analysis of corpus and cross comparison among corpora. In Proceedings of 2002 International Confer­ence on Natural Language Processing, Mumbai, India. Available at: http://ltrc.iiit.ac.in/MachineTrans/publications/technicalReports/tr022/camera-187.pdf

Barbaro, S. (2000). Word length distribution in Italian letters by Pier Paolo Pasolini, Journal of Quantitative Linguistics 7 (2), 115–120. https://doi.org/10.1076/0929-6174(200008)07:02;1-Z;FT115

Best, K.-H. (1996). Word length in Old Icelandic songs and prose texts, Journal of Quantitative Linguistics 3 (2), 97–105. https://doi.org/10.1080/09296179608599619

Dittrich, H. (1996). Word length frequency in the letters of G. E. Lessing, Journal of Quantitative Linguistics 3 (3), 260–264. https://doi.org/10.1080/09296179608599633

Frischen, J. (1996). Word length analysis of Jane Austen’s letters, Journal of Quantitative Linguistics 3 (1), 80–84. https://doi.org/10.1080/09296179608590066

Gómez, P. C. (2013). Statistical Methods in Language and Linguistic Research. Sheffield: Equinox Publishing Ltd.

Gries, S. T. (2009). Statistics for Linguistics. Berlin: R. De Gruyter Mouton. https://doi.org/10.1515/9783110216042

Grzybek, P. (Ed.) (2006). Contributions to the Science of Text and Language: Word Length Studies and Related Issues, Rotterdam: Springer. https://doi.org/10.1007/1-4020-4068-7

Grzybek, P., Stadlober, E., Kelih, E., and Anti?, G. (2005). Quantitative text typology: The impact of word length. In: C. Weihs and W. Gaul (Eds), Classification – The Ubiquitous Challenge, 53–64. Heidelberg, Springer. https://doi.org/10.1007/3-540-28084-7_5

Hatzigeorgiu, N., Mikros, G., and Carayannis, G. (2001). Word length, word frequencies and Zipf’s Law in the Greek language. Journal of Quantitative Linguistics 8 (3), 175–185. https://doi.org/10.1076/jqul.8.3.175.4096

Jayaram, B. D. and Vidya, M. N. (2006). Word length distribution in Indian languages, Glottometrics 12, 16–38.

Kelih, E., Anti?, G., Grzybek, P., and Stadlober, E. (2005). Classification of author and/or genre? The impact of word length. In C. Weihs and W. Gaul (Eds) Classification, the Ubiquitous Challenge, 498–505. Springer Berlin-Heidelberg. https://doi.org/10.1007/3-540-28084-7_58

Kromer, V. (2001). Word length model based on one displaced Poisson uniform distribution. Glottometrics 1, 87–96.

Krott, A. (1996). Some remarks on the relation between word length and morpheme length. Journal of Quantitative Linguistics 3 (1), 29–37. https://doi.org/10.1080/09296179608590061

Krylov, J. K. (2002). Synergetic models and methods in quantitative linguistics. Journal of Quantitative Linguistics 9 (2), 125–185. https://doi.org/10.1076/jqul.9.2.125.8487

Leopold, E. (1998). Frequency spectra within word?length classes. Journal of Quantitative Linguistics 5 (3), 224–231. https://doi.org/10.1080/09296179808590130

Lupsa, D. A. and Lupsa, R. (2005). The law of word length in a vocabulary. Studia Univ. Babes-Bolyal, Informatica, Vol. L, No. 2.

Manning, C. D. and Schütze, H. (1999). Foundations of Statistical Natural Language Processing. Cambridge: MIT Press.

Meyer, P. (1999). Relating word length to morphemic structure: A morphologically motivated class of discrete probability distributions, Journal of Quantitative Linguistics 6 (1), 66–69. https://doi.org/10.1076/jqul.6.1.66.4143

PawlRottmann, O. A. (1999). Word and syllable lengths in East Slavonic, Journal of Quantitative Linguistics 6 (3), 235–238. https://doi.org/10.1076/jqul.6.3.235.6162

Pande, H. and Dhami, H. S. (2010). Mathematical modelling of occurrence of letters and word’s initials in texts of Hindi Language. SKASE Journal of Theoretical Linguistics 7 (2), 19–38.

Pande, H. and Dhami, H. S. (2012).: Model generation for word length frequencies in texts with the application of Zipf’s order approach, Journal of Quantitative Linguistics 19 (4), 249–261. https://doi.org/10.1080/09296174.2012.714531

Pande, H. and Dhami, H. S. (2013a).Mathematical modelling of the pattern of occurrence of words in different corpora of the Hindi language, Journal of Quantitative Linguistics 20 (1), 1–12. https://doi.org/10.1080/09296174.2012.754596

Pande, H. and Dhami, H. S. (2013b). Analysis for the significance of statistical word-length features in genre discrimination of Hindi texts. IOSR Journal of Mathematics 8 (1), 5–10. https://doi.org/10.9790/5728-0810510

Popescu, I.-I., Naumann, S., Kelih, E., Rovenchak, A., Overbeck, A., Sanada, H., Smith, R., ?ech, R., Mohanty, P., Wilson, A., and Altmann, G. (2013). Word length: Aspects and languages. In G. Altmann and R. Köhler (Eds), Issues in Quantitative Linguistics Vol. 3, 224–281. Studies in Quantitative Linguistics, vol. 13, Lüdenscheid: RAM-Verlag.

Renkui, H. and Minghu, J. (2012). Discrimination of Chinese Quantitative Style Features Based on Text Clustering. 11th International Conference on Signal Processing (ICSP), 2012 IEEE, 21–25 October 2012, Beijing.

Röttger, W. (1996). Distribution of word length in Ciceronian letters. Journal of Quantitative Linguistics 3 (1), 68–72. https://doi.org/10.1080/09296179608590064

Rottmann, O. (2003). Word length in the Baltic languages – are they of the same type as the word lengths in the Slavic languages? Glottometrics 6, 52–60.

Rottmann, O. A. (1997). Word?length counting in Old Church Slavonic. Journal of Quant­itative Linguistics, 4 (1–3), 252–256. https://doi.org/10.1080/09296179708590101

Sigurd B., Eeg-Olofsson M., and Weijer, J. van de (2004). Word length, sentence length and frequency – Zipf revisited. Studia Linguistica 58 (1), 37–52. https://doi.org/10.1111/j.0039-3193.2004.00109.x

T?šitelová, M. (1992). Quantitative Linguistics. Amsterdam/Philadelphia: John Benjamins Publishing Company. https://doi.org/10.1075/llsee.37

Uhlírová, L. (1995). On the generality of statistical laws and individuality of texts. A case of syllables, word forms, their length and frequencies, Journal of Quantitative Linguistics 2 (3), 238–247. https://doi.org/10.1080/09296179508590052

Uhlírová, L. (1999). Word length modelling: Intertextuality as a relevant factor? Journal of Quantitative Linguistics 6 (3), 252–256. https://doi.org/10.1076/jqul.6.3.252.6165

Wilson, A. (2003). Word length distribution in modern Welsh prose texts. Glottometrics 6, 35–39.

Wilson, A. (2006). Word-length distribution in present-day lower Sorbian newspaper texts. In P. Grzybek (Ed.), Contributations to the Science of Text and Language: Word Length Studies and Related Issues, 319–327. Rotterdam: Springer.

Ziegler, A. (1996). Word length distribution in Brazilian?Portuguese texts, Journal of Quantitative Linguistics 3 (1), 73–79. https://doi.org/10.1080/09296179608590065

Ziegler, A. (2000). Word length in Romance languages. A complemental contribution, Journal of Quantitative Linguistics 7 (1), 65–68. https://doi.org/10.1076/0929-6174(200004)07:01;1-3;FT065

Published

2018-02-28

Issue

Section

Articles

How to Cite

Pande, H., & Dhami, H. S. (2018). Mathematical modeling of the frequencies of words of different lengths in written Hindi language corpora and examination of the role of texts’ stylistic factor in model’s parameters. Journal of Research Design and Statistics in Linguistics and Communication Science, 4(1), 73-87. https://doi.org/10.1558/jrds.33107

Most read articles by the same author(s)

1 2 3 4 5 6 > >>