PART OF SPEECH TAGGING IN ROMANIAN TEXTS

Authors

  • Claudia Circioroaba Lucian Blaga University of Sibiu
  • Mihai Stancu Lucian Blaga University of Sibiu
  • Daniel Morariu Lucian Blaga University of Sibiu
  • Daniel Volovici Lucian Blaga University of Sibiu

Abstract

Identifying Parts of Speech (PoS) represents the process by which grammar tags containing their corresponding PoS are attached automatic to every word within a sentence. Since no word acts as just one single PoS—their syntactic value depending on the context they are used in—identifying parts of speech is not a trivial matter. In this paper we have taken into account two tagging methods, based on Naïve Bayes’ classifier probabilities and the occurring context of the word for which the PoS must be identified. We have called these methods Backward Naïve Bayes and Forward Naïve Bayes. For Romanian language, we have taken into account seven different PoS as: noun, verb, adjective, adverb, article, preposition plus the „and others” category. From conducted experiments, we have observed that identifying the PoS for a word based on the PoS for the previous word produces better results in all respects. We have studied each PoS separately and have concluded that there also are more easily identifiable PoS in Romanian as well: article, preposition, noun, verb; meanwhile the adjective and adverb are more problematic in identifying the PoS.

References

Agavriloaei Ioan, Modele şi Algoritmi Mining, PhD thesis, 2012

Dumitru-Clementin Cercel, POS tagger bazat pe modelul HMM, Romanian journal of Human- Computer interaction, 2012

Colhon M., Procesarea Limbajului Natural, 2012, https://www.google.ro/webhp?sourceid= chrome-instant&ion=1&espv=2&ie=UTF-8#q=Colhon+M.%2C+Procesarea +Limbajului+Natural , accessed in February 2016

http://www.mcolhon.ro/patterns/index.html - accessed in February 2016

Radu G. Cretulescu, Daniel I. Morariu, Text Mining. Tehnici de clasificare si clustering al documentelor, Published at Editura Albastra, Cluj Napoca, 2012, ISBN 978-973-650-289-7

R. CRETULESCU, A. DAVID, D. MORARIU, L. VINŢAN - Part of Speech Tagging with Naive Bayes Methods, Proceedings of The 18-th International Conference on System Theory, Control and Computing, Sinaia (Romania), October 17 - 19, 2014

Dan Jurafsky, James H. Martin, Speech and Language Processing, 2016, https://web.stanford.edu/~jurafsky/slp3/, accessed in February 2017

Daniel I. Morariu, Text Mining Methods based on Support Vector Machine, MATRIX ROM Publishing house, Bucharest, ISBN 978-973-755-343-0, 168 pages, 2008.

Robi Polikar, Pattern recognition, Wiley Encyclopedia of BioMedical Engineering, 2006

Data Mining From A to Z, SAS Institute Inc., 2015, www.Sas.com

Catalin Stoean, Ruxandra Stoean, Support Vector Machines and Evolutionary Algorithms for Classification: Single or Together?, Intelligent Systems Reference Library, Volume 69, Springer, 2014

Dan Tufiș, Promovarea limbii române în SI – SC, www.racai.ro/media/Tufis-SISC2001.pdf, published in 2001

Downloads

Published

2018-03-07

Issue

Section

Articles