Iteration over all words goes like this: from text_hr import get_all_std_wordsįor word_base, l_key, cnt, _suff_id, wform_key, wform in get_all_std_words(): Totaly 2904 word forms dumped to r:\hg-clones\python\text-hr\text_hr\std_words.txt in codepage utf8 The list can be updated like this: > import text_hr Is located in std_words.txt, and you can read it directly from here > wdh = test_it(word_list, level=4) # doctest: +ELLIPSIS , "brijestovi 1" #the only one checked with endswith, but all other will be checked with get_freq wdh.dump_result(lines_file) # doctest: +NORMALIZE_WHITESPACE +ELLIPSIS wdh.detect(wt_filter=wt_filter, level=level) # e.g. wdh = WordTypeRecognizerExample(word_list, silent=True) > def test_it(word_list, wt_filter=None, level=2): TODO: to be done - check test_detect.txt for samples, and detect.py for the logic:įirst example in test_detect.txt: > from text_hr.detect import WordTypeRecognizerExample X_VAD_PRE// Detection of word types (POS tagging) Usage example - start python shell: > from text_hr import Verb GETTING STARTED There are three important parts that this project provides:
: pip install text-hr If not, then do it old-fashioned way: Installation instructions - if you have installed pip package System is based on unicode strings, default codepage to convert from and to Inflection system - for producing all forms of one wordĭetection of word types (POS tagging) - from existing list of word forms Robert Lujo, Zagreb, Croatia, find mail address in LICENCE FEATURES To name the most important: Jezika, zaustavne riječi, morfološki leksikon AUTHOR Obrnuta infleksija, prepoznavanje vrsta riječi, računalna obrada govornog Hrvatski jezik, lematizacija, Python biblioteka, morfologija, infleksija, Language processing (NLP), Part-of-speech (POS) tagging, stopwords, inverse TAGSĬroatian language, lemmatization, stemming, inflection, python, natural Since API is not freezed, this project is still in alpha. Part-Of-Speech tagging engine (POS tagging) based on inverse inflection Language written in Python programming language. “text-hr” is Morphological/Inflectional/Lemmatization Engine for Croatian Morphological/Inflection/Lemmatization Engine for Croatian language