Update to the Forum Resources, Polish

Discussion in 'Polski (Polish)' started by noychoh, Dec 29, 2013.

  1. noychoh New Member

    Hellow Everybody, especially the Moderators,

    I have seen that the Thread "Forum rules and resources" for Polish (now closed) is quite old (2005), yet it was edited quite recently (2013). By one of the Forum Moderators, I suppose. Therefore, please consider adding a very important link to it in the first section:

    http://korpus.pwn.pl/ - Korpus języka polskiego - Corpus of the Polish language

    At each search it gives up to 300 examples of real usage of the word searched for within its context. The words are given in all inflected forms found. The database is of ca. 40 million words taken from books, press, websites and e-mails, real-life audio recordings etc. - sentences of the present day usage. The free access database is limited to 7.5 million, which is still quite much. The interface is in Polish and in English. Unfortunately, the original errors of the texts are repeated in the database.

    Some advice from my experience: When searching restrict the:

    * Search results to 300 hits (maximum), with 100 to each page (maximum)
    * Citation length: 20 (even better: 30) left-side words; 20 right side words (if you accept the default values of 10 you might often miss the beginning and the end of the entence).
    * Corpus source texts: choose "entire material" not "diversified" (default value). This choise gives you the best result. The default one and the third value "Rzeczpospolita newspaper items" make sense only if you obtain an information that the search has given more than 300 results, because then you can see different sets of answers. Otherwise what you obtain with the first two options are but subsets of the "entire material" option.
    * Sort according to "word to right" -
    - sorting according to the "search word" (default value) is mostly useless, because mostly it is one and the same word, unless you are searching for different usages related to various inflected forms, e.g. search for "Polska" will give you the sentences sorted by "Polsce, polska/Polska, Polską, Polskę, polski/Polski, polskich, polskie, polskiego, polskiej, polskim")
    - sorting according to the "word to left" is completely useless as it begins with the first word in the citation, which means 20 words to the left form the word searched for (most often it is a word in the middle of a sentence, of no relevance to your search).

    All the best,
     
  2. Thomas1

    Thomas1 Senior Member

    polszczyzna warszawska
    I'd also recommend adding the National Corpus of the Polish Language found at: pl, en.
    You will find all the information about it at the website. I'll only add that it is completely free.
     

Share This Page