Learning tips: Learning Chinese characters

twenty6

Senior Member
English - U.S., Chinese - Mandarin
Knowledge of 3,000 words would be nowhere near enough to be able to read a newspaper.
Then according to you, one would need around 4-5000 words to read a newspaper, no? If one character=one word then my estimate is correct by your standards.

Perhaps my interpretation of "characters" is wrong, but my point is that 3000 characters is not enough to understand a newspaper article.

I myself, when I knew about some 3000 words (some 4 years ago), could roughly understand some newspaper articles. As said, it depends on word choice.
 
  • radagasty

    Senior Member
    Australia, Cantonese
    Then according to you, one would need around 4-5000 words to read a newspaper, no? If one character=one word then my estimate is correct by your standards. Perhaps my interpretation of "characters" is wrong, but my point is that 3000 characters is not enough to understand a newspaper article.

    I think it is rather the definition of ‘word’ that might be problematic. (And indeed, the notion of a word is not well-defined in linguistics.)

    Character = 字, and is a unit of writing, usually rather well defined, since each character typically gets its own square space in print. Thus, 檳, 榔, 樂, 樹 and 懶 are all characters. 樹 and 懶 are themselves also words, meaning ‘tree’ and ‘lazy’ respectively, as is the combination 樹懶 ‘sloth’. Then, there are characters that can represent more than one word, e.g., 樂, which is used to write three different words: ‘happy’, ‘music’ and ‘to love’. Finally, there are characters that do not by themselves represent words, e.g., 檳 and 榔, because each one has no meaning by itself; together, though, they constitute a disyllabic word 檳榔 meaning ‘betel’, and 檳榔樹 is a trisyllabic word meaning ‘betel palm’.

    Thus, it is not really the case that ‘one character=one word’. Most characters do constitute words by themselves, but there are many more words that are composed of two or more characters. My point was that 3,000 characters may be sufficient to read a newspaper (at a basic level), but one would need to know many more words (formed from these characters), certainly no fewer than 10,000.

    I myself, when I knew about some 3000 words (some 4 years ago), could roughly understand some newspaper articles.

    May I ask how you measured the number of words you knew? I mean, have no idea how many words I know, either in English or in Chinese. And, if as your tag indicates, you are a native speaker of Chinese, you should certainly have known/know a lot more than 3,000 words. I suppose (and this is just a stab in the dark) that a child starting primary school would already have learnt that many words, even if he doesn't know how to write them.
     

    twenty6

    Senior Member
    English - U.S., Chinese - Mandarin
    Character = 字, and is a unit of writing, usually rather well defined, since each character typically gets its own square space in print.
    I see. Never paid much attention to specific technicalities :(

    Now: let's say one needs 10,000 characters to read a newspaper.

    If (this is a guess) 1/2 of the words were one character, 1/3 were two character, and 1/6 were three character, that would be 166 characters for 100 words, which means for a 10,000 character newspaper one would need to know 6024 words. That's too much. I'm not even sure if my Chinese dictionary has 7000 words. I'd put my estimate at 5000 words.
    May I ask how you measured the number of words you knew?
    There are estimates that one needs 2000-3000 words for daily life in China; I can make conversation in Chinese fairly easily, so a 3000 word estimate is probably not that inaccurate.

    Of course, it is hard to estimate such things.
     

    dojibear

    Senior Member
    AE (US English)
    My point was that 3,000 characters may be sufficient to read a newspaper (at a basic level), but one would need to know many more words (formed from these characters), certainly no fewer than 10,000.
    That sounds realistic to me.

    And that points out the major difference between native speakers and foreign learners. A native speaker is fluent in the spoken language by age 5 or 6. They may already know 10,000 words. What they learn in school is how to read and write those words, using less than 3,000 characters.

    A foreigner doesn't know those words. For them, 2,000 characters may be 5,000 words, but it isn't 10,000 words. And it isn't the most common words in newspapers.

    For example, I learned 天 long ago. Later, I learned 10 or so other 天 words. But my Chinese program (Wenlin 4) shows hundreds of words using 天. I don't know those words. I might guess that 天气 means "weather", and it's clear that 今天means "today", but if I see 天井 in a newspaper, I won't know it means "courtyard".

    I think it is rather the definition of ‘word’ that might be problematic.
    The definitions seem similar in English and Mandarin. In Mandarin writing, each character is a syllable. Most of the characters are also 1-syllable words, but 80% of Mandarin words are 2-syllable words.
     

    Yichen

    Senior Member
    Chinese
    I agree with dojibear.
    Chinese characters have stong abilities to coin new words, and it is at least a place where Chinese and English differ. In many cases, a character may either be a character or a word.
    We can use a character to construct a long list of words in most cases, but it does not necessarily mean we know them exactly. To be frank, I feel it easier to guess or figure out the meaning of a Chinese word than an English one.

    This is a link for words containing 天:包含天的字词
    (I believe some words in the link are nonsensical, though)

    Here is "天井". It is not easily seen nowadays.
     

    Attachments

    • 天井.jpg
      天井.jpg
      84.3 KB · Views: 9
    Last edited:

    radagasty

    Senior Member
    Australia, Cantonese
    Now: let's say one needs 10,000 characters to read a newspaper. If (this is a guess) 1/2 of the words were one character, 1/3 were two character, and 1/6 were three character, that would be 166 characters for 100 words, which means for a 10,000 character newspaper one would need to know 6024 words. That's too much. I'm not even sure if my Chinese dictionary has 7000 words. I'd put my estimate at 5000 words.

    This makes zero sense to me. Can anyone else make heads or tails of these calculations?

    In any case, unless it is for learners or primary-school students, I find it hard to believe that your Chinese dictionary has so few entries. When it comes to Chinese dictionaries, one must distinguish between character dictionaries (字典) and word dictionaries (辭典). The former are focussed on characters along, and usually only list examples of words using the character. The latter list words, usually grouped under their first character, and do try to be exhaustive (commensurate with the overall size of the dictionary).

    The hand-sized bilingual dictionary that I keep on my desk (遠東漢英大辭典–簡明本) lists, according to the fore-matter, some 120,000 entries under 7,331 characters. Now, granted, not all of these entries are ‘words’ as such, as the dictionary also includes expressions and 成語, but the vast majority of them are, and I would say that there are at least 100,000 words in the dictionary. Note that this is an abridged version of the full dictionary, which would have many more entries (and which I also own, but don't have to hand at the moment). And because this dictionary doesn't have Cantonese readings, I also use a pocket-sized character-dictionary (中文字典) published by the 香港華僑語文出版社, ostensibly compiled for the children of Chinese emigrants, which contains about 9,000 characters, with their Cantonese pronunciations.

    There are estimates that one needs 2000-3000 words for daily life in China; I can make conversation in Chinese fairly easily, so a 3000 word estimate is probably not that inaccurate.

    This estimate beggars belief, I have to say. Are we referring to daily life out in a farming village in rural China? Does this include reading a newspaper, or official notices from the government? I really don't see how one can get by knowing so few words, especially in modern society, and certainly, if you are a native (or background) speaker, I would say you know many more words than this.
     

    dojibear

    Senior Member
    AE (US English)
    Another problem with newspapers is the "drop-off problem". Many major languages (English, Mandarin, others) have this problem. This was explained by polyglot Steve Kaufmann, and was discovered by computer research at a University.

    The problem is this: there is a core of very-frequent words (500-800 of them). After that, word frequency drops off rapidly. A language does not have a core of 3000 words, with most sentences use only those words. Instead, there is a core of 800 words and most sentence use those 800 words plus a few less-common words.

    Steve's suggestion for learning a new language is "topics". There are hundreds of topics (ballet, basketball, space travel, trains, riots, music, cooking, hobbies...). But each topic has a set of 20-60 common words. If you read about the same topic, you see the same words over and over.

    The problem with newspapers is that each article is about a new topic. So each article uses several uncommon words (in addition to the 800 core words). Newspapers cannot restrict themselves to 3000 words.

    I see this a lot. For reading practice I use [thechairmanbao.com], which is like a simple newspaper - daily short articles graded HSK1 to HSK6. I've completed a HSK2 course, so I could read every word in today's HSK2 article about lunches, except the word 艺术品. Another HSK2 article (less than 100 words) had 庙会, 摊, 租, 馍 in it.
     

    twenty6

    Senior Member
    English - U.S., Chinese - Mandarin
    radagasty said:
    This makes zero sense to me. Can anyone else make heads or tails of these calculations?
    As I said, these are all random estimations. It's extremely hard to estimate anything in any language. These are merely generated from common consensus on various websites (i.e. Baidu). While estimates can vary from person to person, they're all within some 4000-5000 words to read a newspaper (there was a question on the Baidu Zhidao forums, which asked how many words one needs to know to read a newspaper). From that, at the very very most one knows 8000 words.

    And as you said, it is estimated that there are around 7000 characters in use. And concerning the dictionary, I'm using the all-Chinese version of the 新华字典,twelfth edition. There are some 600 pages of entries, and at the very most one page has 19 words. That makes 11,400 words in total, and many of them are special-use (for describing a very specific thing or that can only be used in a specific situation, i.e. the entry for 那, which is a surname; 嫪, which is also a surname; 橄 and 榄,which can only be used with each other (as far as know) and have separate entries. Since when will you use those?

    滈 is the name of a river, 鄗 was the name of a county (historically), 镐 was the name of the western Zhou dynasty's capital, 皞 means “月亮” (no other explanation provided), 颢 means "white" (as an adjective). These are all on the same page (page 181, to be exact), and only under very specific circumstances can I think of any use for them.

    100,000 words for a pocket dictionary is way too much. Considering it's a bilingual dictionary (unless it's a translation guide), it should be much longer than the all-Chinese version.

    radagasty said:
    This estimate beggars belief, I have to say.
    For daily life, small conversation, etc. With 3000 words one can certainly live in China; this is a rather small estimate. And, as I said, that was from 4 years ago, and now I estimate I know somewhere around 4500.


    I feel that our information comes from very contradictory sources...

    HOWEVER: considering your dictionary is in Cantonese, there might be differences, since what I am referencing is in Mandarin.
     
    Top