Number of Chinese Characters at Different z Levels of Normal Distribution to Understand a Chinese Language Context  
Created on 14Jan09 Modified on 30May09 Version 3.0
Derived & established from Chih-Hao Tsai, Frequency and Stroke Counts of Chinese Characters. http://technology.chtsai.org/charfreq/
Chinese language using traditional script in the encoding of Big5 is used here.
In Chinese character simplification, normally several traditional ones are jointly used under a simplified one.
For simplified script, the number of characters for different z levels shall be slightly less due to the character integration.
Character Total (CT) = 13,060
Frequency Total (FT) = 171,894,734
Average Frequency / Character (AFC) = 13,162
Accumulated Frequency (AFreq.) = 171,894,734
An advanced learner's dictionary of English has about 3,000 lexemes or words used for explanations.
An entry level of Chinese language has about 2,500 lexemes or characters to enable the Chinese essay writing capability.
The OED2 (Oxford English Dictionary, 2nd Edition), which was then the largest English-language dictionary in February 2001,
has about 290,000 entries with some 616,500 word forms.
The number of CJK Unified Ideographs of Unicode 5.1 in April 2008 is 70,237 out of a total of Unicode graphic symbols of 100,713.
Nevertheless, less amount of lexemes (or basic meaningful linguistic unit) (or character in Chinese language) is needed to
basically understand the context of a Chinese newspaper article than English language.
Below is Table 1 showing the accumulated frequency percentage and z level of a normal distribution, N(, δ) where z = (x - )/δ,
for different number of Chinese characters sorted in descending frequency.
z = standardized value, x = non-standardized value, = mean, δ = S.D. (standard deviation)
For IQ (15 S.D.) like Wechsler IQ Scale, = 100 and δ = 15.
Meanwhile Table 2 shows the memorizable sentences to remember the most frequently used Chinese characters.
Table 1. Minimum number of Chinese characters at different z levels to understand a Chinese newspaper article
AFreq. % z Level # Characters x of IQ (15 S.D.)
2.28 -2.000 1 70
4.78 -1.667 2 75
9.13 -1.333 5 80
15.87 -1.000 11 85
25.23 -0.667 28 90
34.46 -0.400 55 94
42.07 -0.200 85 97
47.33 -0.067 113 99
50.00 0.000 129 100
52.67 0.067 147 101
57.93 0.200 188 103
65.54 0.400 265 106
74.77 0.667 404 110
84.13 1.000 648 115
90.87 1.333 986 120
95.22 1.667 1,421 125
97.72 2.000 1,958 130
99.018 2.333 2,573 135
99.616 2.666 3,354 140
99.865 3.000 4,368 145
99.9571 3.333 6,573 150
99.9970 4.000 12,189 160
99.9998 4.660 12,990 170
99.99997 5.000 13,049 175
99.999995 5.330 13,058 180
Please refer to a chart matching the IQ percentiles, IQ societies, and human population distribution at a consortium of high IQ societies
called World Intelligence Network to understand the x of IQ (15 S.D.) at <http://www.iqsociety.org/general/IQchart.pdf>.
Table 2. Memorizable sentence from minimal frequently used Chinese characters
AFreq. % # Characters Chinese language     English language            
2.28 1 的。       Okay; belonging to.            
4.78 2 是的!   Yes!  
9.13 5 一不是我的。   One is not mine.  
15.87 11 我有的人中不是在大一了。     The people, that I have, are not at university first year already.    
25.23 11 + 17 也可以好到就会要来为你交上这个学资。   Can also be likely good then until wanting to submit this studying fee for you.  
    词:天文、提问、那么多、工生们   Phrase: sky art (astronomy), raised question, so many, job trainees    
34.46 28 + 27 他没看天文时,所想得出说过之提问:如能请, When not looking at the sky art, the raised question that he has thought out to say
    下用那么多小工生们?     before: If can employ, recruit down so many young job trainees?    
Discussions
1) How to efficiently learn Chinese and English languages solely, independently one by one in serial, or joinly in parallel?
2) For Chinese and English languages, which and which can be most efficiently used under different situations and contexts?
3) For the context of daily newspaper in Chinese and/or English language, what is the minimum number of lexemes for basic understanding?
Any one of the accumulated frequency percentages (AFreq. %) that is at 75%, 85%, 90%, or 95%?
4) At (AFreq. % = 50%), the minimum number of Chinese characters is 129, which is an amount about the total number of basic Japanese (JP)
language script characters, i.e. kana in JP language, for cursive hiragana (平假名,平仮名,ひらがな) at 106 current
and 2 obsolete kanas, and angular katakana (片假名,片仮名,カタカナ) at 164 current and 11 obsolete kanas, each.
Does this hint that Chinese language is easier to learn than JP language due to minimum demands of lexeme memory?
Anyway, if one has known JP language, especially kanji, then learning Chinese most popular characters shall be quite easy.
5) For current Chinese language pronunciation system, i.e. Hanyu Pinyin for Mandarin dialect, there are 415 basic phonemes and (4+1) tones
potentially making at least (1660+415) tonemes or tonic phonemes. So, be confident enough to learn Chinese language!?!
6) Could the Sheet of Data Table of this file help a novice learner of Chinese language to learn more efficiently?
7) Language is the key to unlock the secret of a culture and a knowledge library written in this language.
So, what are the languages and how many shall one at least learn till which of the minimum proficiency levels?
8) For the main 12 sentences of the Chinese lexeme prose up to the first mostly used 265 Chinese characters,
please refer to worksheet "Chinese Seed".
9) Others.
Till here, one may download the analysis seed file in .XLS file format and consider to start learning the Chinese language
with the assistance of the sorted Chinese characters in usage frequency descending order.
If one has positive learning results by firstly learning the most frequently used Chinese characters, then please inform the author via email.
Come on! Let the teachers start teaching and/or the students start learning the Chinese language via face-to-face, book(s), and Internet.
Copyright 16 January 2009, 20 February 2009, 30 May 2009.
Prepared by Kok-Wah LEE @ Xpree Li Email: xpreee@gmail.com; contact@xpreeli.com
All rights reserved.
E. & O. E., + E. (Errors & Omissions Exempted, plus Estimations)