Number of Chinese Characters at Different Sigmas (S.D.) of Normal Distribution to Understand a Chinese Language Context
Created on 14Jan09 Modified on 16Jan09
Derived from Chih-Hao Tsai, Frequency and Stroke Counts of Chinese Characters.
Chinese language using traditional script in the encoding of Big5 is used here.
In Chinese character simplification, normally several traditional ones are jointly used under a simplified one.
For simplified script, the number of characters for different sigmas shall be slightly less due to the character integration.
Character Total (CT) = 13,060
Frequency Total (FT) = 171,894,734
Average Frequency / Character (AFC) = 13,162
Accumulated Frequency (AFreq.) = 171,894,734
An advanced learner's dictionary of English has about 3,000 lexemes or words used for explanations.
An entry level of Chinese language has about 2,500 lexemes or characters to enable the Chinese essay writing capability.
The OED2 (Oxford English Dictionary, 2nd Edition), which was then the largest English-language dictionary in February 2001, 
has about 290,000 entries with some 616,500 word forms.
The number of CJK Unified Ideographs of Unicode 5.1 in April 2008 is 70,237 out of a total of Unicode graphic symbols of 100,713.
Nevertheless, less amount of lexemes (or basic meaningful linguistic unit) (or character in Chinese language) is needed to 
basically understand the context of a Chinese newspaper article than English language.
Below is Table 1 showing the accumulated frequency percentage and sigma (S.D. = Standard Deviation) of a normal distribution 
for different number of Chinese characters sorted in descending frequency.
Meanwhile Table 2 shows the memorizable sentences to remember the most frequently used Chinese characters.
Table 1. Minimum number of Chinese characters at different sigma levels to understand a Chinese newspaper article
AFreq. % Sigma (S.D.) # Characters IQ (15 S.D.)
2.000000 -2.00 1 70
5.000000 -1.66 2 75
10.000000 -1.33 5 80
16.000000 -1.00 11 85
25.000000 -0.66 28 90
50.000000 0.00 129 100
75.000000 0.66 408 110
85.000000 1.00 679 115
90.000000 1.33 928 120
95.000000 1.66 1,389 125
97.700000 2.00 1,952 130
99.000000 2.33 2,559 135
99.870000 3.00 4,411 145
99.960000 3.33 6,809 150
99.997000 4.00 12,189 160
99.999800 4.66 12,990 170
99.999970 5.00 13,049 175
99.999995 5.33 13,058 180
Please refer to a chart matching the IQ percentiles, IQ societies, and human population distribution at a consortium of high IQ societies 
called World Intelligence Network to understand the sigma and IQ (15 S.D.).
Table 2. Memorizable sentence from minimal frequently used Chinese characters
AFreq. % # Characters Chinese language English language        
2.000000 1 的。   Okay; belonging to.        
5.000000 2 是的。   Yes.  
10.000000 5 一不是我的。   One is not mine.  
16.000000 11 我有的人中不是在大一了。 The people, that I have, are not at university first year already.
1) How to efficiently learn Chinese and English languages solely, independently one by one in serial, or joinly in parallel?
2) For Chinese and English languages, which and which can be most efficiently used under different situations and contexts?
3) For the context of daily newspaper in Chinese and/or English language, what is the minimum number of lexemes for basic understanding?
Any one of the accumulated  frequency percentages (AFreq. %) that is at 75%, 85%, 90%, or 95%?
4) At (AFreq. % = 50%), the minimum number of Chinese characters is 129, which is an amount about the total number of basic Japanese (JP) 
language script characters, i.e. kana in JP language, for cursive hiragana (平仮名, ひらがな or ヒラガナ) at 106 current 
and 2 obsolete kanas, and angular katakana (平仮名, ひらがな or ヒラガナ) at 164 current and 11 obsolete kanas, each.
Does this hint that Chinese language is easier to learn than JP language due to minimum demands of lexeme memory?
Anyway, if one has known JP language, especially kanji, then learning Chinese most popular characters shall be quite easy.
5) For current Chinese language pronunciation system, i.e. Hanyu Pinyin for Mandarin dialect, there are 415 basic phonemes and (4+1) tones 
potentially making at least (1660+415) tonemes or tonic phonemes. So, be confident enough to learn Chinese language!?!
6) Could the Sheet of Data Table of this file help a novice learner of Chinese language to learn more efficiently?
7) Language is the key to unlock the secret of a culture and a knowledge library written in this language.
So, what are the languages and how many shall one at least learn till which of the minimum proficiency levels?
8) Others.
Till here, one may download the analysis seed file in .XLS file format and consider to start learning the Chinese language 
with the assistance of the sorted Chinese characters in usage frequency descending order.
If one has positive learning results by firstly learning the most frequently used Chinese characters, then please inform the author via email.
Come on! Let the teachers start teaching and/or the students start learning the Chinese language via face-to-face, book(s), and Internet.
Copyright 16 January 2009 Prepared by Kok-Wah LEE @ Xpree Li Email:
E. & O. E., + E. (Errors & Omissions Exempted, plus Estimations)