Analysis Seed for Chinese Lexeme Amount to Understand Chinese Newspaper
Articles
| xpree@geocities
| Old Website
| Old
Directory | XKW Lee
| My IP | My IQ | MY Statistics |
| Degree Study | Youth Opting | ShanHaiJing Comments | Four Chinese
Principles |
| Learning Chinese
Language | When
& Why Chinese Language |
Purpose of this File:
This file is a seed to analyze the needed number of Chinese lexemes for basic
understanding of Chinese newspaper articles. Further steps can proceed to other
contexts like journalist columns, letter composition, essay writing, etc.
To go back to the guide, click here.
Products for
[1] None.
Introduction:
Chinese language [中文] remains its phonetic pronunciation
outside the China and has been generally used from more than 2000 years ago
till today. This is perhaps due to the reasons that some of the language
translation techniques keep the phonetic sounds and/or morphologic/symbolic
structures of the originally translated vocabulary. It is firstly recorded in
the ancient Indian scripts for the lexeme of “china” in memorable remembrance
of the porcelain products from the then Qin Dynasty [秦朝] (221 BC – 206 BC) in
China.
Hence, Chinese language
shall have once been widely and popularly called Qin language [秦文]. “Qin” and “Chin” have
similar phonetic sound. Before the pronunciation symbol standardization of
Chinese language into Hanyu Pinyin in the 20th
century, there exist till today many other symbols of pronunciation systems.
Then, the following dynasty in
Now, since the Chinese
language name of China is Zhongguo [中国], Chinese language in the
Chinese language name is also called Zhongwen [中文].
For some of the big memorizable secret creation methods and systems proposed by
Kok-Wah Lee to realize the MePKC
(Memorizable Public-Key Cryptography), Chinese
characters are mainly used in two methods called CLPP (Chinese Language Passphrase) and multilingual key. Optionally, Chinese
characters can be used in the method of 2D key.
Nowadays in the first
decade of the 21st century, Chinese characters or Han characters recognition
and recall are not problems for Chinese language and Japanese language users.
Young Korean language users may have some proficiency problems. In Chinese,
Japanese, and Korean languages, for CJK languages, the Han/Chinese characters
are called hanzi, kanji, and hanja,
respectively.
How good if a user knows
Chinese characters and is multilingual for at least English and Chinese
languages!
Seed to Learn Chinese Language:
Here, there is a literary
work prepared as an analysis seed to study the easiness of learning simple
Chinese language based on the most frequently used Chinese characters. The
simplest language proficiency level is minimally set at a user’s capability to
basically understand Chinese newspaper articles.
The Chinese lexeme
amounts are matched with different z levels, i.e. standardized value of normal
distribution, N(µ, δ) where z = (x - µ)/δ, such as in the IQ level
classification. The data is collected, further derived and analyzed
from a literary work by Chih-Hao Tsai, “Frequency and
Stroke Counts of Chinese Characters,” [URL:
http://technology.chtsai.org/charfreq/], in Chinese language using traditional
script and character encoding of Big5.
In Chinese character
simplification, normally several traditional ones are jointly used under a
simplified one. For simplified script, the number of characters for different z
levels shall be slightly less due to the character integration.
In the Chih-Hao Tsai’s frequency counts of Chinese characters,
there original and derived data as follows:
Character
Total (CT) = 13,060
Frequency
Total (FT) = 171,894,734
Average
Frequency / Character (AFC) = 13,162
Accumulated
Frequency (AFreq.) = 171,894,734
An advanced learner's
dictionary of English has about 3,000 lexemes or words used for explanations.
An entry level of Chinese
language has about 2,500 lexemes or characters to enable the Chinese essay
writing capability.
The OED2 (Oxford English
Dictionary, 2nd Edition), which was then the largest English-language
dictionary in February 2001, has about 290,000 entries with some 616,500 word
forms.
The number of CJK Unified
Ideographs of Unicode 5.1 in April 2008 is 70,237 out of a total of Unicode
graphic symbols of 100,713.
Nevertheless, less amount
of lexemes (or basic meaningful linguistic unit) (or character in Chinese
language) is needed to basically understand the context of a Chinese newspaper
article than English language.
Below is Table 1 showing
the accumulated frequency percentage and z level of a normal distribution, N(µ,
δ) where z = (x - µ)/δ, for different number of Chinese characters
sorted in descending frequency.
z =
standardized value, x = non-standardized value, µ = mean, δ = S.D.
(standard deviation)
For IQ (15
S.D.) like Wechsler IQ Scale, µ = 100 and δ = 15.
Meanwhile Table 2 shows
the memorizable sentences to remember the most
frequently used Chinese characters.
Table 1. Number of Chinese
characters at different z levels to understand a Chinese newspaper article
|
AFreq. % |
z Level |
# Characters |
x of IQ (15 S.D.) |
|
2.28 |
-2.000 |
1 |
70 |
|
4.78 |
-1.667 |
2 |
75 |
|
9.13 |
-1.333 |
5 |
80 |
|
15.87 |
-1.000 |
11 |
85 |
|
25.23 |
-0.667 |
28 |
90 |
|
34.46 |
-0.400 |
55 |
94 |
|
42.07 |
-0.200 |
85 |
97 |
|
47.33 |
-0.067 |
113 |
99 |
|
50.00 |
0.000 |
129 |
100 |
|
52.67 |
0.067 |
147 |
101 |
|
57.93 |
0.200 |
188 |
103 |
|
65.54 |
0.400 |
265 |
106 |
|
74.77 |
0.667 |
404 |
110 |
|
84.13 |
1.000 |
648 |
115 |
|
90.87 |
1.333 |
986 |
120 |
|
95.22 |
1.667 |
1,421 |
125 |
|
97.72 |
2.000 |
1,958 |
130 |
|
99.018 |
2.333 |
2,573 |
135 |
|
99.616 |
2.666 |
3,354 |
140 |
|
99.865 |
3.000 |
4,368 |
145 |
|
99.9571 |
3.333 |
6,573 |
150 |
|
99.9970 |
4.000 |
12,189 |
160 |
|
99.9998 |
4.660 |
12,990 |
170 |
|
99.99997 |
5.000 |
13,049 |
175 |
|
99.999995 |
5.330 |
13,058 |
180 |
Please refer to a chart matching
the IQ percentiles, IQ societies, and human population distribution at a
consortium of high IQ societies called World Intelligence Network to understand
the x of IQ (15 S.D.) at [URL: http://www.iqsociety.org/general/IQchart.pdf].
Table 2. Memorizable sentence from minimal frequently used Chinese
characters
|
AFreq. % |
# Characters |
Chinese language |
English language |
|
2.28 |
1 |
的。 |
Okay; belonging to. |
|
4.78 |
2 |
是的。 |
Yes. |
|
9.13 |
5 |
一不是我的。 |
One is not mine. |
|
15.87 |
11 |
我有的人中不是在大一了。 |
The people, that I have, are not at
university first year already. |
|
25.23 |
11 + 17 |
也可以好到就会要来为你交上这个学资。 |
Can also be likely good then until wanting
to pay this studying fee for you. |
|
34.46 |
28 + 27 |
词:天文、提问、那么多、工生们 |
Phrase: sky art (astronomy), raised
question, so many, job trainees |
|
他没看天文时所想得出说过之提问:如能请,下用那么多小工生们? |
When not looking at the sky art, the raised
question that he has thought out to say before: If can employ, recruit down
so many young job trainees? |
Discussions:
1) How to efficiently
learn Chinese and English languages solely, independently one by one in serial,
or joinly in parallel?
2) For Chinese and
English languages, which and which can be most efficiently used under different
situations and contexts?
3) For the context of
daily newspaper in Chinese and/or English language, what is the minimum number
of lexemes for basic understanding? Any one of the accumulated frequency
percentages (AFreq. %) that is at 75%, 85%, 90%, or
95%?
4) At (AFreq. % = 50%), the minimum number of Chinese characters
is 129, which is an amount about the total number of basic Japanese (JP)
language script characters, i.e. kana in JP language, for cursive hiragana (平假名,平仮名,ひらがな) at 106 current and 2
obsolete kanas, and angular katakana (片假名,片仮名,カタカナ) at 164 current and 11
obsolete kanas, each. Does this hint that Chinese language is easier to learn
than JP language due to minimum demands of lexeme memory? Anyway, if one has
known JP language, especially kanji, then learning Chinese most popular
characters shall be quite easy.
5) For current Chinese
language pronunciation system, i.e. Hanyu Pinyin for
Mandarin dialect, there are 415 basic phonemes and (4+1) tones potentially
making at least (1660+415) tonemes or tonic phonemes.
So, be confident enough to learn Chinese language!?!
6) Could the Sheet of
Data Table of this file help a novice learner of Chinese language to learn more
efficiently?
7) Language is the key to
unlock the secret of a culture and a knowledge library written in this
language. So, what are the languages and how many shall one at least learn till
which of the minimum proficiency levels?
8) Others.
Till here, one may
download the analysis seed file in .XLS file format and consider to start
learning the Chinese language with the assistance of the sorted Chinese
characters in usage frequency descending order.
If one has positive
learning results by firstly learning the most frequently used Chinese
characters, then please inform the author via email.
Come on! Let the teachers
start teaching and/or the students start learning the Chinese language via
face-to-face, book(s), and Internet.
Copyright 16 January 2009. Prepared by Kok-Wah LEE @ Xpree Li. Email: xpreee@gmail.com
E. & O. E., + E.
(Errors & Omissions Exempted, plus Estimations)
Nil.
Intellectual Property:
[1] Kok-Wah
Lee. (2009, January 15). Analysis seed for Chinese lexeme amount to understand
Chinese newspaper articles. Bukit Beruang,
Download the Literary Works:
|
Copyrighted Literary Work |
Fee (SGD$) |
Payment |
|
[0] Analysis Seed for Chinese
Lexemes (version 1.0) |
Free |
D000n: Donate as
one likes. |
|
[1] Analysis Seed for Chinese
Lexemes (version 3.0) |
? |
? |
Created on 15 January
2009
Updated on 08 May 2012