Analysis Seed for Chinese Lexeme Amount to Understand Chinese Newspaper Articles

 


| xpree@geocities | Old Website | Old Directory | XKW Lee | My IP | My IQ | MY Statistics |

 


| Degree Study | Youth Opting | ShanHaiJing Comments | Four Chinese Principles |

 

| Learning Chinese Language | When & Why Chinese Language |

 


Purpose of this File:

This file is a seed to analyze the needed number of Chinese lexemes for basic understanding of Chinese newspaper articles. Further steps can proceed to other contexts like journalist columns, letter composition, essay writing, etc.

To go back to the guide, click here.

 

Products for Sale in this File:

[1] None.

 

Introduction:

Chinese language [中文] remains its phonetic pronunciation outside the China and has been generally used from more than 2000 years ago till today. This is perhaps due to the reasons that some of the language translation techniques keep the phonetic sounds and/or morphologic/symbolic structures of the originally translated vocabulary. It is firstly recorded in the ancient Indian scripts for the lexeme of “china” in memorable remembrance of the porcelain products from the then Qin Dynasty [秦朝] (221 BC – 206 BC) in China.

 

Hence, Chinese language shall have once been widely and popularly called Qin language [秦文]. “Qin” and “Chin” have similar phonetic sound. Before the pronunciation symbol standardization of Chinese language into Hanyu Pinyin in the 20th century, there exist till today many other symbols of pronunciation systems. Then, the following dynasty in China, i.e. Han Dynasty [汉朝] (206 BC – 220 AD), replaced the Qin Dynasty, and the Chinese language was then renamed as Hanwen [汉文] for written scripts and Hanyu [汉语] for spoken oral in China, in which Hanyu is still occasionally used till today as a synonym of Chinese language in Chinese character scripts.

 

Now, since the Chinese language name of China is Zhongguo [中国], Chinese language in the Chinese language name is also called Zhongwen [中文].

 

For some of the big memorizable secret creation methods and systems proposed by Kok-Wah Lee to realize the MePKC (Memorizable Public-Key Cryptography), Chinese characters are mainly used in two methods called CLPP (Chinese Language Passphrase) and multilingual key. Optionally, Chinese characters can be used in the method of 2D key.

 

Nowadays in the first decade of the 21st century, Chinese characters or Han characters recognition and recall are not problems for Chinese language and Japanese language users. Young Korean language users may have some proficiency problems. In Chinese, Japanese, and Korean languages, for CJK languages, the Han/Chinese characters are called hanzi, kanji, and hanja, respectively.

 

How good if a user knows Chinese characters and is multilingual for at least English and Chinese languages!

 

Seed to Learn Chinese Language:

Here, there is a literary work prepared as an analysis seed to study the easiness of learning simple Chinese language based on the most frequently used Chinese characters. The simplest language proficiency level is minimally set at a user’s capability to basically understand Chinese newspaper articles.

 

The Chinese lexeme amounts are matched with different z levels, i.e. standardized value of normal distribution, N(µ, δ) where z = (x - µ)/δ, such as in the IQ level classification. The data is collected, further derived and analyzed from a literary work by Chih-Hao Tsai, “Frequency and Stroke Counts of Chinese Characters,” [URL: http://technology.chtsai.org/charfreq/], in Chinese language using traditional script and character encoding of Big5.

 

In Chinese character simplification, normally several traditional ones are jointly used under a simplified one. For simplified script, the number of characters for different z levels shall be slightly less due to the character integration.

 

In the Chih-Hao Tsai’s frequency counts of Chinese characters, there original and derived data as follows:

Character Total (CT) =  13,060

Frequency Total (FT) =  171,894,734

Average Frequency / Character (AFC) = 13,162

Accumulated Frequency (AFreq.) = 171,894,734

 

An advanced learner's dictionary of English has about 3,000 lexemes or words used for explanations.

An entry level of Chinese language has about 2,500 lexemes or characters to enable the Chinese essay writing capability.

 

The OED2 (Oxford English Dictionary, 2nd Edition), which was then the largest English-language dictionary in February 2001, has about 290,000 entries with some 616,500 word forms.

The number of CJK Unified Ideographs of Unicode 5.1 in April 2008 is 70,237 out of a total of Unicode graphic symbols of 100,713.

Nevertheless, less amount of lexemes (or basic meaningful linguistic unit) (or character in Chinese language) is needed to basically understand the context of a Chinese newspaper article than English language.

 

Below is Table 1 showing the accumulated frequency percentage and z level of a normal distribution, N(µ, δ) where z = (x - µ)/δ, for different number of Chinese characters sorted in descending frequency.

z = standardized value, x = non-standardized value, µ = mean, δ = S.D. (standard deviation)

For IQ (15 S.D.) like Wechsler IQ Scale, µ = 100 and δ = 15.

Meanwhile Table 2 shows the memorizable sentences to remember the most frequently used Chinese characters.

 

Table 1. Number of Chinese characters at different z levels to understand a Chinese newspaper article

AFreq. %

z Level

# Characters

x of IQ (15 S.D.)

2.28

-2.000

1

70

4.78

-1.667

2

75

9.13

-1.333

5

80

15.87

-1.000

11

85

25.23

-0.667

28

90

34.46

-0.400

55

94

42.07

-0.200

85

97

47.33

-0.067

113

99

50.00

0.000

129

100

52.67

0.067

147

101

57.93

0.200

188

103

65.54

0.400

265

106

74.77

0.667

404

110

84.13

1.000

648

115

90.87

1.333

986

120

95.22

1.667

1,421

125

97.72

2.000

1,958

130

99.018

2.333

2,573

135

99.616

2.666

3,354

140

99.865

3.000

4,368

145

99.9571

3.333

6,573

150

99.9970

4.000

12,189

160

99.9998

4.660

12,990

170

99.99997

5.000

13,049

175

99.999995

5.330

13,058

180

 

Please refer to a chart matching the IQ percentiles, IQ societies, and human population distribution at a consortium of high IQ societies called World Intelligence Network to understand the x of IQ (15 S.D.) at [URL: http://www.iqsociety.org/general/IQchart.pdf].

 

 

Table 2. Memorizable sentence from minimal frequently used Chinese characters

AFreq. %

# Characters

Chinese language

English language

2.28

1

的。 

Okay; belonging to. 

4.78

2

是的。 

Yes. 

9.13

5

一不是我的。 

One is not mine. 

15.87

11

我有的人中不是在大一了。

The people, that I have, are not at university first year already.

25.23

11 + 17

也可以好到就会要来为你交上这个学资。

Can also be likely good then until wanting to pay this studying fee for you.

34.46

28 + 27

词:天文、提问、那么多、工生们

Phrase: sky art (astronomy), raised question, so many, job trainees

他没看天文时所想得出说过之提问:如能请,下用那么多小工生们?

When not looking at the sky art, the raised question that he has thought out to say before: If can employ, recruit down so many young job trainees?

 

 

Discussions:

1) How to efficiently learn Chinese and English languages solely, independently one by one in serial, or joinly in parallel?

2) For Chinese and English languages, which and which can be most efficiently used under different situations and contexts?

3) For the context of daily newspaper in Chinese and/or English language, what is the minimum number of lexemes for basic understanding? Any one of the accumulated frequency percentages (AFreq. %) that is at 75%, 85%, 90%, or 95%?

4) At (AFreq. % = 50%), the minimum number of Chinese characters is 129, which is an amount about the total number of basic Japanese (JP) language script characters, i.e. kana in JP language, for cursive hiragana (平假名,平仮名,ひらがな) at 106 current and 2 obsolete kanas, and angular katakana (片假名,片仮名,カタカナ) at 164 current and 11 obsolete kanas, each. Does this hint that Chinese language is easier to learn than JP language due to minimum demands of lexeme memory? Anyway, if one has known JP language, especially kanji, then learning Chinese most popular characters shall be quite easy.

5) For current Chinese language pronunciation system, i.e. Hanyu Pinyin for Mandarin dialect, there are 415 basic phonemes and (4+1) tones potentially making at least (1660+415) tonemes or tonic phonemes. So, be confident enough to learn Chinese language!?!

6) Could the Sheet of Data Table of this file help a novice learner of Chinese language to learn more efficiently?

7) Language is the key to unlock the secret of a culture and a knowledge library written in this language. So, what are the languages and how many shall one at least learn till which of the minimum proficiency levels?

8) Others.

 

Till here, one may download the analysis seed file in .XLS file format and consider to start learning the Chinese language with the assistance of the sorted Chinese characters in usage frequency descending order.

If one has positive learning results by firstly learning the most frequently used Chinese characters, then please inform the author via email.

 

Come on! Let the teachers start teaching and/or the students start learning the Chinese language via face-to-face, book(s), and Internet.


Copyright 16 January 2009. Prepared by Kok-Wah LEE @ Xpree Li. Email: xpreee@gmail.com

 

E. & O. E., + E. (Errors & Omissions Exempted, plus Estimations)

 

Sale Commissions:

Nil.

 

Intellectual Property:

[1] Kok-Wah Lee. (2009, January 15). Analysis seed for Chinese lexeme amount to understand Chinese newspaper articles. Bukit Beruang, Melaka, Malaysia. Copyrighted literary work opened for Internet open reviews.

 

Download the Literary Works:

Copyrighted Literary Work

Fee (SGD$)

Payment

[0]  Analysis Seed for Chinese Lexemes (version 1.0)

<HTM> <ZIP>

Free

D000n: Donate as one likes.

[1]  Analysis Seed for Chinese Lexemes (version 3.0)

<HTM> <ZIP>

?

?

 

 

Created on 15 January 2009

Updated on 08 May 2012