Monday, September 23, 2013

Understanding Chinese characters

Introduction

Chinese characters are a very complex system of recording the Chinese language into writing. Most of what seems to be a mix of illegible symbols is part of a logical but complex writing system that has been gradually developed around 2300 - 3000 years ago, with oldest confirmed characters dating back to around 1200 - 1050 BC. In this article I will try to briefly explain what one needs to know in order to understand Chinese characters and what you should know before you start studying them

Some basic facts:
  • The earliest confirmed evidence of the Chinese script yet discovered is the body of inscriptions on oracle bones from the late Shang dynasty (1200-1050 BC) - Wikipedia
  • According to some studies (including my own), you need to know only about 2500 characters to read the newspaper.
  • About 80% of the characters are made up of two elements - one responsible for the sound and one for the meaning of the character. This means that there is something in the character that will tell you how to read it and something else that will tell you what it means. 80% is a huge number and if you learn how to read this type of characters and understand their system, your learning progress will be much faster. 
Benchmarks in character evolution

Oracle bone script

The earliest preserved characters, that can be reasonably proven and dated are the inscriptions on Oracle bones. These were bones (usually scapulae) of large animals (usually ox) or turtle shells that were used in ancient Chinese fortune telling. A small concave was drilled onto the bone (probably after the animal has been sacrificed) and a glowing piece of coal was placed in it. A person responsible for the ritual then blew on the piece of coal which cracked the bone and based on the direction of the crack, the answer to the question of the fortune teller was 'yes' or 'no'. The question, the result of the fortune telling along with other details was then inscribed in characters onto the bone itself. The character system that was used is called 甲骨文 - The oracle bone script.

The discovery of these bones is relatively recent (1928). Fragments of these bones were sold in a Chinese medicine shop in a province in China, until someone noticed that they had these inscriptions on them. According to Wikipedia, they have been traced to a village near Anyang in Henan province.

Seal script

Before the unification of China in 221 BC, there were no universal rules for writing characters. There were several ways of writing the same character, with varying shapes, stroke orders and stroke types. Several local writing systems have been developed as well. After China has been united by the Qin dynasty in 221 BC and the Warring states period has ended, the First emperor Qin shi huang has decided to abolish all existing forms of writing and ruled that the only form of writing to be used was the one used in the state of Qin (developed gradually during the Warring states period) - a script called the Seal script today. 

Regular script

The Seal script preserved its official status for a relatively short period of time. Other scripts started to emerge, with some of them rising to dominance.

Regular script (the way characters are written today - both traditional and simplified) has been attributed to Zhong Yao, of the Eastern Han to Cao Wei period (ca. 151–230 AD), who has been called the “father of regular script”. However, some scholars postulate that one person alone could not have developed a new script which was universally adopted, but could only have been a contributor to its gradual formation. It was not until the Southern and Northern Dynasties that regular script rose to dominant status. During that period, regular script continued evolving stylistically, reaching full maturity in the early Tang Dynasty.- Wikipedia.

Transitions

The three mentioned scripts are the benchmarks in evolution of Chinese characters because for the most part, the later directly derive from the earlier, they each have risen to prominence for an extended period of time and respected dictionaries often refer to at least the Seal script versions for a better understanding of character etymology. 

 

The above picture shows three versions of the character 人 ren2 'person' in all three scripts. In this particular case, the character is simple and has not undergone a lot of change. The changes are more formal than structural.



The next picture shows the character 化 hua4 'change' in all three versions. As it is a simple character, you still can't see any big structural changes between the Oracle bone and the Seal scripts, formal changes have been made in the transition from the Seal to the Regular script.

As you can see, both sides of the Regular script character have been changed. The left 人 has been contracted to 亻, which is a rule in the Regular script. Lots of standalone characters, if they are parts of other characters, mostly on the left side are somehow contracted and this is one example of it.

The original character was a picture of a person 人 and another person turned upside down, hence the meaning 'change'. Since the character did not significantly change in form in its transition into the Seal script, its etymology can be easily understood there. In the Regular script however, this is not the case. The Seal script is therefore a very important step in understanding character etymology, since in many cases it preserves the shapes of the Oracle bone script better than the Regular script.

Another example is the character 伐 fa2 'attack, to send an expedition' (formed by 人 ren2 'person' and 戈 ge1 'weapon') in all three scripts. Notice how 人 preserved its shape in the Oracle bone and Seal scripts but again has been arbitrarily changed in the Regular script.


The features of the transitions in the above mentioned characters are all only simple examples, but are very frequent. There are more complicated ones however. The following example is one of them and also shows how important the Seal script in particular helps us understand character etymology:


The picture shows the character 乏 fa2 'to lack' which is the mirror image of the character 正 zheng4 'correct, precise'. The inversion has been done on purpose to point to the meaning of the character and can be clearly seen in the Seal script, is however completely lost in the regular script. In the Regular script, it consists of a 丿pie3 'left falling stroke contracted' at the top and 之 zhi1 'to go (which has many other different meanings as well)' both of which have nothing to do with the meaning or the sound of the character as a whole.

These two elements have been chosen arbitrarily by the scribes in the transition from the Seal script into the regular script and this sort abbreviation is a frequent feature of the whole process. The Seal script is a simplification of the Oracle bone script and the Regular script is a simplification of the Seal script (and modern Simplified characters further simplify the traditional characters of the Regular script). Since the scribes only had a handful (hundreds) of elements to choose from for the transition and they had to choose elements that would resemble the shape of the seal script most, in the case of 乏 they ended up choosing  丿 and 之.

The seal script is also very helpful in understanding phono-semantic compound characters as the following example shows:


The top row shows the 父 fu4 'father' character as written in the Seal and Regular scripts. The second row shows the character 布 bu4 'cloth'. The character is composed of 巾 jin1 'towel (semantic element) and 父 fu4 - phonetic element. In the seal script, you can clearly see, that 父 is  part of the 布 character and acts as the phonetic element in it, in the regular script it has been simplified into two strokes and is not recognizable anymore.

80 %

As mentioned before, about 80% of the characters today are characters, where one part of the character will tell you how to read it and another part of it will tell you what the character means (as is the case with the above mentioned example of 父 and 布 for instance). These are called Phono-semantic compounds (PSC). 80% is a huge number and it is safe to say that Chinese Characters today can be divided into these compound characters and the rest. 

When first characters started to originate, they were simple pictures of objects, some of which (very few compared to the total number) are still in use today. Some of these characters are 人 (person), 龜 (turtle), 日 (sun), 月 (moon), 門 (door). Whoever was inventing these characters very soon must have realized that this way of recording a language was very impractical because:
  • there was no relation to the sound in the character and unless told, no one was exactly sure how to read it
  • it might have been easy to create small pictures of concrete objects, but abstract terms, verbs, adverbs, prepositions ect. must have been very difficult if not impossible to create.
  • apart from the fact that there is no relation to the sound or the way a picture should be read, there is also no clear relation to the meaning. A picture of a standing man can represent ' a person, a man, to stand, to be patient...' and probably lots of other things. 
  • characters did not have a standard form, stroke order or stroke number. Quite possibly every time someone tried to write something and did not have an existing text at hand to compare it to, the shape, stroke number and order of some characters must have changed by accident. Some characters had almost 20 versions.
  • those who were inventing characters started to realize that it would be impossible to create as many characters as there are words, objects, actions, situations ect. and some sort of combination would have to be necessary.
To partly overcome the problem of defining abstractness, the scribes started to combine the meanings of existing characters into new ones (for instance 女 nv3 'woman' and 子 zi3 'child' was combined into 好 hao3 'good') or started to employ character loans (我 wo3 - originally a character meaning 'axe, weapon' composed of 扌shou3 'hand' and 戈 ge1 'axe', used for the 1st personal pronoun 'I, me' because the Ancient Chinese words for 'axe' and 'I, me' had the same or similar pronunciation). To overcome multiple meaning ambiguity, they started adding indicators to existing characters, pointing to their meanings (木 mu4 'wood' 本 ben3 'roots'; 刀 dao1 'knife' 刃 ren4 'edge of a blade'; 日 ri4 'sun' 旦 dan4 'dawn'), 

This however to a large extent still did not solve the problem of pronunciation and the problem of comprehension also still prevailed. Probably after sound loans have been introduced, instead of purely combining the meanings of two characters, the scribes started to combine them in a way, where of the two or more characters chosen for combination, one character was chosen to point to the meaning and another character was chosen to point to the sound of the character as a whole. This method proved itself to be historically the most effective and prevalent one as today, more than 80% of characters in use are of this type. In the 康熙字典 - a huge and respected dictionary of the Emperor KangXi from the year 1710 AD - more than 90 % of all characters are phono-semantic compounds.

Phono-semantic compounds explained



The above table shows the 才 character entry from the Etymological phonetic dictionary that I'm working on. 才 cai2 is the leading phonetic character for this group, only the semantic elements change. 才 is a very good character to explain PSCs on because it is both a regular and an irregular compound.

The most prevalent form of PSCs today is a one where the semantic element is on the left side and the phonetic element is on the right side as is the case with the first two characters 財 and 材 which can be called regular PSCs. I call them regular, simply because they are the most frequent ones. Actually in this case, 才 is a perfect phonetic as it matches the syllable (initial and final both) and the tone as well.

In 在 zai4 however, the 才 phonetic element is on the left side and it has been corrupted (but clearly visible in the Seal script version of the character). I call this an irregular PSC. 才 is also not a perfect phonetic element in this case (cai2 Vs. zai4) but still works very well compared to some other PSCs.

The 存 cun2 character is not a PSC, but a meaning-meaning compound. 才 is clearly a part of it as a co-semantic element on the left (see explanation). 

才 is the phonetic again in the following character zai2. It has been corrupted into a 十 at the top left. This character is not used as a standalone character today, but has been chosen as a new phonetic element in the following three characters and as a co-signific in the last one.

Conclusion

  • For understanding character etymology, understanding the earlier versions of modern characters, especially the Seal script is very helpful. Many phonetic or semantic elements have been simplified or corrupted and are not recognizable anymore in the modern versions.
  • You do not need to know 50 000 characters to read the newspaper or books. According to studies, 2500 characters is enough to read the newspaper. According to Wikipedia, the Dictionary of the Emperor KangXi contains 47 000, characters, but 40% of these are graphic variants. I would assume, that most of the remaining characters are place names, people's names or names of local dishes, animals, plants or rarely used objects.
  • Most of Chinese characters (about 80%) are phono-semantic compounds, where one element in the character points to the sound and another element in it points to the pronunciation of the character as a whole. Learning the system behind this type of characters will improve your learning curve significantly.
  • One of the main problems while studying characters thus is to learn the so called leading phonetic characters for each phonetic group (as is 才 in this article) as they are usually meaning-meaning compounds or simplified pictures with no indication as to how they should be pronounced.

No comments:

Post a Comment