Why Study Chinese Etymology
When I was a young man of 22 in Taiwan in 1972 trying to become fluent and literate in Chinese, I was faced with the prospect of learning to write about 5000 characters and 60,000 character combinations. The characters were complex with many strokes and almost no apparent logic. I found on the rare occasions when I could get a step by step evolution of the character from its original form, with an explanation of its original meaning and an interpretation of its original form, suddenly it would become apparent how all the strokes had come to be. The problem is that there is no book in English that adequately explains this etymology and even if you read Chinese there is no single book in Chinese that explains it all. In short it is a research project to understand each character. To have this information at my fingertips in English would have been a great help.
The first advantage of a computerized etymology is that you can do all kinds of analysis that would be limited by the linear nature of books. The second advantage is that etymology is an ongoing research project. We do not know all the answers when it comes to character etymology. If errors or discrepancies are discovered in a computerized system, they can be easily corrected. They can not be corrected in a book that has already been published.
There are literally thousands of references on this subject, most of them in Chinese. Most of them have something new, unique or interesting to say but I only list here what I have found to be the top references.
Table of Contents
- Pictographs and Ideographs
- Primitives and Remnants
- Meaning and Interpretation
- Signific Abstraction
- Phonetics and Phonetic Shifts
- Chinese Derived Characters
- Modern Common Chinese Characters
- Modern Traditional Characters
- Cursive and Super Cursive Chinese
- Modern Simplified Characters
- Seal Characters - ZhuanTiZi 篆體字
- Bronze Characters - JinWen 金文
- Oracle Characters - JiaGuWen 甲骨文
Pictographs and Ideographs
In ancient China when characters were first invented they were formed from one or more pictographs which either indicated meaning or pronunciation. Pictograph means picture graphic. So we have some characters that have one or more pictographs that alone or in combination indicate a meaning. Sometimes characters will have one part that indicates meaning and another that indicates pronunciation. In some cases it is hard to make up an ideograph, ideograph means idea graphic. Because meaning is not easy to represent in pictographs, they sometimes just borrow another character that has the same pronunciation.
Primitives and Remnants
Primitives are the original form of a graphic. They should ideally be recognizable, although they may require explanation. Over the years the character forms were changed so the original pictographs are no longer recognizable. The pronunciation too gets changed over the years, and finally the meaning also gets modified. What we have left are modern characters or parts of characters that I call remnants. Remnants are the modern form of a graphic. All characters and character parts are remnants. A good example is the character quan 犬 dog. We have characters that indicate that the modern character remnant 犬 and 犭 were originally the same and were clearly once a primitive picture of a dog. Even Confucius in 500 BC was quoted to say “The ancients must have had very strange looking dogs”. This example is worse than most, but now, most Chinese characters are just a bunch of complex strokes with no obvious connection with the meaning. So modern Chinese characters are neither pictographs or ideographs.
Meaning and Interpretation
The purpose of etymology is to trace back and find what those remnants came from. A character has a meaning Dian 電 means electricity. Its modern meaning is electricity. Its original meaning was lightning. Its interpretation is a cloud with rain drops with lightning coming down and hitting a field.
I count about 400 primitives. If these primitives offer meaning to the character their modern remnants are usually called Significs. It is often not clear that a character is a signific because either the meaning has changed so much or we can not get in the mind frame of the person who invented the character. This is called Abstraction of the Signific. A simple example is the string primitive Mi 糸 “string”. Sun 孫 “grand child” would indicate a string and a Zi 子 “child”, or the string of children, or by abstraction, “grand child”. This abstraction is easy, some are not.
Phonetics and Phonetic Shifts
There are about 800 characters that are used as phonetics in modern Chinese. About a third of them can be readily recognized. Another third can be recognized by literate Chinese and yet another third are problematic and can only be analyzed. It is very productive to study the phonetic shifts since ancient times, some being natural and some being influences from other dialects.
- Analytic Dictionary of Chinese and Sino-Japanese by Bernard Karlgren
- The classic English analysis of Chinese phonetics.
Chinese Derived Characters
Script refers to the symbols in which a language is written. The Chinese writing system has been borrowed by or has influenced many languages and Chinese dialects other than the current standard which is Mandarin. For Chinese and all other characters derived from or influenced by Chinese characters I use the term Chinese Derived Characters. These languages include Cantonese, Taiwanese, Shanghaiese, Japanese, Korean, Vietnamese, Jurchen, and other dialects. This web site is dedicated to the etymology of Modern Chinese characters which will include information from Chinese dialects of Mandarin, Cantonese, Taiwanese and Shanghaiese.
Modern Common Chinese Characters
This refers to the script used to write modern Mandarin. In English we have an alphabet and we spell things with a fixed number of 62 letters and numbers from which we make about 60,000 modern English words known by the average native speaker. In modern Chinese the literate adult uses fuzzy number of about 5000 characters that correspond to single syllable Mandarin words. These characters can be used to form about 60,000 multi syllable Mandarin words used by modern native speakers. The problem is the fuzzy number nature of Chinese characters.
On an English typewriter or computer we can use or make-up any word we want with little trouble using exactly 62 letter-number symbols. In Chinese we can hand write or sometimes make-up any character we want. The problem with a Chinese typewriter or computer is that we have to limit the characters that can be used ahead of time. It is like making an English typewriter that can only print a fixed number of words, with no compensation for new or special words. The old manual Chinese typewriters had 7000 characters, The GB2312-80 computer standard for Simplified Chinese has 6763 characters. The Big5 computer standard for traditional Chinese has 13051 characters, more than twice as many as most people use. The Unicode “basic multilingual plane” tries to combine all Han characters from Simplified and Traditional Chinese, Japanese, Korean and Cantonese and comes up with a total of 27,484. The question of what is a simplified or traditional character is very complex and will be discussed separately
Chinese, Japanese, Korean and Vietnamese Computing - CJKV Information Processing by Ken Lunde
This is the best book on the computerization of CJKV languages.
The Unicode Standard Version 4.0 The Unicode standard.
Published by the Ministry of Education of Taiwan listing the 4808 characters necessary for adult literacy.
Modern characters are written as a composition of simple strokes as if they were written by a brush which has been the main writing instrument for the past 1800 years. Before this people used a totally different style of characters that were written with reed pens on bamboo slats. There was a transition around 1 AD to a simplified stroke based character rendition using reeds to write with. This style was called LiZi 隷字 or LiShu 隷書 The word Li means “crude” because at the time this simplified form was considered to be non standard. I use the word LiZi to indicate historical accurate renditions of characters that actually existed in the period 1 AD to 200 AD as opposed to the word LiShu which is a modern calligraphic style. As far as the current analysis system is concerned, LiZi is considered to be an intermediate step in the evolution between seal characters and modern characters. After the invention of the brush for writing in about 200 AD, the stile became called KaiZi 楷字 or KaiShu 楷書. The brush brought some more rather minor changes in form and these characters were taken as standard. The word Kai means “standard”. By 200 AD, they had become the standard characters. Many common characters used in 200 AD have died, new ones have been invented. There have been some, mostly minor changes in how some characters are written and some changes in meaning. The HanYuDaZiDian 漢語大字典 is the largest dictionary of Kai type characters. It includes over 56,000 modern printed Chinese characters, both simplified and traditional used over the past 2000 years. I call them modern because they are in the modern style. Most of them are rare characters or rare alternates and not part of useful modern Chinese. About 25% of modern characters did not exist in 200 AD. Most of the characters in use then would be recognized today, although the meanings may have changed.
HanYuDaZiDian 漢語大字典 8 volumes
The largest Chinese-Chinese dictionary of single characters
HanYuDaCiDian 漢語大詞典 13 volumes
The largest Chinese-Chinese dictionary of compound characters
English-Chinese Word-Ocean Dictionary YingHanCiHai 英漢辭海 2 volumes
The largest English-Chinese dictionary
Chinese-English Dictionary HanYingDaXiDian 汉英大辞典 2 volumes
The largest Chinese-English dictionary
GuWenZiGuLin 古文字詁林 12 volumes 李圃 主编
The most extensive Chinese discussion of Chinese etymology
Far East Chinese English Dictionary - 遠東漢英大辭典 by LiangShiQiu
One of the most popular dictionaries, Traditional Chinese to English
The PinYin Chinese English Dictionary - HanYingCiDian - 漢英詞典
A popular dictionary, Simplified Chinese to English, also discusses simplification standards.
Cursive and Super Cursive Chinese
When Chinese write characters, they may write quickly so that the strokes run together. This is called cursive Chinese, XingShu 行書 “running script”. Chinese over the years have devised a number of very cursive forms called “super cursive”, called CaoShu 草書 “grass script”. The word grass refers to the fact that it resembles flowing grass. The earliest form date back to 200 BC and is called ZhangCao 章草, documentary grass script. This is a modification of LiShu. The most prevalent form of super cursive is JinCao 今草, modern grass script. It was pioneered by WangXiZhi 王羲之 321-379 AD. It is still used today. The third style was used in the Tang dynasty 618-905 AD it is called KuangCao 狂草, erratic grass script. There are rules for super cursive and if you do not understand them you can not understand the writing. Most modern Chinese are limited in the amount of super cursive Chinese they can read. Still a fair percentage can read it. Super cursive is used to allow for fast writing and it is also simplified. super cursive dose not fit the simple stroke concept of printed Chinese. At some times in the past people have taken the super cursive form of character and re-strokified them resulting in a simplified printed form. This process is called CaoShuKaiHua 草書楷化 super cursive print formation. This is where many of the modern simplified characters come from. So to understand the etymology of Simplified Chinese it is necessary to understand something about CaoShu.
草字基本符號硏究 (上,中,下) by 趙緟華 and 任漢平
One of the best Chinese discussions of super cursive Chinese
行草讀本 Chinese Cursive Script An introduction to Handwriting in Chinese by FangYuWang
One of the best English discussions of super cursive Chinese
中國草書大字典 李志賢 蔡錦寳 張景春 編主
Large Chinese dictionary of super cursive samples
No one can control the set of characters that people actually write with. So when the Communist Chinese in 1956 decided to tell people how to simplify there language, at first they could only offer some general rules. By the 1980s we have the advent of two computerized character sets that by default are supposed to represent Simplified and Traditional characters.
Reduction in character number
Part of the attempt to make a Simplified Chinese is to reduce the number of characters in common use.
The GB2312-80 character set adopted on December 23, 1980 has 6766 characters. GB means GuoJiaBiaoJun 国家标准 “National Standard”. It is quite adequate for most people. Some problems are that Chinese people like to use rare characters in their names, and those people usually have to find another character that has the same pronunciation or the same meaning. Some place names used old characters and had to change their names. If you wanted to use old or rare characters from ancient literature, you just had to figure out some way around the issue, rewrite the poem, or spell it out or use modified characters or something. In any case 6,766 characters is quite enough for most people to function with. News papers, from time to time, have been strongly encouraged to limit the number of characters even more to 3500, since even 3500 is adequate for good literacy if you make a few adaptations.
The Big5 standard for traditional Chinese put together by the then top 5 computer companies in Taiwan has 13053 characters. Of them 5401 common Chinese characters are arranged in hexadecimal pages A4-C6 and 7652 less common Chinese characters are arranged in hexadecimal pages C9-F9, If you are a literature major even this number is inadequate, We really need the 56,000 characters from the HanYuDaZiDian. If you are an ordinary literate adult, this is more than you are likely to ever use. This means that more than half the traditional characters have no standard simplified form.
Are all simplified characters actually rare traditional characters ?
By a very long stretch of the imagination, this is true. Some simplifications are actually reverting back to older forms. Some simplifications are rare and very non standard monstrosities that have been seen somewhere in history. Some are actually re-strokefication of known super cursive forms to make new Kai type characters It is true that all have some kind of historical justification.
350 Unique Simplifications
There are a set of 350 stand alone unique simplifications. That is the characters are simplified but it is independent of seeing that character as part of another character. In a few cases there is more than one character that gets simplified to the same character. 366 characters get simplified to 350 new characters.
132 Radical and Stand alone Simplifications
There are 132 simplifications in which the stand alone character and any contextual occurances of the character are simplified.
Simplest form of common alternates.
144 simplified characters are different from traditional in that they are the simplest of several common forms. Most Chinese are unaware of which are simplified are which are triditional and these are not specifically defined by the Chinese government, they just happen to be different in the Big5 vs. GB character sets.
Many of the characters have no different simplified form. They were considered simplified enough already. So 6,766 simplified GB characters correspond to 6,883 traditional Big5 characters. 4,411 of the traditional Chinese characters have the same 1-1 simplified equivalent excluding trivial style differences. We can now consider 2,355 simplified characters to 2,522 traditional characters which are different. The rest are unsimplified
1 to N simplification
Sometimes multiple traditional characters were simplified to one character. This accounts for the disappearance of 188 characters which are in Big 5 classical set which have converged simplified forms in GB
In the large character sets you are talking about characters which most people do not know. The Ministry of education defines 4808 traditional characters which a student should know to get out of high school. If you know all of these characters you can look a Chinese in the eye and say “I am adult literate”. You will still occasionally run into characters outside of this set.
To completely understand these characters you must realize that many characters have multiple pronunciations called PoYinZi 破音字 “multiple pronunciation characters”. Most of the time these differences in pronunciation are trivial differences based on where the character is used. Sometimes the differences are not so trivial. Sometimes the differences in pronunciation are an indication that the modern character may have been derived from two different ancient characters. This lresults in the list of 4808 basic characters becomming 5300 character-pronunciation combinations.
Modern Chinese Characters 现代汉字 by Yin Binyong and John S Rohsenow
A good English discussion of Chinese characters and simplification.
简化字源 by LiYaoYi 李乐毅 The Origins of Simplified Chinese Characters
A good Chinese discussion of the simplification story.
In 221 BC Chin Shi Huang 秦始皇 came to power and declared that the proliferation of Chinese characters had become too complicated. He assigned his Prime Minister LiSi 李斯 to make a standard set of official characters. He also declared that all the old documents should be destroyed. This unification and 2200 years of history mean that very few written artifacts survive from before 221 BC. The characters of this time are well known and understood thanks to the dictionary by XuShen 許慎 called the ShuoWenJieZi 說文解字 written in about 147 AD. Our earliest copy dates to the Song dynasty but we think the existing copies are fairly accurate accounts of the original and of the time. This style of characters lasted until about 200 AD, but have been used continuously for some official documents and for official seals, thus the name seal characters. The proper name should be Chin-Han characters.
In my research I use several sources for Chin-Han characters. The ShuoWen is like the Rosetta stone of Chinese. Without it, it would have been almost impossible to decipher the texts of the Zhou and Shang Dynasty. It is also apparent that XuShen had little or no access to texts before 221 BC. When we compare Usher’s description to earlier archaeological artifacts we find many, perhaps 30% of the descriptions have some degree of error ranging from minor to just wrong. XuShen is still a great man, the Galileo of Chinese etymology.
ShuoWenJieZi 說文解字 The earliest complete 987 copy by XuXuan 徐鉉
My main seal character database comes from the 11109 clearly printed characters found in this version of the ShuoWen
ShuoWenJieZi 說文解字 The standard 1815 copy by 段玉裁
This version discuses slightly fewer characters but is probably the standard version of the ShuoWen
LioShuTong 六書通 A Ming Dynasty collection of non standard seal type characters
My extended database of seal characters takes 38,596 characters from this source
Chinese Characters Their Origin, Etymology, History, Classification and Signification
by Dr. L. Wieger, S.J
The most comprehensive English discussions of seal characters mainly from the ShuoWen point of view.
Actually the Zhou Dynasty ended in 255 BC but the seal characters were not standardized until abut 221 BC. From the beginning of the Zhou Dynasty 周朝 to the ChinShiHuang 秦始皇 unification people would have written on bamboo strips, but because of the ChinShiHuang destruction of books and 3000 years of time we have few samples from bamboo strips. What have survived are several thousand cast bronze articles with inscriptions of major events. We have excavated many of these objects and this is what we know about Zhou Chinese. We call these bronze characters, but we could just as well call them Zhou characters because they cover most of the Zhou Dynasty.
The peculiarities of bronze characters are:
the comparatively primitive bronze casting technology of that time means that we can-not depend on the characters to be as accurate as they would be if they were written on bamboo. They have casting flaws.
they have undergone 2000 to 3000 years of corrosion which further deteriorates their condition.
some of these objects were excavated recently, and thus we can depend on their authenticity. Others have been around for hundreds of years and may be forgeries. The making of forgeries was particularly prominent during the Tang Dynasty 600 A.D. to 900 A.D.
the inscriptions range from single characters on coins to several hundred characters on some large bronze objects. One of the main references the JinWenBian covers about 4000 objects. 24,223 different sample characters in all, representing about 4000 different characters.
since these inscriptions mainly commemorate important events, we may not find some of the every day characters that were in use.
these few artifacts range over the entirety of China and over a thousand year period. This is good in that it gives us a large range of samples, but not good in that we can not get an extensive sample of any one place or time.
ShuShen describes a type of characters called greater seal characters. These were the type of characters that were supposed to be used during the Zhou Dynasty. They are often quite different than the real samples we find in the bronze characters.
JinWenBian 金文编 by RungGeng 容庚
Used for my database of 24,223 bronze characters.
This is the most accurate book of character samples from the bronze artifacts.
JinWenGuLin 金文詁林補 8 volumes by ZhouFaGao 周法高
Most extensive Chinese discussion and interpretation of bronze characters
JinWenDaZiDian 金文大字典 3 volumes
JinWenZongJi 金文總集 10 volumes
This is the most extensive collection of photographs and sentence collections of bronze artifacts.
Oracle bones were only discovered in 1895. When we say oracle bones, we mean either the front plates (plastrons) of turtle shells, or the shoulder bones (scapula) of oxen. The people of the Shang Dynasty would cut inscriptions in the bone or shell with a sharp object, and then see how the bone broke when exposed to fire. In this way, they would attempt to cast fortunes. The uneducated Chinese of the 19th century who first found these bones thought they were dragon bones and ground them up for traditional medicine. The writing was obviously not readable to them. We have been studying them and digging them up and trying to put them together now for a hundred years. We can understand somewhat over half of the character samples, which means we can understand around 95 percent of the text.
Peculiarities of oracle characters are:
the oracle bones and turtle plastrons all come from one excavation site. If it were not for this one site, we would have no direct proof that the Shang Chinese were really literate. The shells cover a period of about 200 years from about 1300 B.C. to about 1100 B.C. The advantage of this is that we have a small number of writers, all from one place, and extending over a relatively short period of time. This gives us a kind of average and we can at least talk about how the people of that time and place wrote.
the pieces are a real mess. By some estimates, a total of 400,000 pieces were found. Several thousand plastrons and bones have been reconstructed, and several tens of thousands of sentences have been studied. I have compiled a database of 31,876 sample characters that represent about 4000 different characters of which we think we understand between 1500 and 2000.
from the analysis of characters like dian 典, we believe that the usual writing medium for the time was the bamboo strips. The first actual examples of bamboo strips we have date back to about 400 BC. So by that time we already have almost a thousand years of Chinese for which we have proof that writing existed, but for which there is not one single bamboo strip.
the characters of 1300 BC have already undergone a high degree of abstraction. When we are told what they represent and how they are supposed to be interpreted it seams in most cases fairly obvious. Unlike Egyptian hieroglyphs, it is not obvious, however, to a casual observer what most of the characters represent. This is an indication that the writing system had already been around for a long time.
It is believed that spoken language developed a little at a time. A language with 10 words is more useful than a language with no words. 100 words are better, and so on. With written language on the other hand, a written system that can not represent at least the majority of the spoken language is virtually useless. Imagine a written language that can only represent half of the concepts that you can talk about. Why bother to learn it.
the purpose of the oracle bones was to cast fortunes. There was a lot of writing done here, but it is like the vocabulary you might find in a horoscope. We can assume that they probably had many characters for more every day common things that never appeared on the oracle bones. We might be able to extract 5000 characters from the oracle bones, but there were probably twice that many in use at the time.
The traditional story says that a man named Chang Jie 倉頡 invented the writing system around 3000 BC. You can only say so much with paintings, and tokens. I think that when an innovative artist found that he might represent words with basic symbols and phonetic parts, he and probably a group of people were commissioned to invent and learn a writing system for practical purposes.
We need to be careful about copying these characters so that we do not influence the form by our own interpretation of the character which may be wrong. The following two are the most accurate books of character samples from the oracle artifacts.
JaGuWenBian 甲骨文编 by ShunHaiBuo 孙海波
XuJaGuWenBian 續甲骨文编 by JinXiangHeng 金祥恒
My database of 31,876 oracle characters is taken from this reference.
JaGuWenZiJiShi 甲骨文字集釋 13 volumes by LiXiaoDing 李孝定
An extensive Chinese discussion of the interpretation of Oracle characters
殷墟甲骨刻辭纂 3 volumes, Thousands of real oracle sentences form archeological sources
YinXuJaGuWenHeJi 殷墟>字合集 13 volumes
There may still be questions or discrepancies since this is still an area of research. One will want to see the original objects and sentences This is the largest resource for the original pictures