Print Friendly

Hu, Jiajia, Beihang University, China, hjj81@126.com
Wang, Ning, Beijign Normal University, China, niwangning@263.net

According to Saussure, there are two systems of writing: (1) The system commonly known as ‘phonetic’ tries to respond to the succession of sounds that make up a word. Phonetic systems are sometimes syllabic, sometimes alphabetic i.e. based on the irreducible elements used in speaking. (2) In an ideographic system, each word is represented by a single sign that is unrelated to the sounds of the word itself. Each written sign stands for a whole word and, consequently for the idea expressed by the word. The classic example of an ideographic system of writing is Chinese. Thus there is a common misunderstanding that treated each Chinese character as a separate sign, especially in Chinese information processing field, and caused a major problem namely the mass encoding set. Hanzi, however, as a kind of common information carrier created and shared by the whole Chinese society, is impossible to be disorderly and unsystematic. There must be some intrinsic laws which make Hanzi the most mature ideograhic system in the world. This paper aims to use some analysis methods for complex network to identify these intrinsic laws of Hanzi system.

Being the representative of ideographic systems of writing, the most noticeable feature of Hanzi is that each character’s graphic form was coined according to its original meaning. The elements of Hanzi graphic forms are called components which have certain functions when they were used to form characters. 梅, for example, is formed by 2 components 木 and 每, with 木 indicating 梅 is a name of tree and 每 indicating the pronunciation of 梅. A Chinese character may be formed only by one component, namely the character itself, and called single-element character; or may be formed by no less then 2 components and called compound-element character. The classical six principles for analyzing Hanzi graphic forms are know as ‘Liushu’. Modern study on the Hanzi graphic forms, on the basis of ‘Liushu’, puts forward a new theory that uses ‘component and component functions’ to analyze Hanzi graphic forms. A component could play a quadruple role in forming a Chinese character: (1) as a pictographic symbol; (2) as a semantic symbol; (3) as a phonetic symbol; (4) as a deictic symbol. They are called component functions. All Chinese characters formed by different components with different functions could be divided into eleven categories, called the formation modes, as listed in Table 1.

Table 1: Eleven Kinds of Hanzi Formation Modes
MODE DEFINITION ie
0 Single-element character
Deictic-pictographic composed of a pictographic component and a deictic component
Deictic-semantic composed of  a semantic component and a deictic component
Deictic-phonetic composed of a deictic component and a phonetic component
Pictographic-phonetic composed of a pictographic component and a phonetic component
Semantic-phonetic composed of a semantic component and  a phonetic component
Comprehensive composite with phonetic component composed of a phonetic components and other components
Pictographic composite composed of 2 pictographic components
Semantic-pictographic composed of  a pictographic component and a semantic component
Semantic composite composed of 2 semantic components
Comprehensive composite without phonetic component composed of different components except phonetic component

As a same component could be used to form different Chinese characters with other components, those characters having a common component (with the same function) could be linked to each other, and make the whole set of Hanzi a huge complex network, as Fig.1 shows. Modern Chinese graphology, based on comprehensive and detailed descriptions of Chineses characters’ components and components’ functions, has made some statistical analysis about the systematic features of Hanzi graphic forms, like the number of primitive components on Hanzi if different periods varied from 250 to 400; more than 87% Chinese characters (since seal script) are composed of a semantic component and a phonentic component, or in other words, are of semantic-phonetic mode. These conclusions, however, can not give further explanations on the structure profile of Hanzi System, like how are hundreds of thousands of Chinese characters composed of no more than 400 primitive components; what is the average distance between 2 characters, are there any clusters, what are their densities, do they show a strict hierachy, are there some centers and so on.

Figure 1

Figure 1: The Network of Hanzi Graphic Forms (Local and Global)

This paper views the system of Hanzi graphic forms as a complex network with each Chinese character linked to its components by an edge indicating the component’s function in the character (see Fig.1); and uses a series of network metics, such as the degree, the path length, the clustereing coefficient, the centrality, the coreness, the betweeness and so on, to analyze its topology features. This could introduce some more in-depth discussions on the structure and formation mechanism of Hanzi graphic form system and help us get more thorough understanding of the nature of Chinese. The innovation of this paper lies in the integration of the techniques of complex network theory and the scientific analysis of Hanzi graphic forms, which would bring some new insights in the study on Chinese graphology and provide some useful help in the teaching of Hanzi and Chinese information processing.