浙江大学学报(人文社会科学版)
 
   2025年5月24日 星期六   首页 |  期刊介绍 |  编委会 |  投稿指南 |  信息服务 |  期刊订阅 |  联系我们 |  预印本过刊 |  浙江省高校学报研究会栏目 |  留言板 |  English Version
浙江大学学报(人文社会科学版)
在线优先出版论文 最新目录| 下期目录| 过刊浏览| 高级检索 |
名词分布是人类语言的不变量吗?——以德语书面语中名词分布为例
李媛 段庭辉 刘海涛
Is the Distribution of Nouns an Invariant in Human Languages? — An Investigation Based on Written German Corpora
Li Yuan Duan Tinghui Liu Haitao

全文: PDF (1553 KB)   RICH HTML
输出: BibTeX | EndNote (RIS)      
摘要 

此前对人类自然语言中词类分布的研究显示,不同语言中名词所占比例相对固定。德语中名词所占比例是否也符合这一普遍规律?通过对三个大型德语语料库进行研究发现:首先,德语书面语中的名词占比约为38%,尽管德语复合名词比例高、名词化结构多,但其名词占比同英语以及其他语言中的名词占比大致相当,从而进一步证实了人类自然语言中名词占比具有普遍规律这一结论;其次,不同文体中名词及其各子类的占比有所差异,而这一差异由文体特征决定,并且具有跨语言的相似性;最后,时间因素与文体类型均对名词各个子类占比有显著影响,但名词总体占比未受二者影响。综上,可以进一步证实名词分布是人类语言的不变量这一结论。

服务
把本文推荐给朋友
加入我的书架
加入引用管理器
E-mail Alert
RSS
作者相关文章
李媛 段庭辉 刘海涛
Abstract

Hudson indicates that the proportion of nouns in written English is about 37%. Since then, many other languages haven been studied in this respect, finding out that the proportion of nouns in all human languages is an invariant. German and English have differences in word formation, though they both belong to the West Germanic language subfamily. As for nouns, on the one hand, German has a larger proportion of compound nouns, resulting in intensive information, thus the total quantity of its nouns could be relatively smaller than that of other languages; on the other hand, nominalized structures are common in German, which may cause a larger proportion of nouns in comparison with other languages. Does German conform to the universal law of language? We try to answer this question based on three large-scale corpora of German: The DWDS-Kernkorpus consists of texts of different genres from the 20th Century and has more than 100 million words in total; The Deutsches Textarchiv (DTA) is a diachronic corpus of written German and contains about 150 million words from texts of the same genres as DWDS-Kernkorpus; The TüBa-D/Z treebank is a German newspaper corpus with more than 1.5 million words, containing 3,644 mainstream newspaper articles of ″Die Tageszeitung″ from 1989 to 1999. In order to make the results comparable, we adopted the same classification criteria for nouns and the part-of-speech tagsets suggested by Hudson. The result shows that the proportion of nouns in all three corpora of written German is about 38%. Thus, the above-mentioned hypothesis is corroborated.  Furthermore, we studied the relationship between the proportions of nouns in different genres. Differences exist between different genres in terms of the proportions of subclasses of nouns including common nouns, proper nouns and pronouns. While common nouns are larger in proportion in informational texts, imaginative texts have a larger proportion of pronouns. This result also complies with that of Hudson. Little work has previously been conducted with the diachronic development of language. In this study, we additionally explored the relationship between time and the proportion of nouns (and its subclasses) by analyzing texts from 1500 to 1950. While no big change of the total proportion of nouns in the last five hundred years was observed, there is a shift between the proportion of common nouns and that of pronouns. The proportion of common nouns has been increasing continuously from 14.02% at the beginning of 16th Century to about 24% in the 20th Century, whilst the proportion of pronouns has decreased from 16.66% to 10%. To our best knowledge, this diachronic tendency hasn't been addressed so far. We argue this tendency is caused by the social and technical development as well as the evolution of the language itself.  In conclusion, this study corroborated the hypothesis that the distribution of nouns in all human languages is an invariant. The proportion of subclasses of nouns in written German varies among genres and has changed a lot with time, although the general proportion of nouns remains the same. Moreover, we observed a continuous increase of the proportion of common nouns and a correspondingly decrease of the proportion of pronouns in written German in the last five hundred years. This interesting finding offers a new perspective to language evolution and quantitative linguistic research and deserves further studies.

Key wordsGerman    distribution of nouns    corpora    quantitative linguistic characteristic    genres    diachronic tendency   
     出版日期: 2019-11-10
引用本文:   
李媛 段庭辉 刘海涛. 名词分布是人类语言的不变量吗?——以德语书面语中名词分布为例[J]. 浙江大学学报(人文社会科学版), 2019, 5(6): 39-. Li Yuan Duan Tinghui Liu Haitao. Is the Distribution of Nouns an Invariant in Human Languages? — An Investigation Based on Written German Corpora. JOURNAL OF ZHEJIANG UNIVERSITY, 2019, 5(6): 39-.
链接本文:  
https://www.zjujournals.com/soc/CN/     或     https://www.zjujournals.com/soc/CN/Y2019/V5/I6/39
发表一流的成果,传播一流的发现,提供一流的新知

浙ICP备14002560号-5
版权所有 © 2009 浙江大学学报(人文社会科学版)    浙ICP备05074421号
地址:杭州市天目山路148号 邮编:310028 电话:0571-88273210 88925616 E-mail:zdxb_w@zju.edu.cn
本系统由北京玛格泰克科技发展有限公司设计开发  技术支持:support@magtech.com.cn