3Ts of CL

A chronology of corpus research since pre-electronic age (collated by Jiajin Xu of Beijing Foreign Studies University)

1583/1828

The Latin word ‘corpus’ was used to refer to a collection of documents in addition to its etymological sense of human body. (Source: Gothofredi, D. 1583/1828. Corpus Juris Civilis Romani (Tomus Primus). Neapoli: Apud Januarium Mirelli Bibliopolam.)

1820 or earlier

John Freeman compiled a frequent list to teach adults to read. (Source: ‘A method of teaching adult persons to read ; which is designed to obviate their objections and accelerate their progress.’ Reprinted as ‘On grammalogues: To the Editor of the Phonotypic Journal. The Phonotypic Journal 2(24): 170-171.\’)

1838/1843

Sir Issac Pitman, alphabetic and numerical arrangements of frequent words based on 10,000 words, taken from 20 books, 500 from each. (Source: Pitman, I. 1843. List of words from which grammalogues may be selected. The Phonotypic Journal 2(23): 161-163.)

1897/1898

Kaeding, F. Häufigkeitswörterbuch der Deutschen Sprache. Berlin: Self-published.

1922

According to Malinowski (1922: 18-19), [he] was thus acquiring…an abundant linguistic material, and a series of ethnographic documents…. This corpus inscriptionum Kiriwiniensium…[a] collection of…characteristic narratives, typical utterances,…, as documents of native mentality. (Source: Malinowski, B. (1922). Argonauts of the Western Pacific. London: Routledge & Kegan Paul Ltd.)

1935

Zipf, George Kingsley. (1935). The Psycho-biology of Language: An introduction to dynamic philology. Boston: Houghton Mifflin Company. There is a 1936 version by George Routledge and Sons, Ltd and a 1968 version published by the MIT Press.

1956

The analysis here presented is based on the speech of a single informant…and in particular upon a corpus of material, of which a large proportion was narrative, derived from approximately 100 hours of listening. (Source: At page 128 of Allen, W. (1956). Structure and system in the Abaza verbal complex. Transactions of the Philological Society 55(1): 127-176.)

Whatmough, Joshua. (1956). Poetic, Scientific and Other Forms of Discourse: A new approach to Greek and Latin literature. Berkeley: University of California Press.

1964

The completion of the Brown Corpus (A Standard Corpus of Present-Day Edited American English) project. (Source: Francis, N. & H. Kučera. (1967). Computational Analysis of Present-day American English. Providence: Brown University Press.)

1966

Herdan, Gustav. (1966). The Advanced Theory of Language as Choice and Chance. Berlin: Springer.

1982

Aarts, J. & T. van den Heuvel. (1982). Grammars and intuitions in Corpus Linguistics. In S. Johansson (ed.). Computer Corpora in English Language Research. Bergen: Norwegian Computing Centre for the Humanities. 66-84.

(to be updated)

Corpora compiled by CLSC members

JDEST (Jiao Da English for Science and Technology):

New JDEST:

CLEC:

SWECCL:

Crown: A Brown family American English corpus of one million words published largely in 2009, developed under the leadership of Jiajin Xu and Maocheng Liang. An article describing the corpus was published in the 2013 issue of ICAME Journal. Download Crown (18.2MB). Crown and CLOB corpora based publications can be found here. Please find a detailed description of Crown corpus at CoRD corpus resource database of Helsinki University.

CLOB: A Brown family British English corpus of one million words published largely in 2009) developed under the leadership of Jiajin Xu and Maocheng Liang. An article describing the corpus was published in the 2013 issue of ICAME Journal. Download CLOB (18.2MB). Crown and CLOB corpora based publications can be found here. Please find a detailed description of CLOB corpus at CoRD corpus resource database of Helsinki University.

The TECCL corpus: Ten-thousand English Compositions of Chinese Learners

 

Corpus tools and systems developed by CLSC members

BFSU PowerConc

  1. China Association for Comparative Studies of English and Chinese 中国英汉语比较研究会
  2. Asia Pacific Corpus Linguistics Association 亚太语料库语言学协会 APCLA
  3. Learner Corpus Association 学习者语料库协会 APCLA
  4. Chinese National Corpus 国家现代汉语语料库
  5. corpus4u.org 语料库语言学在线
  6. 语料天涯(Corpora A-Z)
  7. David Lee\’s Bookmarks for Corpus-based Linguists
  8. Beiwai Corpus Research Group 北外语料库语言学沙龙
  9. A (brief) History of Computerised Corpus Tools at TimeMapper