German-English contrasts in cohesion – Towards an empirically-based comparison (GECCo)

Kohäsion im Deutschen und Englischen – ein empirischer Ansatz zum kontrastiven Vergleich

GECCo-Homepage

Funded by DFG - German Research Foundation

Project Start: 1 April 2011

1. phase: April 2011 - June 2013
2. phase: July 2013 - January 2017


Background and Motivation

With the design of the GECCo corpus, we provide a resource allowing a contrastive investigation of cohesion in English and German. The findings of the GECCo-project complement insights in English-German contrastive linguistics on the level of lexicogrammar (e.g. by Hawkins, Rohdenburg, Mair, House, Doherty, Steiner & Teich, König & Gast, Hansen-Schirra, Neumann and Steiner, Fischer), focusing on textual contrasts in cohesion as an area of linguistic research. The findings gained from the GECCo corpus are aimed at enriching contrastive grammars with an empirically-based account of cohesion in English and German.

GECCo also contributes to our understanding of how contrasts on the level of cohesion affect text production and reception in different situations of language use. The corpus and the results obtained on its basis are a resource both for language teaching and for translator and interpreter training. Our corpus findings offer new incentives for modeling translation errors and for elaborating translation strategies and thus may complement work on translation quality assessment.

The GECCo corpus design, its annotation techniques and extraction pipelines offer innovative tools to explore linguistic phenomena including, but not limited to, cohesion. The semi-automatic annotation of the corpora under analysis, for example, allows automatic extractions from parallel corpora. The semi-automatic exploitation of parallel corpora enables an automatic generation of those features that distinguish translated texts from originals. Such features are required for the improvement the quality of machine translation output and further statistical applications. Additionally, automatic extractions from both parallel and comparable corpora can contribute to multilingual systems of textual entailment.

Summarizing, work in GECCo addresses the following questions/issues:

  1. Which cohesive resources are provided by the language systems of English and German and to which extent are they instantiated in different registers (text types) and modes (spoken vs. written)?

  2. Which types of information must be annotated in a corpus for the semi-automatic analysis of cohesive phenomena, as well as its further applications in areas of statistical NLP approaches?

  3. Which are good corpus structures and architectures for efficient and easy querying with the processing tools available?