Registers in contact

Registers in contact: linguistic evolution of specialized scientific registers
(supported by DFG 2010 - 2013)

The topic of the present project is the linguistic evolution of functional variation in highly-specialized scientific domains. The project aims to analyze domain-specific variation emerged through register contact in a corpus of English scientific texts. Our focus lies on scientific domains or disciplines at the boundaries of computer science (i.e. computational linguistics, bioinformatics). The central question of the project is: what are the linguistic means used to create a distinctive identity of disciplines emerged through register contact? From a linguistic point of view the subject of study is a phenomenon of recent language change, not related to the language system, but rather to language use. For this purpose, we use an already existing synchronic corpus (DaSciTex), which consists of English scientific texts of the early 2000s, and expand it diachronically (1970s/1980s) to the SciTex corpus (English Scientific Text Corpus). Our aim is to use the corpus in order to empirically analyze the linguistic evolution of selected scientific registers. Methodologically, the study is based on English register theory. We employ methods already established within corpus linguistics as well as new probabilistic methods for the identification of text similarities.

 

Corpus structure and corpus size

SciTex contains the following subcorpora:

  • computer science (A subcorpus)
  • four contact disciplines (B subcorpus)
    (computational linguistics, bioinformatics, digital construction and microelectronics)
  • four disciplines of origin  (C subcorpus)
    (linguistics, biology, mechanical engineering and electrical engineering)

SciTex is divided into DaSciTex (ealry 2000s) and SaSciTex (1970s/1980s). The corpus as a whole amounts at approx. 34 million tokens.

The corpus consists of:

  • two small, cleaned up corpora with approx. 1 million tokens each for grammatical analyses (one for the 70/80s and one for the early 2000s) and
  • two big corpora (70/80s and early 2000s) with 17 million tokens each for lexical analyses (the small corpora being included in the big ones)


Project related publications:

Teich, E., Degaetano-Ortlieb, S., Fankhauser, P., Kermes, H., and Lapshinova-Koltunski, E. (2014). The Linguistic Construal of Disciplinarity: A Data Mining Approach Using Register Features. Journal of the Association for Information Science and Technology (JASIST). To appear.

Degaetano-Ortlieb, S., Fankhauser, P., Kermes, H., Lapshinova-Koltunski, E., Ordan, N. and Teich, E. (2014). Data Mining with Shallow vs. Linguistic Features to Study Diversification of Scientific Registers. Proceedings of the 9th edition of the Language Resources and Evaluation Conference (LREC 2014). Reykjavik, Iceland. URL: www.lrec-conf.org/proceedings/lrec2014/pdf/291_Paper.pdf

Degaetano-Ortlieb, S. and Teich, E. (2014). Register diversification in evaluative language: The case of scientific writing. In Geoff Thompson and Laura Alba-Juez (eds). Evaluation in Context. John Benjamins Publishing Company, pp. 241-258. URL: https://benjamins.com/#catalog/books/pbns.242.12deg/details

Fankhauser, P., Knappen, J. and Teich, E. (2014). Exploring and Visualizing Variation in Language Resources. Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14). Reykjavik, Iceland. URL: http://www.lrec-conf.org/proceedings/lrec2014/pdf/185_Paper.pdf

Fankhauser, P., Kermes, H. and Teich, E. (2014). Combining Macro- and Microanalysis for Exploring the Construal of Scientific Disciplinarity. Digital Humanities Conference. Lausanne, Switzerland. URL: dharchive.org/paper/DH2014/Poster-126.xml

Degaetano-Ortlieb, S., Kermes, H., Lapshinova-Koltunski, E. and Teich, E. (2013). SciTex - A Diachronic Corpus for Analyzing the Development of Scientific Registers. In Bennett, P., Durrell, M., Scheible, S. and Whitt, R. J., eds. New Methods in Historical Corpus Linguistics. Corpus Linguistics and Interdisciplinary Perspectives on Language - CLIP, Volume 3. Tübingen, Narr.

Degaetano-Ortlieb, S., Kermes, H. and Teich, E. (2013). The notion of importance in academic writing: detection, linguistic properties and targets. Proceedings of the 2nd Workshop on Practice and Theory of Opinion Mining and Sentiment Analysis (PATHOS). Darmstadt, Germany.

Degaetano-Ortlieb, S., Kermes, H., Lapshinova-Koltunski, E. and Teich, E. (2013). Automatic text classification for diachronic register analysis. 24th European Systemic Functional Linguistics Conference and Workshop (ESFLCW2013). Coventry, UK.

Degaetano-Ortlieb, S., Lapshinova-Koltunski, E., Kermes, H. and Teich, E. (2013). Procedures for Automatic Corpus Enrichment. Corpus Linguistics. Lancaster, UK.

Degaetano-Ortlieb, S. and Teich, E. (2013). A methodology to analyze evaluation across scientific disciplines - feature detection, extraction and annotation. Corpus Linguistics - Workshop Evaluative Language and Corpus Linguistics. Lancaster, UK.

Degaetano-Ortlieb, S. and Teich, E. (2013). Detection, extraction and annotation of evaluative expressions in a corpus of academic writing. Deutsche Gesellschaft für Sprachwissenschaft (DGfS) Sektion Computer Linguistic - Postersession. Potsdam, Germany.

Lapshinova-Koltunski, E., Degaetano-Ortlieb, S., Kermes, H. and Teich, E. (2013). Linguistic evolution of conjunctive relations in emerging scientific registers. In: F. Poppi and W. Cheng (eds). The three waves of globalization: winds of change in Professional, Institutional and Academic Genres. Cambridge Scholars Publishing, Cambridge, UK.

Lapshinova-Koltunski, E., Degaetano-Ortlieb, S., Kermes, H. and Teich, E. (2013). Usefulness of Corpora Enriched with Annotations on Abstract Linguistic Levels. Genre- and Register-related Text and Discourse Features in Multilingual Corpora. Brussels, Belgium.

Teich, E., Degaetano-Ortlieb, S., Kermes, H. and Lapshinova-Koltunski, E. (2013). Scientific registers and disciplinary diversification: a comparable corpus approach. Proceedings of 6th Workshop on Building and Using Comparable Corpora (BUCC). Sofia, Bulgaria.

Degaetano-Ortlieb, S., Lapshinova-Koltunski, E. and Teich, E. (2012). Domain-specific variation of sentiment expressions: exploring a model of analysis for academic writing. 1st Workshop on Practice and Theory of Opinion Mining and Sentiment Analysis (PATHOS) at Konvens2012. Vienna.

Degaetano-Ortlieb, S., Lapshinova-Koltunski, E. and Teich, E. (2012). Feature Discovery for Diachronic Register Analysis: a Semi-Automatic Approach. Proceedings of the LREC2012. Istanbul.

Kermes, H. (2012). Formulaic expressions: in this paper but where? Proceedings of ICAME 33. Leuven.

Kermes, H. (2012). A methodology for the extraction of information about the usage of formulaic expressions in scientific texts. Proceedings of LREC 2012. Istanbul.

Kermes, H. and Teich, E. (2012). Formulaic expressions in scientific texts: Corpus design, extraction and exploration. Lexicographica, Volume 28(1). De Gruyter, pages 99-120. URL: http://www.degruyter.com/view/j/lexi.2012.28.issue-1/lexi.2012-0007/lexi.2012-0007.xml?format=INT

Lapshinova-Koltunski, E., Teich, E. and Degaetano-Ortlieb, S. (2012). Tracing 'hybridity' in academic discourse: a corpus-based approach. Proceedings of the European Systemic Functional Linguistics Conference and Workshops (ESFLCW 2012). Bertinoro.

Lapshinova-Koltunski, E., Teich, E. and Degaetano-Ortlieb, S. (2012). Terminology now and then: changes across periods in academic writing. Proceedings of the ICAME 33. Leuven.

Lyding, V., Lapshinova-Koltunski, E., Degaetano-Ortlieb, S., Dittmann, H. and Culy, C. (2012). Visualising Linguistic Evolution in Academic Discourse. Proceedings of the EACL 2012. Avignon.

Teich, E., Degaetano-Ortlieb, S., Lapshinova-Koltunski, E. and Kermes, H. (2012). Register contact: an exploration of recent linguistic trends in the scientific domain. Proceedings of Historical Corpora 2012. Frankfurt.

Bartsch, Sabine, and Elke Teich, 2011. Register profiling of scientific texts: Experiences in linguistic description and corpus-based methods. ICAME32: Trends and Traditions in English Corpus Linguistics, In Honour of Stig Johansson. 127-128. Oslo, Norway. 1-5 June.

Bartsch, Sabine, and Elke Teich, 2011. Register Profiling for Highly Specialized Domains: Methods and Techniques. Anglistentag 2011, Sektion 'Approaches to Linguistic Variation'. Freiburg Brsg., Germany. September.

Bartsch, Sabine, Teich, Elke, Tragl, Christoph, 2011. Patterns of cohesion in informationally dense texts. Corpus Linguistics (CL2011). Birmingham. 20-22 July.

Degaetano-Ortlieb, Stefania, Hannah Kermes, Ekaterina Lapshinova-Koltunski and Elke Teich, 2012. SciTex – A Diachronic Corpus for Analyzing the Development of Scientific Registers. In: P. Bennett, M. Durrell, S. Scheible and R. J. Whitt, editors. Corpus Linguistics and Interdisciplinary Perspectives on Language - CLIP, Vol. 2. New Methods in Historical Corpus Linguistics. Narr. Tübingen, Germany.

Degaetano-Ortlieb, Stefania, Ekaterina Lapshinova-Koltunski, Elke Teich, 2012. Feature Discovery for Diachronic Register Analysis: a Semi-Automatic Approach. In Proceedings of the LREC-2012. Istanbul. 21-27 May.

Degaetano, Stefania, and Elke Teich, 2011. The lexico-grammar of stance: an exploratory analysis of scientific texts. In: St. Dipper and H. Zinsmeister, editors. Bochumer Linguistische Arbeitsberichte 3 - Beyond Semantics: Corpus-based Investigations of Pragmatic and Discourse Phenomena. 23-25 February.

Degaetano, Stefania, Teich, Elke, 2011. Exploring a semi-automatic approach for the analysis of interpersonal meaning in large corpora. International Systemic Functional Linguistics Conference (ISFC38). Lissabon. 25-29 July.

Degaetano, Stefania, 2011. Evaluative options and their choice - modal adjuncts vs. evaluative patterns in academic writing. International Evaluation Conference 2011 (IntEval). Madrid, Spain. October.

Degaetano, Stefania, 2011. Evaluation across scienti c disciplines - a corpus-based analysis. Corpus Linguistics (CL2011). Birmingham. 20-22 July.

Kermes, Hannah, 2012. Methodology for the extraction of information about the usage of formulaic expressions in scientific texts. In Proceedings of the LREC 2012. June.

Kermes, Hannah, 2012. Formulaic expressions: in this paper but where? In Proceedings of the ICAME 33. May.

Kermes, Hannah, 2011. Usage and function of formulaic expressions in scientific texts. Corpus Linguistics 2011. 86-87. Birmingham, UK. July.

Kermes, Hannah and Elke Teich. Formulaic expressions in scientific texts: Corpus design and extraction pipeline. Lexicographica. to appear.

Lapshinova-Koltunski, Ekaterina, Stefania Degaetano-Ortlieb, Elke Teich and Hannah Kermes, 2012. Usefulness of Corpora Enriched with Annotations on Abstract Linguistic Levels. In Proceedings of Genre- and Register-related Text and Discourse Features in Multilingual Corpora. 11-12 January.

Lapshinova-Koltunski, Ekaterina, Elke Teich and Stefania Degaetano-Ortlieb, 2012. Terminology now and then: changes across periods in academic writing. In Proceedings of the ICAME 33. May.

Lapshinova-Koltunski, Ekaterina, Elke Teich and Stefania Degaetano-Ortlieb, 2012. Tracing 'hybridity' in academic discourse: a corpus-based approach. In Proceedings of the ESFLCW2012. July.

Lapshinova, Ekaterina, Degaetano, Stefania, Teich, Elke, 2011. Interdisciplinarity in academic discourse - a corpus-based analysis. Interdisciplinary Linguistics Conference (ILinC2011). Belfast. 14-15 Oktober.

Lyding, Verena, Ekaterina Lapshinova-Koltunski, Stefania Degaetano-Ortlieb, Henrik Dittmann and Christopher Culy, 2012. Visualising Linguistic Evolution in Academic Discourse. In Proceedings of the EACL2012. Avignon, France. 23-27 April.

Teich, Elke, Ekaterina Lapshinova, Stefania Degaetano, 2012. Terminology now and then: changes across periods in academic writing. In Proceedings of the ICAME-33. Leuven. June.

Teich, Elke, Ekaterina Lapshinova, Hannah Kermes and Stefania Degaetano, 2011. Linguistic evolution of emerging scientific registers. Clavier2011. Modena, Italy. November.

Information about the prior project “Linguistic profiles of interdisciplinary registers” can be found here."

Register im Kontakt: Zur Genese spezialisierter wissenschaftlicher Diskurse
(supported by DFG 2010 - 2013)

Thema des vorliegenden Projektes ist die Genese funktionaler Variation (Register) in hochspezialisierten wissenschaftlichen Domänen. Anhand eines Korpus englischsprachiger wissenschaftlicher Texte soll durch Registerkontakt entstehende fachsprachliche Variation untersucht werden. Dabei werden solche wissenschaftlichen Fachgebiete bzw. Disziplinen fokussiert, die an der Schnittstelle zur Informatik liegen (z.B. Computerlinguistik, Bioinformatik). Die Leitfrage hierbei lautet: Mit welchen sprachlichen Mitteln konstruieren solche, durch disziplinäre Grenzüberschreitung entstandenen Fachgebiete ihre distinktive Identität? Linguistisch gesehen handelt es sich bei dem vorliegenden Untersuchungsgegenstand um ein Phänomen des rezenten Sprachwandels, hier aber nicht bezogen auf das Sprachsystem, sondern zunächst auf den Sprachgebrauch. Dazu wird ein bestehendes synchrones Korpus (DaSciTex) englischsprachiger wissenschaftlicher Texte (2000er Jahre) diachron erweitert (1960/70er Jahre). Ziel ist es, an diesem Korpus die Genese ausgewählter wissenschaftssprachlicher Register empirisch nachzuvollziehen. Methodisch ist das Vorhaben der anglistischen Registerlinguistik verpflichtet. Es kommen in der Korpuslinguistik gängige Methoden sowie neuere probabilistische Verfahren zur Bestimmung von Text(klassen)ähnlichkeit zur Anwendung.

Informationen zum Vorgängerprojekt "Linguistische Profile interdisziplinärer Register"