The new London–Lund Corpus (LLC–2): Design, compilation, access

Research output: Contribution to conferenceAbstract


abstract = "This talk reports on the compilation of the new London–Lund Corpus (LLC–2) –a corpus of contemporary spoken British English, collected 2014–2019. The sizeand design of LLC–2 are the same as that of the world{\textquoteright}s first corpus of spokenlanguage, namely the London–Lund Corpus (LLC–1), with spoken data mainlyfrom the 1960s. In addition to the fact that we have a corpus of contemporary speech, the existence of LLC–2 also gives researchers the opportunity to make principles diachronic comparisons of speech over the past 50 years and detect change in communicative behaviour among speakers.The compilation of LLC–2 has included a number of different stages such as data collection, transcription of the recordings, markup and annotation, and finally making the corpus accessible to the research community. The talk describes and critically examines the methodological decisions made in each stage. For example, it was important to strike a balance between LLC–2 as a representative collection of data of contemporary spoken English and its comparability to LLC–1. Therefore, both corpora contain the same speech situations (dialogue, mainly everyday face-to-face conversation, as well as monologue), but the specific recordings added to LLC–2 also reflect the technological advances of the last few decades, particularly with respect to speech situations such as telephone calls (e.g., Skype) and broadcast discussions and interviews (e.g., podcasts). Moreover, the transcriptions in LLC–2 are orthographic and time-aligned with the corresponding sound files, which is a feature of the corpus that is novel and makes it possible to, among other things, investigate prosody and dialogue management among speakers with great precision. The corpus, as well as metadata about the transcriptions and the speakers, will be released to the public in late 2019 from the Lund University Humanities Lab{\textquoteright}s corpus server. The release will fill an unfortunate gap in the availability of spoken corpora for linguistic analysis. The benefits of spoken corpora in general and of LLC–2 in particular will be demonstrated in the talk through examples of case studies based on the corpus (e.g., P{\~o}ldvere & Paradis, 2019a, 2019b). The case studies illustrate how LLC–2 can contribute to our understanding of meaning-making and discursive practices in real communication and provide a window into the cognitive and social processes of dialogic interaction, both from a contemporary and a back-in-time perspective.",
note = "International symposium on spoken language across time ; Conference date: 20-09-2019 Through 20-09-2019",
