Corpus linguistics research
In corpus linguistics, we look at the way language is used in different regions, genres and situations.
Our research is based on huge datasets of natural language – often many billions of words – and we're asking questions about words and language choices, such as the implications and factors that lead to journalists referring to men and women differently in tabloid newspapers, and what the word 'we' tells us about how online groups come together.
Through our work, we're investigating pressing issues and helping to solve problems. For example, our work into the root causes and impact of online bullying, which analyses what causes the abuse, and why.
We are also looking at the language that's used when making business transactions. By researching this language, we can develop teaching materials to help organisations conduct these transactions more effectively.
The outputs from our research are frequently published in leading academic journals, such as Corpora and the International Journal of Management.
Our research covers the following topics
-
Corpus linguistics
Studying language in collections of real-world text, and the sets of rules that govern them, and how they relate to other languages -
Corpus-assisted discourse studies
Using collections of text to analyse written or vocal use of language, including writing, conversation and communication -
Corpus pragmatics
Working at the interface between corpus linguistics and pragmatics, we're covering topics such as turn-taking and politeness -
Corpus stylistics
Investigating the methods, techniques and tools of corpus linguistics, used in the study of literary style -
Lexical priming
A theory of language, based on how certain words are used in different combinations and patterns in the real world, and how this differs from traditional
approaches -
Lexical selection
Understanding how chunks of language seem to get selected and how this is similar to biological evolution -
Metaphor analysis
Understanding how people use metaphor to conceptualise society, such as relationships, crises, organisational change and grief
Our members
Methods and facilities
Different methods can be combined with corpus linguistics. Once the data set – or corpus – is built, we read concordance lines, run collocation analysis, keyword analysis and move between quantitative and qualitative techniques. We also have site licences for Sketch Engine and Lexis Nexis, which can be used to build corpora. Staff also have experience of developing bespoke tools, such as scraping online texts and converting files. We have also developed the world’s largest corpus of online discussions about citizen science – with over 10 million words.
Resources
Students and staff at the University of Portsmouth are offered free access to the following resources.
- Sketch Engine (free access through the university server).Through Sketch Engine you can access corpora in approximately 35 different languages and including some examples of parallel corpora and corpora of academic English. Staff and post-graduate researchers may request an individual user account from John Williams which will allow them to upload their own corpora.
- We have produced a guide to commonly needed Sketch Engine tasks.
- Mark Davies's corpora (open access):
- CoCA (Corpus of Contemporary American English)
- CoHA (Corpus of Historical American English)
- TIME (TIME Magazine Corpus of American English)
- BNC (BYU interface to the British National Corpus)
- Corpus doPortuguês
- Corpus delEspañol
- Michigan Corpus of Academic Spoken English (MICASE) – Another very useful resource for those interested in EAP.
- Webcorp – An interface that lets you analyse the web using corpus linguistic tools
- AntConc – Free concordance program for Windows, Macintosh OS X, and Linux.Will run on text only files and quite user-friendly.
- XAIRA – Open source software package which supports indexing and analysis of large XML textual resources. This is a more powerful tool for concordancing and collocate analysis but only runs on XML texts.
- BootCaT – Free software for creating web corpora. Very easy to use.
- UAM CorpusTool – A free environment for annotation (and interrogation)of text corpora.Runs under Windows and MacOSX.
- International Journal of Corpus Linguistics
- Corpora
- Corpus linguistics and linguistic theory
- ICAME
- Corpus Linguistics Conference – This is the archive for the six conferences - full papers are available for many of the presentations
- Proceedings of The International Symposium on Using Corpora in Contrastive and Translation Studies 2010. Edited by R. Xiao. Full papers are available for several of the presentations.
- David Lee's bookmarks for corpus-based linguistics – is a great resource with over 1000 links
Life Solved Podcast - The Language of Violence with Dr Alessia Tranchese
Discover our areas of expertise
Corpus linguistics is one of our six areas of expertise within our Linguistics research area. Explore the others below.
Translation
We're exploring how texts are translated and the practices around the translation of texts, including professional training, the use of technologies, and non-professional translation communities.
Discourse analysis
We're researching how ideas, concepts and people are represented through language, and exploring how language is used in real-life contexts.
Professional communication
Our research in professional communication explores how spoken and written language is used in workplaces to develop relationships and achieve institutional objectives.
Sociolinguistics
Through our work in sociolinguistics, we're studying the ways in which language can affect, and is affected, by social phenomena.
Teaching English to speakers of other languages (TESOL)
We're focusing on the learning and teaching of English as a second or foreign language, in primary, secondary and adult learning contexts.
Interested in a PhD in Languages and Linguistics?
Browse our postgraduate research degrees – including PhDs and MPhils – at our Languages and Linguistics postgraduate research degrees page.