Corpus linguistics for literary studies

Many methods from corpus linguistics can be used to make the life of scholars who analyse works of literature a lot easier and to potentially enrich those analyses. A general prerequisite for these tricks is that the texts have to be available in a digital format, as a txt or xml file, downloaded from Project Gutenberg, for instance, or Literature Online. They also require a basic understanding of AntConc, a concordancer (or programme that compiles several files into one usable corpus and allows for a search of all files at once). There’s plenty of documentation available for it (for example here).


The following are a few ideas how AntConc could be used to aid in literary analyses.

  • Counting instances of words or phrases is an automatic process in the program instead of painstaking work by hand. AntConc’s concordance tool automatically provides a total of all occurrences and shows their extended context, too.
  • Results can be sorted by frequency or alphabetically. Furthermore, the search term can be shown in the entire file (i.e. book/short story/poem) by switching to the file view tool.
  • Searches can include wildcards so that different forms, endings, or beginnings of a word can be found at the same time.
  • Find instances of commonly co-occurring words in the text, either as N-Grams (combinations of two or more words) or words right next to other words (clusters). Beyond that, so-called collocates are words that appear in the vicinity of, but not necessarily right next to, a search term. They reveal a lot about meaning changes, a writer’s or their character’s attitudes and their writing style.
  • List all words in a piece of text by frequency (ascending or descending) using AntConc’s word list. An alphabetical list, sorted by the first or the last letter, is also possible.
  • Compare two (or more) works of fiction, compare works of two authors, or compare works from the same author but from different time periods using AntConc’s keyword list that allows for a comparison between two texts.


More complex queries are also possible. For instance, texts can automatically tagged for parts of speech and this information can then be incorporated into the analysis.

For more applications of computational methods to literary studies, refer to the Stanford Literary Lab.