In 2020 and then in 2023 we collected job ads from three main types of sources:

Websites of technological companies (e.g. Amazon, Facebook)

Job ads posted on specialized websites for linguists (e.g. LINGUIST List)

General-purpose employment platforms (e.g. LinkedIn)

A corpus is a collection of texts that linguists analyse to find out about typical uses or words and phrases (by the way, we have a whole course on this topic).

Job ads included in the corpus concern language- or linguistics-related tasks requiring digital and/or research skills. Job posts where a degree (MS or PhD) in a STEM field was a requirement, and jobs involving almost exclusively a) content creation tasks (e.g. writing/editorial jobs), or b) translation and revision tasks, were excluded.

The corpora are freely available through a dedicated interface called NoSketchEngine.

We intend to build more corpora along the same lines in the future, to monitor how the market is changing.