Who said it? – Quotations as a research object

Quotations are a valuable and unadulterated source of information – for the press, but for research, too. With "Quotebank", computer scientist Andreas Spitz and his Swiss colleagues are making a data set of 235 million quotations freely available.
© “Haanala 76”; License: CC0 Public Domain

Quoting people directly is common practice in journalism. Quotations state exactly what was said and thus lend authenticity to a report. It is the unadulterated nature of direct quotes that also makes them an interesting research object in fields such as linguistics, to describe linguistic developments, or in the political and social sciences, to infer social or political trends from public statements.

In a freely accessible conference paper, Andreas Spitz from the Department of Computer and Information Science at the University of Konstanz and his colleagues from the École polytechnique fédérale de Lausanne (EPFL; Switzerland) and the Eidgenössische Technische Hochschule Zürich (ETH Zurich; Switzerland) describe their research approach: Using over 6.7 million political quotes in their corresponding context, they examined potential changes in the partisanship of US media between 2013 and 2020. In a related open access study, the team and other colleagues investigated the increasingly negative tone in quotes cited by news media.

Both projects used quotes from Quotebank, a freely accessible database with 235 million quotes from English-language news articles, which Spitz also helped to create.

Quotebank (doi: 10.5281/zenodo.4277310), a data set with 235 million citations from English-language news articles from 2008 to 2020, is available for free download from the Zenodo data repository.

A free web interface is also available for searching the quote database.

The conference paper (International AAAI Conference on Web and Social Media 2023) "Quotatives Indicate Decline in Objectivity in U.S. Political News" is freely available as an open access article.

The replication data (doi: 10.5281/zenodo.7983077) for the study is also available for free download on Zenodo.

The scripts used in the study are freely available as Jupyter notebooks in the GitHub directory of the EPFL Data Science Lab.

The article "United States politicians’ tone became more negative with 2016 primary campaigns" (doi: 10.1038/s41598-023-36839-1) has also been published as an open access article including replication data and code.


Daniel Schmidtke

By Daniel Schmidtke - 05.01.2024