15 Final Thoughts

I hope this book has shown you that it is possible to work with newspaper data to get meaningful and interesting insights, even for beginners. Working with ‘real-world’ data, with its OCR errors and the many gaps in the collection, can be a difficult but ultimately fruitful way of applying digital methods to historical datasets. Techniques such as these can be applied to other types of historical data, for example books or letters.

For a next step, I would recommend finding some specific question of interest which you think could be tackled with the collections made freely available and download your own corpus related to this question. Then, see how the different methods might help to answer it.

The use of large language models in the historical domain will ingest texts such as this at a huge scale, and understanding how these texts look is valuable in getting to know how these models work and where their limitations are. It’s worth bearing in mind the specific material history of the collection, where it has come from, and how it has been digitised. As the volume of digitised material increases and becomes more accessible, this will be an increasingly important point.

Further Reading