This week, you are tasked with constructing a network graph from a small collection of historical letters.
You’ve just come back from a research trip to the National Archives in the UK, as part of your project on spy networks operating between Britain and the Dutch Republic in the seventeenth century. While there, you took down details of some potentially interesting letters sent in the summer of 1666, during the Second Anglo-Dutch war. Many of these were written by the female English playwright and spy Aphra Behn. You’re hoping the content can in some way tell you about the key players involved in secret intelligence during the war.
The information seems ideally suited to constructing a network graph of some kind.
A network graph (as you have learned in this week’s lecture) is a representation of things and their relations to other things. These things (we call them entities, or nodes in network-science speak) can be of many types: people, places, institutions, or even words or concepts. The relations (called edges) can also be of different types—for example ‘brother of’ or ‘employer’ in a social network.
To construct and analyse a network from this historical data you need to carry out a number of steps:
Once you have done this, you’ll write a short report on your findings, critically reflecting on the method, the tool, and its application to your own area of academic interest.
On your research trip you made detailed notes of 22 letters, noting down people, dates, people mentioned, and locations, as well as copying a brief description of each letter, written by the archive cataloguers.
All of this letter information is available in a tab called ‘data’ at the top of the screen. You can also download a spreadsheet, which can be opened in Excel/Open Office/Google Sheets containing the same information by right-clicking this link and selecting ‘Save Link As..’ (in Google Chrome) or ‘Download Linked File As..’ (Safari).
The first decision you need to make is which pieces of information will make up the nodes and edges of your network. Some options include:
There is no right or wrong answer: almost anything can be represented as a network. However, some pieces of information and combinations of information will be easier to intepret than others.
A crucial step in any data analysis using humanities data is data cleaning. This is the process of correcting and standardising your data so that it can be properly ‘understood’ by computer programs (which are not as forgiving as a human reader).
As you were in a rush during your visit to the archives, you made a number of mistakes or inconsistencies. These might include:
These need to be fixed before constructing your network, as otherwise they will affect the results. The network application will treat spelling variations as separate nodes, for example.
Once you have decided on your data type to turn into a network, next it needs to be extracted in a format suitable for input to a network analysis application.
The simplest format for network analysis is what is known as an edge list. This is a list containing two columns: first, the Source node and second, the Target node. An edge (the line between points in a network diagram) will be drawn between each pair. The edge list can be directed, meaning that the flow of information goes from the Source to the Target node, and not necessarily the other way around.
The application we’re going to use takes as its input one connection per line, separated by a comma.
So for example, if you decide to draw a network of connected people:
To, From
.Once you have finished, you can move on the the application below.
The tool you’re going to use also allows for the input of an edge weight: for example, if you chose to construct a network of connected places (using the origin and destination information), the weight might be the number of times the two places are connected in the data.
To do this, simply count up the repeated values, and add the weight as a number, again separated by a comma:
If you choose to use a weight, you’ll need to enter a value of 1 in any rows where you don’t have multiple instances.
For the assignment alongside this task, you are asked to put together a short report (c. 500 words) reflecting on the task carried out above. This can (and ideally should) contain multimedia objects in the form of screenshots, tables of data, and so forth. In it you should:
All the letter data is in the table below; you can dsiplay all of it by using the drop-down menu at the top-left, or click through individual pages at the bottom of the screen.
Alternatively you can download a spreadsheet containing the same information by right-clicking this link and selecting ‘Save Link As..’ (in Google Chrome) or ‘Download Linked File As..’ (Safari).
If you’ve followed the steps in the exercise correctly, you should now have an open text document containing an edge list.
Enter it into the text box in the web application below, decide on whether your network should be weighted/unweighted, and click ‘Calculate’.
This tool was written by researchers at Carnegie Mellon and is embedded in this webpage for convenience, but you can also go to the original site at https://network-navigator.library.cmu.edu/