Exercise

Constructing a Network from Historical Sources

This week, you are tasked with constructing a network graph from a small collection of historical letters.

You’ve just come back from a research trip to the National Archives in the UK, as part of your project on spy networks operating between Britain and the Dutch Republic in the seventeenth century. While there, you took down details of some potentially interesting letters sent in the summer of 1666, during the Second Anglo-Dutch war. Many of these were written by the female English playwright and spy Aphra Behn. You’re hoping the content can in some way tell you about the key players involved in secret intelligence during the war.

The information seems ideally suited to constructing a network graph of some kind.

A network graph (as you have learned in this week’s lecture) is a representation of things and their relations to other things. These things (we call them entities, or nodes in network-science speak) can be of many types: people, places, institutions, or even words or concepts. The relations (called edges) can also be of different types—for example ‘brother of’ or ‘employer’ in a social network.

To construct and analyse a network from this historical data you need to carry out a number of steps:

Decide what in the data will become the nodes and edges of your network
Undertake data cleaning to standardise and correct the data
Construct an edge list from your chosen data
Input this to a simple network analysis online web tool
Decide on the most appropriate and useful analysis from the results

Once you have done this, you’ll write a short report on your findings, critically reflecting on the method, the tool, and its application to your own area of academic interest.

Step 1: Understanding the data

On your research trip you made detailed notes of 22 letters, noting down people, dates, people mentioned, and locations, as well as copying a brief description of each letter, written by the archive cataloguers.

All of this letter information is available in a tab called ‘data’ at the top of the screen. You can also download a spreadsheet, which can be opened in Excel/Open Office/Google Sheets containing the same information by right-clicking this link and selecting ‘Save Link As..’ (in Google Chrome) or ‘Download Linked File As..’ (Safari).

The first decision you need to make is which pieces of information will make up the nodes and edges of your network. Some options include:

Authors connected to recipients
Origins connected to destinations
Authors connected to the people they mention in their letters
More abstract ideas such as keywords connected to authors or recipients
Anything else you can find in the data!

There is no right or wrong answer: almost anything can be represented as a network. However, some pieces of information and combinations of information will be easier to intepret than others.

Step 2: Data Cleaning

A crucial step in any data analysis using humanities data is data cleaning. This is the process of correcting and standardising your data so that it can be properly ‘understood’ by computer programs (which are not as forgiving as a human reader).

As you were in a rush during your visit to the archives, you made a number of mistakes or inconsistencies. These might include:

Using a shortened name (Mr. Halsall instead of James Halsall; Scott instead of William Scott)
Spelling mistakes or inconsistencies (Aphara Behn instead of Aphra Behn)
Missing values (an author marked as ‘unknown’)

These need to be fixed before constructing your network, as otherwise they will affect the results. The network application will treat spelling variations as separate nodes, for example.

Step 3: Constructing an ‘edge list’

Once you have decided on your data type to turn into a network, next it needs to be extracted in a format suitable for input to a network analysis application.

The simplest format for network analysis is what is known as an edge list. This is a list containing two columns: first, the Source node and second, the Target node. An edge (the line between points in a network diagram) will be drawn between each pair. The edge list can be directed, meaning that the flow of information goes from the Source to the Target node, and not necessarily the other way around.

The application we’re going to use takes as its input one connection per line, separated by a comma.

So for example, if you decide to draw a network of connected people:

Open a blank text document, using notepad or textEdit.
Copy the relevant ‘to’ and ‘from’ into the text document, in the following format To, From.
If you have have multiple instances of the same pair of ‘to’ and ‘from’, you can either ignore, or add a third column, weight: see below for instructions on doing this.

Once you have finished, you can move on the the application below.

Weighted networks

The tool you’re going to use also allows for the input of an edge weight: for example, if you chose to construct a network of connected places (using the origin and destination information), the weight might be the number of times the two places are connected in the data.

To do this, simply count up the repeated values, and add the weight as a number, again separated by a comma:

If you choose to use a weight, you’ll need to enter a value of 1 in any rows where you don’t have multiple instances.

Step 4: Upload to ‘Network Navigator’ tool and interpret the results.

Once you have finished constructing your network, open the ‘Network Navigator’ application, either by using the third tab above, or opening https://network-navigator.library.cmu.edu/ in a brower window. Copy and paste your constructed network into the text box.

Following this, press the ‘calculate’ button and you’ll be presented with a range of basic network metrics for each node in your data. These all in some way measure the node’s importance, or centrality to the network in question.

This centrality can be calculated in different ways. Two of the most relevant are degree and betweenness centrality:

Degree simply counts the total number of connections for each node. In the following network, Aphra Behn has a degree of 4, as she is directly connected to four other nodes.

Betweenness centrality is a measurement of how important a particular node is in the flow of information between different parts of the network. It is calculated by calculating each of the ‘shortest paths’ through the network: the shortest number of ‘hops’ between any two nodes. Nodes which appear on many of these paths score highly for betweenness centrality.

In the network diagram above, Henry Bennet, Earl of Arlington is used on the path from any node on the right to any on the left, such as that from John Tippets to William Scott (highlighted in red). After Aphra Behn—the most connected node—he scores the highest for betweenness centrality.

Written Assignment

For the assignment alongside this task, you are asked to put together a short report (c. 500 words) reflecting on the task carried out above. This can (and ideally should) contain multimedia objects in the form of screenshots, tables of data, and so forth. In it you should:

Outline the steps taken to produce the network, including your rationale for choosing the information to represent, the data cleaning you had to undertake and its implications on the resulting network.
Present some findings based on the results produced by the application. Non-results are OK too!
Think critically about the benefits and limitations of network analysis applied to historical sources.
Reflect on how network analysis might be applied to your own research.

Exercise

Constructing a Network from Historical Sources

Step 1: Understanding the data

Step 2: Data Cleaning

Step 3: Constructing an ‘edge list’

Weighted networks

Step 4: Upload to ‘Network Navigator’ tool and interpret the results.

Written Assignment

Further Reading

Data

Letter Data

Network Analysis Tool