Slides | Yann Ryan

The ‘Networking Archives’ Team

Worked is based on a larger project, thanks to (clockwise from left): Howard Hotson (Principal Investigator), Miranda Lewis, Matthew Wilcoxson, Arno Bosse, Philip Beeley, Ruth Ahnert (Co-Investigator), Sebastian Ahnert (Co-Investigator), Esther van Raamsdonk.

{width=“700”}

The Archival and Network Turns

The ‘archival turn’:
- A phrase used by historians to describe the practice in recent years to focus critical attention on archives themselves
- Archives are ‘texts’, contain layers of interpretation, at each step (collection, cataloguing, digitisation and so forth)
- Challenges notion of archives as repositories of historical material and archivists as neutral custodians (Ketalaar 2001)
Ahnert et. al The Network Turn (2020) argues that we live in a networked world; this conference is evidence that this is true of historical scholarship.
What do we get at the intersection of the two?

::: {.notes} In recent years the idea of the ‘archival turn’ has been used very frequently in historical scholarship. It’s a phrase used by historicans to describe the practice of focusing critical attention on archives themselves. The basic idea is that archives are texts in their own right: they contain layers on interpretation at each step in their production (collection, cataloguing, digitisation and so forth). The archival turn challenges the notion of archives as simple repositories of historical material and archivigsts as neutral custodians: see for example the work of Eric Ketalaar. Another ‘turn’ in the humanities is a networked one: see for example the recent book ‘the networked turn by Ahnert and others. What happens at the intersection of the two. :::

Networks and Archives

Historical Network Research: letter archives (correspondence data) used to uncover communication practices or make claims about social relations.
However often the networks reveal as much or more about archival biases and practices as they do these things
I suggest we can use networks to talk about archives in their own right
What can network analysis tell us about the ‘text’ of archival processes?

::: {.notes} Usually when we use networks on historical correspondence archives, we’re trying to understand communication practices, or make some claims about social relations and so forth. For example a project might look at a set of centrality measures to make a claim that a particular individual was the most-connected in a given historical network, or to find people who functions as bridges between separate clusters. But often what we find in fact says more about archival practices than it does historical realities: perhaps the individual who looks the most central just seems that way because they were the most diligent about keeping on to their letters, or because they happened to keep a copy of their outgoing correspondence.

I suggest that this facet of network analysis which can be frustating can be turned around. In this paper I’m going to talk about some of the ways in which network analysis can be a useful tool in understanding archives and their collection practices themselves. :::

Outline of the paper

Introduction to the ‘Networking Archives’ project and the archives used
Show some of the ways network analysis can be used to understand the shape and process of archives
- Start with ways of looking for the overall shape
- Move to more specific: what individual metrics tell us about archives
Finish with a more specific example of networks helping to find ‘new’ information in archives

::: {.notes} :::

The Archives Used

EMLO:
A union of c.100 catalogues brought together to study the republic of letters (1500-1800 but focus in 17th century)
Based on variety of sources, mostly printed editions but some from manuscript
In general catalogues are ‘ego networks:’ built around a static individual at the centre of a network.
SPO:
Digitised Calendars of scans of the State Papers from Britain and Ireland, Tudor and Stuart Periods (1509 - 1714)
Individual secretaries often viewed their official documents as ‘private’ and kept them as their possessions on leaving office:e.g. Conway Papers were only returned to the office in the 19th century

The Archives Used

::: {.columns-2}

When merged they make a large network of approximately 70,000 nodes and 120,000 edges, over 300 years.
The result is a ‘hairbrush’ force-directed diagram:
- Tudor and Stuart connected to each other by a short edge: the ‘handle’ with early 16th century on one end and late 17th on the other.
- EMLO and Stuart more focused on the 17th century and share more connections.

library(tidyverse)
library(data.table)
library(tidytable)
library(tidygraph)
library(igraph)
library(ggraph)
library(ForceAtlas2)
library(snakecase)


universal_network = fread('/Users/Yann/Documents/non-Github/universal_network/universal_network.csv')
# layout = universal_network %>% distinct.(Source, Target, data) %>% graph_from_data_frame(directed = T) %>% as_tbl_graph()   mutate(degree = centrality_degree( mode = 'total')) %>%   arrange(desc(degree)) %>% filter(degree>100) %>% mutate(component = group_components()) %>% filter(component==1) %>%
#   layout.forceatlas2(directed=FALSE, iterations = 1000, plotstep = 0)

universal_network %>% distinct.(Source, Target, data) %>% 
  graph_from_data_frame(directed = T) %>% 
  as_tbl_graph() %>% 
  mutate(degree = centrality_degree( mode = 'total')) %>% 
  arrange(desc(degree)) %>% 
  filter(degree>30) %>%
  mutate(component = group_components()) %>%
  filter(component==1)%>% ggraph('nicely') + 
  geom_edge_link(aes(color = data), alpha = .2) +
  geom_node_point(aes(size  = degree), pch = 21, fill = 'white',color = 'black', stroke = .5) + 
  theme_void() + 
  scale_size_area() + theme(legend.position = 'none') + labs(color = NULL) + scale_x_continuous(expand = c(.2,.2))

:::

Using NA to understand the overall shape of the archives

::: {.columns-2}

Most basic analysis of these archives is to plot the degree distribution
Strikingly similar in all cases, despite very different origins of EMLO and SPO.
Implications for how we think about the archives: in all cases centred around a few ‘elite’ hubs:
Secretaries of State for SPO, and the figures at the centre of catalogues for EMLO

emlo = universal_network %>% filter(data == 'emlo') %>% distinct.(Source, Target) %>% graph_from_data_frame() %>% as_tbl_graph() %>% mutate(degree = centrality_degree(mode = 'total')) %>% as_tibble() %>% count.(degree, name = 'n') %>% mutate(source = 'emlo')

stuart = universal_network %>% filter(data == 'stuart') %>% distinct.(Source, Target) %>% graph_from_data_frame() %>% as_tbl_graph() %>% mutate(degree = centrality_degree(mode = 'total')) %>% as_tibble() %>% count.(degree, name = 'n')%>% mutate(source = 'stuart')

tudor = universal_network %>% filter(data == 'tudor') %>% distinct.(Source, Target) %>% graph_from_data_frame() %>% as_tbl_graph() %>% mutate(degree = centrality_degree(mode = 'total')) %>% as_tibble() %>% count.(degree, name = 'n')%>% mutate(source = 'tudor')


 
  ggplot(data = rbind(emlo, stuart, tudor)) + geom_point(aes(x = degree, y = n))+
  scale_x_continuous(
                     trans = "log10") +
  scale_y_continuous(
                     trans = "log10") + facet_wrap(~source, ncol = 2) + theme_bw()

:::

Catalogue Analysis

Created a network where each catalogue is a node, with an edge between them weighted on how many individuals shared by both
Result is a densely connected network which can show how each catalogue relates to each other
Three ‘core’ catalogues, State Papers Stuart, State Papers Tudor, Bodleian card catalogue
Visualisation shows some surprising connections, such as the strong overlap between Robert Boyle, Constantijn Huygens and the English State Papers.
Others on the ‘outside’: Johan de Witt, Athanasius Kircher, despite large collections are ‘separate’ to the ‘core’ of EMLO and State Papers. Different networks.

Catalogue Analysis

library(reshape2)

library(lubridate)

location <- read_csv("/Users/Yann/Documents/GitHub/Book-Chapters/Communities in Space/data/location.csv", col_types = cols(.default = "c"))
person <- read_csv("/Users/Yann/Documents/GitHub/Book-Chapters/Communities in Space/data/person.csv", col_types = cols(.default = "c"))
work <- read_csv("/Users/Yann/Documents/GitHub/Book-Chapters/Communities in Space/data/work.csv", col_types = cols(.default = "c"))

colnames(location) = to_snake_case(colnames(location))
colnames(person) = to_snake_case(colnames(person))
colnames(work) = to_snake_case(colnames(work))

emlo_network = read_delim('/Users/Yann/Documents/GitHub/Book-Chapters/Communities in Space/data/emlo_full_network.dat', delim = '\t', col_names = F, col_types = cols(.default = "c"))

unknowns = read_csv('https://raw.githubusercontent.com/networkingarchives/de-duplications/master/to_remove_list_with_unknown.csv')

df = universal_network %>% filter(! X5 %in% unknowns$value) %>% 
  left_join(work %>% 
              dplyr::select(emlo_letter_id_number, original_catalogue_name), by = c('X5' = 'emlo_letter_id_number'), na_matches = 'never') %>% mutate(original_catalogue_name = coalesce(original_catalogue_name, data)) %>% mutate(original_catalogue_name = ifelse(original_catalogue_name == 'tudor', 'Tudor State Papers',
                                                                                                                                                                                                                                           ifelse(original_catalogue_name == 'stuart', 'Stuart State Papers', original_catalogue_name))) %>% filter(!str_detect(original_catalogue_name, "(?i)TEST"))

list_of_cats = unique(df$original_catalogue_name)
l = list()
for(cat in list_of_cats){
  
  allnodes = df %>% 
    filter(original_catalogue_name == cat) %>% 
    distinct(Source, Target) %>% 
    graph_from_data_frame() %>% V()
  
  names(allnodes)
  
  l[[cat]] = names(allnodes)
  
}


nms <- combn( names(l) , 2 , FUN = paste0 , collapse = "|" , simplify = FALSE )

# Make the combinations of list elements
ll <- combn( l , 2 , simplify = FALSE )

# Intersect the list elements
out <- lapply( ll , function(x) length( intersect( x[[1]] , x[[2]] ) ) )

# Output with names
overlap = setNames( out , nms ) %>% as_tibble() %>% pivot_longer(names_to = 'names', values_to = 'value', cols = everything()) %>% separate(names, into = c('name1', 'name2'), sep = '\\|')

library(ForceAtlas2)

totals = df %>% distinct(X5, .keep_all = T) %>% group_by(original_catalogue_name) %>% tally()
  
 g =  overlap %>% 
  rename(weight = value) %>% 
  graph_from_data_frame(directed = T) %>% 
  as_tbl_graph() %>% 
  mutate(degree = centrality_degree(weights = weight, mode = 'total')) %>% 
  activate(edges) %>% 
  #filter(weight >1) %>% 
  activate(nodes) %>% 
  mutate(comp = group_components()) %>% 
  filter(comp ==1) %>% left_join(totals, by=c('name' = 'original_catalogue_name')) 
  
df_g = overlap %>% 
  rename(weight = value) %>% 
  graph_from_data_frame(directed = T) %>% 
  as_tbl_graph() %>% 
  mutate(degree = centrality_degree(weights = weight, mode = 'total')) %>% 
  activate(edges) %>% 
 # filter(weight >1) %>% 
  activate(nodes) %>% 
  mutate(comp = group_components()) %>% 
  filter(comp ==1) %>% left_join(totals, by=c('name' = 'original_catalogue_name')) %>% activate(edges) %>% as_tibble()

layout = layout.forceatlas2(df_g, plotstep = 0, gravity = 1, nohubs = F, linlog = F, k =10000)


 p = ggraph(graph = g, layout = layout %>% select(-name) %>%  rename(x = V1, y = V2))  + 
  geom_edge_arc(aes(width = weight), alpha = .5, strength = .1) + 
  geom_node_point(aes(size = n), pch = 21, fill = 'white', color = 'black') + 
  geom_node_text(aes(label = name),size = 1.8, repel = T, segment.alpha = .2) + 
   scale_size_area(max_size = 15) + 
   scale_edge_width_continuous(range = c(.00001, 3)) + 
   theme_void() + theme(legend.position = 'bottom') + labs(size = 'Number of Letters')
 
p

Catalogue Analysis using in and out-degree

::: {.columns-2} High in-degree:

library(kableExtra)
catalogues = fread('catalogues') %>% mutate(catalogue = trimws(catalogue, which = 'both'))%>% mutate(emlo_id = trimws(emlo_id, which = 'both'))


options("kableExtra.html.bsTable" = T)

emlo_network %>% 
  count.(X1, X2, name = 'weight') %>% 
  graph_from_data_frame() %>% 
  as_tbl_graph() %>%
  mutate(in_degree = centrality_degree(mode = 'in', weights = weight)) %>% 
    mutate(out_degree = centrality_degree(mode = 'out', weights = weight)) %>% 
  as_tibble() %>% 
  inner_join.(catalogues, by = c('name' = 'emlo_id')) %>% 
  arrange(desc(in_degree)) %>% head(10) %>% 
  select.(-name, Catalogue = catalogue, `In-Degree`= in_degree, `Out-degree` = out_degree) %>%
  kbl("html", table.attr = "style = \"color: black;\"") %>%
  kable_styling(font_size = 10, bootstrap_options = c("striped", 'hover', full_width = F))

High out-degree:

library(kableExtra)

options("kableExtra.html.bsTable" = T)

emlo_network %>% 
  count.(X1, X2, name = 'weight') %>% 
  graph_from_data_frame() %>% 
  as_tbl_graph() %>%
  mutate(in_degree = centrality_degree(mode = 'in', weights = weight))%>%
  mutate(out_degree = centrality_degree(mode = 'out', weights = weight)) %>% 
  as_tibble() %>% 
   inner_join.(catalogues, by = c('name' = 'emlo_id')) %>% 
  arrange(desc(out_degree)) %>% head(10) %>% select.(-name, Catalogue = catalogue, `Out-Degree`= out_degree, `In-degree` = in_degree) %>%
  kbl("html", table.attr = "style = \"color: black;\"") %>%
  kable_styling(font_size = 10, bootstrap_options = c("striped", 'hover', full_width = F))

:::

Catalogue Analysis using in and out-degree

Simple metrics such as in and out-degree allow us to quickly understand the shape of various archives:
- High in-degree, low out degree: Figures such as de Witt, a personal archive of collected letters: a ‘true’ archive as we might imagine it.
- High out-degree, low in-degree: for example Françoise de Graffigny (1695–1758), archive of her collected and reassembled correspondence
- Balance of both: tend to be reassembled collections, often based on printed editions or proejcts such as Constantijn Huygens (1596-1687) and Henry Oldenburg (1619-1677)

Archives and ‘closeness centrality’

Nodes with a high closeness score have a short distance to all other nodes (the inverse of the average distance to all other nodes)
In a series of connected archives, can tell which are more ‘embedded’ at the centre, and which at the periphery
Closeness is very related to degree: to find outliers, each archive in EMLO was ranked for closeness centrality and plotted against degree rank

Archives and ‘closeness centrality’

library(ggrepel)
close = emlo_network %>% 
  count.(X1, X2, name = 'weight') %>% 
  graph_from_data_frame(directed = F) %>% 
  as_tbl_graph() %>% 
  mutate(degree = centrality_degree(mode = 'total'))  %>%
  mutate(closeness = centrality_closeness(mode = 'total')) %>% 
  as_tibble() %>% 
   inner_join.(catalogues, by = c('name' = 'emlo_id')) %>%  
  mutate(rank_closeness = rank(-closeness)) 

  ggplot(data = close) + 
  geom_point(aes(x = rank(-degree), y = rank_closeness)) + 
  geom_text_repel(aes(x = rank(-degree), y = rank_closeness, label = catalogue), size = 2, segment.alpha = .5, segment.size= .2) + theme_bw()

Archives and ‘closeness centrality’

The results help to describe the catalogues even with very minimal knowledge of their content
- The ‘centre’ of EMLO is a group of mostly Dutch scholars
- The archive of Athanasius Kircher is an anomaly: despite high degree, he is ranked lower than expected for closeness.
- This tallies with our own knowledge: Kircher had a separate (though sometimes connected) network with the Republic of Letters. His inclusion in EMLO is an anomaly, separate to the core of its agenda.
- On the other hand we have John Aubrey, whose closeness is surprisingly high for his degree: Aubrey was famously well-connected, member of the Royal Society, moved in both Royalist and Republican circles.
- As new catalogues are added, can help to understand where they are situated, in a glance

Looking for intercepted letters in disconnected components

The State Papers have a complicated history:
- Mostly the personal papers of individual secretaries of state
- But also includes seized documents, intercepted papers, whole bunches of documents captured from ships (see also the Prize Papers)
- Some simple tools from network analysis can also help to discover some of these

Disconnected Components

Most of the State Papers consists of one ‘giant component’: every node can reach every other
However there are some completely ‘disconnected’ components
These make interesting starting-points for investigations
Many are intercepted or seized documents

Disconnected Components

universal_network %>% 
  filter(data %in% c('stuart', 'tudor')) %>%
  distinct.(Source, Target) %>% 
  graph_from_data_frame(directed = F) %>% 
  as_tbl_graph() %>% 
  mutate(component = group_components())%>% 
  mutate(degree = centrality_degree(mode = 'total'))%>% 
  filter(component %in% 2:10|degree >30)%>%
  ggraph('fr') + 
  geom_edge_link(alpha = .1) + 
  geom_node_point(aes(size = degree, fill = as.character(component)), pch = 21, stroke = .1) + 
  theme_void() + theme(legend.position = 'none')

Disconnected Components

universal_people = fread('/Users/Yann/Documents/non-Github/universal_network/universal_people.csv')
universal_network %>% 
  filter(data %in% c('stuart', 'tudor')) %>%
  distinct.(Source, Target) %>% 
  graph_from_data_frame(directed = F) %>% 
  as_tbl_graph() %>% 
  mutate(component = group_components())%>% 
  mutate(degree = centrality_degree(mode = 'total'))%>% 
  filter(component %in% 2:20) %>% 
  left_join(universal_people) %>%
  ggraph('fr') + 
  geom_edge_link() + 
  geom_node_point(aes(size = degree), pch = 21, color = 'black', fill = 'white')+ 
  geom_node_text(aes(label = ifelse(degree>3, main_name, NA)), repel = T, size =3) + 
  theme_void() + theme(legend.position = 'none')

Disconnected Components

::: {.columns-2}

Seized letters from Dr. Richard Smith, Roman Catholic Bishop
Warrant was issued for Smith’s arrest in 1628
He fled England for France, presumably when his letters were found and added to the State Papers
A secretary or clerk has added ‘Papist’ when filing, indicating these were suspect communications

universal_network %>% 
  filter(data %in% c('stuart', 'tudor')) %>%
  distinct.(Source, Target) %>% 
  graph_from_data_frame(directed = F) %>% 
  as_tbl_graph() %>% 
  mutate(component = group_components())%>% 
  mutate(degree = centrality_degree(mode = 'total'))%>% 
  filter(component ==8) %>% 
  left_join(universal_people) %>%
  ggraph('fr') + 
  geom_edge_link() + 
  geom_node_point(aes(size = degree), pch = 21, color = 'black', fill = 'white')+ 
  geom_node_text(aes(label = main_name), repel = T, size =3) + 
  theme_void() + theme(legend.position = 'none')

:::

Disconnected Components

::: {.columns-2} {width=“350”}

{width=“350”} :::

Disconnected Components

Conclusions

NA based on historical archives often tells us more about collection than it does about communication practices (e.g a particular centrality might not reflect a node’s position but rather the way that archive has been collected and digitised)
This can be a positive thing if focus is shifted to archives themselves
If archives are texts, it follows that we can apply ‘distant reading’ to them
Network analysis can help us understand what Eric Ketalaar calls the ‘tacit narratives of power and knowledge’ found in archives
If archives are ‘texts’, then it follows that we can apply ‘distant reading’ to them
Help to understand their formation as active objects rather than passive silos of information
As more archives are merged this can help to understand differences between them