{width=“700”}
The ‘archival turn’:
Ahnert et. al The Network Turn (2020) argues that we live in a networked world; this conference is evidence that this is true of historical scholarship.
What do we get at the intersection of the two?
::: {.notes} In recent years the idea of the ‘archival turn’ has been used very frequently in historical scholarship. It’s a phrase used by historicans to describe the practice of focusing critical attention on archives themselves. The basic idea is that archives are texts in their own right: they contain layers on interpretation at each step in their production (collection, cataloguing, digitisation and so forth). The archival turn challenges the notion of archives as simple repositories of historical material and archivigsts as neutral custodians: see for example the work of Eric Ketalaar. Another ‘turn’ in the humanities is a networked one: see for example the recent book ‘the networked turn by Ahnert and others. What happens at the intersection of the two. :::
::: {.notes} Usually when we use networks on historical correspondence archives, we’re trying to understand communication practices, or make some claims about social relations and so forth. For example a project might look at a set of centrality measures to make a claim that a particular individual was the most-connected in a given historical network, or to find people who functions as bridges between separate clusters. But often what we find in fact says more about archival practices than it does historical realities: perhaps the individual who looks the most central just seems that way because they were the most diligent about keeping on to their letters, or because they happened to keep a copy of their outgoing correspondence.
I suggest that this facet of network analysis which can be frustating can be turned around. In this paper I’m going to talk about some of the ways in which network analysis can be a useful tool in understanding archives and their collection practices themselves. :::
Introduction to the ‘Networking Archives’ project and the archives used
Show some of the ways network analysis can be used to understand the shape and process of archives
Finish with a more specific example of networks helping to find ‘new’ information in archives
::: {.notes} :::
::: {.columns-2}
library(tidyverse)
library(data.table)
library(tidytable)
library(tidygraph)
library(igraph)
library(ggraph)
library(ForceAtlas2)
library(snakecase)
universal_network = fread('/Users/Yann/Documents/non-Github/universal_network/universal_network.csv')
# layout = universal_network %>% distinct.(Source, Target, data) %>% graph_from_data_frame(directed = T) %>% as_tbl_graph() mutate(degree = centrality_degree( mode = 'total')) %>% arrange(desc(degree)) %>% filter(degree>100) %>% mutate(component = group_components()) %>% filter(component==1) %>%
# layout.forceatlas2(directed=FALSE, iterations = 1000, plotstep = 0)
universal_network %>% distinct.(Source, Target, data) %>%
graph_from_data_frame(directed = T) %>%
as_tbl_graph() %>%
mutate(degree = centrality_degree( mode = 'total')) %>%
arrange(desc(degree)) %>%
filter(degree>30) %>%
mutate(component = group_components()) %>%
filter(component==1)%>% ggraph('nicely') +
geom_edge_link(aes(color = data), alpha = .2) +
geom_node_point(aes(size = degree), pch = 21, fill = 'white',color = 'black', stroke = .5) +
theme_void() +
scale_size_area() + theme(legend.position = 'none') + labs(color = NULL) + scale_x_continuous(expand = c(.2,.2))
:::
::: {.columns-2}
emlo = universal_network %>% filter(data == 'emlo') %>% distinct.(Source, Target) %>% graph_from_data_frame() %>% as_tbl_graph() %>% mutate(degree = centrality_degree(mode = 'total')) %>% as_tibble() %>% count.(degree, name = 'n') %>% mutate(source = 'emlo')
stuart = universal_network %>% filter(data == 'stuart') %>% distinct.(Source, Target) %>% graph_from_data_frame() %>% as_tbl_graph() %>% mutate(degree = centrality_degree(mode = 'total')) %>% as_tibble() %>% count.(degree, name = 'n')%>% mutate(source = 'stuart')
tudor = universal_network %>% filter(data == 'tudor') %>% distinct.(Source, Target) %>% graph_from_data_frame() %>% as_tbl_graph() %>% mutate(degree = centrality_degree(mode = 'total')) %>% as_tibble() %>% count.(degree, name = 'n')%>% mutate(source = 'tudor')
ggplot(data = rbind(emlo, stuart, tudor)) + geom_point(aes(x = degree, y = n))+
scale_x_continuous(
trans = "log10") +
scale_y_continuous(
trans = "log10") + facet_wrap(~source, ncol = 2) + theme_bw()
:::
library(reshape2)
library(lubridate)
location <- read_csv("/Users/Yann/Documents/GitHub/Book-Chapters/Communities in Space/data/location.csv", col_types = cols(.default = "c"))
person <- read_csv("/Users/Yann/Documents/GitHub/Book-Chapters/Communities in Space/data/person.csv", col_types = cols(.default = "c"))
work <- read_csv("/Users/Yann/Documents/GitHub/Book-Chapters/Communities in Space/data/work.csv", col_types = cols(.default = "c"))
colnames(location) = to_snake_case(colnames(location))
colnames(person) = to_snake_case(colnames(person))
colnames(work) = to_snake_case(colnames(work))
emlo_network = read_delim('/Users/Yann/Documents/GitHub/Book-Chapters/Communities in Space/data/emlo_full_network.dat', delim = '\t', col_names = F, col_types = cols(.default = "c"))
unknowns = read_csv('https://raw.githubusercontent.com/networkingarchives/de-duplications/master/to_remove_list_with_unknown.csv')
df = universal_network %>% filter(! X5 %in% unknowns$value) %>%
left_join(work %>%
dplyr::select(emlo_letter_id_number, original_catalogue_name), by = c('X5' = 'emlo_letter_id_number'), na_matches = 'never') %>% mutate(original_catalogue_name = coalesce(original_catalogue_name, data)) %>% mutate(original_catalogue_name = ifelse(original_catalogue_name == 'tudor', 'Tudor State Papers',
ifelse(original_catalogue_name == 'stuart', 'Stuart State Papers', original_catalogue_name))) %>% filter(!str_detect(original_catalogue_name, "(?i)TEST"))
list_of_cats = unique(df$original_catalogue_name)
l = list()
for(cat in list_of_cats){
allnodes = df %>%
filter(original_catalogue_name == cat) %>%
distinct(Source, Target) %>%
graph_from_data_frame() %>% V()
names(allnodes)
l[[cat]] = names(allnodes)
}
nms <- combn( names(l) , 2 , FUN = paste0 , collapse = "|" , simplify = FALSE )
# Make the combinations of list elements
ll <- combn( l , 2 , simplify = FALSE )
# Intersect the list elements
out <- lapply( ll , function(x) length( intersect( x[[1]] , x[[2]] ) ) )
# Output with names
overlap = setNames( out , nms ) %>% as_tibble() %>% pivot_longer(names_to = 'names', values_to = 'value', cols = everything()) %>% separate(names, into = c('name1', 'name2'), sep = '\\|')
library(ForceAtlas2)
totals = df %>% distinct(X5, .keep_all = T) %>% group_by(original_catalogue_name) %>% tally()
g = overlap %>%
rename(weight = value) %>%
graph_from_data_frame(directed = T) %>%
as_tbl_graph() %>%
mutate(degree = centrality_degree(weights = weight, mode = 'total')) %>%
activate(edges) %>%
#filter(weight >1) %>%
activate(nodes) %>%
mutate(comp = group_components()) %>%
filter(comp ==1) %>% left_join(totals, by=c('name' = 'original_catalogue_name'))
df_g = overlap %>%
rename(weight = value) %>%
graph_from_data_frame(directed = T) %>%
as_tbl_graph() %>%
mutate(degree = centrality_degree(weights = weight, mode = 'total')) %>%
activate(edges) %>%
# filter(weight >1) %>%
activate(nodes) %>%
mutate(comp = group_components()) %>%
filter(comp ==1) %>% left_join(totals, by=c('name' = 'original_catalogue_name')) %>% activate(edges) %>% as_tibble()
layout = layout.forceatlas2(df_g, plotstep = 0, gravity = 1, nohubs = F, linlog = F, k =10000)
p = ggraph(graph = g, layout = layout %>% select(-name) %>% rename(x = V1, y = V2)) +
geom_edge_arc(aes(width = weight), alpha = .5, strength = .1) +
geom_node_point(aes(size = n), pch = 21, fill = 'white', color = 'black') +
geom_node_text(aes(label = name),size = 1.8, repel = T, segment.alpha = .2) +
scale_size_area(max_size = 15) +
scale_edge_width_continuous(range = c(.00001, 3)) +
theme_void() + theme(legend.position = 'bottom') + labs(size = 'Number of Letters')
p
::: {.columns-2} High in-degree:
library(kableExtra)
catalogues = fread('catalogues') %>% mutate(catalogue = trimws(catalogue, which = 'both'))%>% mutate(emlo_id = trimws(emlo_id, which = 'both'))
options("kableExtra.html.bsTable" = T)
emlo_network %>%
count.(X1, X2, name = 'weight') %>%
graph_from_data_frame() %>%
as_tbl_graph() %>%
mutate(in_degree = centrality_degree(mode = 'in', weights = weight)) %>%
mutate(out_degree = centrality_degree(mode = 'out', weights = weight)) %>%
as_tibble() %>%
inner_join.(catalogues, by = c('name' = 'emlo_id')) %>%
arrange(desc(in_degree)) %>% head(10) %>%
select.(-name, Catalogue = catalogue, `In-Degree`= in_degree, `Out-degree` = out_degree) %>%
kbl("html", table.attr = "style = \"color: black;\"") %>%
kable_styling(font_size = 10, bootstrap_options = c("striped", 'hover', full_width = F))
High out-degree:
library(kableExtra)
options("kableExtra.html.bsTable" = T)
emlo_network %>%
count.(X1, X2, name = 'weight') %>%
graph_from_data_frame() %>%
as_tbl_graph() %>%
mutate(in_degree = centrality_degree(mode = 'in', weights = weight))%>%
mutate(out_degree = centrality_degree(mode = 'out', weights = weight)) %>%
as_tibble() %>%
inner_join.(catalogues, by = c('name' = 'emlo_id')) %>%
arrange(desc(out_degree)) %>% head(10) %>% select.(-name, Catalogue = catalogue, `Out-Degree`= out_degree, `In-degree` = in_degree) %>%
kbl("html", table.attr = "style = \"color: black;\"") %>%
kable_styling(font_size = 10, bootstrap_options = c("striped", 'hover', full_width = F))
:::
Simple metrics such as in and out-degree allow us to quickly understand the shape of various archives:
library(ggrepel)
close = emlo_network %>%
count.(X1, X2, name = 'weight') %>%
graph_from_data_frame(directed = F) %>%
as_tbl_graph() %>%
mutate(degree = centrality_degree(mode = 'total')) %>%
mutate(closeness = centrality_closeness(mode = 'total')) %>%
as_tibble() %>%
inner_join.(catalogues, by = c('name' = 'emlo_id')) %>%
mutate(rank_closeness = rank(-closeness))
ggplot(data = close) +
geom_point(aes(x = rank(-degree), y = rank_closeness)) +
geom_text_repel(aes(x = rank(-degree), y = rank_closeness, label = catalogue), size = 2, segment.alpha = .5, segment.size= .2) + theme_bw()
The results help to describe the catalogues even with very minimal knowledge of their content
The State Papers have a complicated history:
universal_network %>%
filter(data %in% c('stuart', 'tudor')) %>%
distinct.(Source, Target) %>%
graph_from_data_frame(directed = F) %>%
as_tbl_graph() %>%
mutate(component = group_components())%>%
mutate(degree = centrality_degree(mode = 'total'))%>%
filter(component %in% 2:10|degree >30)%>%
ggraph('fr') +
geom_edge_link(alpha = .1) +
geom_node_point(aes(size = degree, fill = as.character(component)), pch = 21, stroke = .1) +
theme_void() + theme(legend.position = 'none')
universal_people = fread('/Users/Yann/Documents/non-Github/universal_network/universal_people.csv')
universal_network %>%
filter(data %in% c('stuart', 'tudor')) %>%
distinct.(Source, Target) %>%
graph_from_data_frame(directed = F) %>%
as_tbl_graph() %>%
mutate(component = group_components())%>%
mutate(degree = centrality_degree(mode = 'total'))%>%
filter(component %in% 2:20) %>%
left_join(universal_people) %>%
ggraph('fr') +
geom_edge_link() +
geom_node_point(aes(size = degree), pch = 21, color = 'black', fill = 'white')+
geom_node_text(aes(label = ifelse(degree>3, main_name, NA)), repel = T, size =3) +
theme_void() + theme(legend.position = 'none')
::: {.columns-2}
universal_network %>%
filter(data %in% c('stuart', 'tudor')) %>%
distinct.(Source, Target) %>%
graph_from_data_frame(directed = F) %>%
as_tbl_graph() %>%
mutate(component = group_components())%>%
mutate(degree = centrality_degree(mode = 'total'))%>%
filter(component ==8) %>%
left_join(universal_people) %>%
ggraph('fr') +
geom_edge_link() +
geom_node_point(aes(size = degree), pch = 21, color = 'black', fill = 'white')+
geom_node_text(aes(label = main_name), repel = T, size =3) +
theme_void() + theme(legend.position = 'none')
:::
::: {.columns-2}
{width=“350”}
{width=“350”}
:::