Early modern routing with Viabundus and sfnetworks

Last updated on May 3, 2021 25 min read

The last couple of months have been particularly good ones for anyone working with spatial and historical data: An major new ‘online streetmap’ of early modern Northern Europe, Viabundus, was released at the end of April, about a month after the first CRAN release of an R package called sfnetworks. This package and dataset seem made for each other: with some pretty basic code I’ve been able to do some really fun things.

sfnetworks package

The stnetworks package has been in development for some time, but the first I heard of it was in March when they announced the first CRAN release (which means it’s in R’s ‘official’ list of packages that can be installed through the programming language itself). Essentially, two of my favourite packages, sf for spatial analysis and tidygraph for networks, have got together and had a baby: sfnetworks combines both packages, providing a really easy interface for performing spatial network analysis tasks such as calculating shortest paths.

As I wrote in an earlier post, tidygraph allows you to perform network analysis within a tidy data workflow, by linking together a nodes table and an edges table, allows you to switch between the two and easily do typical data analysis tasks such as grouping and filtering, directly to the network itself.

sfnetworks extends this idea by making the nodes and edges tables into simple features spatial objects: the nodes table becomes a table of spatial points, and the edges table one of spatial lines. You can also just supply one or the other and it’ll automatically make the second - so for example if you supply a table of spatial points, it’ll draw implicit edges between them, and, conversely, if you supply a dataset of spatial lines, it’ll give you a table of nodes, consisting of all the start and end points of each of the lines. Simple features objects are particularly easy to work with thanks to the R package sf: the sfnetworks format lets you leverage all the great GIS analysis from sf along with network-specific tasks.

The examples below are all based on the package vignettes and the result of a couple of days playing around. I highly recommend taking a look at these to understand the full data structure - which I have simplified and, I’m sure in parts, misunderstood.

Viabundus

As luck would have it, the data type needed for sfnetworks is pretty similar to the second resource I mentioned, Viabundus. Viabundus is the result of a four-year project by a group of researchers in Universities across Europe, which the creators describe as a ‘online street map of late medieval and early modern northern Europe (1350-1650).’ It’s an atlas of early modern roads, which have been digitised and converted into a nifty online map which allows you to calculate routes across Northern Europe, including travel times, tolls passed, fairs visited, to a pretty mind-boggling level of detail. Basically Google Maps for the seventeenth-century.

Alongside the map, the team have released the dataset as Open Access (CC-BY-SA) data. There’s extensive documentation, but in essence, to reconstruct the basic map and perform spatial analysis with it, the data contains two key tables: one edges table, the roads, and one nodes table, the towns, markets, bridges and other points through which the routes pass. As I said - conveniently this is more or less the data structure required as an input for sfnetworks. The rest of this post is a short demonstration on how the two can be used together.

Create a `sfnetworks` object from Viabundus data

The data is freely available here, with a CC-BY-SA licence. I recommend having a good read of the documentation to understand all the additional tables, but the ones we’ll use today are the two main ones: the nodes and edges. First, import them into R:

library(tidyverse)
nodes = read_csv('Viabundus-1.0-CSV/Nodes.csv')
edges = read_csv('Viabundus-1.0-CSV/Edges.csv')

Next, you need to convert them into sf objects. The geographic data in the Viabundus edges table is supplied in a format called ‘Well Known Text’ (WKT). The sf package supports this as a data input - just supply the column containing the spatial data as the argument wkt =. In this case, the WKT column in the edges table is also called WKT. Use st_set_crs to set the Coordinate Reference System, which needs to be the same for both tables.

library(sf)
edges_sf = edges %>% st_as_sf(wkt = "WKT")
edges_sf = edges_sf %>% st_set_crs(4326)

The nodes table doesn’t use WKT, but contains a regular latitude and longitude column. Use st_as_sf again, and supply the column names to the coords = argument.

nodes_sf = nodes %>% st_as_sf(coords = c('Longitude', 'Latitude'))
nodes_sf = nodes_sf %>% st_set_crs(4326)

To create the sfnetworks object, you supply data in one of a number of forms. The most complete is to supply a both a table of edges and a table of nodes. The edges table should contain ‘from’ and ‘to’ columns, corresponding to the node IDs in the node table, which will link them together.

As far as I can tell, the Viabundus data doesn’t explicitly connect the nodes and edges data like this. I’ve used a sort of work-around to get around this.

First, use as_sfnetwork to create an sfnetwork from the edges table alone.

library(sfnetworks)
sf_net = as_sfnetwork(edges_sf, directed = F)

sf_net

## # A sfnetwork with 12567 nodes and 14819 edges
## #
## # CRS:  EPSG:4326 
## #
## # An undirected multigraph with 10 components with spatially explicit edges
## #
## # Node Data:     12,567 x 1 (active)
## # Geometry type: POINT
## # Dimension:     XY
## # Bounding box:  xmin: 3.224534 ymin: 48.72051 xmax: 37.6206 ymax: 60.71302
##                   WKT
##           <POINT [°]>
## 1 (22.47944 58.24702)
## 2 (22.80928 58.30961)
## 3 (22.86145 58.36151)
## 4 (23.00004 58.40795)
## 5  (23.04775 58.5087)
## 6 (23.03553 58.57549)
## # … with 12,561 more rows
## #
## # Edge Data:     14,819 x 12
## # Geometry type: LINESTRING
## # Dimension:     XY
## # Bounding box:  xmin: 3.222333 ymin: 48.72051 xmax: 37.6206 ymax: 60.71302
##    from    to     ID Section Type  Certainty Zoomlevel From  To    Comment_ID
##   <int> <int>  <dbl> <chr>   <chr>     <dbl>     <dbl> <chr> <chr> <chr>     
## 1     1     2 227856 EST     land          2         1 NULL  NULL  NULL      
## 2     2     3 227858 EST     land          2         1 NULL  NULL  NULL      
## 3     3     4 227860 EST     land          3         1 NULL  NULL  NULL      
## # … with 14,816 more rows, and 2 more variables: Length <dbl>, WKT <LINESTRING
## #   [°]>

Looking at the data, you can see it has created a simple nodes table of all the start and end points of the spatial lines in the edges table. We need to know what towns and other points correspond to these nodes.

To do this, I used the sf function st_join. Use the join function ‘nearest feature,’ which will link the sfnetwork nodes table to the closest point in the original Viabundus nodes table. This is the best method I could think of for now, though it’s not perfect—it would be better if there was an explicit link between the nodes and edges table, which I haven’t been able to figure out.

sf_net = sf_net %>% 
  st_join(nodes_sf, join = st_nearest_feature)

One of the key things which makes a spatial network object different to a regular network object is that we treat the length of each edge as a weight. This can be done automatically with sfnetworks using edge_length, though the Viabundus edges table has a column with this information already.

sf_net = sf_net %>%
  activate("edges") %>%
  mutate(weight = edge_length())

Network Calculations

This new object can be used to calculate regular network analysis metrics, such as betweenness centrality, using the same set of functions as tidygraph:

library(tidygraph)
library(kableExtra)
betweenness_scores = sf_net %>% 
  activate(nodes) %>% 
  mutate(betweenness = centrality_betweenness(weights = NULL)) %>% 
  as_tibble() %>% 
  arrange(desc(betweenness)) %>% 
  filter(Is_Town == 'y') %>% head(10)

betweenness_scores %>% st_drop_geometry() %>% select(Name, betweenness) %>% kbl(caption = "Nodes with highest betweenness scores:")

Table 1: Nodes with highest betweenness scores:
Name	betweenness
Wittenberge	21883786
Cölln (Berlin)	21044547
Havelberg	20997097
Hitzacker	20414717
Berlin	19693706
Berlin	19316790
Bernau bei Berlin	19291164
Potsdam	19272557
Bernau bei Berlin	19257977
Malchow	19240900

Or color all nodes on their betweenness score, and display on a map:

betweenness_sf = sf_net %>% 
  activate(nodes) %>% 
  mutate(betweenness = centrality_betweenness(weights = NULL)) %>%
  as_tibble()

ggplot() + geom_sf(data = betweenness_sf, aes(color = betweenness), alpha =  .8) + scale_color_viridis_c() +theme_void() + labs(title  = "Viabundus Network, Betweenness Centrality Scores", caption = 'Made with data from Viabundus.eu') + theme(title = element_text(size = 14, face = 'bold'))

Interestingly, the highest betweenness nodes all seem to fall along a single path. Is this a particularly important trunk road? We can also plot the whole network. To use ggplot, create separate nodes and edges tables using as_tibble, and add them as separate geoms to the same plot:

ggplot() + 
  geom_sf(data = sf_net %>% activate(nodes) %>% as_tibble(), alpha =  .1, size  = .01)  + 
  geom_sf(data = sf_net %>% activate(edges) %>% as_tibble(), alpha = .5) +theme_void() + 
  labs(title  = "Viabundus Network, All Routes", caption = 'Made with data from Viabundus.eu') + 
  theme(title = element_text(size = 14, face = 'bold'))

Shortest-path calculation

The nicest application of these two resources I’ve found so far is to use sfnetworks to calculate the shortest path between any set of points in the network. This uses Dijkstra’s algorithm, along with the lengths of the spatial lines as weights (or impedance), to calculate the shortest route between any two points (or from one start point to a range of end points). This is done with the function st_shortest_paths.

st_network_paths takes from and to arguments, using the node ID from the nodes table. You’ll need to figure out some way to find the relevant node IDs to enter here—the most convenient way I’ve found is first to make a copy of just the nodes table:

nodes_lookup_table = sf_net %>% activate(nodes) %>% as_tibble()

The ID you need is the row index - create a column called node_id containing this info:

nodes_lookup_table  = nodes_lookup_table %>% mutate(node_id = 1:nrow(.))

Looking at the table, you’ll see that sometimes the same place has been assigned to multiple closest coordinates. For now, I’m just going to pick the first one.

utrecht_node_id = nodes_lookup_table %>% filter(Name == 'Utrecht') %>% head(1) %>% pull(node_id)

hamburg_node_id = nodes_lookup_table %>% filter(Name == 'Hamburg') %>% head(1) %>% pull(node_id)

Run st_network_paths supplying the points you want to calculate the route between as from and to arguments:

paths = st_network_paths(sf_net, from =utrecht_node_id, to = hamburg_node_id)

paths

## # A tibble: 1 x 2
##   node_paths edge_paths
##   <list>     <list>    
## 1 <int [99]> <int [98]>

The result of st_network_paths is a dataframe with two columns, each containing a vector of edge or node IDs. First convert each into a dataframe with a single columns of IDs:

 node_list = paths %>%
  slice(1) %>%
  pull(node_paths) %>%
  unlist() %>% as_tibble()

edge_list = paths %>%
  slice(1) %>%
  pull(edge_paths) %>%
  unlist() %>% as_tibble()

Use inner join() to attach the edge and node IDs in the route to the edge and nodes tables from the sfnetwork object. You’ll need to recreate them as sf objects:

line_to_draw = edge_list %>% inner_join(sf_net %>% 
  activate(edges) %>% 
  as_tibble() %>% 
  mutate(edge_id = 1:nrow(.)) , by = c('value' = 'edge_id'))

line_to_draw = line_to_draw %>% st_as_sf()
line_to_draw = line_to_draw %>% st_set_crs(4326)

nodes_to_draw = node_list %>% inner_join(sf_net %>% 
  activate(nodes) %>% 
  as_tibble() %>% 
  mutate(node_id = 1:nrow(.)) , by = c('value' = 'node_id'))

nodes_to_draw = nodes_to_draw %>% st_as_sf()
nodes_to_draw = nodes_to_draw %>% st_set_crs(4326)

Plot this using Leaflet:

library(leaflet)

leaflet() %>% 
  addTiles() %>% 
  addCircles(data = nodes_to_draw, label = ~Name) %>% 
  addPolylines(data  = line_to_draw)

Constructing an itinerary using the node information

The paths object gives both the roads (edges/lines) and towns or other points of interest passed (nodes). We can use the additional data from the list of nodes to get more information on the route along the way.

First, it’s worth taking a closer look at the nodes data.

nodes_lookup_table %>% glimpse()

## Rows: 12,567
## Columns: 45
## $ WKT                     <POINT [°]> POINT (22.47944 58.24702), POINT (22.80…
## $ ID                      <dbl> 287, 7743, 6695, 3479, 5911, 7263, 10815, 386…
## $ Parent_ID               <dbl> NA, NA, NA, NA, NA, NA, 3865, NA, NA, NA, NA,…
## $ Name                    <chr> "Kuressaare", "Tõlluste", "Sakla", "Kahtla", …
## $ Alternative_Name        <chr> "Arensburg; Arnsborch; Arnburg; Arensburgk", …
## $ Geonames_ID             <dbl> 590939, 588249, 588839, 591760, 589398, 59043…
## $ Ready                   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Is_Settlement           <chr> "y", "y", "y", NA, "y", "y", NA, "y", "y", "y…
## $ Settlement_From         <dbl> 1384, 1528, 1645, NA, 1250, 1345, NA, 1532, 1…
## $ Settlement_To           <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Settlement_Description  <chr> "In the second half of the 14th century the c…
## $ Is_Town                 <chr> "y", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ Town_From               <dbl> 1563, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ Town_To                 <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Town_Description        <chr> "The settlement near the castle was granted R…
## $ Is_Toll                 <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Toll_From               <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Toll_To                 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Toll_Description        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Is_Staple               <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Staple_From             <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Staple_To               <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Staple_Duration_Of_Stay <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Staple_Description      <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Is_Fair                 <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Fair_From               <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Fair_To                 <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Fair_Description        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Is_Ferry                <chr> NA, NA, NA, NA, NA, "y", "y", NA, NA, NA, NA,…
## $ Ferry_From              <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Ferry_To                <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Ferry_Description       <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Is_Bridge               <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Bridge_From             <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Bridge_To               <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Bridge_Description      <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Is_Harbour              <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Harbour_From            <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Harbour_To              <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Harbour_Description     <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Is_Lock                 <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Lock_From               <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Lock_To                 <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Lock_Description        <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ node_id                 <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14…

There are a large number of types of node, including town, settlement, bridge, harbour, ferry, and lock. Using the types will help to plan out a proper itinerary for the route.

There’s also dates from/to fields for each type, which means we can draw up an itinerary for a specific time period. From what I can tell, the official Viabundus map takes into account the infrastructure types in the node list (locks, bridges and ferries etc) and draws up a route which would have used the correct network at a specified date, but I haven’t figured out how to do that yet.

The Viabundus data format is quite ‘wide,’ which makes it difficult to easily filter. Each place can be multiple types, with multiple start and end points. To make it easier to work with, I used pivot_longer to make the data long, in a couple of steps.

First, I turned all the From/To columns into a pair: one column for the type of node, and a second for the value:

nodes_to_draw = nodes_to_draw %>% map_df(rev) # The route is in reverse order - switch this around

long_data = nodes_to_draw %>% mutate(order = 1:nrow(.)) %>% 
  pivot_longer(names_to = 'data_type' , values_to = 'value_of', cols =matches("From$|To$", ignore.case = F))

long_data %>% select(Name, data_type, value_of) %>% head(10) %>% kbl()%>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"))

Name	data\_type	value\_of
Hamburg	Settlement\_From	831
Hamburg	Settlement\_To	NA
Hamburg	Town\_From	1150
Hamburg	Town\_To	NA
Hamburg	Toll\_From	1188
Hamburg	Toll\_To	NA
Hamburg	Staple\_From	1458
Hamburg	Staple\_To	1748
Hamburg	Fair\_From	1000
Hamburg	Fair\_To	NA

Next I used ifelse to filter the value column in a different way depending on whether it was a to date or a from date:

long_data  = long_data %>% 
  mutate(date_type = ifelse(str_detect(data_type, "From$"), 'from', 'to')) %>% 
  filter(date_type == 'from' & value_of < 1600 | data_type == 'to' & value_of >1600| is.na(value_of))

Using, this, we can easily list each place visited:

library(htmltools)
long_data %>% distinct(order, .keep_all = T) %>% select(order, Name) %>% filter(!is.na(Name)) %>% distinct(Name) %>% kbl() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"))%>%
  scroll_box(height = "400px")

Name
Hamburg
Ottensen
Blankenese
Buxtehude
Kammerbusch
Ahrenswohlde
Wangersen
Steddorf
Boitzen
Heeslingen
Zeven
Oldendorf (Zeven)
Steinfeld
Otterstedt
Ottersberg
Bassen
Borstel (Achim)
Achim (Weser)
Uphusen (Bremen)
Mahndorf (Bremen)
Arbergen
Hemelingen
Hastedt
Bremen
Neustadt
Warturm
Kirchhuchting
Mittelshuchting
Varrel
Horstedt
Wildeshausen
Ahlhorn
Lethe
Bethen
Cloppenburg
Krapendorf
Lastrup
Löningen
Haselünne
Bawinkel
Lingen
Schepsdorf
Südlohne (Bentheim)
Nordhorn
Ootmarsum
Almelo
Wierden
Rijssen
Heeten
Deventer
Twello
Apeldoorn
Hoog-Soeren
Garderen
Voorthuizen
Terschuur
Hoevelaken
Amersfoort
De Bilt
Utrecht

Another useful task is to make a nice summary of the places visited and their types. To do this easily, the data needs to be made ‘longer’ again. This time, we want a pair of columns where one is the node type, and the other is whether it is a ‘y’ or simply empty.

longer_data = long_data %>% pivot_longer(names_to = 'node_type', values_to = 'type', cols = matches('^Is_'))

longer_data %>% select(Name, node_type, type) %>% head(10)

## # A tibble: 10 x 3
##    Name    node_type     type 
##    <chr>   <chr>         <chr>
##  1 Hamburg Is_Settlement y    
##  2 Hamburg Is_Town       y    
##  3 Hamburg Is_Toll       y    
##  4 Hamburg Is_Staple     y    
##  5 Hamburg Is_Fair       y    
##  6 Hamburg Is_Ferry      <NA> 
##  7 Hamburg Is_Bridge     <NA> 
##  8 Hamburg Is_Harbour    y    
##  9 Hamburg Is_Lock       <NA> 
## 10 Hamburg Is_Settlement y

You’ll notice that Hamburg now has multiple repeated entries for each of the types, because we had already duplicated the data in the previous step, when making the dates long. This will need to be fixed for counts.

First, filter out any entries where the type is NA (this means that particular place was not that type):

longer_data = longer_data %>% filter(!is.na(type))

Now we can add all the node types associated with each place, using summarise. We need to use unique too, and then it’ll just print one instance of each type for each place:

longer_data %>% 
  group_by(order) %>% 
  summarise(Name = max(Name),types = paste0(unique(node_type), collapse = '; ')) %>% filter(!is.na(Name)) %>% kbl()%>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"))%>%
  scroll_box(height = "400px")

order	Name	types
1	Hamburg	Is\_Settlement; Is\_Town; Is\_Toll; Is\_Staple; Is\_Fair; Is\_Harbour
3	Ottensen	Is\_Settlement
4	Blankenese	Is\_Settlement; Is\_Ferry
6	Buxtehude	Is\_Settlement; Is\_Town; Is\_Toll; Is\_Fair
7	Buxtehude	Is\_Settlement; Is\_Town; Is\_Toll; Is\_Fair
8	Buxtehude	Is\_Settlement; Is\_Town; Is\_Toll; Is\_Fair
9	Kammerbusch	Is\_Settlement
10	Ahrenswohlde	Is\_Settlement
11	Wangersen	Is\_Settlement
12	Steddorf	Is\_Settlement
13	Boitzen	Is\_Settlement
14	Heeslingen	Is\_Settlement
15	Zeven	Is\_Settlement; Is\_Fair
16	Zeven	Is\_Settlement; Is\_Fair
17	Oldendorf (Zeven)	Is\_Settlement
18	Steinfeld	Is\_Settlement
19	Otterstedt	Is\_Settlement
20	Ottersberg	Is\_Settlement; Is\_Toll; Is\_Fair
21	Bassen	Is\_Settlement
22	Borstel (Achim)	Is\_Settlement
23	Achim (Weser)	Is\_Settlement
24	Uphusen (Bremen)	Is\_Settlement
25	Mahndorf (Bremen)	Is\_Settlement
26	Arbergen	Is\_Settlement
27	Hemelingen	Is\_Settlement
28	Hastedt	Is\_Settlement
29	Bremen	Is\_Settlement; Is\_Town; Is\_Toll; Is\_Staple; Is\_Fair
30	Bremen	Is\_Settlement; Is\_Town; Is\_Toll; Is\_Staple; Is\_Fair
33	Warturm	Is\_Settlement; Is\_Toll; Is\_Bridge
34	Kirchhuchting	Is\_Settlement
35	Mittelshuchting	Is\_Settlement
36	Varrel	Is\_Settlement; Is\_Toll
37	Horstedt	Is\_Settlement
38	Wildeshausen	Is\_Settlement; Is\_Town; Is\_Toll; Is\_Fair
39	Wildeshausen	Is\_Settlement; Is\_Town; Is\_Toll; Is\_Fair
40	Wildeshausen	Is\_Settlement; Is\_Town; Is\_Toll; Is\_Fair
41	Ahlhorn	Is\_Settlement
42	Ahlhorn	Is\_Settlement
43	Lethe	Is\_Settlement
44	Bethen	Is\_Settlement
45	Cloppenburg	Is\_Settlement; Is\_Town; Is\_Toll; Is\_Fair
46	Cloppenburg	Is\_Settlement; Is\_Town; Is\_Toll; Is\_Fair
47	Cloppenburg	Is\_Settlement; Is\_Town; Is\_Toll; Is\_Fair
48	Krapendorf	Is\_Settlement
49	Krapendorf	Is\_Settlement
50	Lastrup	Is\_Settlement
51	Löningen	Is\_Settlement
52	Haselünne	Is\_Settlement; Is\_Town; Is\_Toll; Is\_Fair
55	Bawinkel	Is\_Settlement
56	Lingen	Is\_Settlement; Is\_Town; Is\_Fair
57	Lingen	Is\_Settlement; Is\_Town; Is\_Fair
60	Schepsdorf	Is\_Settlement
61	Schepsdorf	Is\_Settlement
62	Südlohne (Bentheim)	Is\_Settlement
63	Nordhorn	Is\_Settlement; Is\_Town; Is\_Toll
64	Nordhorn	Is\_Settlement; Is\_Town; Is\_Toll
69	Ootmarsum	Is\_Settlement; Is\_Town
70	Ootmarsum	Is\_Settlement; Is\_Town
71	Ootmarsum	Is\_Settlement; Is\_Town
72	Almelo	Is\_Settlement; Is\_Town
73	Almelo	Is\_Settlement; Is\_Town
74	Wierden	Is\_Settlement
75	Rijssen	Is\_Settlement; Is\_Town
76	Rijssen	Is\_Settlement; Is\_Town
77	Rijssen	Is\_Settlement; Is\_Town
78	Heeten	Is\_Settlement
79	Heeten	Is\_Settlement
80	Deventer	Is\_Settlement; Is\_Town
81	Deventer	Is\_Settlement; Is\_Town
83	Twello	Is\_Settlement
84	Twello	Is\_Settlement
85	Apeldoorn	Is\_Settlement; Is\_Toll
86	Hoog-Soeren	Is\_Settlement
87	Garderen	Is\_Settlement
88	Voorthuizen	Is\_Settlement
89	Voorthuizen	Is\_Settlement
90	Terschuur	Is\_Settlement
91	Hoevelaken	Is\_Settlement
92	Hoevelaken	Is\_Settlement
93	Amersfoort	Is\_Settlement; Is\_Town
94	Amersfoort	Is\_Settlement; Is\_Town
95	Amersfoort	Is\_Settlement; Is\_Town
97	De Bilt	Is\_Settlement
98	De Bilt	Is\_Settlement
99	Utrecht	Is\_Settlement; Is\_Town; Is\_Harbour

Tidy data also allows us to count the totals for each type of node passed. Again, we want to remove the duplicated data which is easy to do with distinct

longer_data %>% distinct(order, node_type) %>% group_by(node_type) %>% tally() %>% kbl()%>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"))

node\_type	n
Is\_Bridge	4
Is\_Fair	18
Is\_Ferry	3
Is\_Harbour	10
Is\_Settlement	85
Is\_Staple	3
Is\_Toll	20
Is\_Town	31

Estimate Travel Times

We can use this info on the nodes passed plus the road lengths to estimate the total travel time.

First, get the total distance (note that everything is in metres for now):

distance = line_to_draw %>% tally(Length) %>% pull(n)

Viabundus suggests 5km per hour for wagons, up to a maximum of 35km per day.

time_taken_in_hours = distance / 5000

time_taken_in_days = time_taken_in_hours/7

Passing through certain nodes slow you down, according to the Viabundus documentation. Staples forced merchants to stop for three days, and ferries add an hour for loading/unloading. We can use the lists generated above to add the correct time penalties:

staples = longer_data %>% 
  distinct(order, node_type) %>% 
  filter(node_type == 'Is_Staple') %>% 
  tally() %>% pull(n)

ferries = longer_data %>% 
  distinct(order, node_type) %>% 
  filter(node_type == 'Is_Ferry') %>% 
  tally() %>% pull(n)

ferries = ferries /24

total_time_taken = time_taken_in_days + staples+ ferries

paste0("Total time taken (in days): ", total_time_taken)

## [1] "Total time taken (in days): 15.3426"

Calculating all daily travel rates

sfnetworks also has some functions for calculating the ‘neighbourhood’ of a node - the number of nodes reachable within a specified distance.

Using the code from the package vignette, I visualised the maximum one could travel from Utrecht in steps of a day for up to 10 days, ignoring staple stops:

Download a map of the world to use as a backdrop:

library(rnaturalearth)
library(rnaturalearthdata)

map = rnaturalearth::ne_coastline(returnclass = 'sf', scale = 'medium')
map = map %>% st_set_crs(4326)

Get a set of coordinates for Utrecht, in sf form:

utrecht_point = sf_net %>% activate(nodes) %>% filter(Name == 'Utrecht') %>%
  st_geometry() %>%
  st_combine() %>% 
  st_centroid()

Make a list of the ‘thresholds’: a sequence of distances which will be fed to a for loop to calculate ten separate ‘neighborhood’ objects. The thresholds are each 35,000 metres long, corresponding to the maximum speed per day:

thresholds = rev(seq(0, 350000, 35000))
palette = sf.colors(n = 10)

Run the for loop, which will generate 10 separate network objects calculating the neighborhood of nodes from our starting point, and put them in a single list, with an additional ‘days’ column.

nbh = list()


for (i in c(1:10)) {
  nbh[[i]] = convert(sf_net, to_spatial_neighborhood, utrecht_point, thresholds[i])
   nbh[[i]] =  nbh[[i]] %>% st_set_crs(4326)
   nbh[[i]] = nbh[[i]] %>% activate(edges) %>% as_tibble() %>% mutate(days = 10-(i-1))
   
}

Use rbindlist() from data.table to merge them into one large spatial object:

library(data.table)

## 
## Attaching package: 'data.table'

## The following objects are masked from 'package:dplyr':
## 
##     between, first, last

## The following object is masked from 'package:purrr':
## 
##     transpose

all_neighborhoods = rbindlist(nbh)
all_neighborhoods = st_as_sf(all_neighborhoods)
all_neighborhoods = all_neighborhoods %>% st_set_crs(4326)

Plot the results:

p = ggplot() + 
  geom_sf(data = map) + 
  geom_sf(data = all_neighborhoods, aes(color = days)) + 
  coord_sf(xlim = c(0, 11), ylim = c(50, 54))+ theme_void() + 
  labs(title  = "Maximum Distance Reached From Utrecht,\nby Cart or Wagon", color = 'Days:', caption = 'Made with data from Viabundus.eu') + 
  theme(title = element_text(size = 14, face = 'bold')) + 
  scale_color_viridis_c(direction = -1, option = 'A', breaks = 1:10) + 
  guides(color = guide_colorsteps(barwidth = '20', barheight = .5)) + 
  theme(legend.position = 'bottom') 

p

Calculating elevation data

A few more R packages allow us to estimate the elevation travelled during the trip, which can also come in handy for understanding likely routes and travel times. Load the elevatr and slopes packages.

library(elevatr)
library(slopes)

Take the spatial lines data created above, and reverse it, because the path is reversed by default:

line_to_draw = line_to_draw %>% map_df(rev)

line_to_draw = line_to_draw %>% st_as_sf()
line_to_draw = line_to_draw %>% st_set_crs(4326)

The get_elev_raster function takes a different spatial object, called a SpatialLinesDataFrame. Create this using as('Spatial'):

sp_edges = line_to_draw  %>% as('Spatial')

Now, download elevation data using get_elev_raster. This function takes the spatial object and downloads the relevant elevation data as a raster file from an online service. We can specify the zoom level (higher numbers mean a larger download) and the src, which is the data source, in this case Mapzen tiles hosted on Amazon Web Services.

elevation = get_elev_raster(sp_edges, z=9, src = "aws")

## Mosaicing & Projecting

## Note: Elevation units are in meters.
## Note: The coordinate reference system is:
##  GEOGCRS["WGS 84 (with axis order normalized for visualization)",
##     DATUM["World Geodetic System 1984",
##         ELLIPSOID["WGS 84",6378137,298.257223563,
##             LENGTHUNIT["metre",1]]],
##     PRIMEM["Greenwich",0,
##         ANGLEUNIT["degree",0.0174532925199433]],
##     CS[ellipsoidal,2],
##         AXIS["geodetic longitude (Lon)",east,
##             ORDER[1],
##             ANGLEUNIT["degree",0.0174532925199433,
##                 ID["EPSG",9122]]],
##         AXIS["geodetic latitude (Lat)",north,
##             ORDER[2],
##             ANGLEUNIT["degree",0.0174532925199433,
##                 ID["EPSG",9122]]]]

With this info, create a new ‘slope’ column using slope_raster and the raster we’ve just downloaded.

line_to_draw$slope = slope_raster(line_to_draw, e = elevation)

## Loading required namespace: geodist

Use the function slope_3d to estimate elevation data using the slopes, and add it as a Z axis to the lines sf object.

line_to_draw_3d = slope_3d(line_to_draw, elevation)

Plot the result using plot_slope:

plot_slope(line_to_draw_3d, title = "Elevation Data, Utrecht to Hamburg")

Using Slope Data in the Shortest-Path Routing

We can also supply elevation data as part of the weight used by the shortest-path algorithm - letting us help an early modern trader avoid pesky hills. To do this, first download elevation data for the entire Viabundus dataset. I’ve reduced the zoom level a bit because it takes a long time to download and process.

all_elevation = get_elev_raster(edges_sf, z=7, src = "aws")

## Mosaicing & Projecting

## Note: Elevation units are in meters.
## Note: The coordinate reference system is:
##  GEOGCRS["WGS 84 (with axis order normalized for visualization)",
##     DATUM["World Geodetic System 1984",
##         ELLIPSOID["WGS 84",6378137,298.257223563,
##             LENGTHUNIT["metre",1]]],
##     PRIMEM["Greenwich",0,
##         ANGLEUNIT["degree",0.0174532925199433]],
##     CS[ellipsoidal,2],
##         AXIS["geodetic longitude (Lon)",east,
##             ORDER[1],
##             ANGLEUNIT["degree",0.0174532925199433,
##                 ID["EPSG",9122]]],
##         AXIS["geodetic latitude (Lat)",north,
##             ORDER[2],
##             ANGLEUNIT["degree",0.0174532925199433,
##                 ID["EPSG",9122]]]]

Use the full raster to get slope data for all the roads:

edges_sf$slope = slope_raster(edges_sf, e = all_elevation)

Create a new weight column in the roads dataset, taking the slope of each road into account using some kind of appropriate formula. I’ve done a very exaggerated one here, so we can easily see changes in the routing:

edges_sf  = edges_sf %>% mutate(weight =  Length + Length * (slope*1000))

Now, create an sfnetworks object and run the pathing algorithm as above:

sf_net_elev = as_sfnetwork(edges_sf, directed = F)

sf_net_elev = sf_net_elev %>% 
  st_join(nodes_sf, join = st_nearest_feature)

## although coordinates are longitude/latitude, st_nearest_points assumes that they are planar

paths = st_network_paths(sf_net_elev, from =utrecht_node_id, to = hamburg_node_id)

 node_list = paths %>%
  slice(1) %>%
  pull(node_paths) %>%
  unlist() %>% as_tibble()

edge_list = paths %>%
  slice(1) %>%
  pull(edge_paths) %>%
  unlist() %>% as_tibble()

line_to_draw = edge_list %>% inner_join(sf_net %>% 
  activate(edges) %>% 
  as_tibble() %>% 
  mutate(edge_id = 1:nrow(.)) , by = c('value' = 'edge_id'))

line_to_draw = line_to_draw %>% st_as_sf()
line_to_draw = line_to_draw %>% st_set_crs(4326)

nodes_to_draw = node_list %>% inner_join(sf_net %>% 
  activate(nodes) %>% 
  as_tibble() %>% 
  mutate(node_id = 1:nrow(.)) , by = c('value' = 'node_id'))

nodes_to_draw = nodes_to_draw %>% st_as_sf()
nodes_to_draw = nodes_to_draw %>% st_set_crs(4326)

And plot it - it now takes a totally different route, avoiding the hills around Amerongen in the Netherlands, for example.

library(leaflet)

leaflet() %>% 
  addTiles() %>% 
  addCircles(data = nodes_to_draw, label = ~Name) %>% 
  addPolylines(data  = line_to_draw)

Plot this in 3d to see the difference in elevation:

line_to_draw$slope = slope_raster(line_to_draw, e = all_elevation)


line_to_draw_3d = slope_3d(line_to_draw, all_elevation)

plot_slope(line_to_draw_3d, title = "Elevation Data, Utrecht to Hamburg")

Perhaps more useful would be to incorporate the slope data in figuring out the total journey lengths, by devising a formula and using it in the calculations above, but this is still an interesting application!

Conclusions

Viabundus is an amazing resource, and sfnetworks makes it super easy to import and work with the data. The project team should be immensely proud of the result, and I’m very grateful to them for releasing it as open access data.

Converting the files into a ‘tidy’ data structure where necessary helped me to more easily sort and filter results, which might be worth keeping in mind.

I can imagine this having lots of really interesting applications to understanding postal routes, itineraries, and so forth. Another way to use the data would be to link to geo-temporal events on Wikidata (such as battles and sieges), and estimate the routes travelled by individuals in EMLO by their letter dates and locations. This could be a good way of figuring out intermediate stops travelled, and through that the likelihood of them being caught up in certain events.

I’m also hoping to figure out how to correctly exclude certain nodes (or node types) when running the pathing algorithm (rather than just removing those nodes from the itinerary afterwards), perhaps by setting the weight of certain edges to Inf. This would make it possible to estimate a route which avoided tolls and staples, for example.

Hopefully, some day, we’ll have similar data for most of Europe. I look forward to further updates from the project!

Yann Ryan

Research Fellow, Networking Archives Project

I’ve been at Queen Mary, University of London since 2014 and recently completed a PhD in the history of early modern news.