The package gganimate can be used in conjunction with ggplot to create some great visualisations of spatio-temporal data. Here we will visualise the 2014 Ebola outbreak in Sierra Leone using a polygon shapefile of administrative areas in Sierra Leone and data from the World Health Organisation on the weekly number cases in each area.
To start you will need to download all of the data, which can be found in the gganimate_ggmap_tutorial folder. Once you’re there, click on the green Clone or download button.You can either clone the repository on your GitHub account (if you have one) or download as a zip file.
You will also need to install ImageMagick. You can download it from here.
Next open RStudio and load all of the required packages. If you don’t
have these packages already they can be installed using the install.packages()
function.
We will then include a line of code that will help gganimate find imagemagick using ani.options()
from the animation package.
You will need to include the link to your “ImageMagick” folder in Program Files followed by \\magick.exe
devtools::install_github("dgrtwo/gganimate")
library(tidyr)
library(rgdal)
library(ggplot2)
library(gganimate)
library(animation)
ani.options(convert=shortPathName("Link_to_ImageMagick_folder\\magick.exe"))
Once you have loaded all of the packages you need to set you working directory.
Using the setwd()
function set your working directory to the folder where you saved
the tutorial data.
## Set working directory ##
setwd("Your_working_directory")
Now you can import the data for the polygons showing the administrative areas
of Sierra Leone. We will do this using the readOGR
function. We will use two
arguments in this case “dsn=”
& layer=
. The dsn=
argument is the path to
the folder that contains your Sierra Lone shape file (the folder named “Sierra_Leone”),
this should also include the file name of your shapefile with the “.shp” extension.
The layer=
argument should just include the filename of your shapefile without
the any extension. We also specify the projection for this data using proj4string
(see Section 1.2 in the Open Spatial Analysis 1 Tutorial ) in this case we will use
the “+proj=longlat” as our coordinates are in decimal latitude and longitude.
## Load polygon shapefile ##
SL <- readOGR(dsn = "Your_working_directory\\Sierra_leone\\sle_admbnda_adm2_1m_gov_ocha.shp",
layer="sle_admbnda_adm2_1m_gov_ocha")
## Set appropriate projection system for coordinates ##
proj4string(SL) <- CRS("+proj=longlat")
Now you can plot your shapefile in R
## Plot polygon shapefile ##
plot(SL)
Now that we have our shapefile imported we will transform it into a data frame using fortify
and
join it with a data frame of metadata associated with each polygon using the merge
function.
This function will merge two data frames based on a common attribute, in this case
we are merging them based on the column “id” and by.y=0 merges these data frames by row.name
##Fortify shapefile and create dataframe with metadata of each polygon##
SL.df <- merge(fortify(SL), as.data.frame(SL), by.x="id", by.y=0)
Now we will import the data for the weekly numbers of ebola cases. After importing the
data we will reformat it using the gather
function. The dataframe is in a wide format,
each week is a column that contains the case number for each area. The gather
function
transforms the data into a long format, one column “date” will contain the weeks and another
column which contains the number of cases “cases”. The -admin2Name
means that gather will
keep this column intact and duplicate the area names as needed. After this we create a new data frame
with both the spatial data and the weekly case data frame using the merge
function to merge
the polygon datframe with the cases data frame by area name.
## Load data for ebola cases in each adminastative district in Sierra Leone
ebola <- read.csv("ebola.csv")
str(ebola) ## This data is in wide format
ebola <- gather(ebola,key="date",value="cases",-admin2Name) ##convert to long format
str(ebola) ## Data is now in long format
ebola_data <- merge(SL.df,ebola,by="admin2Name")
We can now use the new ebola_data data frame to plot the number of cases for each
area in Sierra Leone. We start by plotting the cases for a single week using ggplot
(see ggplot tutorial for an introduction on how to use ggplot2) it’s important to include
the group=group
argument to make sure that polygons are plotted in the correct order. By
including fill=cases
geom_polygon will change the fill colour for each polygon depending on the
number of cases in each polygon. The scale_fill_gradient
function allows us to change the low and
high fill colours. http://colorbrewer2.org is a good place to source colour schemes for maps.
## Subset the ebola data to plot just one week
ebola_data_sub <- ebola_data[ebola_data$date.y=="X.2014.W51.",]
## create ggplot object ##
ebola_plot_sub <- ggplot(data = ebola_data_sub, aes(x = long, y = lat, group=group)) +
geom_polygon(aes(fill = cases),color="black") +
scale_fill_gradient(low = "#fcfbfd", high = "#2d004b") +
coord_map() + theme_minimal()
## Plot ggplot object
ebola_plot_sub
We can now use the gganimate package to plot the data for all weeks. To do this
we use similar code for the ggplot
function but simply add in the frame=date.y
argument.
In the ebola_data data frame date.y is the variable that contains information about the week case numbers
were collected for. The gganimate
function will then create a plot for each week and combine these to create a gif.
We use the first argument to tell gganimate the ggplot object we want to gganimate
and the filename=
argument to designate a filename for a file to be written out into the working directory.
It might take a little while to do this.
## Plot data for the whole outbreak using gganimate() ##
ebola_plot_animate <- ggplot(data = ebola_data, aes(x = long, y = lat, group=group, frame=date.y)) +
geom_polygon(aes(fill = cases),color="black") +
scale_fill_gradient(low = "#fcfbfd", high = "#2d004b") +
coord_map() +
theme_minimal()
## animate plot with gganimate
gganimate(ebola_plot_animate, filename="ebola_gif.gif")
The next part of the tutorial will focus on using the ggmap and ggplot packages to visualise crime data in Greater Manchester. First of all we will load all of the required packages. If you don’t already have these packages then you will need to install them. We will also set our working directory.
library(ggplot2)
library(ggmap)
setwd(Your_working_directory)
We will start my importing the crime.csv file into R using the read.csv
function.
We will then subset this data based on the crime that we want to plot a heatmap for.
Using the unique
function allows us to view all of the crime types in our dataframe.
In this case we will focus on bicycle theft.
c_data <- read.csv("Crime_data.csv", header=T) ## Read csv file with crime data
unique(c_data$Crime.type) ## list different types of crimes in dataset
c_data_b <- c_data[c_data$Crime.type=="Bicycle theft",]
Then we will get the base map of the area we want to focus on by using the
get_googlemap
function. This function allows you to download a map from the
google maps API. The lat=
and long=
arguments will be the centre of the map
that you will download and the extent of this map can be controlled using the
zoom=
argument. By using the mean of our latitude and longitude we are able to
identify the centre of the area we want to download. You can then plot the map
to check it and adjust zoom accordingly. The type of map that you want to
download is selected using the maptype=
argument. In this case we will
download a “terrain” map although there other options (“satellite” and
“roadmap”). You can further customise the google maps you download by using the
style=
argument, this argument passes HTML code to the google API to
customise the map you download. For example, if we include
style='feature:all|element:labels|visibility:off
it will remove all of the text
from the map. https://developers.google.com/maps/documentation/static-maps/ has some good examples of ways to customise maps from google. We can use the function ggmap to
plot the map.
map <- get_googlemap(c(lon=mean(c_data_b$Longitude), lat=mean(c_data_b$Latitude)), zoom=10,maptype="terrain")
ggmap(map)
map_no_text <- get_googlemap(c(lon=mean(c_data_b$Longitude), lat=mean(c_data_b$Latitude)), zoom=10,maptype="terrain",style="feature:all|element:labels|visibility:off")
ggmap(map_no_text)
For this visualisation we are going to plot the density of points using the
stat_density2d
function. This function utilises the kde2d
function to calculate
contours and create a heatmap. The kde2d
function determines the
density of points within a given bandwidth or search radius. Kde2d will
calculate a bandwidth automatically but this bandwidth can also be defined
manually if you are interested in including a specific bandwidth, perhaps to use
a finer resolution to calculate the desnity of points. Before defining
this bandwidth we need calculate interval of latitude and longitude that covers
200m in our area. The function below allows us to calculate which interval of
latitude x_d
and longitude y_d
is the equivalent of 200 meters. We can use
this later to define our bandwidth.
# Calculates the geodesic distance between two points specified by radian latitude/longitude using the
# Haversine formula (hf)
gcd.hf <- function(long1, lat1, long2, lat2) {
R <- 6371 # Earth mean radius [km]
delta.long <- (long2 - long1)
delta.lat <- (lat2 - lat1)
a <- sin(delta.lat/2)^2 + cos(lat1) * cos(lat2) * sin(delta.long/2)^2
c <- 2 * asin(min(1,sqrt(a)))
d = R * c
return(d) # Distance in km
}
metric <- 200 ## bandwidth in meters
#X
x_d <- 0.1/(gcd.hf(-2.6,53.4,-2.5,53.4)/metric) ## calculate distance in lon
x_d
#y
y_d <- 0.1/(gcd.hf(-2.6,53.4,-2.6,53.5)/metric) ## calculate distance in lat
y_d
Now we can plot our crime data on the base map using ggplot and ggmap.
The ggmap()
function will plot your base map and we can then use other ggplot
functions to plot in layers on top of this base map. First we will use geom_point
to plot the locations where bikes were stolen in Manchester. At this point we
can also refine the extent of our base map further by using scale_x_continuous
and scale_y_continuous
. By using the min
and max
of latitude and longitude we can
change the extent and zoom in on the range of our coordinates. We can also use
the expand=
argument to create buffer around the edge of the spatial range of
our points so they aren’t too close to the edge of the plot.
raw_points <- ggmap(map) + geom_point(data=c_data_b,aes(x=Longitude,y=Latitude),size=1)+
scale_x_continuous( limits = c(min(c_data_b$Longitude),max(c_data_b$Longitude)) , expand = c(0.01 , 0.01) )+
scale_y_continuous( limits = c(min(c_data_b$Latitude),max(c_data_b$Latitude)) , expand = c(0.01, 0.01))
raw_points
Finally we will add stat_density2d to create a heatmap of the density of these
points. The stat_density2d
function will accept the same aesthetic arguments as most ggplot
functions and you can also include arguments that are specific to the kde2d
function.
In this case we include an argument to pass to the kde2d
function, which is h=
.
This is the argument to define a bandwidth, if this argument is left out stat_density2d
will calculate the bandwidth automatically. In this case we will tell the kde2d
function that we want it to use a bandwidth of x_d
and y_d
, the equivalent of 200m.
The arguments with ..level..
will vary the fill and transparency based on the
density of points derived using the kde2d
function. The higher the density estimate
the less transparent the contours will be. As before we use the scale_fill_gradient
function to define our low and high colours. The scale_alpha(guide=FALSE)
will
remove the legend showing changes in alpha. For a more in depth desription of density maps see Section 5 in the
Quantifying Population and Visualising Species Occurrence Change Tutorial
Bike_dens_200m<- ggmap(map) + stat_density2d(data = c_data_b,aes(x = Longitude, y = Latitude,fill = ..level.., alpha = ..level..), geom = "polygon", h=c(x_d,y_d))+
scale_fill_gradient(low = "blue", high = "red") + scale_alpha(guide = FALSE)+
scale_x_continuous( limits = c(min(c_data_b$Longitude),max(c_data_b$Longitude)) , expand = c(0.01 , 0.01) )+
scale_y_continuous( limits = c(min(c_data_b$Latitude),max(c_data_b$Latitude)) , expand = c(0.01, 0.01))
Bike_dens_200m