There are many reasons you may want to draw a custom map with ggplot in R — overlaying data with ggplot is a lot more flexible than in Power BI1 or other readily made BI tools. There are many blog posts on how to do this for the US, but resources regarding Canada are scarce. Fear not! With the help from StatCan, some great pointers out there2, and the spirit of exploration, we are ready to share how to draw some Canada maps.
Update March 18th, 2021
I’ve pushed the code and data onto a repository at Github. Link here: github.com/fifty8/canada-maps-r. My thanks to readers who found out about the dead link I had.
I’ve also had the chance to test how long it takes my MacBook (late-2017, 13” quad-core i7) to process the raw geo files. The whole process took about 2 hours, which is a lot longer than what I remembered when running on a Windows machine. YMMV wildly. Have patience.
By the end of the post, you will be able to —
- Use R and ggplot to draw maps for Canada: all the 10 provinces and the 3 territories
- Zoom in to draw only select provinces or select areas within a province
- Apply coordinate transformation so that the maps of the Great White North appear closer to reality
- Overlay custom data points with coordinates — we are pinning Vancouver, Calgary, Edmonton, Toronto, Ottawa, and Montréal.
You will be using —
- Shape files from StatCan (.shp, shx, .dbf)
- R with multiple shape file-related packages listed below
- A hardcoded copy of latitudes and longitudes for some major Canadian cities
city | lat | long |
---|---|---|
Vancouver | 49.2827 | -123.1207 |
Calgary | 51.0447 | -114.0719 |
Edmonton | 53.5461 | -113.4938 |
Toronto | 43.6532 | -79.3832 |
Ottawa | 45.4215 | -75.6972 |
Montreal | 45.5017 | -73.5673 |
The R packages are listed here and omitted in code blocks below. The full script and data can be downloaded here found on Github.
library(sf) # the base package manipulating shapes
library(rgdal) # geo data abstraction library
library(geojsonio) # geo json input and output
library(spdplyr) # the `dplyr` counterpart for shapes
library(rmapshaper) # the package that allows geo shape transformation
library(magrittr) # data wrangling
library(dplyr)
library(tidyr)
library(ggplot2)
Getting the Data
StatCan website has a comprehensive list of shape file data available. For this exercise, we are using ArcGIS format, Census divisions, Cartographic Boundary File. Download the zip file — it should be around 60 MB and slightly larger unzipped.
After unzipping, you will see four files:
.shp
file is where the data live in.shx
is the index for the main data file.dbf
describes the shapes in the main data file.prj
is the project file for when the shape file was created
Note: You MUST name the .shp
, .shx
, and .dbf
files with the same name (different extensions) and place them in the same folder. Existence of the .prj
file is optional.
Transforming and Simplifying the Data
Shape files must be transformed to Geo JSON format to be consumed and utilized by R. And since the shape file is really high-definition (on a map 2000 pixels wide, a shape defined by 2 million points has virtually no difference than a shape defined by 1,000 points), we need to simplify the file so that plotting the data isn’t a painful 20-minute wait.
This code block only needs to be run once — once you have the simplified Geo JSON file stored in disk, you can read data directly from there.
For the R code below, we are assuming the data have been placed in the /data
folder in the working directory.
canada_raw = readOGR(dsn = "data", layer = "gcd_000b11a_e", encoding = 'latin1') # 1
canada_raw_json <- geojson_json(canada_raw) # 2
canada_raw_sim <- ms_simplify(canada_raw_json) # 3
geojson_write(canada_raw_sim, file = "data/canada_cd_sim.geojson") # 4
Notes:
-
We are reading the shape file (files) — note the lack of extension and the
latin1
encoding. You shouldn’t be using UTF-8 here for this file downloaded from StatCan. -
Converts shape file into Geo JSON format — should take a few minutes. (Update March 18, 2021: It took my quad-core i7 MacBook Pro almost two hours to finish this step. It took less than 10 minutes on a Windows machine with my old employer — unless I remembered it wrong. Your mile may wildly vary at this step.)
-
Simplifying the Geo JSON data, keeping 5% of the most prominent vertices of the geo polygons by default. This should also take a few minutes. (Update March 18, 2021: It took about 10 minutes on my MacBook.)
-
Writing the simplified Geo JSON file back to disk. From here on out, we no longer use the large original shape files.
Plotting the Great White North
Assuming you’ve done the transformation part above, and have a Geo JSON data placed in the /data
folder in the working directory, you are ready to draw the maps of Canada.
canada_cd <- st_read("data/canada_cd_sim.geojson", quiet = TRUE) # 1
crs_string = "+proj=lcc +lat_1=49 +lat_2=77 +lon_0=-91.52 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs" # 2
# Define the maps' theme -- remove axes, ticks, borders, legends, etc.
theme_map <- function(base_size=9, base_family="") { # 3
require(grid)
theme_bw(base_size=base_size, base_family=base_family) %+replace%
theme(axis.line=element_blank(),
axis.text=element_blank(),
axis.ticks=element_blank(),
axis.title=element_blank(),
panel.background=element_blank(),
panel.border=element_blank(),
panel.grid=element_blank(),
panel.spacing=unit(0, "lines"),
plot.background=element_blank(),
legend.justification = c(0,0),
legend.position = c(0,0)
)
}
# Define the filling colors for each province; max allowed is 9 but good enough for the 13 provinces + territories
map_colors <- RColorBrewer::brewer.pal(9, "Pastel1") %>% rep(37) # 4
# Plot the maps
ggplot() +
geom_sf(aes(fill = PRUID), color = "gray60", size = 0.1, data = canada_cd) + # 5
coord_sf(crs = crs_string) + # 6
scale_fill_manual(values = map_colors) +
guides(fill = FALSE) +
theme_map() +
theme(panel.grid.major = element_line(color = "white"),
legend.key = element_rect(color = "gray40", size = 0.1))
Notes:
-
We are reading the Geo JSON file from disk.
-
We are defining the descriptions for Coordinate Reference System (CRS). For Canada, you can take this string and run without asking why, but if you’ve gotta ask, here is a footnote.3
-
We are creating a ggplot theme that basically removes axes, borders, legends, and use white backgrounds.
-
We are gonna colour each province and territory with their own colour. But since there are 13 provinces and territories and Color Brewer only supports 9, we are not killing ourselves coming up with distinct colours — 9 colours for 13 areas are good enough.
-
We pass in the shape object as data source, and fill the polygons based on
PRUID
, or province unique identifier. -
We apply CRS transformation. The transformation is applied after all the shape layers are drawn (in this case, there’s but one layer). Refer to Note 2 to learn more about the CRS transformation and the why behind it.
You should now be able to plot some pretty Canada Maps like this:
Now let’s add more fun to it by also plotting a few major Canadian cities across the country!
Overlaying Custom Data
What fun is it with maps if we can’t plot our data points on them? Let’s do just that.
Getting the coordinates of the cities
We are using hardcoded city coordinates, read in as a data frame with read.delim
:
conn = textConnection("city lat long
Vancouver 49.2827 -123.1207
Calgary 51.0447 -114.0719
Edmonton 53.5461 -113.4938
Toronto 43.6532 -79.3832
Ottawa 45.4215 -75.6972
Montreal 45.5017 -73.5673")
city_coords = read.delim(conn, stringsAsFactors = F)
By now you should have city_coords
as a data frame of city names, latitudes and longitudes.
Conforming the data frame to shape objects
To actually plot the coordinates, the last step is transformation. Remember what we’ve done with the Canada maps? We applied CRS transformation to better represent the geometries. We need to do the same to the cities’ coordinates.
sf_cities = city_coords %>%
select(long, lat) %>% # 1
as.matrix() %>% # 2
st_multipoint(dim = 'XY') %>% # 3
st_sfc() %>% # 4
st_set_crs(4269) # 5
A few things to unpack here:
-
We use
dplyr
to filter for only longitude and latitude columns of the data frame. We are also using%>%
to pipe the operations — to learn more, check outmagrittr
package. -
We convert it to a 6-by-2 matrix, as this is the only data type the next step’s
st_multipoint
accepts. -
We convert the matrix to a multipoint geometry, consisting of 6 points on the X-Y plane.
-
We convert the geometry to a shape object.
-
We decorate the geometry with CRS code “4269”, which is standard for North America, and is used in StatCan’s data files.
And now we are finally ready for the fun.
Plotting for real
ggplot() +
geom_sf(aes(fill = PRUID), color = "gray60", size = 0.1, data = canada_cd) +
geom_sf(data = sf_cities, color = '#001e73', alpha = 0.5, size = 3) + # 17
coord_sf(crs = crs_string) +
scale_fill_manual(values = map_colors) +
guides(fill = FALSE) +
theme_map() +
theme(panel.grid.major = element_line(color = "white"),
legend.key = element_rect(color = "gray40", size = 0.1))
Voilà! Canada maps right there, all created in R.
What’s Next?
Consider creating your own package to draw the Canada maps that allow you to generate part of Canada easily. For my daytime work, we focus heavily on the major metropolitan areas such as Greater Vancouver, Greater Toronto, and Ottawa-Gatineau. A package that allows you to conjure shapes with one line of code will prove very handy.
-
We use Microsoft Power BI across the company. ↩
-
Canada Maps - R Bloggers was my starting point. The post neatly points which packages to look into, and where to get the data. However, it falls short of explaining how to draw your own data that are not part of a shape file. A lot of code in this post is taken with only minor modifications. ↩
-
it’s because the official Canada maps are drawn with Lambert Conformal Conic projection (hence the “lcc” in the string), with the central meridian at \(91^\circ\,52'\,W\), and standard parallels at \(49^\circ\,N\) and \(77^\circ\,N\). The full discussion is on StatCan as well.
If you do not apply transformation, the maps will look more like this:
The northern territories will have overstated areas, and look how distorted Nunavut appears! ↩