There are many reasons you may want to draw a custom map with ggplot in R — overlaying data with ggplot is a lot more flexible than in Power BI1 or other readily made BI tools. There are many blog posts on how to do this for the US, but resources regarding Canada are scarce. Fear not! With the help from StatCan, some great pointers out there2, and the spirit of exploration, we are ready to share how to draw some Canada maps.

Update March 18th, 2021

I’ve pushed the code and data onto a repository at Github. Link here: github.com/fifty8/canada-maps-r. My thanks to readers who found out about the dead link I had.

I’ve also had the chance to test how long it takes my MacBook (late-2017, 13” quad-core i7) to process the raw geo files. The whole process took about 2 hours, which is a lot longer than what I remembered when running on a Windows machine. YMMV wildly. Have patience.

By the end of the post, you will be able to —

Canada Maps drawn in R

You will be using —

city lat long
Vancouver 49.2827 -123.1207
Calgary 51.0447 -114.0719
Edmonton 53.5461 -113.4938
Toronto 43.6532 -79.3832
Ottawa 45.4215 -75.6972
Montreal 45.5017 -73.5673

The R packages are listed here and omitted in code blocks below. The full script and data can be downloaded here found on Github.

library(sf) # the base package manipulating shapes
library(rgdal) # geo data abstraction library
library(geojsonio) # geo json input and output
library(spdplyr) # the `dplyr` counterpart for shapes
library(rmapshaper) # the package that allows geo shape transformation
library(magrittr) # data wrangling
library(dplyr)
library(tidyr)
library(ggplot2)

Getting the Data

StatCan website has a comprehensive list of shape file data available. For this exercise, we are using ArcGIS format, Census divisions, Cartographic Boundary File. Download the zip file — it should be around 60 MB and slightly larger unzipped.

After unzipping, you will see four files:

Note: You MUST name the .shp, .shx, and .dbf files with the same name (different extensions) and place them in the same folder. Existence of the .prj file is optional.

Transforming and Simplifying the Data

Shape files must be transformed to Geo JSON format to be consumed and utilized by R. And since the shape file is really high-definition (on a map 2000 pixels wide, a shape defined by 2 million points has virtually no difference than a shape defined by 1,000 points), we need to simplify the file so that plotting the data isn’t a painful 20-minute wait.

This code block only needs to be run once — once you have the simplified Geo JSON file stored in disk, you can read data directly from there.

For the R code below, we are assuming the data have been placed in the /data folder in the working directory.

canada_raw = readOGR(dsn = "data", layer = "gcd_000b11a_e", encoding = 'latin1') # 1
canada_raw_json <- geojson_json(canada_raw) # 2
canada_raw_sim <- ms_simplify(canada_raw_json) # 3
geojson_write(canada_raw_sim, file = "data/canada_cd_sim.geojson") # 4

Notes:

  1. We are reading the shape file (files) — note the lack of extension and the latin1 encoding. You shouldn’t be using UTF-8 here for this file downloaded from StatCan.

  2. Converts shape file into Geo JSON format — should take a few minutes. (Update March 18, 2021: It took my quad-core i7 MacBook Pro almost two hours to finish this step. It took less than 10 minutes on a Windows machine with my old employer — unless I remembered it wrong. Your mile may wildly vary at this step.)

  3. Simplifying the Geo JSON data, keeping 5% of the most prominent vertices of the geo polygons by default. This should also take a few minutes. (Update March 18, 2021: It took about 10 minutes on my MacBook.)

  4. Writing the simplified Geo JSON file back to disk. From here on out, we no longer use the large original shape files.

Plotting the Great White North

Assuming you’ve done the transformation part above, and have a Geo JSON data placed in the /data folder in the working directory, you are ready to draw the maps of Canada.

canada_cd <- st_read("data/canada_cd_sim.geojson", quiet = TRUE) # 1
crs_string = "+proj=lcc +lat_1=49 +lat_2=77 +lon_0=-91.52 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs" # 2

# Define the maps' theme -- remove axes, ticks, borders, legends, etc.
theme_map <- function(base_size=9, base_family="") { # 3
	require(grid)
	theme_bw(base_size=base_size, base_family=base_family) %+replace%
		theme(axis.line=element_blank(),
			  axis.text=element_blank(),
			  axis.ticks=element_blank(),
			  axis.title=element_blank(),
			  panel.background=element_blank(),
			  panel.border=element_blank(),
			  panel.grid=element_blank(),
			  panel.spacing=unit(0, "lines"),
			  plot.background=element_blank(),
			  legend.justification = c(0,0),
			  legend.position = c(0,0)
		)
}
# Define the filling colors for each province; max allowed is 9 but good enough for the 13 provinces + territories
map_colors <- RColorBrewer::brewer.pal(9, "Pastel1") %>% rep(37) # 4

# Plot the maps
ggplot() +
	geom_sf(aes(fill = PRUID), color = "gray60", size = 0.1, data = canada_cd) + # 5
	coord_sf(crs = crs_string) + # 6
	scale_fill_manual(values = map_colors) +
	guides(fill = FALSE) +
	theme_map() +
	theme(panel.grid.major = element_line(color = "white"),
		  legend.key = element_rect(color = "gray40", size = 0.1))

Notes:

  1. We are reading the Geo JSON file from disk.

  2. We are defining the descriptions for Coordinate Reference System (CRS). For Canada, you can take this string and run without asking why, but if you’ve gotta ask, here is a footnote.3

  3. We are creating a ggplot theme that basically removes axes, borders, legends, and use white backgrounds.

  4. We are gonna colour each province and territory with their own colour. But since there are 13 provinces and territories and Color Brewer only supports 9, we are not killing ourselves coming up with distinct colours — 9 colours for 13 areas are good enough.

  5. We pass in the shape object as data source, and fill the polygons based on PRUID, or province unique identifier.

  6. We apply CRS transformation. The transformation is applied after all the shape layers are drawn (in this case, there’s but one layer). Refer to Note 2 to learn more about the CRS transformation and the why behind it.

You should now be able to plot some pretty Canada Maps like this:

Canada Maps

Now let’s add more fun to it by also plotting a few major Canadian cities across the country!

Overlaying Custom Data

What fun is it with maps if we can’t plot our data points on them? Let’s do just that.

Getting the coordinates of the cities

We are using hardcoded city coordinates, read in as a data frame with read.delim:

conn = textConnection("city	lat	long
Vancouver	49.2827	-123.1207
Calgary	51.0447	-114.0719
Edmonton	53.5461	-113.4938
Toronto	43.6532	-79.3832
Ottawa	45.4215	-75.6972
Montreal	45.5017	-73.5673")
city_coords = read.delim(conn, stringsAsFactors = F)

By now you should have city_coords as a data frame of city names, latitudes and longitudes.

Conforming the data frame to shape objects

To actually plot the coordinates, the last step is transformation. Remember what we’ve done with the Canada maps? We applied CRS transformation to better represent the geometries. We need to do the same to the cities’ coordinates.

sf_cities = city_coords %>%
	select(long, lat) %>% # 1
	as.matrix() %>% # 2
	st_multipoint(dim = 'XY') %>% # 3
	st_sfc() %>% # 4
	st_set_crs(4269) # 5

A few things to unpack here:

  1. We use dplyr to filter for only longitude and latitude columns of the data frame. We are also using %>% to pipe the operations — to learn more, check out magrittr package.

  2. We convert it to a 6-by-2 matrix, as this is the only data type the next step’s st_multipoint accepts.

  3. We convert the matrix to a multipoint geometry, consisting of 6 points on the X-Y plane.

  4. We convert the geometry to a shape object.

  5. We decorate the geometry with CRS code “4269”, which is standard for North America, and is used in StatCan’s data files.

And now we are finally ready for the fun.

Plotting for real

ggplot() +
	geom_sf(aes(fill = PRUID), color = "gray60", size = 0.1, data = canada_cd) +
	geom_sf(data = sf_cities, color = '#001e73', alpha = 0.5, size = 3) + # 17
	coord_sf(crs = crs_string) +
	scale_fill_manual(values = map_colors) +
	guides(fill = FALSE) +
	theme_map() +
	theme(panel.grid.major = element_line(color = "white"),
		  legend.key = element_rect(color = "gray40", size = 0.1))

Canada Maps drawn in R

Voilà! Canada maps right there, all created in R.

What’s Next?

Consider creating your own package to draw the Canada maps that allow you to generate part of Canada easily. For my daytime work, we focus heavily on the major metropolitan areas such as Greater Vancouver, Greater Toronto, and Ottawa-Gatineau. A package that allows you to conjure shapes with one line of code will prove very handy.

  1. We use Microsoft Power BI across the company. 

  2. Canada Maps - R Bloggers was my starting point. The post neatly points which packages to look into, and where to get the data. However, it falls short of explaining how to draw your own data that are not part of a shape file. A lot of code in this post is taken with only minor modifications. 

  3. it’s because the official Canada maps are drawn with Lambert Conformal Conic projection (hence the “lcc” in the string), with the central meridian at \(91^\circ\,52'\,W\), and standard parallels at \(49^\circ\,N\) and \(77^\circ\,N\). The full discussion is on StatCan as well.

    If you do not apply transformation, the maps will look more like this:

    Canada Maps without CRS transformation

    The northern territories will have overstated areas, and look how distorted Nunavut appears!