::p_load(treemap, treemapify, tidyverse) pacman
Hands-on Exercise 5 - Part 5
Treemap Visualisation with R
Note: Last modified to include author’s details.
1. Getting Started
Plot static treemap using treemap package.
Design interactive treemap using d3treeR package.
1.1 Install and launch R packages
For the purpose of this exercise, the following R packages will be used.
- First-time users will have to install additional package from github with the code chunk below. Otherwise, proceed to step 3 to launch d3treeR package.
install.packages("devtools")
- Load devtools library and install package found in github with the code chunk below.
library(devtools)
install_github("timelyportfolio/d3treeR", force = TRUE)
- Launch d3treeR package with code chunk below.
library(d3treeR)
1.2 Import the data
This exercise used the REALIS2018.csv data from Urban Redevelopment Authority (URA) - REALIS portal. This dataset provides information of private property transaction records in 2018.
Code
<- read_csv("data/realis2018.csv") realis2018
1.3 Overview of the data
Show code
summary(realis2018)
Project Name Address No. of Units Area (sqm)
Length:23205 Length:23205 Min. :1 Min. : 24.0
Class :character Class :character 1st Qu.:1 1st Qu.: 67.0
Mode :character Mode :character Median :1 Median : 98.0
Mean :1 Mean : 118.2
3rd Qu.:1 3rd Qu.: 127.0
Max. :1 Max. :4836.0
Type of Area Transacted Price ($) Nett Price($) Unit Price ($ psm)
Length:23205 Min. : 40000 Length:23205 Min. : 355
Class :character 1st Qu.: 950000 Class :character 1st Qu.:11231
Mode :character Median : 1280000 Mode :character Median :14621
Mean : 1734099 Mean :15246
3rd Qu.: 1858000 3rd Qu.:18075
Max. :100000000 Max. :54363
Unit Price ($ psf) Sale Date Property Type Tenure
Min. : 33 Length:23205 Length:23205 Length:23205
1st Qu.:1043 Class :character Class :character Class :character
Median :1358 Mode :character Mode :character Mode :character
Mean :1416
3rd Qu.:1679
Max. :5050
Completion Date Type of Sale Purchaser Address Indicator
Length:23205 Length:23205 Length:23205
Class :character Class :character Class :character
Mode :character Mode :character Mode :character
Postal District Postal Sector Postal Code Planning Region
Min. : 1.00 Min. : 1.00 Min. : 18965 Length:23205
1st Qu.:10.00 1st Qu.:26.00 1st Qu.:267952 Class :character
Median :15.00 Median :45.00 Median :456068 Mode :character
Mean :14.96 Mean :42.66 Mean :434269
3rd Qu.:19.00 3rd Qu.:54.00 3rd Qu.:548461
Max. :28.00 Max. :82.00 Max. :829750
Planning Area
Length:23205
Class :character
Mode :character
1.4 Data Wrangling & Manipulation
Two key verbs of dplyr package, namely: group_by()
and summarize()
will be used to perform these steps.
group_by()
breaks down a data.frame into specified groups of rows. When you then apply the verbs above on the resulting object they’ll be automatically applied “by group”.
Grouping affects the verbs as follows:
grouped
select()
is the same as ungroupedselect()
, except that grouping variables are always retained.grouped
arrange()
is the same as ungrouped; unless you set.by_group = TRUE
, in which case it orders first by the grouping variables.mutate()
andfilter()
are most useful in conjunction with window functions (likerank()
, ormin(x) == x
). They are described in detail in vignette(“window-functions”).sample_n()
andsample_frac()
sample the specified number/fraction of rows in each group.summarise()
computes the summary for each group.
Show code
<- group_by(realis2018, `Project Name`,
realis2018_grouped `Planning Region`, `Planning Area`,
`Property Type`, `Type of Sale`)
<- summarise(realis2018_grouped,
realis2018_summarised `Total Unit Sold` = sum(`No. of Units`, na.rm = TRUE),
`Total Area` = sum(`Area (sqm)`, na.rm = TRUE),
`Median Unit Price ($ psm)` = median(`Unit Price ($ psm)`, na.rm = TRUE),
`Median Transacted Price` = median(`Transacted Price ($)`, na.rm = TRUE))
group_by()
is used together withsummarise()
to derive the summarised data.frame.Aggregation functions such as
sum()
andmeadian()
obey the usual rule of missing values: if there’s any missing value in the input, the output will be a missing value. The argument na.rm = TRUE removes the missing values prior to computation.The code chunk above is not very efficient given the need to name each intermediate data.frame, e.g. realis2018_grouped and realis2018_summarised (though not necessary).
Group summaries with pipe
The code chunk below shows a more efficient way to tackle the same processes by using the pipe, %>%:
Show code
<- realis2018 %>%
realis2018_summarised group_by(`Project Name`,`Planning Region`,
`Planning Area`, `Property Type`,
`Type of Sale`) %>%
summarise(`Total Unit Sold` = sum(`No. of Units`, na.rm = TRUE),
`Total Area` = sum(`Area (sqm)`, na.rm = TRUE),
`Median Unit Price ($ psm)` = median(`Unit Price ($ psm)`, na.rm = TRUE),
`Median Transacted Price` = median(`Transacted Price ($)`, na.rm = TRUE))
2. Designing Treemap with treemap package
2.1 Designing Static Treemap
treemap package is a R package specially designed to offer great flexibility in drawing treemaps. The core function, namely: treemap()
offers at least 43 arguments.
2.1.1 Using basic arugments
treemap() of Treemap package is used to plot a treemap showing the distribution of median unit prices and total unit sold of resale condominium by geographic hierarchy in 2017.
Select records that belong to resale condominium property type.
Show code
<- realis2018_summarised %>%
realis2018_selected filter(`Property Type` == "Condominium", `Type of Sale` == "Resale")
The code chunk below designed a treemap by using three core arguments of treemap(), namely: index, vSize and vColor.
Show code
treemap(realis2018_selected,
index=c("Planning Region", "Planning Area", "Project Name"),
vSize="Total Unit Sold",
vColor="Median Unit Price ($ psm)",
title="Resale Condominium by Planning Region and Area, 2017",
title.legend = "Median Unit Price (S$ per sq. m)"
)
2.1.2 Working with vColor and type arguments
Two arguments that determine the mapping to color palettes: mapping and palette.
The only difference between “value” and “manual” is the default value for mapping.
The “value” treemap considers palette to be a diverging color palette (e.g ColorBrewer’s “RdYlBu”), and maps it in such a way that 0 corresponds to the middle color (typically white or yellow), -max(abs(values)) to the left-end color, and max(abs(values)) to the right-end color.
The “manual” treemap simply maps min(values) to the left-end color, max(values) to the right-end color, and mean(range(values)) to the middle color.
In the code chunk below, type argument is define as value.
Show code
treemap(realis2018_selected,
index=c("Planning Region", "Planning Area", "Project Name"),
vSize="Total Unit Sold",
vColor="Median Unit Price ($ psm)",
type = "value",
title="Resale Condominium by Planning Region and Area, 2017",
title.legend = "Median Unit Price (S$ per sq. m)"
)
In the code chunk below:
type argument is define as value
palette = “RdYlBu”
Show code
treemap(realis2018_selected,
index=c("Planning Region", "Planning Area", "Project Name"),
vSize="Total Unit Sold",
vColor="Median Unit Price ($ psm)",
type="value",
palette="RdYlBu",
title="Resale Condominium by Planning Region and Area, 2017",
title.legend = "Median Unit Price (S$ per sq. m)"
)
The “manual” type does not interpret the values as the “value” type does. Instead, the value range is mapped linearly to the colour palette.
Show code
treemap(realis2018_selected,
index=c("Planning Region", "Planning Area", "Project Name"),
vSize="Total Unit Sold",
vColor="Median Unit Price ($ psm)",
type="manual",
palette="RdYlBu",
title="Resale Condominium by Planning Region and Area, 2017",
title.legend = "Median Unit Price (S$ per sq. m)"
)
Show code
treemap(realis2018_selected,
index=c("Planning Region", "Planning Area", "Project Name"),
vSize="Total Unit Sold",
vColor="Median Unit Price ($ psm)",
type="manual",
palette="Blues",
title="Resale Condominium by Planning Region and Area, 2017",
title.legend = "Median Unit Price (S$ per sq. m)"
)
2.1.2 Working with treemap layout algorithm argument
Treemap Layout
treemap()
supports two popular treemap layouts, namely: “squarified” and “pivotSize”. The default is “pivotSize”.
squarified treemap algorithm (Bruls et al., 2000) produces good aspect ratios, but ignores the sorting order of the rectangles (sortID).
ordered treemap, pivot-by-size, algorithm (Bederson et al., 2002) takes the sorting order (sortID) into account while aspect ratios are still acceptable.
Set
algorithm = "squarified"
for squarified treemapShow code
treemap(realis2018_selected, index=c("Planning Region", "Planning Area", "Project Name"), vSize="Total Unit Sold", vColor="Median Unit Price ($ psm)", type="manual", palette="Blues", algorithm = "squarified", title="Resale Condominium by Planning Region and Area, 2017", title.legend = "Median Unit Price (S$ per sq. m)" )
Set
algorithm = "pivotSize"
for ordered treemap.Use
sortID
argument to determine order in which the rectangles are placed from top left to bottom right.Show code
treemap(realis2018_selected, index=c("Planning Region", "Planning Area", "Project Name"), vSize="Total Unit Sold", vColor="Median Unit Price ($ psm)", type="manual", palette="Blues", algorithm = "pivotSize", sortID = "Median Transacted Price", title="Resale Condominium by Planning Region and Area, 2017", title.legend = "Median Unit Price (S$ per sq. m)" )
2.2 Designing Treemap using treemapify package
treemapify is a R package specially developed to draw treemaps in ggplot2.
2.2.1 Designing basic treemap
Show code
ggplot(data=realis2018_selected,
aes(area = `Total Unit Sold`,
fill = `Median Unit Price ($ psm)`),
layout = "scol",
start = "bottomleft") +
geom_treemap() +
scale_fill_gradient(low = "light blue", high = "blue")
2.2.2 Defining hierarchy
Use subgroup = Planning Region
Show code
ggplot(data=realis2018_selected,
aes(area = `Total Unit Sold`,
fill = `Median Unit Price ($ psm)`,
subgroup = `Planning Region`),
start = "topleft") +
geom_treemap()
Use subgroup = Planning Region
& subgroup2 = Planning Area
Show code
ggplot(data=realis2018_selected,
aes(area = `Total Unit Sold`,
fill = `Median Unit Price ($ psm)`,
subgroup = `Planning Region`,
subgroup2 = `Planning Area`)) +
geom_treemap()
Use geom_treemap_subgroup_border()
Show code
ggplot(data=realis2018_selected,
aes(area = `Total Unit Sold`,
fill = `Median Unit Price ($ psm)`,
subgroup = `Planning Region`,
subgroup2 = `Planning Area`)) +
geom_treemap() +
geom_treemap_subgroup2_border(colour = "gray40",
size = 2) +
geom_treemap_subgroup_border(colour = "gray20")
2.3 Designing Interactive Treemap using d3treeR
Use treemap()
to build a treemap by using selected variables in condominium data.frame.
Show code
<- treemap(realis2018_summarised,
tm index=c("Planning Region", "Planning Area"),
vSize="Total Unit Sold",
vColor="Median Unit Price ($ psm)",
type="value",
title="Private Residential Property Sold, 2017",
title.legend = "Median Unit Price (S$ per sq. m)"
)
Use d3tree()
to build interactive treemap.
d3tree(tm,rootname = "Singapore" )
3. Self-exploratory on Singapore Consumer Price Index
Import CPI data
The dataset (monthly CPI for the period between January 2015 to December 2023) is retrieved from Department of Statistics (DOS) Singapore.
::p_load(tidyverse, treemap, treemapify, d3treeR, matrixStats) pacman
Check on data structure of dataset
str(CPI)
colSums(is.na(CPI))
which(is.na(CPI))
Drop NA values
<- na.omit(CPI) CPI2
str(CPI2)
Rename category variable
<- CPI2 %>%
CPI3 rename(
category = `Data Series`
)
Group sub-categories into main categories
Show code
<- CPI3 %>%
CPI4 mutate(
main_category = case_when(
# general
row_number() %in% 1 ~ "All Items",
row_number() %in% 2:55 ~ "Food",
row_number() %in% 56:65 ~ "Clothing & Footwear",
row_number() %in% 66:72 ~ "Housing & Utilities",
row_number() %in% 73:82 ~ "Household Durables & Services",
row_number() %in% 83:94 ~ "Healthcare",
row_number() %in% 95:107 ~ "Transport",
row_number() %in% 108:111 ~ "Communication",
row_number() %in% 112:129 ~ "Recreation & Culture",
row_number() %in% 130:136 ~ "Education",
row_number() %in% 137:150 ~ "Miscellaneous Goods & Services",
row_number() %in% 151 ~ "All Items Less Imputed Rentals On Owner-Occupied Accommodation",
row_number() %in% 152 ~ "All Items Less Accommodation"
)
)
# shift main category column to 1st column
<- CPI4[, c(ncol(CPI4), 1:(ncol(CPI4)-1))] CPI4
Prepare for CPI Treemap (Static)
Show code
<- CPI4 %>%
CPI4_summarised group_by(main_category,category) %>%
summarise(`2023 Average` = rowMeans(CPI3[,c("2023 Dec", "2023 Nov", "2023 Oct", "2023 Sep", "2023 Aug", "2023 Jul",
"2023 Jun", "2023 May", "2023 Apr", "2023 Mar", "2023 Feb", "2023 Jan")]),
`2023 Max` = rowMaxs(as.matrix(CPI3[,c("2023 Dec", "2023 Nov", "2023 Oct", "2023 Sep", "2023 Aug", "2023 Jul",
"2023 Jun", "2023 May", "2023 Apr", "2023 Mar", "2023 Feb", "2023 Jan")]))
)
Show code
<- treemap(CPI4_summarised,
tm23 index=c("main_category", "category"),
vSize="2023 Average",
vColor="2023 Max",
title="Singapore CPI (2015-2023)",
title.legend = "CPI"
)
Prepare for CPI Treemap (Interactive)
Show code
d3tree(tm23, rootname = "Singapore CPI (2015-2023)",
value = paste("Max CPI : ", CPI4_summarised$`2023 Max`,
"\nAverage CPI : ", CPI4_summarised$`2023 Average`)
)