Hands-on Exercise 5 - Part 4

Visual Multivariate Analysis with Parallel Coordinates Plot

Author

Teo Suan Ern

Published

January 28, 2024

Modified

February 24, 2024

Note: Last modified to include author’s details.

1. Getting Started

This exercise will over on the following:

  • plotting statistic parallel coordinates plots by using ggparcoord() of GGally package,

  • plotting interactive parallel coordinates plots by using parcoords package, and

  • plotting interactive parallel coordinates plots by using parallelPlot package.

1.1 Install and launch R packages

For the purpose of this exercise, the following R packages will be used.

pacman::p_load(GGally, parallelPlot, tidyverse, RColorBrewer)

1.2 Import the data

This exercise used the World Happiness 2018 report dataset.

Show code
wh <- read_csv("data/WHData-2018.csv")

1.3 Overview of the data

Show code
summary(wh)
   Country             Region          Happiness score  Whisker-high  
 Length:156         Length:156         Min.   :2.905   Min.   :3.074  
 Class :character   Class :character   1st Qu.:4.454   1st Qu.:4.590  
 Mode  :character   Mode  :character   Median :5.378   Median :5.478  
                                       Mean   :5.376   Mean   :5.479  
                                       3rd Qu.:6.168   3rd Qu.:6.260  
                                       Max.   :7.632   Max.   :7.695  
  Whisker-low       Dystopia     GDP per capita   Social support 
 Min.   :2.735   Min.   :0.292   Min.   :0.0000   Min.   :0.000  
 1st Qu.:4.345   1st Qu.:1.654   1st Qu.:0.6162   1st Qu.:1.077  
 Median :5.285   Median :1.909   Median :0.9495   Median :1.262  
 Mean   :5.273   Mean   :1.923   Mean   :0.8874   Mean   :1.217  
 3rd Qu.:6.051   3rd Qu.:2.270   3rd Qu.:1.1978   3rd Qu.:1.463  
 Max.   :7.569   Max.   :2.961   Max.   :1.6490   Max.   :1.644  
 Healthy life expectancy Freedom to make life choices   Generosity    
 Min.   :0.0000          Min.   :0.0000               Min.   :0.0000  
 1st Qu.:0.4223          1st Qu.:0.3583               1st Qu.:0.1095  
 Median :0.6440          Median :0.4940               Median :0.1740  
 Mean   :0.5980          Mean   :0.4570               Mean   :0.1816  
 3rd Qu.:0.7772          3rd Qu.:0.5800               3rd Qu.:0.2422  
 Max.   :1.0300          Max.   :0.7240               Max.   :0.5980  
 Perceptions of corruption
 Min.   :0.0000           
 1st Qu.:0.0510           
 Median :0.0820           
 Mean   :0.1125           
 3rd Qu.:0.1390           
 Max.   :0.4570           

2. Plotting Static Parallel Coordinates Plot

Use ggparcoord(). to plot a basic static parallel coordinates plot.

Show code
ggparcoord(data = wh, 
           columns = c(7:12))

Only two argument namely data and columns is used. Data argument is used to map the data object (i.e. wh) and columns is used to select the columns for preparing the parallel coordinates plot

Set splineFactor = TRUE to smooth lines.

Show code
ggparcoord(data = wh, 
           columns = c(7:12),
           splineFactor = TRUE) +
           scale_color_brewer(palette = "Set2")

Use ggparcoord() to makeover the existing version.

Show code
ggparcoord(data = wh, 
           columns = c(7:12), 
           groupColumn = 2,
           scale = "uniminmax",
           alphaLines = 0.2,
           boxplot = TRUE, 
           title = "Parallel Coordinates Plot of World Happines Variables")

Learning Points
  • groupColumn argument is used to group the observations (i.e. parallel lines) by using a single variable (i.e. Region) and colour the parallel coordinates lines by region name.

  • scale argument is used to scale the variables in the parallel coordinate plot by using uniminmax method. The method univariately scale each variable so the minimum of the variable is zero and the maximum is one.

  • alphaLines argument is used to reduce the intensity of the line colour to 0.2. The permissible value range is between 0 to 1.

  • boxplot argument is used to turn on the boxplot by using logical TRUE. The default is FALSE.

  • title argument is used to provide the parallel coordinates plot a title.

Use facet_wrap() in ggplot2 plot 10 small multiple parallel coordinates plots. Each plot represent one geographical region such as East Asia.

Show code
ggparcoord(data = wh, 
           columns = c(7:12), 
           groupColumn = 2,
           scale = "uniminmax",
           alphaLines = 0.2,
           boxplot = TRUE, 
           title = "Multiple Parallel Coordinates Plots of World Happines Variables by Region") +
  facet_wrap(~ Region)

Rotating x-axis text label

Use  theme() function in ggplot2 to rotate the axis by 30 degrees.

Show code
ggparcoord(data = wh, 
           columns = c(7:12), 
           groupColumn = 2,
           scale = "uniminmax",
           alphaLines = 0.2,
           boxplot = TRUE, 
           title = "Multiple Parallel Coordinates Plots of World Happines Variables by Region") +
  facet_wrap(~ Region) + 
  theme(axis.text.x = element_text(angle = 30))

Learning Points
  • To rotate x-axis text labels, use axis.text.x as argument to theme() function. Specify element_text(angle = 30) to rotate the x-axis text by an angle 30 degree.

Adjusting the rotated x-axis text label

Use  hjust argument to theme’s text element with element_text() to rotating x-axis text labels to 30 degrees makes the label overlap with the plot and avoid this by adjusting the text location with axis.text.x.

Show code
ggparcoord(data = wh, 
           columns = c(7:12), 
           groupColumn = 2,
           scale = "uniminmax",
           alphaLines = 0.2,
           boxplot = TRUE, 
           title = "Multiple Parallel Coordinates Plots of World Happines Variables by Region") +
  facet_wrap(~ Region) + 
  theme(axis.text.x = element_text(angle = 30, hjust=1))

3. Plotting Interactive Parallel Coordinates Plot

parallelPlot is an R package specially designed to plot a parallel coordinates plot by using ‘htmlwidgets’ package and d3.js.

Use parallelPlot() to plot interactive parallel coordinates plot.

Show code
wh <- wh %>%
  select("Happiness score", c(7:12))
parallelPlot(wh,
             width = 320,
             height = 250)
Continuous Color Scale:
Categorical Color Scale:
Categories Representation:
Arrange Method in Category Boxes:
2.533.544.555.566.577.58Happiness score00.20.40.60.811.21.41.61.8GDP per capita00.20.40.60.811.21.41.61.8Social support00.10.20.30.40.50.60.70.80.911.1Healthy life expectancy00.10.20.30.40.50.60.70.8Freedom to make life choices00.050.10.150.20.250.30.350.40.450.50.550.6Generosity00.050.10.150.20.250.30.350.40.450.5Perceptions of corruption

Use rotateTitle argument to avoid overlapping axis labels.

Show code
parallelPlot(wh,
             rotateTitle = TRUE)
Continuous Color Scale:
Categorical Color Scale:
Categories Representation:
Arrange Method in Category Boxes:
2.533.544.555.566.577.58Happiness score00.20.40.60.811.21.41.61.8GDP per capita00.20.40.60.811.21.41.61.8Social support00.10.20.30.40.50.60.70.80.911.1Healthy life expectancy00.10.20.30.40.50.60.70.8Freedom to make life choices00.050.10.150.20.250.30.350.40.450.50.550.6Generosity00.050.10.150.20.250.30.350.40.450.5Perceptions of corruption

Do you know?

An interactive feature of parallelPlot allows user to click on a variable of interest, for example Happiness score, the monotonous blue colour (default) will change a blues with different intensity colour scheme will be used.

Use continuousCS argument to change default colour (blue) to other colours.

Show code
parallelPlot(wh,
             continuousCS = "YlOrRd",
             rotateTitle = TRUE)
Continuous Color Scale:
Categorical Color Scale:
Categories Representation:
Arrange Method in Category Boxes:
2.533.544.555.566.577.58Happiness score00.20.40.60.811.21.41.61.8GDP per capita00.20.40.60.811.21.41.61.8Social support00.10.20.30.40.50.60.70.80.911.1Healthy life expectancy00.10.20.30.40.50.60.70.8Freedom to make life choices00.050.10.150.20.250.30.350.40.450.50.550.6Generosity00.050.10.150.20.250.30.350.40.450.5Perceptions of corruption

Use histoVisibility argument to plot histogram along the axis of each variables.

Show code
histoVisibility <- rep(TRUE, ncol(wh))
parallelPlot(wh,
             rotateTitle = TRUE,
             continuousCS = "BuPu",
             histoVisibility = histoVisibility)
Continuous Color Scale:
Categorical Color Scale:
Categories Representation:
Arrange Method in Category Boxes:
2.533.544.555.566.577.58Happiness score00.20.40.60.811.21.41.61.8GDP per capita00.20.40.60.811.21.41.61.8Social support00.10.20.30.40.50.60.70.80.911.1Healthy life expectancy00.10.20.30.40.50.60.70.8Freedom to make life choices00.050.10.150.20.250.30.350.40.450.50.550.6Generosity00.050.10.150.20.250.30.350.40.450.5Perceptions of corruption

4. Parallel Coordinates (Ordering Methods)

This is a self-exploratory segment on parallel coordinates based on different ordering methods. Given that groupColumn has to be in categorical format, Happiness Score variable is first binned into 5 groups.

Show code
binning <- wh %>%
  mutate(
    # binning happiness score into 5 groups
    happinessGroup = (quantile_Rank=ntile(wh$`Happiness score`,5)),
    
    # renaming bin happiness labels
    happinessGroup = factor(happinessGroup, labels = c("Lowest", "Low", "Average", "High", "Highest"))
  )

Set order =“anyClass” with ggparcoord() for order by maximum of k F-statistics.

Show code
ggparcoord(data = binning,
          columns = c(1:7),
          groupColumn = "happinessGroup",
          order = "anyClass") +
          scale_color_brewer(palette = "RdYlGn") +
          theme(axis.text.x = element_text(angle = 30))

Set order =“allClass” with ggparcoord() for order by F-statistics from an ANOVA.

Show code
ggparcoord(data = binning,
          columns = c(1:7), 
          groupColumn = "happinessGroup",
          order = "allClass") +
          scale_color_brewer(palette = "RdYlGn") +
          theme(axis.text.x = element_text(angle = 30))

Set order =“skewness” with ggparcoord() for order by sample skewness.

Show code
ggparcoord(data = binning,
          columns = c(1:7), 
          groupColumn = "happinessGroup",
          order = "skewness") +
          scale_color_brewer(palette = "RdYlGn") +
          theme(axis.text.x = element_text(angle = 30))

5. References

Back to top