class: center, middle, inverse, title-slide .title[ # Other Layers in the Grammar of Graphics ] .author[ ### Nicholas Sim ] .date[ ### 26 March 2024 ] --- class: center, middle, inverse # Introduction --- ### Topics * Scales, e.g. `scales_x_log10()` * Labels and guides (for titles, axis titles, etc.) * Add statistics/trendlines * Facets (for multiple plots) * Themes (overall appearance) * Interactive plots using the `plotly` package --- ### Required Libraries ```r library(tidyverse) library(plotly) library(ggthemes) ``` --- ### Introduction Ref: Chapter 3 KH We will explore the other layers in the Grammar of Graphics, i.e. coordinates, statistics, facets and themes. --- ### Mandatory Layers <img src="Seminar3_fig1.png" width="30%" style="display: block; margin: auto;" /> --- ### Mandatory Layers <img src="Seminar3_fig2.png" width="30%" style="display: block; margin: auto;" /> --- ### Coordinates and Scales <img src="Seminar3_fig3.png" width="30%" style="display: block; margin: auto;" /> --- ### Adjusting the Scales We may improve the appearance of the plots by adjusting the scales of the variables. If a variable is very large, we may consider transforming it using a log function. For instance, we may log-transform the `x` variable using `scale_x_log10()`, and likewise the `y` variable using `scale_y_log10()`. --- ### Declaring the Units We may also declare the units to be shown on the axes by using `scale_x_log10()` or `scale_y_log10()`. For example, we may declare `x` in dollars and show this on the x-axis tick marks using the scales package, e.g. `scale_x_log10(labels = scales::dollar)` If `x` is population, which is a very large number, and we wish to rescale it to per 100,000, we may pass `labels = scales::unit_format(scale = 1/100000)` into `scale_x_log10()`. To suppress the tick marks altogether, we may pass in `labels = scales::unit_format(scale = 1/100000, unit = "")` instead. --- ### Labels and Guides <img src="Seminar3_fig4.png" width="30%" style="display: block; margin: auto;" /> --- ### Declaring the Labels To declare the title, axis titles, legend, etc., we may use the `labs()` function. The basic syntax is ```r labs(x = "x AXIS TITLE", \\ y = "Y AXIS TITLE", \\ title = "FIGURE TITLE", \\ subtitle = "FIGURE SUBTITLE") ``` --- ### Declaring the Labels Plot titles, axis titles, legend titles, etc. can be specified through the `labs()` function. For instance, if we declare a variable as a `color`, `fill`, `shape` or `size` aesthetic, a legend will be shown with the name of the variable as the default title. To change the legend's title, we pass in the new title into the argument named by the aesthetic itself (i.e. color, shape, size, etc.) ```r labs(x = "X AXIS TITLE", y = "Y AXIS TITLE", title = "FIGURE TITLE", subtitle = "FIGURE SUBTITLE", color = "COLOR TITLE") ``` To suppress a title, we may set the argument to `NULL` (e.g. `x = NULL`). --- ### Guides Guides are essentially legends. One approach of changing the appearance of legends is via the `guides()` function. A common use of `guides()` is to suppress the legend. To do so, we add (i.e. `+`) `guides(color = "none")` to our ``ggplot2` command. --- class: center, middle, inverse # Statistics --- ### Statistics We may add a trend line by adding `geom_smooth()` to our plot. The default method in `geom_smooth()` is the LOESS approach, which overlays a nonlinear trend line on the plot. To change the nonlinear trend line into a linear regression line, we set the method to `lm`, i.e. `geom_smooth(method = "lm")`. --- class: center, middle, inverse # Facet --- ### Facet With `facet_wrap()`, we can create multiple plots for different groups specified by a categorical variable. For instance, let's consider a variable named country, containing the names of countries in our dataset. If we aim to generate a plot for each country and showcase these country-specific plots in a unified visualisation, we can achieve this with `facet_wrap()`. To do so, we may use the `facet_wrap( ~ )` function and pass `country` into the right of lambda `~`, i.e. `facet_wrap(~country)`. --- ### Faceting with Multiple Variables We can create a facet plot with multiple faceting variables by arranging the facet subplots in a matrix format using the `facet_grid()` function. For instance, to arrange our plots by years in rows and countries in columns, we can specify Year and Country like this: `facet_grid(Year ~ Country)`. By default, the same scales are applied to each plot in the facet. To allow each plot to have its own scales, we can "free up" the scales using the `scales = "free"` option within the `facet_wrap()` or `facet_grid()` function. For example, `facet_wrap(~Country, scales = "free")`. --- class: center, middle, inverse # Theme --- ### Theme The `theme()` function allows us to control the overall appearance of our plot. We can easily apply various pre-set themes such as `theme_bw()`, `theme_minimal()`, `theme_gray()`, etc., by appending them to the command line. Additionally, the `ggthemes` package offers appealing themes that replicate styles from publications like the Wall Street Journal, the Economist, and even Excel. However, for more customized visualisations, we need to adjust specific parameters using the `theme()` function. For instance, a common adjustment involves relocating the legend from its default position on the right to another location, such as the top. This can be done by including `theme(legend.position = "top")` in the command line. Moreover, we can regulate the font size of the title, axis titles, etc., by incorporating `theme(title = element_text(size = 14)`, `axis.title = element_text(size = 12))` into the command line. --- ### Saving the Plot After plotting, we may save the figure using `ggsave`. By default `ggsave` saves the last plot that you displayed, using the size of the current graphics device. It also guesses the type of graphics device from the extension. --- class: center, middle, inverse # Example: Examining the Association between GDP and Income Inequality --- ### Background Let's put the concepts together by building a visualisation from scratch. In this example, let's attempt to address wealthier countries have lower income inequality. In this exercise, we explore the association between GDP per capita and the Gini using `WDI_Data.csv`. As before, save the data into a data frame named `df`. Using whatever you have learned about aesthetics, scales, plot the Gini against GDP per capita. Use colors to explore if income group matters for the Gini-GDP relationship. Adjust the scales if necessary. Overlay an OLS regression trendline. Explore this relationship for countries in each income group using facets. In the dataset, income inequality is captured by `Gini` and growth is captured by `GDP.PerCap`. Do remove all the `NA`s by passing your data frame `df` into `na.omit()` and saving the output. ```r df <- read_csv(file = "WDI_Data.csv") df <- na.omit(df) ``` --- ### Basic Plot Here is a basic plot showing a negative relationship between the Gini coefficient and GDP per capita. It is not necessary to adjust the x-axis scale. .pull-left[ ```r ggplot(data = df, mapping = aes(x = GDP.PerCap, y =Gini)) + geom_point() ``` ] .pull-right[  ] --- ### Overlaying a Trendline and Color by Income Group We overlay a regression line. Figure looks somewhat puzzling as there are high income countries with not quite high incomes. Could it be that very old data are included here? .pull-left[ ```r ggplot(data = df, mapping = aes(x = GDP.PerCap, y =Gini)) + geom_point(aes(color=Income.Group)) + geom_smooth(method = "lm", se = FALSE) ``` ] .pull-right[  ] --- ### Subsetting the Data Let's include only observations after 2010. Even with the focus on later data, the plot is still rather difficult to read. .pull-left[ ```r ggplot(data = subset(df, Year>2010), mapping = aes(x = GDP.PerCap, y =Gini)) + geom_point(aes(color=Income.Group)) + geom_smooth(method = "lm", se = FALSE) ``` ] .pull-right[  ] --- ### Faceting by Income Group Let's facet the plot by income group. Note that the legend is becomes irrelevant and should be suppressed (use `guides(color= "none")`). .pull-left[ ```r # Let's try to facet. ggplot(data = subset(df, Year > 2010), mapping = aes(x = GDP.PerCap, y = Gini)) + geom_point(aes(color = Income.Group)) + geom_smooth(method = "lm", se = FALSE) + facet_wrap(~Income.Group, scales = "free") ``` ] .pull-right[  ] --- ### Labels and Titles Let's clean up the plot and axes titles. .pull-left[ ```r ggplot(data = subset(df, Year > 2010), mapping = aes(x = GDP.PerCap, y = Gini)) + geom_point(aes(color = Income.Group)) + geom_smooth(method = "lm", se = FALSE) + facet_wrap(~Income.Group, scales = "free") + labs(x = "GDP Per Capita", y= "Gini Coefficient", title = "GDP Per Capita and Income Inequality") + guides(color= "none") ``` ] .pull-right[  ] --- ### Using Different Theme Let's try out a different theme using the `ggthemes` package. .pull-left[ ```r # Let's use a theme from the Wall Street Journal contained in the ggthemes package ggplot(data = subset(df, Year>2010), mapping = aes(x = GDP.PerCap, y =Gini)) + geom_point(aes(color=Income.Group)) + geom_smooth(method = "lm", se = FALSE) + facet_wrap(~Income.Group, scales = "free")+ labs(x = "GDP Per Capita", y= "Gini Coefficient", title = "GDP Per Capita and Income Inequality") + guides(color="none") + theme_wsj() ``` ] .pull-right[  ] --- class: center, middle, inverse # Extensions --- ### Making Interactive Plots R offers a variety of interesting packages for data visualization, including those tailored to transform `ggplots` into interactive visuals. Achieving this is straightforward: simply pass the figure object (i.e., the saved plot) into the `plotly()` function provided by the `plotly` package. .pull-left[ ```r # As an example library(plotly) p <- ggplot(data = mtcars, mapping= aes(x = disp, y= mpg)) + geom_point() + labs(title='Miles Per Gallon Vs Displacement') plotly::ggplotly(p) ``` ] .pull-right[
] --- ### Example Let's make our plot interactive by using the `plotly` package. First, save the final figure, and then, pass it into the `ggplotly()` function. .panelset[ .panel[.panel-name[R Code] ```r # Save the final figure and pass it through plotly fig <- ggplot(data = subset(df, Year>2015), mapping = aes(x = GDP.PerCap, y =Gini, label = Country)) + geom_point(aes(color = Region)) + geom_smooth(method = "lm", se = FALSE) + labs(x = "GDP Per Capita", y = "Gini Coefficient", title = "GDP Per Capita and Income Inequality") plotly::ggplotly(fig, width = 1000, height = 400) ``` ] .panel[.panel-name[Plot]
] ] --- ### Hands-On Activity Activity 1 to 3 in `Seminar3_demo_part2.r`