class: center, middle, inverse, title-slide .title[ # Storyboarding ] .author[ ### Nicholas Sim ] .date[ ### 10 April 2024 ] --- class: center, middle, inverse # Introduction --- ### Topics * Creating a storyboard for data storytelling * Useful packages for data storytelling: `ggtext` , `ggcharts`, `tidytext`. * Flexdashboard (for quick dashboarding) - homework --- ### Required Libraries ```r library(tidyverse) library(socviz) library(ggthemes) library(ggrepel) library(ggtext) # from github library(ggcharts) # from github theme_set(theme_minimal()) ``` --- ### Introduction Ref: Chapter 8 KH Colors, text, annotation, etc. could be exploited for data storytelling. We will demonstrate how data visualisation could be used to address the following questions: 1. Did party flips from counties with a sizable African-American population help Donald Trump in his 2016 Presidential victory? 2. Did Marissa Mayer underperform as CEO of Yahoo? --- ### Storyboarding 1. What is the big idea? 2. What is the intended message? 3. How do you plan to demonstrate it? 4. How would you draw attention to your findings? Watch: David McCandless' TED Talk https://www.youtube.com/watch?v=5Zg-C8AAIGg --- class: center, middle, inverse # Which County Flipped? --- ### Which County Flipped? There are two major political parties in the US - the Republican Party and the Democratic Party. We are interested in the variable `flipped`, a "Yes"/"No" variable that indicates if county has flipped for Donald Trump. For this exercise, we use `socviz::county_data`.
--- ### Initial Plot Let's construct a basic scatter plot with `black/100` against `pop` for counties that did not flip during the 2016 elections. .panelset[ .panel[.panel-name[R Code] ```r p0 <- ggplot(data = subset(county_data, flipped == "No"), mapping = aes(x = pop, y = black/100)) p1 <- p0 + geom_point(alpha = 0.15, color = "gray50") + scale_x_log10(labels = scales::comma) # without the scales::comma, the tick labels will be different. # See, also, https://scales.r-lib.org/reference/number_format.html # p1 <- p0 + geom_point(alpha = 0.15, color = "gray50") + scale_x_log10(labels = scales::comma_format()) p1 ``` ] .panel[.panel-name[Plot] <img src="Storyboarding_files/figure-html/flip.1-out-1.png" style="display: block; margin: auto;" /> ] ] --- ### Overlaying Flipped Counties Let's overlay the scatter plot with counties that flipped during 2016. We will use `partywinner16` as a color aesthetic to differentiate how these counties voted in 2016. .panelset[ .panel[.panel-name[R Code] ```r party_colors <- c("#2E74C0", "#CB454A") # These are the party colors. p2 <- p1 + geom_point(data = subset(county_data, flipped == "Yes"), mapping = aes(x = pop, y = black/100, color = partywinner16)) + # The party that won in the county in 2016 scale_color_manual(values = party_colors) p2 ``` ] .panel[.panel-name[Plot] <img src="Storyboarding_files/figure-html/flip.2-out-1.png" style="display: block; margin: auto;" /> ] ] --- ### Title and Labels Let's tidy up the plot by adding a title and editing the labels (legend title, caption, axis titles, etc.): .panelset[ .panel[.panel-name[R Code] ```r p3 <- p2 + scale_y_continuous(labels=scales::percent) + # change the scales to percent. labs(color = "County flipped to... ", x = "County Population (log scale)", y = "Black Population (percent)", title = "Flipped counties, 2016", caption = "Counties in gray did not flip.") # We may use percent_format() from the scales package # p3 <- p2 + scale_y_continuous(labels=scales::percent_format()) + labs(color = "County flipped to... ", x = "County Population (log scale)", y = "Black Population (percent)", title = "Flipped counties, 2016", caption = "Counties in gray did not flip.") p3 ``` ] .panel[.panel-name[Plot] <img src="Storyboarding_files/figure-html/flip.3-out-1.png" style="display: block; margin: auto;" /> ] ] --- ### Label the Scatter Points Let's label the counties with a `>25%` black population and had flipped. .panelset[ .panel[.panel-name[R Code] ```r p4 <- p3 + geom_text_repel(data = subset(county_data, flipped == "Yes" & black > 25), mapping = aes(x = pop, y = black/100, label = state), size = 2) + theme_minimal() + theme(plot.title = element_text(size = rel(0.6)), legend.title = element_text(size = rel(0.35)), plot.caption = element_text(size = rel(0.35)), legend.position = "top") # See https://ggplot2.tidyverse.org/reference/theme.html # size = rel() controls the size relative to the baseline p4 ggsave("trump2016.png", dpi= 600) ``` ] .panel[.panel-name[Plot] <img src="Storyboarding_files/figure-html/flip.4-out-1.png" style="display: block; margin: auto;" /> ] ] --- ### Economist Theme Let's change the plot appearance by using one of themes from the `ggthemes` package. Here is a theme based on the Economist magazine. ```r p4 + theme_economist() ``` <img src="Storyboarding_files/figure-html/unnamed-chunk-3-1.png" style="display: block; margin: auto;" /> --- ### Wall Street Journal Theme Here is a theme based on the Wall Street Journal. ```r p4 + theme_wsj() ``` <img src="Storyboarding_files/figure-html/unnamed-chunk-4-1.png" style="display: block; margin: auto;" /> --- ### Reflection What have you taken away from this chart? Is this chart effective in conveying a data story about Trump's victory? <img src="Storyboarding_files/figure-html/unnamed-chunk-5-1.png" style="display: block; margin: auto;" /> --- ### Answering the Storyboard 1. What is the big idea? * **Trump won the 2016 elections.** 2. What is the intended message? * **Voter flips in counties with a sizable black population had helped him win.** 3. How do you plan to demonstrate it? * **Scatter plot** 4. How would you draw attention to your findings? * **Colors, labels and annotation** --- class: center, middle, inverse # Did Marissa Mayer Underperform? --- ### An Actual Investor's Presentation "In late 2015 Marissa Mayer’s performance as CEO of Yahoo was being criticized by many observers. One of them, Eric Jackson, an investment fund manager, sent a ninety-nine-slide presentation to Yahoo’s board outlining his best case against Mayer." (p. 220, KH). <img src="yahoo.png" width="70%" style="display: block; margin: auto;" /> --- ### Did Marissa Mayer Underperform? Let's look at the dataset `socviz::yahoo`. Marissa Mayer was appointed Yahoo's CEO in 2012.
--- ### Did Marissa Mayer Underperform? Let's use `geom_path()` to show the trajectory of Yahoo's revenue and headcount over the years. `geom_path()` will connect the data points in adjacent rows with a line. We pass `Mayer` as a color aesthetic. What do we observe? .panelset[ .panel[.panel-name[R Code] ```r #theme_set(theme_fivethirtyeight()) p <- ggplot(data = yahoo, mapping = aes(x = Employees, y = Revenue)) p + geom_path(color = "gray80") + geom_text(aes(color = Mayer, label = Year), size = 4, fontface = "bold") + theme(legend.position = "bottom") + labs(color = "Mayer is CEO", x = "Employees", y = "Revenue (Millions)", title = "Yahoo Employees vs Revenues, 2004-2014") + scale_y_continuous(labels = scales::dollar) + scale_x_continuous(labels = scales::comma) + theme(title = element_text(size = 20), axis.text = element_text(size = 16), legend.text = element_text(size=16)) ggsave("mayer.png", dpi= 600) ``` ] .panel[.panel-name[Plot] <img src="Storyboarding_files/figure-html/mayer.1-out-1.png" style="display: block; margin: auto;" /> ] ] --- ### Answering the Storyboard 1. What is the big idea? * **Marissa Mayer took over as Yahoo's CEO.** 2. What is the intended message? * **She underperformed.** 3. How do you plan to demonstrate it? * **Path plot** 4. How would you draw attention to your findings? * **Colors, labels and annotation** --- ### Exercise: Did Marissa Mayer Underperform? Another approach is to plot a line, as the data are time series. The figure below is far from perfect. How would you improve it? .panelset[ .panel[.panel-name[R Code] ```r theme_set(theme_minimal()) p <- ggplot(data = yahoo, mapping = aes(x = Year, y = Revenue/Employees)) p + geom_vline(xintercept = 2012) + geom_line(color = "gray60", size = 1.2) + annotate("text", x = 2013, y = 0.45, label = " Mayer becomes CEO", size = 6) + labs(x = "Year", y = "Revenue/Employees", title = "Yahoo Revenue to Employee Ratio, 2004-2014") ``` ] .panel[.panel-name[Plot] <img src="Storyboarding_files/figure-html/mayer.2-out-1.png" style="display: block; margin: auto;" /> ] .panel[.panel-name[Suggestion] <img src="Storyboarding_files/figure-html/unnamed-chunk-9-1.png" style="display: block; margin: auto;" /> ] ] --- class: center, middle, inverse # Useful Packages for Data Storytelling --- ### Improving Data Storytelling * Using colors in titles or text can help to simplify your plots. * We can colors to text easily using `ggtext` package (see https://www.r-bloggers.com/enhance-your-ggplot2-data-visualizations-with-ggtext/) * All we need to do is to wrap `<span style = color:"YourColor"> </span>` around the text we want to highlight, then call ` theme(plot.title = ggtext::element_markdown())` * https://www.youtube.com/watch?time_continue=158&v=TUKV7Xk1218&feature=emb_logo --- ### Using `ggtext` .panelset[ .panel[.panel-name[R Code] ```r library(ggtext) library(ggcharts) data(biomedicalrevenue, package = "ggcharts") # Filter out data for Roche and Novartis df.filter <- biomedicalrevenue %>% filter(company %in% c("Roche", "Novartis")) p <-ggplot(data = df.filter, mapping = aes(year, revenue, color = company)) p + geom_line(size = 1.2) + labs(title = "<span style='color:#007EFE'>**Roche**</span> *overtook* <span style='color:darkorange'>**Novartis**</span> in 2016") + # Just wrap the words in the title using <span style = color:"YourColor"> </span> scale_color_manual(values = c("Roche" = "#007EFE", "Novartis" = "darkorange"), guide = "none") + ggcharts::theme_hermit(ticks = "x", grid = "X") + theme(plot.title = ggtext::element_markdown()) # Need to add plot.title = element_markdown() # Check the documentation on ggcharts in https://cran.r-project.org/web/packages/ggcharts/vignettes/themes.html ``` ] .panel[.panel-name[Plot] <img src="Storyboarding_files/figure-html/nicepackages.1-out-1.png" style="display: block; margin: auto;" /> ] ] --- ### Using `ggcharts` and `tidytext` * `ggcharts` is a nice package to use simplified commands for plotting. As we have seen, `ggplot2` commands are usually lengthy (like the bar chart below). `ggcharts` can achieve the same with fewer lines of commands. See https://github.com/thomas-neitmann/ggcharts. * Trying to re-order the columns in a column chart before faceting can be painful. Often, the `reorder()` command does not work satisfactorily. To re-order the data within each group that you are faceting with, it is better to use `reorder_within()` from the `tidytext` package. See discussion by https://github.com/thomas-neitmann/ggcharts and https://juliasilge.com/blog/reorder-within/. --- ### Using `ggcharts` and `tidytext` .panelset[ .panel[.panel-name[R Code] ```r library(ggcharts) library(tidytext) ggcharts_set_theme("theme_hermit") data("biomedicalrevenue") # Filter data for the years 2012, 2015, and 2018 d1 <-biomedicalrevenue %>%filter(year %in% c(2012, 2015, 2018)) # Group the data by year. For each year, choose the top ten observations by revenue. Then ungroup by year d2 <- group_by(d1,year) %>% top_n(10, revenue) %>% ungroup() # Reorder the company variable (containing the company's name), based on revenue, for each year. Replace the original company variable with the reordered company variable. d3 <- mutate(d2, company = tidytext::reorder_within(company, by = revenue, within = year)) # Note: the reordered company variable will look like Roche_2012, Roche_2015, Roche_2018, so on. The years are appended to the company's name. We need to call tidytext::scale_x_reordered() to remove these year tags. ggplot(d3,aes(company, revenue)) + geom_col() + coord_flip() + tidytext::scale_x_reordered() + facet_wrap(~year, scales = "free_y") ``` ] .panel[.panel-name[Plot] <img src="Storyboarding_files/figure-html/nicepackages.2-out-1.png" style="display: block; margin: auto;" /> ] ] --- ### Another Look at Roche versus Novartis Here is another look at Roche overtaking Novartis. Below, we use `ggtext` to highlight Roche and Novartis in the column labels. We selected the top 10 companies and use `gsub()` to create colors for Roche and Novartis as emphasis. --- ### Another Look at Roche versus Novartis .panelset[ .panel[.panel-name[R Code] ```r library(tidytext) library(ggcharts) library(ggtext) ggcharts_set_theme("theme_hermit") data("biomedicalrevenue") # Wrap the names Roche and Novartis with the html color tag d3 <-biomedicalrevenue %>% filter(year %in% c(2012, 2015, 2018)) %>% group_by(year) %>% top_n(10, revenue) %>% ungroup() %>% mutate(company = gsub("Novartis","<strong><i style='color:darkorange'>**Novartis**</i></strong>", company), company = gsub("Roche","<strong><span style='color:#007EFE'>**Roche**</span></strong>", company)) %>% mutate(company = tidytext::reorder_within(company, revenue, year)) ggplot(d3,aes(company, revenue)) + geom_col() + coord_flip() + tidytext::scale_x_reordered() + facet_wrap(~year, scales = "free_y") + theme(axis.text.y = ggtext::element_markdown()) ``` ] .panel[.panel-name[Plot] <img src="Storyboarding_files/figure-html/nicepackages.3-out-1.png" style="display: block; margin: auto;" /> ] ] --- ### Homework: Dashboarding A dashboard is a collection of key visualisations that help to facilitate quick business decisions. There are two ways of creating dashboards in R - *Shiny* and *Flexdashboard*. Flexdashboard is an easy and quick way of turning a collection of data visualizations into a dashboard. All you need to know to start is to partition a html page for different visualizations. There are many online resources. See https://www.youtube.com/watch?v=O3CgrEwTg1k by RStudio for an introduction. Refer to `Seminar6_flexdashboard.Rmd` for a basic dashboard that integrates the govSG API.