class: center, middle, inverse, title-slide .title[ # ANL501 Data Visualisation and Storytelling ] .author[ ### Nicholas Sim ] .date[ ### 30 July 2024 ] --- class: center, middle, inverse # Administrative Matters --- ### Overview * Data visualization - Learn to use R/R Studio (optional class on Python) - Storytelling with data * Main text is Data Visualization: A Practical Introduction by Kieran Healy - https://socviz.co (free access to book content) * Other useful resources: - R for Data Science by Hadley Wickham (https://r4ds.had.co.nz/index.html) - RMarkdown: A Definitive Guide by Yihui Xie, J. J. Allaire, Garrett Grolemund (https://bookdown.org/yihui/rmarkdown/) - R Gallery Book by Kyle Brown (https://bookdown.org/content/b298e479-b1ab-49fa-b83d-a57c2b034d49/) - Storytelling with Data by Cole Nussbaumer-Knaflic (non-R) * More advanced plotting - Interactive web-based data visualization with R, plotly, and shiny. (https://plotly-r.com/) --- ### Assessments * Assessments - Pre-course Quiz 10% - Tutor-Marked Assignment (TMA) 30% - Participation 10% - End-of-course Assignment (ECA) 50% --- ### Submission * Submission of assignments must be done via Canvas only. * Please note that the datelines are **hard** (marks deduction will automatically be applied to late submissions). * Please submit your assignment early if you are unable to submit on the dateline itself (e.g. travel for work). * You must submit your assignment as a Word document. Submission of R/RMarkdown scripts only (without a Word document) will not be accepted. --- ### Attendence * Attendance is compulsory. * To receive SSG funding, you must achieve at least 75% attendance **AND** pass the course. --- ### Passing the Course * You must **pass** both the OCAS (Overall Continuous Assessment Score) and the OES (Overall Examinable Score) components (i.e. `\(\ge 40\%\)`) to pass the course. * If you do not submit the TMA, - but submit the ECA, you will receive an "F" grade automatically. - and do not submit the ECA, you will receive an "W" grade automatically. * You **cannot pass the course** by submitting the ECA but not the TMA. --- class: center, middle, inverse # Analytics and Visualisation --- ### What is Data Visualisation? Data analytics is the use of computing and statistical methods to generate information and insights from data. These techniques may involve **data visualisation** or **machine learning**. "Data visualization is the graphical representation of information and data" (www.tableau.com). It is usually the step prior to taking the data to machine learning models. Machine learning is the use of computer algorithms and statistical theories to generate information, often for the purpose of predictions. These approaches can broadly be classified into supervised learning, unsupervised learning, and reinforcement learning. --- ### What is Required for Successful Data Storytelling? * Ability to manage and transform data - Customization of data visualisation for storytelling requires significant work and understanding on what data structure one should have and how to achieve it (e.g. group level aggregation, pivoting, etc.). * Ability to identify the message to convey - This requires significant exploratory data analysis, such as exploratory data visualisations for data understanding that will not be reported. * Ability to use a flexible computing tool - R will be used in this course. --- ### Course Coverage * How to manage and clean data. * Fundamental principles of data visualisation (Grammar of Graphics) * A software (i.e. R) with the flexibility for highly customized data visualisations (for storytelling) Note: We will not cover everything about R, but enough to the point that enable you to carry out more complex work with R. --- class: center, middle, inverse # Course Structure --- ### Seminars 1-3 * **Pre-Course Readings** 1. R Operations and Programming * **Seminar 1** 1. Course Introduction 2. Suggested Practices in Data Storytelling 3. Data Management in Base R * **Seminar 2** 1. Data Management with Tidyverse 2. Principles of Data Storytelling and the Grammar of Graphics * **Seminar 3** 1. Getting Started with Data Visualisation 3. Aesthetics and Settings 3. Other Layers in the Grammar of Graphics --- ### Seminars 4-6 * **Seminar 4** 1. Introduction to RMarkdown 2. Bar Charts, Histograms and Densities 3. Time Series Plots * **Seminar 5** 1. Column Plots and Data Preprocessing 2. Boxplots and Annotation 3. Choropleth Maps * **Seminar 6** 1. Further Approaches for Spatial Visualization 2. Storyboarding 3. Statistics and Regression (Optional) --- ### Expectations Hours Per Week: 1. Seminars (mandatory) - 3 2. Pre-class readings (slides) - 1 3. Post-class readings (slides and assigned text) and practice - 3 4. Assessments - 3 5. Others (e.g. Datacamp) - 0.5 --- ### Success Factors * Practice and try things out on your own. * Play around with the source files of the slides, which are created from RMarkdown. * Be very familiar with data frames and data wrangling tools like the `dplyr` package. * Memorize a few basic lines on how to construct visualisations with the `ggplot` package, especially the use of basic features such as aesthetics, settings, labs (i.e. plot labels), scales, and theme. --- class: center, middle, inverse # R For Data Analytics and Visualisation --- ### Why R? 1. Open Source. 2. Capability - A massive library of statistical packages. 3. Integrates easily with other software, e.g. Python, SQL, etc. (e.g. `reticulate` package integrates python functions in R). 4. Relatively easy to learn, particularly with the `tidyverse` library. 5. Reproducible reports, i.e. RMarkdown (covered here), Quarto. 6. Highly flexible. 7. Few data limitations, e.g. an Excel workbook can hold ~1 million records, but R (also Python) can hold ~ `\(2^{31} - 1\)` records. 8. Popular among data analysts/scientists. --- ### Why Code? Read: https://www.r-bloggers.com/2020/12/6-reasons-to-learn-r-for-business-2021/ <img src="ANL501Introduction_files/figure-html/unnamed-chunk-2-1.png" width="70%" style="display: block; margin: auto;" /> --- ### Example - Scatter Plot Seminar 3 (Scatter Plot with Color Aesthetics) <img src="ANL501Introduction_files/figure-html/unnamed-chunk-3-1.png" width="70%" style="display: block; margin: auto;" /> --- ### Example - Animation Seminar 3 (Animations)  See, also, https://www.youtube.com/watch?v=SnCi0s0e4Io --- ### Example - Creating Facets Seminar 4 (Line Plots with Facets) <img src="ANL501Introduction_files/figure-html/unnamed-chunk-4-1.png" width="70%" style="display: block; margin: auto;" /> --- ### Example - Boxplot with Flipped Coordinates Seminar 5 (Flipped Boxplots) <img src="ANL501Introduction_files/figure-html/box.5-1.png" width="70%" style="display: block; margin: auto;" /> --- ### Example - Visualising Spatial Data Seminar 5 (Choropleth Map) <img src="ANL501Introduction_files/figure-html/unnamed-chunk-5-1.png" width="70%" style="display: block; margin: auto;" /> --- ### Example - Using APIs Seminar 6 (Google Maps) <img src="dengue.png" width="526" style="display: block; margin: auto;" /> --- ### Example - Creating a Data Story Seminar 6 (Storytelling) <img src="ANL501Introduction_files/figure-html/unnamed-chunk-7-1.png" width="70%" style="display: block; margin: auto;" />