class: center, middle, inverse, title-slide .title[ # R Operations ] .author[ ### Nicholas Sim ] .date[ ### 31 December 2024 ] --- # Topics 1. The assignment operator `<-` 2. Combine function `c()` 3. Logicals (i.e. `TRUE`, `FALSE`) e.g. + `5 >= 5` returns `TRUE` + `5 != 2` returns `TRUE` 4. Names function `names()` 5. Matrix function `matrix()` 6. Element referencing e.g. + `mat[1,2]` references Row 1 and Column 2 of matrix `mat` + `mat[1,]` references the entire Row 1 of matrix `mat` + `mat[,2]` references the entire Column 2 of matrix `mat` 7. Colon ":" for constructing sequences and matrix slicing --- class: center, middle, inverse # Introduction --- ### Installation This lecture explores some basic operations and working commands in R. Before we proceed, let's first install R and RStudio. RStudio is an IDE (Integrated Development Environment), a software you use to implement R commands. RStudio will not work on its own without R. To install R and RStudio, please visit https://cran.r-project.org/bin/windows/base/ and https://posit.co/products/open-source/rstudio/ You may also use the cloud version of RStudio. To sign up for a free account, visit https://posit.cloud/plans --- ### Commenting Rather than writing commands in the console, we may store these commands using a script, and then execute them through the script. For our own reference, it is a good practice to write notes, or "comments", inside our scripts to explain the tasks that the commands are supposed to achieve. Comments are lines that R recognizes as plain text. Unlike a command that executes a procedure, a comment (even if it contains codes) will not be implemented. A comment in R is prefaced by the "`#`" symbol. Any expression that comes after the "`#`" symbol within the same line will be recognized as a comment. --- ### Commenting To illustrate, the following is a command that returns a value of 4 ``` r 2+2 ``` ``` ## [1] 4 ``` whereas, by prefacing `2+2` with the "`#`" symbol, the command becomes a comment, ``` r # 2+2 ``` which is a text. You are strongly encouraged to write comments in your script to increase the readability of your codes. --- class: center, middle, inverse # Basic R Functions --- ### Calculator R can be used as a calculator such as to add, subtract, divide, multiply, etc. For example: ``` r 5*2 # Multiplying x with 2 ``` ``` ## [1] 10 ``` ``` r 5/2 # Dividing x with 2 ``` ``` ## [1] 2.5 ``` ``` r 5^2 # Squaring x ``` ``` ## [1] 25 ``` ``` r 5 %% 2 # %% is called the modulo. It gives you the remainder of x/2 (which is 1 as the remainder is 1). ``` ``` ## [1] 1 ``` --- ### Assignment Operator To store an output in R, we need to assign it to an object. This is done by using the assignment operator, `<-`, which is a combination of the symbols "`<`" and "`-`". The assignment operator `<-` assigns values from the right side of the arrow to the object on the left side. For instance, to assign a value of 2 to `x`, we employ the assignment operator `<-`: ``` r x <- 2 print(x) # Print the results out using the print() function. ``` ``` ## [1] 2 ``` The "`=`" symbol (mostly) works in the same way as a "right-to-left" assignment operator. For instance: ``` r y = 2 print(y) ``` ``` ## [1] 2 ``` In practice, the "`=`" symbol in R has other purposes than being an assignment operator (such as associating a function's argument with an object). Therefore, we often use `<-` for assigning values to objects, although in most context, the "`=`" symbol will be understood as an assignment operator in R as well. --- ### Creating a Vector A vector in R is an object that contains several values of the *same* type (see Section 4 for a list of primitive data types in R). We may create a vector by combining these values using the **combine** function, i.e. `c()`. To illustrate, let's create a vector `v1` containing the numbers 1 to 10. To do so, we cannot merely write `v1 <- 1 2 3 4 5 6 7 8 9 10` as R will be confused about what to do with these numbers. To assign multiple values to a single object, we employ the `c()` (combine) function, which combines these values into a vector. **Note**: The combine function is used for combining values, not objects, like a vector, matrix, data frame, etc. To combine several objects into a single entity, we use the **list** function, `list()`, which will be discussed in the *R Programming* seminar. --- ### The Combine Function Let's create some vectors using the `c()` function .pull-left[ We now have two vectors, `v1` and `v2`. Notice that the values contained in a vector must belong to the same type or class. The vector `v1` contains *numeric* values and the `v2` contains *string* characters. A string is just a collection of text. To declare a value as a string, we must wrap it with " " (i.e. a pair of double-quotation marks) or ' ' (i.e. a pair of single quotation marks). ] .pull-right[ ``` r v1 <- c(1,2,3,4,5,6,7,8,9,10) v2 <- c("A","A","A","A","B","B","B","B","C","C") print(v1) ``` ``` ## [1] 1 2 3 4 5 6 7 8 9 10 ``` ``` r print(v2) ``` ``` ## [1] "A" "A" "A" "A" "B" "B" "B" "B" "C" "C" ``` ] --- ### Remark: R Combine Function Versus Python List For students familiar with Python, a parallel to the R's combine function in Python is the Python *list*. However, the Python list is different from the combine function in that the Python list is a collection of strings (i.e. text characters). By contrast, the R's combine function can store a collection of numerical values, booleans, characters, as long as these values combined belong to the same type. Otherwise, if we combine different types of data using the combine function, the data will be converted into a string. --- ### Applying Mathematical Operations on Vectors In R, a mathematical operation applied to a numeric or integer vector is carried out *element-by-element*. For example: .pull-left[ ``` r v3 <- c(1,2,3,4) v4 <- c(1,1,2,2) v3 + 2 # add 2 to each element in the vector v3 ``` ``` ## [1] 3 4 5 6 ``` ``` r v3+v4 # add 1st element in v3 to the 1st element in v4, 2nd element in v3 to the 2nd element in v4, so on ``` ``` ## [1] 2 3 5 6 ``` ] .pull-right[ ``` r v3/v4 # element-by-element division ``` ``` ## [1] 1.0 2.0 1.5 2.0 ``` ``` r v4^2 # each element in v4 is squared ``` ``` ## [1] 1 1 4 4 ``` ] **Remark**: If you add a shorter vector to a longer vector, the elements in the longer vector will be "**recycled**" (See Section 6.7). For instance, try adding `v3a <- c(1,2,3,4,5)` and `v4 <- c(1,1,2,2)`. --- class: center, middle, inverse # Printing and Pasting --- ### The `print()` Function The `print()` function prints out (displays) a text or a numeric output. For example, let's display the text `Hello World`. To do so, we need to pass in `"Hello World"` (with double quotation marks) or `'Hello World'` (with single quotation marks). ``` r print("Hello World") ``` ``` ## [1] "Hello World" ``` ``` r print('Hello World') ``` ``` ## [1] "Hello World" ``` --- ### The `paste()` Function The `paste()` function is used to paste (i.e. concatenate) text strings and numerical values. For example, suppose we have a numerical variable called `age`, which is equal to 50. Let's construct the statement `"Karl is 50 years old."` using the `paste()` function and print it out using the `print()` function: ``` r age <- 50 print(paste("Karl is", age, "years old.")) ``` ``` ## [1] "Karl is 50 years old." ``` In the above, the `paste()` function puts together the text string `Karl is`, the numerical variable `age`, and the text string `years old.`. Notice that `age`, an integer, is converted into a string when passed into the `paste()` function (see Section 4 below). Notice also that the `paste()` function, by default, adds a space to separate the three statements. In other words, a single space is the default separator of the text inputs into the `paste()` function. --- ### Separator In general, we may specify the separator for the `paste()` function by using the `sep = ` input (more on function inputs below). For example, let's remove the default space separator in `paste()`: ``` r print(paste("Karl is", age, "years old.", sep = "")) ``` ``` ## [1] "Karl is50years old." ``` Here, there is no space to separate `is` and `age`, and `age` and `years`. As an experiment, let's include three spaces as a space separator: ``` r print(paste("Karl is", 50, "years old.", sep = " ")) ``` ``` ## [1] "Karl is 50 years old." ``` --- ### The `paste0()` Function The `paste0()` function does the same as the `print()` function, except it does not add a single space separator by default. ``` r print(paste0("Karl is", 50, "years old.")) ``` ``` ## [1] "Karl is50years old." ``` The `paste0()` function is useful for pasting together date and time strings, which we will encounter later in this course. --- class: center, middle, inverse # Comparison Operators --- ### Comparison Operators A comparison operator compares two objects and returns a logical, i.e. `TRUE` or `FALSE` (**upper case**), depending on whether one or more conditions are met. The `TRUE` value in a logical or booleans is often used to trigger a particular task. For example, if a comparison statement returns `TRUE`, then the logical `TRUE` can be used to select specific data elements that satisfy the comparison statement. --- ### Comparison Operators The comparison operators are * `\(>\)` Greater than * `\(<\)` Less than * `\(>=\)` Greater than or equal to * `\(<=\)` Less than or equal to * `\(==\)` Equal (note the double equality sign) * `\(!=\)` Not equal to --- ### Examples ``` r 5 > 5 ``` ``` ## [1] FALSE ``` ``` r 5 >= 5 ``` ``` ## [1] TRUE ``` ``` r 2 < 2 ``` ``` ## [1] FALSE ``` ``` r 2 <= 2 ``` ``` ## [1] TRUE ``` ``` r 5 != 2 ``` ``` ## [1] TRUE ``` ``` r 5 != 5 ``` ``` ## [1] FALSE ``` ``` r 5 == 5 ``` ``` ## [1] TRUE ``` --- ### Comparing Vectors .pull-left[ When a comparison operator is applied to compare two vectors, the comparison is made element-by-element: ] .pull-right[ ``` r v1 <- c(1,2,3) v2 <- c(10,20,30) v1 < v2 ``` ``` ## [1] TRUE TRUE TRUE ``` ] .pull-left[ When we compare an entire vector with a single number, the comparison is also made element-by-element: ] .pull-right[ ``` r v <- c(1, 2, 3, 4, 5) v < 2 # checks if each element is less than 2 ``` ``` ## [1] TRUE FALSE FALSE FALSE FALSE ``` ``` r v == 3 # checks if each element is equal to 3 ``` ``` ## [1] FALSE FALSE TRUE FALSE FALSE ``` ] **Remark**: To test the relationship "v is equal to 3", we use the double-equal "`==`" symbol, as the single equal "`=`" is an assignment operator. --- class: center, middle, inverse # Data Types --- ### Primitive Data Types R classifies variables into the following types: double, integer, logical, string/character, factor. To see the type of a variable, we pass this variable into the `class()` function. **Naming of Objects**: We may use the period "." in a variable's name. To assign two or more words as a name for a variable, we may use "." to join these words up, e.g. v.numeric. We may also use other conventions like an underscore "_". --- ### Numeric Variables Numeric variables are variables that are amenable to mathematical operations. These variables come in two forms: a double or an integer. A double, also known as a floating point value, is a numeric variable with decimals. By contrast, an integer is a numeric variable without decimals. To construct a numeric variable, we simply assign a number: ``` r v.numeric <- 2 class(v.numeric) ``` ``` ## [1] "numeric" ``` To declare a numeric variable as an integer, we append the number with "L" ``` r v.integer <- 2L class(v.integer) ``` ``` ## [1] "integer" ``` --- ### Strings Strings are simply plain text. In R, they are also known as **characters**. To create a string, we wrap the content using a pair of single or double quotation marks: ``` r v.character <- "Hello World!" class(v.character) ``` ``` ## [1] "character" ``` --- ### Logicals A logical/boolean is a `TRUE` or `FALSE` value. ``` r v.boolean <- TRUE class(v.boolean) ``` ``` ## [1] "logical" ``` --- ### Factors A factor is a variable that is recognized by their unique values/levels, known as factors (slight abuse of language here). While a factor variable may appear as numbers or text, it is neither a numeric or string variable. Rather, what matters for a factor variable are the unique values that the factor has, as each unique value forms a category. For instance, ``` r v.factor <- factor(c(1, 1, 1, 1, 2, 2, 2, 2)) class(v.factor) ``` ``` ## [1] "factor" ``` `v.factor` is a factor and R will recognize 1 and 2 as unique categories. Although `v.factor` looks like a numeric variable, R will not recognize it as such. Thus, we cannot apply a mathematical operation to a factor even if it contains numbers: ``` r v.factor + 1 ``` ``` ## Warning in Ops.factor(v.factor, 1): '+' not meaningful for factors ``` ``` ## [1] NA NA NA NA NA NA NA NA ``` --- ### NAs A missing value in a vector is represented by `NA`. For instance, ``` r v.na <- c(1,2,NA,4,5) summary(v.na) ``` ``` ## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's ## 1.00 1.75 3.00 3.00 4.25 5.00 1 ``` The 3rd element in `v.na` is an `NA`, i.e. it is missing. For R computations, the `NA`s are often skipped. For instance, in the above, the mean of the vector `v.na` is 3, which is `(1 + 2 + 4 + 5)/4`. **Note**: When importing a dataset into R, an empty cell in the original dataset will be converted into an `NA`. --- class: center, middle, inverse # Useful Functions for Data Checks --- ### The `is.na()` and `any()` Functions A useful function to test for the presence of missing values is the `is.na()` function. To illustrate, ``` r is.na(v.na) ``` ``` ## [1] FALSE FALSE TRUE FALSE FALSE ``` indicates that there are missing values in the third position in the vector, `v.na`. To check if there is **any** missing value in a vector, we may combine the `any()` and `is.na()` functions: ``` r any(is.na(v.na)) ``` ``` ## [1] TRUE ``` The `any()` function checks if any of the positions in the vector is TRUE. --- ### The `which()` Function The `which()` function shows which element in a vector satisfies a given condition. For instance, we know that the third element is missing from ``` r which(is.na(v.na)) ``` ``` ## [1] 3 ``` --- ### Reclassifying the Data Type We may reassigning the type of data by using the family of "`as.`" functions: .pull-left[ ``` r v.1 <- c(2,2,3,3,4) # This is numeric v.2 <- as.character(v.1) class(v.2) ``` ``` ## [1] "character" ``` ``` r v.3 <- as.factor(v.1) class(v.3) ``` ``` ## [1] "factor" ``` ] .pull-right[ ``` r v.4 <- as.integer(v.1) class(v.4) ``` ``` ## [1] "integer" ``` ``` r v.5 <- as.numeric(v.2) class(v.5) ``` ``` ## [1] "numeric" ``` ] **Note 1**: We may convert a number to a character, but we cannot convert a non-numeric character (e.g. `"a"`) to a numeric variable. --- ### Exercise 1. Construct a vector `s1` that contains the numbers 3 and 4. Test if `s1` is equal to 4. What do you observe? 2. Convert the vector `s1` to a character vector. Call it `s2`. Test if `s2` is equal to 4. How about equal to `"4"`? --- class: center, middle, inverse # Vectors --- ### Basics In R, we deal mainly with rectangular data. Such data are organized as vectors or matrices. A vector is a 1-dimensional array that holds string, numeric, or logical data elements. Recall that we may create a vector using the combine function `c()`. For example: ``` r n.vec <- c(1, 2, 2, 4) ``` To see how a vector is reported in R, let's type the name of the vector in the console ``` r n.vec ``` ``` ## [1] 1 2 2 4 ``` `n.vec` is a vector containing the values 1, 2, 2, 4, respectively, as is reported in the output as a single row, i.e. `[1]`. To check the type of vector `n.vec`, we may pass it into the `class()` function, which tells us that `n.vec` is numeric: ``` r class(n.vec) ``` ``` ## [1] "numeric" ``` --- ### Constructing a Vector .pull-left[ To construct a vector containing characters/strings, we wrap each element in the vector with a set of single or double quotation marks:] .pull-right[ ``` r c.vec <- c('A','B','C') c.vec ``` ``` ## [1] "A" "B" "C" ``` ``` r class(c.vec) ``` ``` ## [1] "character" ``` ] .pull-left[ We may also construct a vector containing logicals/booleans: ] .pull-right[ ``` r l.vec <- c(TRUE, FALSE) l.vec ``` ``` ## [1] TRUE FALSE ``` ``` r class(l.vec) ``` ``` ## [1] "logical" ``` ] --- ### Vectors Cannot Have Elements with Different Data Types We cannot construct a vector containing elements from different primitive types (i.e. cannot mix numerical variables, strings, etc.). If we try to do so, R will try to force the elements into a single type. .pull-left[ For example, if we mix numerical values with characters, R will force the numerical values into characters:] .pull-right[ ``` r m.vec <- c(1, 2, 'C') m.vec ``` ``` ## [1] "1" "2" "C" ``` ``` r class(m.vec) ``` ``` ## [1] "character" ``` ] --- ### Vectors Cannot Have Elements with Different Data Types .pull-left[ If we try to combine a logical value and a number, the logical will be forced into a 1 or 0 number. ] .pull-right[ ``` r v <- c(FALSE, 2) v ``` ``` ## [1] 0 2 ``` ``` r class(v) # TRUE and FALSE will be forced into 1 and 0. ``` ``` ## [1] "numeric" ``` ] --- ### The Sequence Function and Letters The sequence function, `seq()`, generates a sequence of numbers from a specified minimum value to a specified maximum value. Let's generate a sequence between 1 to 20 with increments of 2: ``` r seq.vec <- seq(from = 1, to = 20, by = 2) seq.vec ``` ``` ## [1] 1 3 5 7 9 11 13 15 17 19 ``` **Note 1**: From that `from`, `to`, `by` are names of the argument. Once you are familiar with using a function, you may omit them and input the numbers directly. For example, you may just write `seq.vec <- seq(1, 20, 2)` **Note 2**: Notice how the "`=`" symbol is used in a function. On the left-hand-side of the "`=`" symbol, we have the name of the input. On the right-hand-side, we have the value. For instance, the `seq()` function has three inputs, which are named, `from`, `to` and `by`. --- ### The Slicing Notation To generate a sequence of numbers with increments of 1, we can use the slicing notation, represented by the colon "`:`" symbol. The left side of the symbol specifies the start number and the right side specifies the end number: ``` r seq.vec2 <- 1:10 seq.vec2 ``` ``` ## [1] 1 2 3 4 5 6 7 8 9 10 ``` ``` r seq.vec3 <- 3:14 seq.vec3 ``` ``` ## [1] 3 4 5 6 7 8 9 10 11 12 13 14 ``` Unlike the `seq()` function, which enables you to specify the increment step, the (:) operator increases the sequence by 1 unit. --- ### The Alphabet Sequence We may generate a sequence of alphabets by typing `letters` ``` r seq.lett <- letters seq.lett ``` ``` ## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" ## [20] "t" "u" "v" "w" "x" "y" "z" ``` `letters` is not a function, but a vector that stores the alphabets from `a` to `z`. --- ### Naming the Vector Elements The `names()` function assigns names to each element in a vector. The following vector containing temperature readings for Monday to Sunday: ``` r temps <- c(29, 26, 33, 34, 26, 35, 31) temps ``` ``` ## [1] 29 26 33 34 26 35 31 ``` There are 7 temperature readings corresponding to each day of the week. To label the days, we use `names()` to specify the day corresponding to each element: ``` r names(temps) <- c('Mon','Tue','Wed','Thu','Fri','Sat','Sun') temps ``` ``` ## Mon Tue Wed Thu Fri Sat Sun ## 29 26 33 34 26 35 31 ``` --- ### Naming the Vector Elements Alternatively, we may first construct a variable containing the names of the days, then assign these names to the variable using the `names()` function. ``` r days <- c('Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun') names(temps) <- days ``` --- ### Vector Indexing To work with specific elements in a vector, we need to first understand how the elements are indexed. In R, the element of a vector is identified by its numerical position. For instance, the first element of a vector is indexed by 1, the second element by 2, and so on. To identify a certain element from a vector, we pass in the element index into the square *index brackets* appended to the name of that vector. The index brackets take in the element index as an input, and returns the element chosen as the output. For example, consider the `temps` vector constructed earlier. To reference the third element in `temps`, we pass the number 3 into the index brackets appended to `temps`: ``` r temps[3] ``` ``` ## Wed ## 33 ``` --- ### Vector Referencing by Names If the elements are named, we may pass the names of the element, instead of its index position, into the index brackets. For example, ``` r temps['Mon'] ``` ``` ## Mon ## 29 ``` ``` r temps['Wed'] ``` ``` ## Wed ## 33 ``` --- ### Vector Slicing We may filter out (i.e. select) a range of elements by using the colon "`:`" symbol. This is known as **slicing**. Recall that a sequence of numbers can be generated using the colon "`:`" symbol. For example, the line `2:4` generates the sequence `2,3,4`. Therefore, to slice out the second to fourth element in `temps`, we pass in `2:4` into the index brackets: ``` r temps[2:4] ``` ``` ## Tue Wed Thu ## 26 33 34 ``` --- ### Vector Filtering We may filter out multiple elements like, say the 1st, 3rd, 5th elements (or 'Mon', 'Wed', 'Fri'). Since there are multiple indices, we first combine them using the `c()` function before passing them into the index brackets: ``` r temps[c(1, 3, 5)] ``` ``` ## Mon Wed Fri ## 29 33 26 ``` ``` r temps[c('Mon', 'Wed', 'Fri')] ``` ``` ## Mon Wed Fri ## 29 33 26 ``` --- ### Filtering plus Ordering .pull-left[ Using the combine function, we may also call the elements out of order. For instance] .pull-right[ ``` r temps[c(3, 5, 1)] ``` ``` ## Wed Fri Mon ## 33 26 29 ``` ``` r temps[c('Wed', 'Fri', 'Mon')] ``` ``` ## Wed Fri Mon ## 33 26 29 ``` ] .pull-left[ We may also filter out all elements except the ones specified by the negation or minus "`-`" symbol. For example, ] .pull-right[ ``` r temps[-c(2)] # Select all except Tuesday ``` ``` ## Mon Wed Thu Fri Sat Sun ## 29 33 34 26 35 31 ``` ``` r temps[-c(1,4)] # Select all except Monday and Thursday ``` ``` ## Tue Wed Fri Sat Sun ## 26 33 26 35 31 ``` ] --- ### Conditional Filtering We may use comparison operators to filter out (i.e. select) elements satisfying certain conditions. These operators first create a set of logicals (i.e. T/F), known as **boolean mask**. Elements satisfying the condition (i.e. `TRUE`) will be selected. Let's select the days from the `temps` vector where temperatures are greater than 30C. Before we do so, we use a comparison operator to see which are the days when temperatures have exceeded 30C: ``` r temps > 30 ``` ``` ## Mon Tue Wed Thu Fri Sat Sun ## FALSE FALSE TRUE TRUE FALSE TRUE TRUE ``` If we pass `temps > 30` into the index brackets, this will filter out the elements satisfying this condition: ``` r temps[temps > 30] ``` ``` ## Wed Thu Sat Sun ## 33 34 35 31 ``` --- ### Conditional Filtering Alternatively, we may construct a logical/boolean filter first, and then pass the filter into the index brackets: ``` r filter <- temps > 30 temps[filter] ``` ``` ## Wed Thu Sat Sun ## 33 34 35 31 ``` --- ### Exercise John, Jim, Jane are 161cm, 174cm, 182cm tall, respectively. Create a vector containing their heights. Name each element in the vector by the names of the individuals. How would you subset the vector to contain only the heights of John and Jim? --- class: center, middle, inverse # Matrices --- ### Creating a Matrix A matrix is a 2-dimensional data structure containing rows and columns. To create a matrix, one approach is to use the `matrix()` function. For example, let's construct a sequence from 1 to 10 and create a matrix based on it. ``` r v <- 1:10 matrix(v) # converts a vector into a matrix ``` ``` ## [,1] ## [1,] 1 ## [2,] 2 ## [3,] 3 ## [4,] 4 ## [5,] 5 ## [6,] 6 ## [7,] 7 ## [8,] 8 ## [9,] 9 ## [10,] 10 ``` Instead of a vector, we now have a two-dimensional 10-by-1 matrix that contains 10 rows and 1 column. --- ### Creating a Matrix To convert a 10-by-1 vector `v` into a matrix containing 2 rows (and therefore, 5 columns), we specify the number of desired rows using the parameter/argument `nrow` in `matrix()`: ``` r matrix(v, nrow = 2) ``` ``` ## [,1] [,2] [,3] [,4] [,5] ## [1,] 1 3 5 7 9 ## [2,] 2 4 6 8 10 ``` --- ### Creating a Matrix By default, R will fill the first column of the matrix, then the second column, then the third, and so on. For instance, let's fill a matrix containing 4 rows with elements 1 to 12: ``` r matrix(1:12, nrow = 4) ``` ``` ## [,1] [,2] [,3] ## [1,] 1 5 9 ## [2,] 2 6 10 ## [3,] 3 7 11 ## [4,] 4 8 12 ``` Using the elements from the sequence 1:12, R will fill the first column of the matrix until it reaches Row 4, then it continues to populate the second column until it reaches Row 4, and so on. --- ### Creating a Matrix The default sequence by which the `matrix()` fills the values in a matrix can be replicated by the command ``` r # The numbers are filling in by column. matrix(1:12, byrow = FALSE, nrow = 4) ``` ``` ## [,1] [,2] [,3] ## [1,] 1 5 9 ## [2,] 2 6 10 ## [3,] 3 7 11 ## [4,] 4 8 12 ``` `byrow = FALSE` tells R to fill the matrix column by column, which is the default sequence. To fill the matrix row-by-row, we use the parameter `byrow = TRUE`: ``` r # The numbers are filling by row. matrix(1:12, byrow = TRUE, nrow = 4) ``` ``` ## [,1] [,2] [,3] ## [1,] 1 2 3 ## [2,] 4 5 6 ## [3,] 7 8 9 ## [4,] 10 11 12 ``` --- ### Exercise Generate a sequence from 1:30. Replicate the following 10-by-3 matrix: .pull-left[ ``` ## [,1] [,2] [,3] ## [1,] 1 11 21 ## [2,] 2 12 22 ## [3,] 3 13 23 ## [4,] 4 14 24 ## [5,] 5 15 25 ## [6,] 6 16 26 ## [7,] 7 17 27 ## [8,] 8 18 28 ## [9,] 9 19 29 ## [10,] 10 20 30 ``` ] .pull-right[ ``` ## [,1] ## [1,] 1 ## [2,] 2 ## [3,] 3 ## [4,] 1 ## [5,] 2 ## [6,] 3 ## [7,] 1 ## [8,] 2 ## [9,] 3 ## [10,] 1 ``` ] --- class: center, middle, inverse # Combining Vectors into a Matrix --- ### Using the `matrix()` Function We may combine vectors into a matrix. Suppose we have data on stock1 and stock2 from Monday to Friday. Let's combine them by passing in their names into the `c()` function: ``` r stock1 <- c(9.5, 9.7, 9.8, 8.7, 8.4) stock2 <- c(203, 209, 215, 206, 198) stock.data <- c(stock1, stock2) stock.data # This is a vector of 10 elements ``` ``` ## [1] 9.5 9.7 9.8 8.7 8.4 203.0 209.0 215.0 206.0 198.0 ``` This operation combines all the elements into a single vector, but not a matrix. To reshape the vector into a matrix, we will use the `matrix()` function. --- ### Using the `matrix()` Function Let's use the `matrix()` function to convert `stock.data` into a matrix. Let's represent the days of the week as rows and stocks as columns. Thus, we should convert `stock.data` into a 5-by-2 matrix. ``` r # Tell R to convert this vector into a matrix with 5 rows stock.data.matrix <- matrix(stock.data, nrow = 5) stock.data.matrix ``` ``` ## [,1] [,2] ## [1,] 9.5 203 ## [2,] 9.7 209 ## [3,] 9.8 215 ## [4,] 8.7 206 ## [5,] 8.4 198 ``` --- ### Assigning Row and Column Names We may assign row and column names using the `rownames()` and `colnames()` functions, respectively. ``` r names.stock <- c('stock1', 'stock2') days.stock <- c('Mon', 'Tue', 'Wed', 'Thu', 'Fri') rownames(stock.data.matrix) <- days.stock # There are five days of observations colnames(stock.data.matrix) <- names.stock # There are two stocks stock.data.matrix ``` ``` ## stock1 stock2 ## Mon 9.5 203 ## Tue 9.7 209 ## Wed 9.8 215 ## Thu 8.7 206 ## Fri 8.4 198 ``` --- ### Using the `cbind()` Function There is an easier way to construct matrices from vectors than to apply the `matrix()` function to a vector. This is achieved by using the **column combine** function, i.e. `cbind()`. The `cbind()` function combines the vectors into columns of a new matrix. Using the above example, consider the following: ``` r stock1 <- c(9.5, 9.7, 9.8, 8.7, 8.4) stock2 <- c(203, 209, 215, 206, 198) stock.data2 <- cbind(stock1, stock2) stock.data2 ``` ``` ## stock1 stock2 ## [1,] 9.5 203 ## [2,] 9.7 209 ## [3,] 9.8 215 ## [4,] 8.7 206 ## [5,] 8.4 198 ``` Notice that the column names are the original names of the vectors, `stock1` and `stock2`. **Note**: To combine two matrices or vectors by rows, we may use the **row combine**, i.e. `rbind()`. --- ### Exercise 1. Construct a 2-by-2 matrix, called `mat.1`, by applying the `matrix()` function to a sequence from 1 to 4. 2. Construct a 2-by-2 matrix, called `mat.2` by applying the `cbind()` function to two vectors: 1) a sequence from 1 to 2, and 2) a sequence from 3 to 4. 3. Name the rows and columns of `mat.2` as "R1", "R2" and "C1", "C2" using the `rownames()` and `colnames()` functions. --- ### Exercise Generate a sequence from 1:30. Construct a 10-by-3 matrix. Generate another sequence from 81 to 90, and column combine it with the matrix you generated. --- class: center, middle, inverse # Index Referencing --- ### Referencing the Elements in a Matrix Elements in a vector can be filtered (i.e. chosen) using brackets and referencing index. For example, let's construct a 4-by-3 matrix using elements from the sequence 1 to 12: ``` r mat.values <- matrix(1:12, nrow = 4) mat.values ``` ``` ## [,1] [,2] [,3] ## [1,] 1 5 9 ## [2,] 2 6 10 ## [3,] 3 7 11 ## [4,] 4 8 12 ``` To reference the elements in `mat.values`, we append the matrix name with "`[ , ]`", where the first argument is the row index and the second argument is the column index, i.e. `[row index, column index]`. For example, "3" is the value of Row 3 and Column 1 of `mat.values` ``` r mat.values[3, 1] ``` ``` ## [1] 3 ``` --- ### Referencing the Elements in a Matrix To reference an entire row, we specify the row index in the brackets but leave the column index empty. For example, to filter Row 2 (i.e. with the values 2,6,and 10), ``` r mat.values[2,] ``` ``` ## [1] 2 6 10 ``` To select Column 3, we specify the column index and leave the row index empty ``` r mat.values[,3] ``` ``` ## [1] 9 10 11 12 ``` --- ### Subsetting a Matrix **subsetting** refers to the creation of a subset from a matrix or data frame (which we will see later). Let's subset a matrix by **slicing**. We may slice the matrix by its rows and/or columns using the colon "`:`" symbol. .pull-left[ For example, to extract Rows 2-3 of `mat.values`, we specify 2:3 in the row argument and leave the column argument empty. ] .pull-right[ ``` r mat.values[2:3,] ``` ``` ## [,1] [,2] [,3] ## [1,] 2 6 10 ## [2,] 3 7 11 ``` ] .pull-left[ To include values from all columns (so long as they belong to Rows 2 and 3 of `mat.values`), we leave the column argument empty. To extract, say, Columns 1 and 2, we specify 1:2 in the column argument and leave the row argument empty:] .pull-right[ ``` r mat.values[, 1:2] ``` ``` ## [,1] [,2] ## [1,] 1 5 ## [2,] 2 6 ## [3,] 3 7 ## [4,] 4 8 ``` ] --- ### Subsetting a Matrix To extract Rows 2-3 and Columns 1-2, ``` r mat.values[2:3, 1:2] ``` ``` ## [,1] [,2] ## [1,] 2 6 ## [2,] 3 7 ``` We may extract multiple rows or columns using the combine function. Let's filter out Rows 1 and 4: ``` r mat.values[c(1,4), ] ``` ``` ## [,1] [,2] [,3] ## [1,] 1 5 9 ## [2,] 4 8 12 ``` --- ### Describing a Matrix Let's construct a 4-by-3 matrix from elements in the sequence 1:12 (1 to 12). ``` r mat.1 <- matrix(1:12, 4) # 4 rows ``` Let's **summarize** the matrix using the `summary()` function ``` r summary(mat.1) ``` ``` ## V1 V2 V3 ## Min. :1.00 Min. :5.00 Min. : 9.00 ## 1st Qu.:1.75 1st Qu.:5.75 1st Qu.: 9.75 ## Median :2.50 Median :6.50 Median :10.50 ## Mean :2.50 Mean :6.50 Mean :10.50 ## 3rd Qu.:3.25 3rd Qu.:7.25 3rd Qu.:11.25 ## Max. :4.00 Max. :8.00 Max. :12.00 ``` The `summary()` function summarizes the information for each column in the matrix. For column `v1`, we can see that the minimum value is 1, maximum value is 4, and mean is 2.5. --- ### Describing a Matrix Let's append a vector of alphabets, a, a, b, b, to the matrix `mat.1` using the column bind function `cbind()`. Then, we summarize the combined matrix, `mat.2` ``` r letters.vec <- c('a', 'a', 'b', 'b') mat.2 <- cbind(mat.1,letters.vec) # column combine summary(mat.2) ``` ``` ## V1 V2 V3 letters.vec ## Length:4 Length:4 Length:4 Length:4 ## Class :character Class :character Class :character Class :character ## Mode :character Mode :character Mode :character Mode :character ``` For column `letters.vec`, there are two entries of 'a' and two entries of 'b'. Because a matrix or vector cannot contain elements from different classes, the numeric values in `mat.1` now become string values. --- ### Describing a Matrix Several useful functions for data description may be applied to matrices. Let's reproduce our data on stocks and find the mean for each stock. .pull-left[ ``` r stock1 <- c(9.5, 9.7, 9.8, 8.7, 8.4) stock2 <- c(203, 209, 215, 206, 198) stock.data2 <- cbind(stock1, stock2) stock.data2 ``` ``` ## stock1 stock2 ## [1,] 9.5 203 ## [2,] 9.7 209 ## [3,] 9.8 215 ## [4,] 8.7 206 ## [5,] 8.4 198 ``` ] .pull-right[ ``` r colMeans(stock.data2) ``` ``` ## stock1 stock2 ## 9.22 206.20 ``` ] The results show that the average price of stock1 is 9.22 and the average price of stock2 is 206.2. --- ### Describing a Matrix There are other useful functions (though doesn't make much sense in the context of stocks), such as * colSums (Sum across each column) * rowMeans (Mean of each row) * rowSums (Sum across each row) ``` r colSums(stock.data2) ``` ``` ## stock1 stock2 ## 46.1 1031.0 ``` ``` r rowMeans(stock.data2) ``` ``` ## [1] 106.25 109.35 112.40 107.35 103.20 ``` ``` r rowSums(stock.data2) ``` ``` ## [1] 212.5 218.7 224.8 214.7 206.4 ``` --- ### Recycling Property R allows you to combine a longer vector with a shorter vector. To combine these vectors, R will recycle (i.e. repeat) the elements in the shorter vector to fill in the missing elements. .pull-left[ In this example, the length of longer vector is a multiple of the length of the shorter vector. ] .pull-right[ ``` r v.longer <- 1:10 v.shorter <- c('A', 'B') v.combined <- cbind(v.longer, v.shorter) v.combined ``` ``` ## v.longer v.shorter ## [1,] "1" "A" ## [2,] "2" "B" ## [3,] "3" "A" ## [4,] "4" "B" ## [5,] "5" "A" ## [6,] "6" "B" ## [7,] "7" "A" ## [8,] "8" "B" ## [9,] "9" "A" ## [10,] "10" "B" ``` ] --- ### Recycling Property .pull-left[ In this example, the length of the longer vector is not a multiple of the length of the shorter one. Nevertheless, the elements in the shorter vector are recycled to meet the length of the longer vector. ] .pull-right[ ``` r v.longer <- 1:10 v.shorter2 <- c('A','B','C') cbind(v.longer,v.shorter2) ``` ] --- ### Exercise Try to reproduce the results shown below. .pull-left[ (1) Using the `matrix()` command, construct a 10-by-5 matrix `z.data` from the sequence 1:50 (1 to 50). You should reproduce ] .pull-right[ ``` ## [,1] [,2] [,3] [,4] [,5] ## [1,] 1 11 21 31 41 ## [2,] 2 12 22 32 42 ## [3,] 3 13 23 33 43 ## [4,] 4 14 24 34 44 ## [5,] 5 15 25 35 45 ## [6,] 6 16 26 36 46 ## [7,] 7 17 27 37 47 ## [8,] 8 18 28 38 48 ## [9,] 9 19 29 39 49 ## [10,] 10 20 30 40 50 ``` ] .pull-left[ (2) Create a subset using Columns 1 to 3 from `z.data`. You should reproduce ] .pull-right[ ``` ## [,1] [,2] [,3] ## [1,] 1 11 21 ## [2,] 2 12 22 ## [3,] 3 13 23 ## [4,] 4 14 24 ## [5,] 5 15 25 ## [6,] 6 16 26 ## [7,] 7 17 27 ## [8,] 8 18 28 ## [9,] 9 19 29 ## [10,] 10 20 30 ``` ] --- ### Exercise .pull-left[ (3) Create a subset using Rows 5 to 10 from `z.data`. You should reproduce ] .pull-right[ ``` ## [,1] [,2] [,3] [,4] [,5] ## [1,] 5 15 25 35 45 ## [2,] 6 16 26 36 46 ## [3,] 7 17 27 37 47 ## [4,] 8 18 28 38 48 ## [5,] 9 19 29 39 49 ## [6,] 10 20 30 40 50 ``` ] .pull-left[ (4) Create a subset using Rows 1,2, and 4 from `z.data`. You should reproduce ] .pull-right[ ``` ## [,1] [,2] [,3] [,4] [,5] ## [1,] 1 11 21 31 41 ## [2,] 2 12 22 32 42 ## [3,] 4 14 24 34 44 ``` ] --- ### Exercise .pull-left[ (5) Create a subset using Rows 1, 2, and 4 and Columns 2 and 3 from `z.data`. You should reproduce ] .pull-right[ ``` ## [,1] [,2] ## [1,] 11 21 ## [2,] 12 22 ## [3,] 14 24 ``` ] .pull-left[ (6) Name the five columns of `z.data` as `var1` (Column 1), `var2` (Column 2), `var3` (Column 3), `var4` (Column 4) and `var5` (Column 5). Note: `var1`, `var2`, `var3`, `var4` and `var5` are characters. You need to enclose each of them using quotation marks. ] .pull-right[ ``` ## var1 var2 var3 var4 var5 ## [1,] 1 11 21 31 41 ## [2,] 2 12 22 32 42 ## [3,] 3 13 23 33 43 ## [4,] 4 14 24 34 44 ## [5,] 5 15 25 35 45 ## [6,] 6 16 26 36 46 ## [7,] 7 17 27 37 47 ## [8,] 8 18 28 38 48 ## [9,] 9 19 29 39 49 ## [10,] 10 20 30 40 50 ``` ] --- ### Exercise .pull-left[ (7) Find the column means of `z.data` as `var1` (Column 1), `var2` (Column 2), `var3` (Column 3), `var4` (Column 4) and `var5` (Column 5). Note: `var1`, `var2`, `var3`, `var4` and `var5` are characters. ] .pull-right[ ``` ## var1 var2 var3 var4 var5 ## 5.5 15.5 25.5 35.5 45.5 ``` ]