Lecture 1

Author

Lifeng Ren

Published

October 7, 2023

1 Introduction

In this section, we are going to learn:

  • Basic data types in R

  • Matrix/Linear Algebra calculation in R

  • Basic data structures in R

  • Basic data access and results print

2 Data Types

Data Type Description Example Check Function
numeric General number, can be integer or decimal. 5.2, 3L (the L makes it integer) is.numeric()
character Text or string data. "Hello, R!" is.character()
logical Boolean values. TRUE, FALSE is.logical()
integer Whole numbers. 2L, 100L is.integer()
complex Numbers with real and imaginary parts. 3 + 2i is.complex()
raw Raw bytes. charToRaw("Hello") is.raw()
factor Categorical data. Can have ordered and unordered categories. factor(c("low", "high", "medium")) is.factor()

The check function above would return TRUE or FALSE.

We can use several other functions to examine the data type. Most frequent use: class(), length()

Function Description Example Usage Sample Output
typeof() Returns the internal data type of the object (e.g., integer, double, character). typeof(2.5) “double”
class() Retrieves the class or type of the object, indicating how R should handle it. class(data.frame()) “data.frame”
storage.mode() Provides the mode of how the object is stored internally (often similar to typeof()). storage.mode(2L) “integer”
length() Gives the count of elements in an object. For matrices, it’s the product of rows and columns. length(c(1,2,3)) 3
attributes() Shows any metadata attributes associated with the object (e.g., names, dimensions). attributes(matrix(1:4, ncol=2)) List with $dim attribute

2.1 In-Class Exercise: Data Type

Use class(), length() and is.XXX() to examine the data types

  1. Copy the following code and run them in the R script->Data Type Section.
Code
# For 5.2
print(paste("Class of 5.2:", class(5.2)))
print(paste("Length of 5.2:", length(5.2)))
print(paste("Is it a numeric? ", is.numeric(5.2)))

# For 3L
print(paste("Class of 3L:", class(3L)))
print(paste("Length of 3L:", length(3L)))
print(paste("Is it an integer? ", is.integer(3L)))

# For "Hello, R!"
print(paste("Class of 'Hello, R!':", class("Hello, R!")))
print(paste("Length of 'Hello, R!':", length("Hello, R!")))
print(paste("Is it a character? ", is.character("Hello, R!")))

# For TRUE
print(paste("Class of TRUE:", class(TRUE)))
print(paste("Length of TRUE:", length(TRUE)))
print(paste("Is it a logical? ", is.logical(TRUE)))

# For FALSE
print(paste("Class of FALSE:", class(FALSE)))
print(paste("Length of FALSE:", length(FALSE)))
print(paste("Is it a logical? ", is.logical(FALSE)))

# For 2L
print(paste("Class of 2L:", class(2L)))
print(paste("Length of 2L:", length(2L)))
print(paste("Is it an integer? ", is.integer(2L)))

# For 100L
print(paste("Class of 100L:", class(100L)))
print(paste("Length of 100L:", length(100L)))
print(paste("Is it an integer? ", is.integer(100L)))

# For 3 + 2i
print(paste("Class of 3 + 2i:", class(3 + 2i)))
print(paste("Length of 3 + 2i:", length(3 + 2i)))
print(paste("Is it a complex? ", is.complex(3 + 2i)))

# For charToRaw("Hello")
raw_value <- charToRaw("Hello")
print(paste("Class of charToRaw('Hello'):", class(raw_value)))
print(paste("Length of charToRaw('Hello'):", length(raw_value)))
print(paste("Is it raw? ", is.raw(raw_value)))

# For factor(c("low", "high", "medium"))
factor_value <- factor(c("low", "high", "medium"))
print(paste("Class of the factor:", class(factor_value)))
print(paste("Length of the factor:", length(factor_value)))
print(paste("Is it a factor? ", is.factor(factor_value)))

3 Data Structure

In general R handles the following data structures:

Data Structure Description Creation Function Example
Vector Holds elements of the same type. c() c(1, 2, 3, 4)
Matrix Two-dimensional; elements of the same type. matrix() matrix(1:4, ncol=2)
Array Multi-dimensional; elements of the same type. array()
List Can hold elements of different types. list() list(name="John", age=30, scores=c(85, 90, 92))
Data Frame Like a table; columns can be different types. data.frame() data.frame(name=c("John", "Jane"), age=c(30, 25))
Factor For categorical data. factor() factor(c("male", "female", "male"))
Tibble Part of tidyverse; improved data frame. tibble(), as_tibble()
Time Series Used for time series data. ts()

Note: There are multiple ways to create each single data structure, the upper table is only for an example.

In this class, we are going to focus on four data structures:

  • Vector

  • Matrix

  • List

  • Data Frame

3.1 Access to the value

Data Structure Description Example Result Description
Vector Accessing values by index v <- c(10, 20, 30, 40); v[2] Gets the second element: 20
Matrix Accessing rows and columns using indices m <- matrix(1:4, 2, 2); m[1,2] Gets the value in the 1st row, 2nd column: 3
Data Frame Accessing columns by name and rows by index df <- data.frame(x=1:3, y=4:6); df$x Gets the x column: 1, 2, 3
df[1, ] Gets the first row as a data frame
List Accessing elements by index or name lst <- list(a=1, b=2, c=3); lst$a Gets the a element: 1
lst[[2]] Gets the second element: 2
Array Accessing elements using indices in each dimension arr <- array(1:8, dim=c(2,2,2)); arr[1,2,2] Accessing value in the given indices
Factor Accessing levels and values f <- factor(c("low", "high", "medium")); levels(f) Gets the levels of the factor

3.2 Vectors

3.2.1 Exercise: Vectors

  • Part (a) is for in-class use. Part (b) and Challenge Task are for your own practice at home.

  • Please do the exercise first and then check the solution.


Question 1: Basic Vector Creation

(a) In-class: Create a numeric vector named ages that contains the ages of five friends: 21, 23, 25, 27, and 29.

(b) Take-home: Create a character vector named colors with the values: “red”, “blue”, “green”, “yellow”, and “purple”.


Question 2: Accessing Vector Elements

(a) In-class: Print the age of the third friend from the ages vector.

(b) Take-home: Print the last color in the colors vector without using its numeric index.


Question 3: Vector Operations

(a) In-class: Add 2 years to each age in the ages vector.

(b) Take-home: Combine the ages and colors vectors into a single vector named combined. Print this new vector.


Question 4: Vector Filtering

(a) In-class: From the ages vector, filter and print ages less than 27.

(b) Take-home: From the colors vector, find and print colors that have the letter “e” in them.


Challenge Task!

(a) In-class: Reverse the order of the colors vector. (Hint: Think about how you might use the seq() function or indexing.)

(b) Take-home: Using a loop (advanced), print each color from the colors vector with a statement: “My favorite color is [color]”. (Replace [color] with the actual color from the vector.)]


Basic Vector Creation

Code
ages <- c(21, 23, 25, 27, 29)
print(ages)
[1] 21 23 25 27 29
Code
colors <- c("red", "blue", "green", "yellow", "purple")
print(colors)
[1] "red"    "blue"   "green"  "yellow" "purple"

Accessing Vector Elements

Code
#(a)
third_age <- ages[3]
print(third_age)
[1] 25
Code
#(b)
last_color <- tail(colors, n=1)
print(last_color)
[1] "purple"

Vector Operations

Code
#(a)
ages <- ages + 2
print(ages)
[1] 23 25 27 29 31
Code
#(b)
combined <- c(ages, colors)
print(combined)
 [1] "23"     "25"     "27"     "29"     "31"     "red"    "blue"   "green" 
 [9] "yellow" "purple"

Vector Filtering

Code
#(a)
young_ages <- ages[ages < 27]
print(young_ages)
[1] 23 25
Code
#(b)
e_colors <- colors[grepl("e", colors)]
print(e_colors)
[1] "red"    "blue"   "green"  "yellow" "purple"

Challenge Task!

Code
#(a)
reversed_colors <- colors[rev(seq_along(colors))]
print(reversed_colors)
[1] "purple" "yellow" "green"  "blue"   "red"   
Code
#(b)
for (color in colors) {
  cat("My favorite color is", color, "\n")
}
My favorite color is red 
My favorite color is blue 
My favorite color is green 
My favorite color is yellow 
My favorite color is purple 

3.3 Matrix

For the matrix data structure, you need to know three things:

  • How to create a new Matrix

  • How to do Matrix/Linear Algebra Operations

  • How to access the Matrix specific cell.

  • Review the Appendix: Matrix Operations for more information


3.3.1 How to create a new Matrix

To create a matrix in R, you can use the matrix() function. Here’s an example:

Code
# Create a matrix from a vector with 3 rows and 2 columns
my_matrix <- matrix(1:6, nrow = 3, ncol = 2)
my_matrix
     [,1] [,2]
[1,]    1    4
[2,]    2    5
[3,]    3    6

In the above example, the sequence 1:6 generates a vector containing numbers from 1 to 6. This is then used to fill a matrix with 3 rows and 2 columns.


3.3.2 How to do Matrix/Linear Algebra Operations

Matrix operations in R are straightforward. You can use common arithmetic operations (+, -, *, /) for element-wise operations or use specific functions for matrix-specific operations.

For instance, matrix multiplication (a common operation in linear algebra) can be done using the %*% operator:

Code
# Create two matrices
A <- matrix(c(1, 2, 3, 4), nrow=2)
B <- matrix(c(2, 0, 1, 3), nrow=2)

# Matrix multiplication
result <- A %*% B
result
     [,1] [,2]
[1,]    2   10
[2,]    4   14

Note on Matrix Operators:

The difference between * and %*% can be illustrated with an example:

Let’s say we have two matrices:

mat1 = matrix(c(1, 2, 3, 4), nrow=2)
mat2 = matrix(c(2, 0, 1, 3), nrow=2)

Using *, we get element-wise multiplication:

mat1 * mat2

This will give:

     [,1] [,2]
[1,]    2    3
[2,]    0   12

Using %*%, we get standard matrix multiplication:

mat1 %*% mat2

This will give:

     [,1] [,2]
[1,]    4    6
[2,]   10   12

Thus, * multiplies corresponding elements of the matrices, whereas %*% performs matrix multiplication as defined in linear algebra.

Here, matrices A and B are multiplied together to get the result matrix.


3.3.3 How to access the Matrix specific cell

You can access a specific cell of a matrix by using the row and column indices. Here’s how you can do it:

Code
# Using the previous matrix, let's access the element at the 2nd row and 1st column
element <- my_matrix[2, 1]
element
[1] 2

The value in the second row and first column of my_matrix is extracted and stored in the element variable.


3.3.4 Exercise: Basic Matrix Algebra Calculation


Question 1: Mathematical Functions

(a) In-class:

Given the sequence seq1 <- c(3, 4, 12, 16, 5), calculate:

  1. Square root of each element.
  2. Absolute value after subtracting 7 from each element.
  3. Natural logarithm of each element.

(b) Take-home:

Given the sequence seq2 <- c(8, 14, 7, 5, 9), calculate:

  1. Factorial of each element.
  2. The exponential value of each element.
  3. Trigonometric sine of each element.

Question 2: Basic Statistics

(a) In-class:

Given the sequence data1 <- c(2, 4, 6, 8, 10), calculate:

  1. Mean value.
  2. Median value.
  3. Standard deviation.

(b) Take-home:

Given the sequence data2 <- c(3, 5, 8, 9, 12), calculate:

  1. Variance.
  2. Minimum and maximum values.
  3. The 1st quantile (25th percentile).

Question 3: Matrix Calculation

(a) In-class:

Given the matrices:

matA = matrix(c(2, 3, 1, 5), nrow=2)
matB = matrix(c(1, 0, 2, 4), nrow=2)
  1. Perform matrix addition between matA and matB.
  2. Perform element-wise multiplication between matA and matB.
  3. Transpose matA.

(b) Take-home:

  1. Perform matrix multiplication between matA and matB.
  2. Calculate the determinant of matA.
  3. Find the eigenvalues of matA.

Challenging Question:

Given the matrices:

matX = matrix(c(4, 3, 2, 1), nrow=2)
matY = matrix(c(1, 2, 3, 4), nrow=2)
  1. Prove or disprove: The matrix product of matX and matY is commutative. (i.e., show whether matX %*% matY is equal to matY %*% matX or not).

Answer Keys for the In-class Exercise on R Vectors and Matrix Operations


Question 1: Mathematical Functions

(a) In-class:

Given the sequence seq1 <- c(3, 4, 12, 16, 5):

  1. Square root of each element:
Code
seq1 <- c(3, 4, 12, 16, 5)
sqrt(seq1)
[1] 1.732051 2.000000 3.464102 4.000000 2.236068
  1. Absolute value after subtracting 7 from each element:
Code
abs(seq1 - 7)
[1] 4 3 5 9 2
  1. Natural logarithm of each element:
Code
log(seq1)
[1] 1.098612 1.386294 2.484907 2.772589 1.609438

(b) Take-home:

Given the sequence seq2 <- c(8, 14, 7, 5, 9):

  1. Factorial of each element:
Code
seq2 <- c(8, 14, 7, 5, 9)
factorial(seq2)
[1]       40320 87178291200        5040         120      362880
  1. The exponential value of each element:
Code
exp(seq2)
[1]    2980.9580 1202604.2842    1096.6332     148.4132    8103.0839
  1. Trigonometric sine of each element:
Code
sin(seq2)
[1]  0.9893582  0.9906074  0.6569866 -0.9589243  0.4121185

Question 2: Basic Statistics

(a) In-class:

Given the sequence data1 <- c(2, 4, 6, 8, 10):

  1. Mean value:
Code
data1 <- c(2, 4, 6, 8, 10)
mean(data1)
[1] 6
  1. Median value:
Code
median(data1)
[1] 6
  1. Standard deviation:
Code
sd(data1)
[1] 3.162278

(b) Take-home:

Given the sequence data2 <- c(3, 5, 8, 9, 12):

  1. Variance:
Code
data2 <- c(3, 5, 8, 9, 12)
var(data2)
[1] 12.3
  1. Minimum and maximum values:
Code
min(data2)
[1] 3
Code
max(data2)
[1] 12
  1. The 1st quantile (25th percentile):
Code
quantile(data2, 0.25)
25% 
  5 

Question 3: Matrix Calculation

(a) In-class:

Given the matrices:

Code
matA = matrix(c(2, 3, 1, 5), nrow=2)
matB = matrix(c(1, 0, 2, 4), nrow=2)
  1. Matrix addition between matA and matB:
Code
matA + matB
     [,1] [,2]
[1,]    3    3
[2,]    3    9
  1. Element-wise multiplication between matA and matB:
Code
matA * matB
     [,1] [,2]
[1,]    2    2
[2,]    0   20
  1. Transpose matA:
Code
t(matA)
     [,1] [,2]
[1,]    2    3
[2,]    1    5

(b) Take-home:

  1. Matrix multiplication between matA and matB:
Code
matA %*% matB
     [,1] [,2]
[1,]    2    8
[2,]    3   26
  1. Determinant of matA:
Code
det(matA)
[1] 7
  1. Eigenvalues of matA:
Code
eigen(matA)$values
[1] 5.791288 1.208712

Challenging Question:

Given the matrices:

Code
matX = matrix(c(4, 3, 2, 1), nrow=2)
matY = matrix(c(1, 2, 3, 4), nrow=2)
  1. Matrix product of matX and matY versus matY and matX:
Code
matX %*% matY
     [,1] [,2]
[1,]    8   20
[2,]    5   13
Code
matY %*% matX
     [,1] [,2]
[1,]   13    5
[2,]   20    8

As these results are not equal, the matrix product is not commutative for matX and matY.


3.4 List

Lists in R are a type of data structure that allow you to store elements of different types (e.g., numbers, strings, vectors, and even other lists). Here’s a comprehensive tutorial on using lists in R:


3.4.1 How to create a new List

To create a list in R, you can use the list() function. Here’s how:

Code
# Create a list containing a number, a character string, and a vector
my_list <- list(age = 25, name = "John", scores = c(85, 90, 95))
my_list
$age
[1] 25

$name
[1] "John"

$scores
[1] 85 90 95

The above code creates a list my_list with three elements: an age, a name, and a vector of scores.


3.4.2 How to modify and add elements to a List

You can modify an existing list element or add a new element by using the [[ ]] operator.

Code
# Modify the age
my_list[[1]] <- 26

# Add a new element, address
my_list$address <- "123 R Street"

my_list
$age
[1] 26

$name
[1] "John"

$scores
[1] 85 90 95

$address
[1] "123 R Street"

In this example, the age is modified, and a new element address is added to the list.


3.4.3 How to access elements in a List

To access elements in a list, you can use either the [[ ]] operator or the $ operator:

Code
# Access the name using double square brackets
person_name <- my_list[[2]]

# Access scores using the dollar sign
test_scores <- my_list$scores

person_name
[1] "John"
Code
test_scores
[1] 85 90 95

Here, the second element of my_list (name) is accessed using [[ ]], and the scores are accessed using $.


3.4.4 How to remove elements from a List

You can remove an element from a list by setting it to NULL:

Code
# Remove the address element
my_list$address <- NULL

my_list
$age
[1] 26

$name
[1] "John"

$scores
[1] 85 90 95

The element address is removed from the list in this example.


Remember, lists are versatile and can hold heterogeneous data, making them crucial for various applications in R, especially when you need to organize and structure diverse data types.

3.4.5 Exercise: List


Question 1: Creating and Modifying Lists

(a) In-class:

  1. Create a list named student_info containing the following elements:
    • name: “Alice”
    • age: 20
    • subjects: a vector with “Math”, “History”, “Biology”
    Display the created list.

(b) Take-home:

  1. Add two more elements to the student_info list:
    • grades: a vector with scores 90, 85, 88 corresponding to the subjects.
    • address: “123 Main St”
    Display the updated list.

Question 2: Accessing and Analyzing List Elements

(a) In-class:

  1. From the student_info list, extract and print:
    • The name of the student.
    • The subjects they are studying.

(b) Take-home:

  1. Calculate and display:
    • The average grade of the student using the grades element from the list.
    • The number of subjects the student is studying.

Question 3: Nested Lists

(a) In-class:

  1. Create a nested list named school_info with the following structure:
    • school_name: “Greenwood High”
    • students: a list containing two elements:
      • student1: the student_info list you created in Question 1.
      • student2: a new list with name as “Bob”, age as 22, and subjects with “Physics”, “Math”, “English”.
    Display the created nested list.

(b) Take-home:

  1. Add a new student, student3, to the students list in school_info with your own details. Display the updated school_info list.

Challenging Question:

Given the school_info list:

  1. Write a function named get_average_grade that takes in the school_info list and a student name as arguments. The function should return the average grade for the given student. If the student does not exist in the list or has no grades, return an appropriate message. Test your function with student1, student2, and another name not in the list.

Question 1: Creating and Modifying Lists

(a) In-class:

Code
# Creating the student_info list
student_info <- list(
  name = "Alice",
  age = 20,
  subjects = c("Math", "History", "Biology")
)

# Displaying the created list
student_info
$name
[1] "Alice"

$age
[1] 20

$subjects
[1] "Math"    "History" "Biology"

(b) Take-home:

Code
# Adding more elements to the list
student_info$grades <- c(90, 85, 88)
student_info$address <- "123 Main St"

# Displaying the updated list
student_info
$name
[1] "Alice"

$age
[1] 20

$subjects
[1] "Math"    "History" "Biology"

$grades
[1] 90 85 88

$address
[1] "123 Main St"

Question 2: Accessing and Analyzing List Elements

(a) In-class:

Code
# Extracting and printing the name and subjects
student_name <- student_info$name
student_subjects <- student_info$subjects

student_name
[1] "Alice"
Code
student_subjects
[1] "Math"    "History" "Biology"

(b) Take-home:

Code
# Calculating the average grade
average_grade <- mean(student_info$grades)

# Counting the number of subjects
num_subjects <- length(student_info$subjects)

average_grade
[1] 87.66667
Code
num_subjects
[1] 3

Question 3: Nested Lists

(a) In-class:

Code
# Creating the nested list school_info
school_info <- list(
  school_name = "Greenwood High",
  students = list(
    student1 = student_info,
    student2 = list(name = "Bob", age = 22, subjects = c("Physics", "Math", "English"))
  )
)

# Displaying the created nested list
school_info
$school_name
[1] "Greenwood High"

$students
$students$student1
$students$student1$name
[1] "Alice"

$students$student1$age
[1] 20

$students$student1$subjects
[1] "Math"    "History" "Biology"

$students$student1$grades
[1] 90 85 88

$students$student1$address
[1] "123 Main St"


$students$student2
$students$student2$name
[1] "Bob"

$students$student2$age
[1] 22

$students$student2$subjects
[1] "Physics" "Math"    "English"

(b) Take-home:

Code
# Adding student3 to the students list
school_info$students$student3 <- list(name = "Charlie", age = 23, subjects = c("Chemistry", "Music", "Art"))

# Displaying the updated school_info list
school_info
$school_name
[1] "Greenwood High"

$students
$students$student1
$students$student1$name
[1] "Alice"

$students$student1$age
[1] 20

$students$student1$subjects
[1] "Math"    "History" "Biology"

$students$student1$grades
[1] 90 85 88

$students$student1$address
[1] "123 Main St"


$students$student2
$students$student2$name
[1] "Bob"

$students$student2$age
[1] 22

$students$student2$subjects
[1] "Physics" "Math"    "English"


$students$student3
$students$student3$name
[1] "Charlie"

$students$student3$age
[1] 23

$students$student3$subjects
[1] "Chemistry" "Music"     "Art"      

Challenging Question:

Code
# Function to get the average grade of a student
get_average_grade <- function(school_info, student_name) {
  student_data <- school_info$students[[student_name]]
  if (!is.null(student_data) && !is.null(student_data$grades)) {
    return(mean(student_data$grades))
  } else {
    return(paste("No grades found for", student_name))
  }
}

# Testing the function
get_average_grade(school_info, "student1")
[1] 87.66667
Code
get_average_grade(school_info, "student2")
[1] "No grades found for student2"
Code
get_average_grade(school_info, "John")
[1] "No grades found for John"

3.5 Data Frame

A data frame is a table-like structure that stores data in rows and columns. Each column in a data frame can be of a different data type (e.g., numeric, character, factor), but all elements within a column must be of the same type. This makes data frames ideal for representing datasets.

3.5.1 Creating a Data Frame

Data frames can be created using the data.frame() function.

Code
# Example
students <- data.frame(
  Name = c("Alice", "Bob", "Charlie"),
  Age = c(20, 21, 19),
  Grade = c("A", "B", "A")
)

print(students)
     Name Age Grade
1   Alice  20     A
2     Bob  21     B
3 Charlie  19     A

3.5.2 Accessing Data in Data Frames (Indexing)

In R, data frames are similar to tables in that they store data in rows and columns. Each row represents an observation and each column a variable. Indexing in data frames refers to accessing specific rows, columns, or cells of the data frame.

By Column Name

You can access the data of a specific column using the $ operator or double square brackets.

Code
# Creating a sample data frame
df <- data.frame(
  Name = c("Alice", "Bob", "Charlie"),
  Age = c(20, 21, 19),
  Grade = c("A", "B", "A")
)

# Accessing the 'Name' column using the `$` operator
names <- df$Name
print(names)
[1] "Alice"   "Bob"     "Charlie"
Code
# Accessing the 'Age' column using double square brackets
ages <- df[["Age"]]
print(ages)
[1] 20 21 19

By Column Index

You can also access a column by its numeric index.

Code
# Accessing the first column
first_column <- df[,1]
print(first_column)
[1] "Alice"   "Bob"     "Charlie"

By Row Index

You can access specific rows using their numeric indices.

Code
# Accessing the first and third rows
rows_1_and_3 <- df[c(1,3), ]
print(rows_1_and_3)
     Name Age Grade
1   Alice  20     A
3 Charlie  19     A

By Row and Column Indices

You can access a specific cell of the data frame using its row and column indices.

Code
# Accessing the age of the second student
age_of_second <- df[2, 2]
print(age_of_second)
[1] 21

By Row and Column Names

You can also use row and column names to access specific cells. Note: By default, data frames in R do not have row names unless explicitly set.

Code
# Setting row names for our data frame
rownames(df) <- c("Student1", "Student2", "Student3")

# Accessing the grade of the third student using row and column names
grade_of_third <- df["Student3", "Grade"]
print(grade_of_third)
[1] "A"

3.5.3 Modifying a Data Frame

We will talk more about this section in the Data Cleaning Session. But we can see an example on columns manipulation:

Code
# Adding a new column
students$Major <- c("Math", "Biology", "Physics")
print(students)
     Name Age Grade   Major
1   Alice  20     A    Math
2     Bob  21     B Biology
3 Charlie  19     A Physics
Code
# Modifying a column
students$Age <- students$Age + 1
print(students)
     Name Age Grade   Major
1   Alice  21     A    Math
2     Bob  22     B Biology
3 Charlie  20     A Physics
Code
# Removing a column
students$Major <- NULL
print(students)
     Name Age Grade
1   Alice  21     A
2     Bob  22     B
3 Charlie  20     A

3.5.4 Useful Functions for Data Frames

Here are some functions that are particularly useful when working with data frames:

  • head(df): Displays the first six rows of the data frame df.
  • tail(df): Displays the last six rows of the data frame df.
  • str(df): Provides a structured overview of the data frame df, showing the data types of each column and the first few entries of each column.
  • dim(df): Returns the dimensions (number of rows and columns) of the data frame df.
  • summary(df): Provides a statistical summary of each column in the data frame df.

3.5.5 Load and Save Data Frames in R

Handling data is one of the most essential aspects of data analysis in R. In this section, we’ll explore how to load data into R from external sources and save it for future use.

Read a CSV File

To read a CSV (Comma-Separated Values) file and store its contents as a data frame, use the read.csv() function.

Code
# Load a CSV file into a data frame
df <- read.csv("path_to_file.csv")

# Display the first few rows of the data frame
head(df)

Saving to CSV Files

To save a data frame to a CSV file, use the write.csv() function.

Code
# Save a data frame to a CSV file
write.csv(df_csv, "path_to_output_file.csv", row.names = FALSE)

# Note: `row.names = FALSE` ensures that row names are not written to the CSV.

Loading Data Frames from Excel Files

To work with Excel files, you might need external packages like readxl and writexl. Steps should be:

  • Install and load the haven package:install.packages("readxl")
  • Load the library: library(readxl)
  • Use data using: df_excel <- read_excel("path_to_file.xlsx")

Loading Data Frames from STATA’s .dta File

To read .dta files from Stata into R, follow these steps:

  • Install and load the haven package:install.packages("haven")
  • Load the library: library(haven)
  • Use data using: df_dta <- read_dta("path_to_file.dta")

4 Summary

By finishing this lecture, you should be able to:

  • Understand the basic data types, and data structures in R

  • For Vector, Matrix, List type of data structures, you should know:

    • Access to the value

    • Basic operations

  • Know how to save, read dataframe.


5 Comprehensive Challenge Project

  • Do not do this project this week, as I know you are busy

  • This is just for your own practice later and someone who is only use the notes to self-study.

5.1 Introduction

In this challenge project, you will demonstrate your understanding of the basic data types, structures in R, and manipulate built-in data sets to gain insights. This project will test your proficiency in accessing values in vectors, matrices, lists, and data frames, performing basic operations, and saving & reading data frames.

5.2 Dataset

We will use the built-in dataset mtcars. This data was extracted from the 1974 Motor Trend US magazine and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973-74 models).

5.3 Tasks

5.3.1 Basic Data Types and Structures

(a) Vectors

  • Create a numeric vector that represents the miles per gallon (mpg) of the mtcars dataset.
  • Calculate and print the average mpg.
  • Access and print the mpg of the 10th car.

(b) Matrices

  • Convert the first 5 rows and 3 columns of the mtcars dataset into a matrix.
  • Perform a matrix operation: Multiply the above matrix by 2 and print the result.

(c) Lists

  • Create a list that contains:
    • A numeric vector of horsepower (hp) of the cars.
    • A character vector of car model names.
  • Access and print the name of the 5th car from the list.

5.3.2 Data Frame Operations

  • Access and print the details of the car with the highest horsepower.
  • Save this single-row data frame to a CSV file named “high_hp_car.csv”.
  • Read the “high_hp_car.csv” file back into R and print its contents to confirm the saved data.

5.3.3 Comprehensive Exploration

  • Filter cars that have an mpg greater than 20 and less than 6 cylinders.
  • For these filtered cars, calculate:
    • The average horsepower.
    • The median weight.
    • The number of cars with manual transmission (am column value is 1).

5.3.4 Challenge Question!

  • Create a matrix of dimensions 3x3 using the mpg, hp, and wt (weight) columns for the first three cars.
  • Invert this matrix. (Hint: You can use the solve() function.)
  • Check if the matrix is singular before inversion (its determinant should not be zero).

5.4 Deliverables

  1. An R script containing all the operations performed and any auxiliary functions created.
  2. The “high_hp_car.csv” file.
  3. (Optional) A brief report generated by rmd with output (in html PDF file).

5.5 Solutions

You can find the solution here.


6 Appendix

6.1 Math Operations

Function Description Example Result
sqrt() Square root sqrt(9) 3
abs() Absolute value abs(-10) 10
log() Natural logarithm (base e) log(2.72) 1
log10() Logarithm base 10 log10(100) 2
exp() Exponential function (base e) exp(1) 2.71828 (approximately e)
factorial() Factorial of a number factorial(4) 24
^ Exponentiation 2^3 8
round() Rounds a number round(2.678, 2) 2.68
ceiling() Rounds up to the nearest integer ceiling(2.1) 3
floor() Rounds down to the nearest integer floor(2.9) 2
trunc() Removes the decimal part trunc(2.9) 2
sin(), cos(), tan() Trigonometric functions sin(pi/2) 1

6.2 Basic Statistical Operations

Function Description Example Result
mean() Arithmetic mean (average) mean(c(1, 2, 3, 4, 5)) 3
median() Median (middle value) median(c(1, 3, 5, 7, 9)) 5
sd() Standard deviation sd(c(1, 2, 3, 4, 5)) 1.5811
var() Variance var(c(1, 2, 3, 4, 5)) 2.5
min() Minimum value min(c(2, 5, 1, 8, 7)) 1
max() Maximum value max(c(2, 5, 1, 8, 7)) 8
range() Range (min and max) range(c(2, 5, 1, 8, 7)) 1, 8
sum() Sum of values sum(c(1, 2, 3, 4, 5)) 15
quantile() Quantiles (e.g., quartiles) quantile(c(1, 2, 3, 4, 5), 0.25) 1.5
cor() Correlation coefficient cor(c(1, 2, 3), c(3, 2, 1)) -1
cov() Covariance cov(c(1, 2, 3), c(3, 2, 1)) -1
table() Frequency table of factors table(c("A", "A", "B", "B", "C")) A: 2, B: 2, C: 1
prop.table() Proportional table prop.table(table(c("A", "A", "B", "B", "C"))) A: 0.4, B: 0.4, C: 0.2

6.3 Vector Operations

Description Code
Create a sequence nums = 2:6
Generate repeated numbers sevens = rep(7, times=5)
Repeat a string vec_hello = rep("Hello", times = 7)
Repeat a string multiple times vec_hello_20 = rep(vec_hello, times = 2)
Sequence with incremental steps vec_range = seq(from = 3.2, to = 4.5, by = 0.2)
Initialize a zero vector outputs = integer(length=7)
Combine vectors vec_merge = c(nums, rep(seq(2, 6, 2), 2), c(2, 3, 5), 77, 5:3)
Check vector length length(vec_merge)
Access the first value vec2[1]
Omit the first value vec2[-1]
Access elements using an index range vec2[2:4]
Select specific elements vec2[c(2,3,6)]
Add a scalar to a vector vec2 + 2
Multiply all elements by a scalar 3 * vec2
Compute square root sqrt(vec2)
Add two vectors vec2 + 3*vec2
Compare vector with a scalar vec2 > 4
Check if vector contains an exact value vec2 == 3
Logical AND condition vec2 > 3 & vec2!=4
Logical OR condition vec2 > 3 | vec2!=5
Filter elements based on a condition vec2[vec2>4]
Index of elements that meet a condition which(vec2>6)
Largest element in vector max(vec2)
Index of largest element which.max(vec2)
Smallest element in vector min(vec2)
Index of smallest element which.min(vec2)
Sum of all elements in a vector sum(vec2)
Product of all elements in a vector prod(vec2)
Mean value of a vector mean(vec2)
Median value of a vector median(vec2)

6.4 Matrix Operation

Operation Symbol/Function Description Example
Matrix Addition + Element-wise addition of two matrices mat1 + mat2
Matrix Subtraction - Element-wise subtraction of two matrices mat1 - mat2
Element-wise Multiplication * Multiplies each element of one matrix with the corresponding element of another mat1 * mat2
Matrix Multiplication %*% Standard matrix multiplication mat1 %*% mat2
Matrix Transposition t() Transposes a matrix (rows become columns and vice versa) t(mat1)
Matrix Inversion solve() Inverts a matrix (only for square matrices) solve(mat1)
Determinant det() Calculates the determinant of a matrix det(mat1)
Eigenvalues and Eigenvectors eigen() Calculates the eigenvalues and eigenvectors of a matrix eigen(mat1)

7 Reference

Back to top