rm(list = ls()) # clean-up workspace
Homogeneous | Heterogeneous | |
---|---|---|
1d | Atomic vector | List |
2d | Matrix | Data frame |
nd | Array |
Homogeneous: all contents must be of the same type
Heterogeneous: the contents can be of different types
The basic data structure in R.
Two flavors: atomic vectors and lists
Three common properties:
Type, typeof()
, what it is.
Length, length()
, how many elements it
contains.
Attributes, attributes()
, additional arbitrary
metadata.
No scalars in R. They are length 1 vectors.
Note: is.vector()
does not test if an
object is a vector. Use is.atomic()
or
is.list()
to test.
There are four common types of atomic vectors (remember Lab 2?)
logical
integer
numeric (actually double)
character
Many commands in R generate a vector of output, rather than a single number.
The c()
command: creates a vector containing a list of
specific elements.
Example 1
c(7, 3, 6, 0)
## [1] 7 3 6 0
c(73:60)
## [1] 73 72 71 70 69 68 67 66 65 64 63 62 61 60
c(7:3, 6:0)
## [1] 7 6 5 4 3 6 5 4 3 2 1 0
c(rep(7:3, 6), 0)
## [1] 7 6 5 4 3 7 6 5 4 3 7 6 5 4 3 7 6 5 4 3 7 6 5 4 3 7 6 5 4 3 0
Example 2 The command seq()
creates a
sequence of numbers.
seq(7)
## [1] 1 2 3 4 5 6 7
seq(3, 70, by = 6)
## [1] 3 9 15 21 27 33 39 45 51 57 63 69
seq(3, 70, length = 6)
## [1] 3.0 16.4 29.8 43.2 56.6 70.0
c()
’s:Example 3
c(1, c(2, c(3, 4)))
## [1] 1 2 3 4
Elements can be of any type, including lists.
Construct list by using list()
instead of
c()
.
x <- list(1:3, "a", c(TRUE, FALSE, TRUE), c(2.3, 5.9))
str(x)
## List of 4
## $ : int [1:3] 1 2 3
## $ : chr "a"
## $ : logi [1:3] TRUE FALSE TRUE
## $ : num [1:2] 2.3 5.9
$
.x.named <- list(vector = 1:3, name = "a", logical = c(TRUE, FALSE, TRUE), range = c(2.3, 5.9))
str(x.named)
## List of 4
## $ vector : int [1:3] 1 2 3
## $ name : chr "a"
## $ logical: logi [1:3] TRUE FALSE TRUE
## $ range : num [1:2] 2.3 5.9
x.named$vector
## [1] 1 2 3
x.named$range
## [1] 2.3 5.9
Lists are used to build up many of the more complicated data structures in R.
For example, both data frames (another data structure in R) and
linear models objects (as produced by lm()
) are
lists.
All objects can have arbitrary additional attributes to store metadata about the object.
Attributes can be thought as a named list.
Use attr()
to access individual attribute or
attributes()
to access all attributes as a list.
By default, most attributes are lost when modifying a vector. Only the most important ones stay:
Names, a character vector giving each element a name.
Dimensions, used to turn vectors into matrices and arrays.
Class, used to implement S3 object system.
y <- 1:10
attr(y, "my_attribute") <- "This is a vector"
attr(y, "my_attribute")
## [1] "This is a vector"
str(y)
## int [1:10] 1 2 3 4 5 6 7 8 9 10
## - attr(*, "my_attribute")= chr "This is a vector"
str(attributes(y))
## List of 1
## $ my_attribute: chr "This is a vector"
A factor is a vector that can contain only predefined values and is used to store categorical data.
Built upon integer vectors using two attributes:
the class
, “factor”: makes them behave differently
from regular integer vectors
the levels
: defines the set of allowed
values
Sometimes when a data frame is read directly from a file, you may get a column of factor instead of numeric because of non-numeric value in the column (e.g. missing value encoded specially)
Possible remedy: coerce the vector from a factor to a character vecctor, and then from a character to a double vector
Better use na.strings
argument to
read.csv()
function
adding a dim
attribute to an atomic vector allows it
to behave like a multi-dimensional array
matrix is a special case of array
matrix()
command creates a matrix from the given set
of values
# Two scalar arguments to specify rows and columns
a <- matrix(1:6, ncol = 3, nrow = 2)
# One vector argument to describe all dimensions
b <- array(1:12, c(2, 3, 2))
# You can also modify an object in place by setting dim()
c <- 1:6
dim(c) <- c(3, 2)
c
## [,1] [,2]
## [1,] 1 4
## [2,] 2 5
## [3,] 3 6
dim(c) <- c(2, 3)
c
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
Exercise Write a command to generate a random permutation of the numbers between 1 and 5 and save it to an object.
set.seed(7360) # the course seed number
order(runif(5))
## [1] 3 5 2 4 1
sample(1:5, 5)
## [1] 2 1 5 3 4
Most common way of storing data in R
A list of equal-length vectors
2-dimensional structure, shares properties of both
matrix
and list
has attributes, names()
, colnames()
and
rownames()
length()
of a data frame is the length of the
underlying list, same as ncol()
We will focus more on tibble
, a data frame, but
more.
Functions are a fundamental building block of R
Functions are objects in their own right (so
that they can have attributes()
)
All R functions have three parts:
the formals()
, the list of
arguments which controls how you can call the
function
the body()
, the code inside the
function
the environment()
, the “map” of the location of the
function’s variables
f <- function(x) x^2
f
## function(x) x^2
formals(f)
## $x
body(f)
## x^2
environment(f)
## <environment: R_GlobalEnv>
There is no special syntax for defining and naming a function
simply create a function object (with function
) and
bind it to a name with <-
DoNothing <- function() {
return(invisible(NULL))
}
DoNothing()
mean(1:10, na.rm = TRUE)
## [1] 5.5
args <- list(1:10, na.rm = TRUE)
do.call(mean, args)
## [1] 5.5
do.call()
.Now let’s discuss scoping
R uses lexical scoping that follows four primary rules:
Name masking
Functions versus variables
A fresh start
Dynamic lookup
x <- 10
y <- 20
g02 <- function(){
x <- 1 # a local variable to the function
y <- 2
c(x, y)
}
g02()
## [1] 1 2
x <- 2
g03 <- function() {
y <- 1
c(x, y)
}
g03()
## [1] 2 1
y
## [1] 20
R searches inside the current function, then looks where the function is defined and so on, all the way up to the global environment.
Finally, R looks in other loaded packages.
y <- 10
f <- function(x) {
y <- 2
y^2 + g(x)
}
g <- function(x) {
x * y
}
What is the value of f(3)
?
In R, functions are ordinary objects. This means the scoping rules described above also apply to functions.
However, rules get complicated when functions and non-functions share the same name.
Better avoid assigning same names to objects
rm(a) # just in case...
g11 <- function() {
if (!exists("a")) {
a <- 1
} else {
a <- a + 1
}
a
}
g11()
## [1] 1
g11()
## [1] 1
What happens if we do
a <- 1:5
g11()
g11()
Lexical scoping determines where to look for values.
R looks for values when the function is run, not when the function is created.
g12 <- function() x + 1
x <- 15
g12()
## [1] 16
x <- 20
g12()
## [1] 21
Depending on variables defined in the global environment can be bad!
codetools::findGlobals()
can be helpful
You can define default values for arguments
Default values can be in terms of other arguments, or even in terms of variables defined later in the function
This is because R uses Lazy Evaluation that function arguments are only evaluated if accessed.
h04 <- function(x = 1, y = x * 2, z = a + b) {
a <- 10
b <- 100
c(x, y, z)
}
h04()
## [1] 1 2 110
...
(dot-dot-dot)Functions can have a special argument ...
With ...
, a function can take any number of
additional arguments
You can use ...
to pass those additional arguments
on to another function
Pro
x <- list(c(1, 3, NA), c(4, NA, 6))
str(lapply(x, mean, na.rm = TRUE))
## List of 2
## $ : num 2
## $ : num 5
Con
sum(1, 2, NA, na_rm = TRUE)
## [1] NA
These are the basic control-flow constructs of the R language. They function in much the same way as control statements in any Algol-like (Algol short for “Algorithmic Language”) language. They are all reserved words.
keyword | usage |
---|---|
if | if(cond) expr |
if-else | if(cond) cons.expr else alt.expr |
for | for(var in seq) expr |
while | while(cond) expr |
break | breaks out of a for loop |
next | halts the processing of the current iteration and advances the looping index |
Most functions exit in one of two ways:
return a value, indicating success
throw an error, indicating failure
There are two ways that a function can return a value:
j01 <- function(x) {
if (x < 10) {
0
} else {
10
}
}
j01(5)
## [1] 0
j01(15)
## [1] 10
return()
j02 <- function(x) {
if (x < 10) {
return(0)
} else {
return(10)
}
}
invisible()
to the last value:j04 <- function() invisible(1)
j04()
If a function cannot complete its assigned task, it should throw an
error with stop()
, which immediately terminates the
execution of the function.
j05 <- function() {
stop("I'm an error")
return(10)
}
j05()
## Error in j05(): I'm an error
Use on.exit()
to set up an exit
handler that is run regardless of whether the function exits
normally or with an error
Always set add = TRUE
when using
on.exit()
. Otherwise, each call will overwrite the previous
exit handler.
j06 <- function(x) {
cat("Hello\n")
on.exit(cat("Goodbye!\n"), add = TRUE)
if (x) {
return(10)
} else {
stop("Error")
}
}
j06(TRUE)
## Hello
## Goodbye!
## [1] 10
j06(FALSE)
## Hello
## Error in j06(FALSE): Error
## Goodbye!
with_dir <- function(dir, code) {
old <- setwd(dir)
on.exit(setwd(old), add = TRUE)
code
}
getwd()
## [1] "/Users/xji3/Dropbox/My_Files/Tulane/Teaching/tulane-math-7360-2023.github.io/lectures/06-Data_structure"
with_dir("~", getwd())
## [1] "/Users/xji3"
getwd()
## [1] "/Users/xji3/Dropbox/My_Files/Tulane/Teaching/tulane-math-7360-2023.github.io/lectures/06-Data_structure"