R Basics, cont

Announcement
First functions to learn
Locating and deleting objects:
Vectors
Operations on vectors
Matrix
R commands on vector/matrix
Comparison (logic operator)
Other operators
Control flow
Define a function

Announcement

Email me your GitHub user name and accept the invitation to course organization
HW1 posted (due in 2 weeks on Sept. 8th, 2023)
- learn from the past: start early
Project description (one page) due in two weeks on Sept. 8th, 2023

First functions to learn

symbol	use
?	get documentation
str	show structure

test.str <- 1:6
str(test.str)

##  int [1:6] 1 2 3 4 5 6

Locating and deleting objects:

The commands objects() and ls() will provide a list of every object that you’ve created in a session.

objects()

## [1] "test.str"

ls()

## [1] "test.str"

The rm() and remove() commands let you delete objects (tip: always clearn-up your workspace as the first command)

rm(list=ls())  # clean up workspace

Vectors

Many commands in R generate a vector of output, rather than a single number.

The c() command: creates a vector containing a list of specific elements.

Example 1

c(7, 3, 6, 0)

## [1] 7 3 6 0

c(73:60)

##  [1] 73 72 71 70 69 68 67 66 65 64 63 62 61 60

c(7:3, 6:0)

##  [1] 7 6 5 4 3 6 5 4 3 2 1 0

c(rep(7:3, 6), 0)

##  [1] 7 6 5 4 3 7 6 5 4 3 7 6 5 4 3 7 6 5 4 3 7 6 5 4 3 7 6 5 4 3 0

Example 2 The command seq() creates a sequence of numbers.

seq(7)

## [1] 1 2 3 4 5 6 7

seq(3, 70, by = 6)

##  [1]  3  9 15 21 27 33 39 45 51 57 63 69

seq(3, 70, length = 6)

## [1]  3.0 16.4 29.8 43.2 56.6 70.0

Operations on vectors

Use brackets to select element of a vector.

x <- 73:60
x[2]

## [1] 72

x[2:5]

## [1] 72 71 70 69

x[-(2:5)]

##  [1] 73 68 67 66 65 64 63 62 61 60

Can access by “name” (safe with column/row order changes)

y <- 1:3
names(y) <- c("do", "re", "mi")
y[3]

## mi 
##  3

y["mi"]

## mi 
##  3

Matrix

matrix() command creates a matrix from the given set of values

matrix.example <- matrix(rnorm(100), nrow = 10, ncol = 10, byrow = TRUE)
matrix.example

##              [,1]        [,2]        [,3]        [,4]        [,5]        [,6]
##  [1,] -0.79247787 -0.40769726 -0.36242586 -0.86068473  0.61676550  0.60389301
##  [2,] -0.29342927  1.95626805 -3.25017198 -0.14644821  0.01868993 -0.97801917
##  [3,]  0.31710851  0.20475630 -0.04035274 -0.97646302  1.35850653  0.12491491
##  [4,]  1.41671864  0.07356081 -1.21213895  0.08046383  2.27805421  0.06189492
##  [5,]  0.57611802 -0.76400165 -2.60136335 -1.25164721 -1.46517943  1.46662665
##  [6,]  0.94900206 -0.16397246  0.89301258 -0.31142476 -1.70174669 -1.07300719
##  [7,] -0.03117343 -0.64567427  0.39518972 -0.14867986 -0.06920866  1.11953956
##  [8,] -2.49109292  1.20514550  1.65177244  1.81235849  1.26707910  1.95343199
##  [9,]  0.70165190 -0.21433470 -0.59936898  0.66333999 -1.70872981 -0.10206605
## [10,] -1.49453257 -1.30447995  1.41099157 -0.42615005 -1.14678580  0.47538729
##             [,7]        [,8]        [,9]       [,10]
##  [1,] -0.5513043 -0.56339787 -0.14265374  0.44419440
##  [2,] -1.1681861 -0.20979295  0.79762950  0.05981776
##  [3,] -0.5858411 -0.21534572  0.81169681 -1.14022515
##  [4,]  1.0369276  1.13687680  1.11344061 -0.06540517
##  [5,]  1.5293385  0.04091718 -0.65668681  0.27070602
##  [6,]  0.1126170 -0.60767039 -1.67410183  0.96186329
##  [7,]  1.0854858 -0.45621575  0.05359429 -0.09726792
##  [8,]  0.1648262  0.78698222 -0.86067254  1.30975774
##  [9,]  1.8252697 -0.94672705  0.01949460  1.30205411
## [10,]  1.6317522  0.32867626  1.88074098  0.56400994

R commands on vector/matrix

command	usage
sum()	sum over elements in vector/matrix
mean()	compute average value
sort()	sort all elements in a vector/matrix
min(), max()	min and max values of a vector/matrix
length()	length of a vector/matrix
summary()	returns the min, Q1, median, mean, Q3, and max values of a vector
dim()	dimension of a matrix
cbind()	combine a sequence of vector, matrix or data-frame arguments and combine by columns
rbind()	combine a sequence of vector, matrix or data-frame arguments and combine by rows
names()	get or set names of an object
colnames()	get or set column names of a matrix-like object
rownames()	get or set row names of a matrix-like object

sum(matrix.example)

## [1] 5.944488

mean(matrix.example)

## [1] 0.05944488

sort(matrix.example)

##   [1] -3.25017198 -2.60136335 -2.49109292 -1.70872981 -1.70174669 -1.67410183
##   [7] -1.49453257 -1.46517943 -1.30447995 -1.25164721 -1.21213895 -1.16818612
##  [13] -1.14678580 -1.14022515 -1.07300719 -0.97801917 -0.97646302 -0.94672705
##  [19] -0.86068473 -0.86067254 -0.79247787 -0.76400165 -0.65668681 -0.64567427
##  [25] -0.60767039 -0.59936898 -0.58584106 -0.56339787 -0.55130432 -0.45621575
##  [31] -0.42615005 -0.40769726 -0.36242586 -0.31142476 -0.29342927 -0.21534572
##  [37] -0.21433470 -0.20979295 -0.16397246 -0.14867986 -0.14644821 -0.14265374
##  [43] -0.10206605 -0.09726792 -0.06920866 -0.06540517 -0.04035274 -0.03117343
##  [49]  0.01868993  0.01949460  0.04091718  0.05359429  0.05981776  0.06189492
##  [55]  0.07356081  0.08046383  0.11261702  0.12491491  0.16482621  0.20475630
##  [61]  0.27070602  0.31710851  0.32867626  0.39518972  0.44419440  0.47538729
##  [67]  0.56400994  0.57611802  0.60389301  0.61676550  0.66333999  0.70165190
##  [73]  0.78698222  0.79762950  0.81169681  0.89301258  0.94900206  0.96186329
##  [79]  1.03692764  1.08548584  1.11344061  1.11953956  1.13687680  1.20514550
##  [85]  1.26707910  1.30205411  1.30975774  1.35850653  1.41099157  1.41671864
##  [91]  1.46662665  1.52933853  1.63175221  1.65177244  1.81235849  1.82526966
##  [97]  1.88074098  1.95343199  1.95626805  2.27805421

summary(matrix.example)

##        V1                V2                  V3                V4          
##  Min.   :-2.4911   Min.   :-1.304480   Min.   :-3.2502   Min.   :-1.25165  
##  1st Qu.:-0.6677   1st Qu.:-0.586180   1st Qu.:-1.0589   1st Qu.:-0.75205  
##  Median : 0.1430   Median :-0.189154   Median :-0.2014   Median :-0.23005  
##  Mean   :-0.1142   Mean   :-0.006043   Mean   :-0.3715   Mean   :-0.15653  
##  3rd Qu.: 0.6703   3rd Qu.: 0.171957   3rd Qu.: 0.7686   3rd Qu.: 0.02374  
##  Max.   : 1.4167   Max.   : 1.956268   Max.   : 1.6518   Max.   : 1.81236  
##        V5                 V6                 V7                V8          
##  Min.   :-1.70873   Min.   :-1.07301   Min.   :-1.1682   Min.   :-0.94673  
##  1st Qu.:-1.38558   1st Qu.:-0.06108   1st Qu.:-0.3853   1st Qu.:-0.53660  
##  Median :-0.02526   Median : 0.30015   Median : 0.6009   Median :-0.21257  
##  Mean   :-0.05526   Mean   : 0.36526   Mean   : 0.5081   Mean   :-0.07057  
##  3rd Qu.: 1.10450   3rd Qu.: 0.99063   3rd Qu.: 1.4184   3rd Qu.: 0.25674  
##  Max.   : 2.27805   Max.   : 1.95343   Max.   : 1.8253   Max.   : 1.13688  
##        V9                V10         
##  Min.   :-1.67410   Min.   :-1.1402  
##  1st Qu.:-0.52818   1st Qu.:-0.0341  
##  Median : 0.03654   Median : 0.3575  
##  Mean   : 0.13425   Mean   : 0.3610  
##  3rd Qu.: 0.80818   3rd Qu.: 0.8624  
##  Max.   : 1.88074   Max.   : 1.3098

Exercise Write a command to generate a random permutation of the numbers between 1 and 5 and save it to an object.

Comparison (logic operator)

symbol	use
!=	not equal
==	equal
>	greater
>=	greater or equal
<	smaller
<=	smaller or equal
is.na	is it “Not Available”/Missing
complete.cases	returns a logical vector specifying which observations/rows have no missing values
is.finite	if the value is finite
all	are all values in a logical vector true?
any	any value in a logical vector is true?

test.vec <- 73:68
test.vec

## [1] 73 72 71 70 69 68

test.vec < 70

## [1] FALSE FALSE FALSE FALSE  TRUE  TRUE

test.vec > 70

## [1]  TRUE  TRUE  TRUE FALSE FALSE FALSE

test.vec[3] <- NA
test.vec

## [1] 73 72 NA 70 69 68

is.na(test.vec)

## [1] FALSE FALSE  TRUE FALSE FALSE FALSE

complete.cases(test.vec)

## [1]  TRUE  TRUE FALSE  TRUE  TRUE  TRUE

all(is.na(test.vec))

## [1] FALSE

any(is.na(test.vec))

## [1] TRUE

Now let’s do a test of accuracy for doubles in R. Recall that for Double precision, we get approximately \(\log_{10}(2^{52}) \approx 16\) decimal point for precision.

test.exponent <- -(7:18)
10^test.exponent == 0

##  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

1 - 10^test.exponent == 1

##  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE

7360 - 10^test.exponent == 7360

##  [1] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE

73600 - 10^test.exponent == 73600

##  [1] FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE

Other operators

%in%, match()

test.vec

## [1] 73 72 NA 70 69 68

66 %in% test.vec

## [1] FALSE

match(66, test.vec, nomatch = 0)

## [1] 0

70 %in% test.vec

## [1] TRUE

match(70, test.vec, nomatch = 0)

## [1] 4

match(70, test.vec, nomatch = 0) > 0 # the implementation of %in%

## [1] TRUE

Control flow

These are the basic control-flow constructs of the R language. They function in much the same way as control statements in any Algol-like (Algol short for “Algorithmic Language”) language. They are all reserved words.

keyword	usage
if	if(cond) expr
if-else	if(cond) cons.expr else alt.expr
for	for(var in seq) expr
while	while(cond) expr
break	breaks out of a for loop
next	halts the processing of the current iteration and advances the looping index

Define a function

Read Function section from Advanced R by Hadley Wickham. We will visit functions in more details.

DoNothing <- function() {
  return(invisible(NULL))
}
DoNothing()

In general, try to avoid using loops (vectorize your code) in R. If you have to loop, try using for loops first. Sometimes, while loops can be dangerous (however, a smart compiler should detect this).

DoBadThing <- function() {
  result <- NULL
  while(TRUE) {
    result <- c(result, rnorm(100))
  }
  return(result)
}
# DoBadThing()