If you are not familiar with shell environment, you could start with
finding where you are by type command pwd
in the terminal
window. You can also open the graphic file system by type
open .
(for mac) and xdg-open .
(for linux) in
the terminal. If you want to change to another location, try to play
with terminal using commands from the following list. Tip:
cd
Some useful commands to get you started.
pwd
print absolute path to the current working
directory (where you are right now)
ls
list contents of a directory
ls -l
list detailed contents of a directory
ls -al
list all contents of a directory, including
those start with .
(hidden files/folders)
cd
change directory
cd ..
go to the parent directory of the current
working directory
Manipulate files and directories
cp
copies file to a new location.
mv
moves file to a new location.
touch
creates a text file; if file already exists,
it’s left unchanged.
rm
deletes a file.
mkdir
creates a new directory.
rmdir
deletes an empty directory.
rm -rf
deletes a directory and all contents in that
directory (be cautious using the -f
option …).
rm(list = ls()) # clean up workspace first
sessionInfo()
## R version 4.3.0 (2023-04-21)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS Ventura 13.5.1
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: America/Chicago
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] digest_0.6.31 R6_2.5.1 fastmap_1.1.1 xfun_0.39
## [5] cachem_1.0.8 knitr_1.42 htmltools_0.5.5 rmarkdown_2.21
## [9] cli_3.6.1 sass_0.4.6 jquerylib_0.1.4 compiler_4.3.0
## [13] rstudioapi_0.14 tools_4.3.0 evaluate_0.20 bslib_0.4.2
## [17] yaml_2.3.7 rlang_1.1.1 jsonlite_1.8.4
We will implement two simplest approaches for numerical integration
and apply them to integrate sin(x)
from \(0\) to \(\pi\).
midpoint
and
trapezoid
values.\(\mbox{midpoint} = (b - a) \times f(\frac{a + b}{2})\)
\(\mbox{trapezoid} = \frac{b - a}{2} \times (f(a) + f(b))\)
# if you are modifying this source file directly, remember to change the above flag to eval=TRUE
# finish the code below
midpoint <- function(f, a, b) {
result <-
return(result)
}
trapezoid <- function(f, a, b) {
result <-
return(result)
}
# what do you get?
midpoint(sin, 0, pi)
trapezoid(sin, 0, pi)
midpoint.composite <- function(f, a, b, n = 10) {
points <- seq(a, b, length = n + 1)
area <- 0
for (i in seq_len(n)) {
area <- area + midpoint()
}
return(area)
}
trapezoid.composite <- function(f, a, b, n = 10) {
points <- seq(a, b, length = n + 1)
area <- 0
for (i in seq_len(n)) {
area <- area + trapezoid()
}
return(area)
}
midpoint.composite(sin, 0, pi, n = 10)
midpoint.composite(sin, 0, pi, n = 100)
midpoint.composite(sin, 0, pi, n = 1000)
trapezoid.composite(sin, 0, pi, n = 10)
trapezoid.composite(sin, 0, pi, n = 100)
trapezoid.composite(sin, 0, pi, n = 1000)
Now try the above functions with n=10, 100, 1000
.
Explain your findings.
midpoint.composite.vectorize <- function(f, a, b, n = 10) {
points <- seq(a, b, length = n + 1)
areas <- midpoint(f, points[], points[]) # Tip: the first points[] should be the list of all a's and the second points[] should be the list of all b's
return(sum(areas))
}
trapezoid.composite.vectorize <- function(f, a, b, n = 10) {
points <- seq(a, b, length = n + 1)
areas <- trapezoid(f, points[], points[])
return(sum(areas))
}
midpoint.composite.vectorize(sin, 0, pi, n = 10)
midpoint.composite.vectorize(sin, 0, pi, n = 100)
midpoint.composite.vectorize(sin, 0, pi, n = 1000)
trapezoid.composite.vectorize(sin, 0, pi, n = 10)
trapezoid.composite.vectorize(sin, 0, pi, n = 100)
trapezoid.composite.vectorize(sin, 0, pi, n = 1000)
Now try the above vectorized functions with
n=10, 100, 1000
. Explain your findings.
From William Dunlap:
“User CPU time” gives the CPU time spent by the current process (i.e., the current R session) and “system CPU time” gives the CPU time spent by the kernel (the operating system) on behalf of the current process. The operating system is used for things like opening files, doing input or output, starting other processes, and looking at the system clock: operations that involve resources that many processes must share. Different operating systems will have different things done by the operating system.
system.time(midpoint.composite(sin, 0, pi, n = 10000))
system.time(trapezoid.composite(sin, 0, pi, n = 10000))
system.time(midpoint.composite.vectorize(sin, 0, pi, n = 10000))
system.time(trapezoid.composite.vectorize(sin, 0, pi, n = 10000))
Now let’s implement the Normal equations from scratch. \(\hat{\beta} = (X^{\top}X)^{-1}X^{\top}Y\).
my.normal.equations <- function(X, Y) {
if (!is.vector(Y)) {
stop("Y is not a vector!")
}
if (!is.matrix(X)) { # force X to be a matrix for now
stop("X is not a matrix!")
}
if (dim(X)[1] != length(Y)) {
stop("Dimension mismatch between X and Y!")
}
return() # finish the calculation for beta
}
set.seed(7360)
sample.size <- 100
num.col <- 2
X <- matrix(rnorm(sample.size * num.col), nrow = sample.size, ncol = num.col)
X <- cbind(1, X)
Y <- rnorm(sample.size)
system.time(result.lm <- lm(Y ~ X[, 2] + X[, 3]))
summary(result.lm)
system.time(result.my.normal.equations <- my.normal.equations(X, Y))
result.my.normal.equations
Does your result match the estimated coefficients from the
lm()
function?