HW1 due today.
Questions regarding HW1 problem 3 from Viraj (from 2 years ago, everyone should still thank him)
u
is a
column vector.Dr. Hua Zhou’s slides
To gain a deep understanding of how R works, the book Advanced R by Hadley Wickham is a must read. Read now to save numerous hours you might waste in future.
We cover select topics on coding style, benchmarking, profiling, debugging, parallel computing, byte code compiling, Rcpp, and package development.
Most important is to be consistent in your code style.
Sources:
Advanced R: http://adv-r.had.co.nz/Performance.html, https://adv-r.hadley.nz/perf-measure.html#microbenchmarking
In order to identify performance issue, we need to measure runtime accurately.
system.time
set.seed(280)
x <- runif(1e6)
system.time({sqrt(x)})
## user system elapsed
## 0.001 0.001 0.002
system.time({x ^ 0.5})
## user system elapsed
## 0.009 0.001 0.010
system.time({exp(log(x) / 2)})
## user system elapsed
## 0.007 0.001 0.008
From William Dunlap:
“User CPU time” gives the CPU time spent by the current process (i.e., the current R session) and “system CPU time” gives the CPU time spent by the kernel (the operating system) on behalf of the current process. The operating system is used for things like opening files, doing input or output, starting other processes, and looking at the system clock: operations that involve resources that many processes must share. Different operating systems will have different things done by the operating system.
microbenchmark
library("microbenchmark")
library("ggplot2")
mbm <- microbenchmark(
sqrt(x),
x ^ 0.5,
exp(log(x) / 2),
times = 100
)
## Warning in microbenchmark(sqrt(x), x^0.5, exp(log(x)/2), times = 100): less
## accurate nanosecond times to avoid potential integer overflows
mbm
## Unit: milliseconds
## expr min lq mean median uq max neval
## sqrt(x) 1.267802 1.314952 1.664725 1.411794 1.636003 3.362820 100
## x^0.5 8.876910 9.027031 9.562402 9.229080 10.125688 11.430677 100
## exp(log(x)/2) 7.079224 7.248103 7.639206 7.383546 7.638279 9.713597 100
Results from microbenchmark
can be nicely plotted in
base R or ggplot2.
boxplot(mbm)
autoplot(mbm)
bench
The bench
package is another tool for microbenchmarking.
The output is a tibble:
library(bench)
(lb <- bench::mark(
sqrt(x),
x ^ 0.5,
exp(log(x) / 2)
))
## # A tibble: 3 × 6
## expression min median `itr/sec` mem_alloc `gc/sec`
## <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
## 1 sqrt(x) 1.22ms 1.45ms 681. 7.63MB 197.
## 2 x^0.5 8.88ms 9.36ms 107. 7.63MB 22.3
## 3 exp(log(x)/2) 7.23ms 7.54ms 130. 7.63MB 25.0
To visualize the result
plot(lb)
## Loading required namespace: tidyr
It is colored according to GC (garbage collection) levels.
Premature optimization is the root of all evil (or at least most of it) in programming.
-Don Knuth
Sources:
library(profvis)
profvis({
data(diamonds, package = "ggplot2")
plot(price ~ carat, data = diamonds)
m <- lm(price ~ carat, data = diamonds)
abline(m, col = "red")
})
First generate test data:
times <- 4e5
cols <- 150
data <- as.data.frame(x = matrix(rnorm(times * cols, mean = 5), ncol = cols))
data <- cbind(id = paste0("g", seq_len(times)), data)
head(data)
## id V1 V2 V3 V4 V5 V6 V7 V8
## 1 g1 5.969774 3.386956 4.358790 6.237727 5.771275 5.174312 5.037446 4.430052
## 2 g2 4.921927 5.633818 5.694171 4.865483 4.153967 5.061355 4.371293 5.463183
## 3 g3 5.800428 5.549560 3.976693 4.171676 4.666550 7.123001 5.098244 5.293764
## 4 g4 7.898909 4.615669 4.377937 4.701138 5.101872 5.295811 4.312929 4.085740
## 5 g5 6.452353 5.880938 5.926650 4.594801 4.482622 5.954435 4.586131 3.444331
## 6 g6 4.469903 6.975048 4.217334 6.039186 3.682259 4.181690 5.154283 5.046682
## V9 V10 V11 V12 V13 V14 V15 V16
## 1 3.945362 6.991074 6.534173 6.317850 5.068947 5.795827 5.498428 4.983893
## 2 5.313754 5.346877 5.352962 4.297411 3.143680 5.640137 5.002181 2.896327
## 3 4.865347 4.281242 4.907693 4.644739 4.526487 5.236702 2.669363 4.749898
## 4 3.162529 3.530056 3.844779 5.075276 4.674053 4.944213 5.056897 6.646479
## 5 5.776090 3.128314 5.455446 5.906633 6.148552 4.399970 4.660836 3.235290
## 6 6.352984 6.289363 5.674194 3.793744 3.325485 5.174891 4.695926 4.282914
## V17 V18 V19 V20 V21 V22 V23 V24
## 1 4.021139 5.065696 6.310364 5.490599 5.055548 4.238561 5.797920 4.650697
## 2 4.800560 4.762682 3.607615 4.842496 6.059028 6.391931 7.143750 4.092829
## 3 5.418325 4.843147 4.703208 5.158282 4.913269 5.997505 6.404710 5.901887
## 4 4.596020 7.193654 5.363815 5.828282 4.825783 4.223200 4.941217 5.421718
## 5 5.168906 3.363993 3.267161 6.299007 3.990615 4.938966 6.021251 4.856988
## 6 3.766488 5.115333 3.881455 5.711343 5.260267 6.811890 6.109790 5.321400
## V25 V26 V27 V28 V29 V30 V31 V32
## 1 6.232738 5.156774 3.934398 3.779090 4.736236 3.174204 6.293351 5.757692
## 2 5.430874 4.537045 6.011686 4.729784 4.512812 5.574587 5.457530 4.388429
## 3 5.027204 4.896893 4.960644 4.795403 4.603998 4.743484 4.563350 3.839632
## 4 6.068461 4.881969 5.554265 4.336586 6.011450 5.118163 4.219831 5.042094
## 5 5.456461 6.307981 4.052145 5.367455 5.202699 4.090536 4.852753 3.555741
## 6 6.915472 5.672714 3.100835 7.315498 5.420842 5.900181 4.055118 4.149567
## V33 V34 V35 V36 V37 V38 V39 V40
## 1 4.955279 6.137107 3.573192 4.722608 6.093101 5.095926 4.211594 5.593647
## 2 4.955159 4.581849 5.322105 5.565669 5.911121 5.071331 3.647852 6.422367
## 3 4.702261 5.063408 5.118905 2.957961 4.564707 2.932893 4.673558 4.913145
## 4 4.108503 3.184510 4.287867 5.577840 5.290784 6.163332 5.269505 3.863132
## 5 6.238919 4.414152 4.087591 4.255676 4.117047 5.126822 5.187829 5.426037
## 6 5.278965 4.367095 2.899642 4.237844 6.699517 5.530437 5.831208 3.738705
## V41 V42 V43 V44 V45 V46 V47 V48
## 1 4.753439 5.169350 4.883482 3.657573 5.669892 4.241260 4.851962 5.456667
## 2 6.187101 4.716270 5.597624 4.413653 4.230248 4.871157 5.293433 6.435590
## 3 5.438515 4.850280 4.626902 4.459188 5.827907 5.663024 3.457956 5.721593
## 4 5.892379 5.762037 5.179667 3.942267 5.778172 5.932535 5.524665 4.771691
## 5 4.901255 3.280638 5.652811 4.206320 5.858897 6.658522 5.633509 4.963152
## 6 3.961880 4.871338 5.252205 4.892354 5.132825 5.681368 7.427848 3.656597
## V49 V50 V51 V52 V53 V54 V55 V56
## 1 3.641526 5.823936 6.202179 4.682491 6.020682 2.888328 4.884376 5.080382
## 2 5.709755 5.460538 4.667820 4.270379 4.517615 5.204095 4.635607 5.231288
## 3 4.349273 5.278583 3.925600 5.579634 4.434822 5.277278 5.749743 5.783613
## 4 6.172233 6.314465 6.106169 5.754318 5.706827 6.369271 5.991514 3.357682
## 5 4.258391 6.554396 5.381009 4.826608 5.104414 5.925495 6.495626 3.528083
## 6 5.196717 5.721946 6.507628 4.720609 4.800197 4.981737 4.804950 4.029324
## V57 V58 V59 V60 V61 V62 V63 V64
## 1 5.731241 3.716672 3.728804 4.225844 5.893676 5.236128 3.998496 5.052559
## 2 3.800630 5.785667 3.602729 5.613489 4.841687 4.655119 5.548548 3.668189
## 3 4.409653 4.344321 4.891546 4.541498 5.192576 5.162769 5.449993 4.251714
## 4 5.511138 4.634418 4.294338 5.769028 5.738561 4.282375 5.683305 3.788938
## 5 5.250808 6.254993 7.384261 5.711009 4.899243 4.683286 5.427239 4.776874
## 6 4.003427 4.317158 5.344575 7.102476 3.520562 3.071539 5.494856 6.203147
## V65 V66 V67 V68 V69 V70 V71 V72
## 1 2.297626 5.044528 4.422164 4.317439 5.039205 5.306640 4.291307 5.752981
## 2 5.042682 5.209917 6.233088 3.673400 5.550484 4.341351 5.621920 5.107727
## 3 4.954122 3.964389 3.632149 5.902254 5.180800 5.905438 6.001834 4.165093
## 4 4.699785 5.317041 4.387484 2.815468 5.945561 4.348289 4.949502 3.373685
## 5 5.688282 4.922704 5.282254 5.433178 7.002321 5.236936 6.255325 3.910518
## 6 6.325418 4.847684 7.013564 5.339205 3.909243 5.400927 5.143125 3.502081
## V73 V74 V75 V76 V77 V78 V79 V80
## 1 4.452937 4.556165 4.878821 5.494998 4.525805 5.598838 5.510052 4.335159
## 2 5.582111 5.070413 3.553271 3.302500 4.883199 5.669165 6.934187 5.756974
## 3 7.069034 5.077702 4.823577 3.950463 2.968174 2.981413 5.337635 5.745397
## 4 3.325205 5.391997 5.576931 4.869578 5.218397 3.661650 4.783694 7.297424
## 5 4.194588 5.936640 5.947325 6.494725 6.002182 4.566777 3.771062 5.382190
## 6 3.909446 4.450924 4.379779 4.845526 5.499161 2.473238 4.043085 3.965896
## V81 V82 V83 V84 V85 V86 V87 V88
## 1 3.841143 4.854982 5.862762 6.055590 5.570398 5.605576 5.207664 4.721803
## 2 6.290887 3.717678 5.352763 4.009332 4.243106 6.890147 6.300223 6.049974
## 3 4.865970 3.963592 4.502974 4.874401 4.185517 4.027433 4.543304 4.727077
## 4 5.910201 6.051057 3.789773 4.745911 4.428981 3.917806 4.718456 4.085395
## 5 4.958616 2.225116 5.862221 4.738276 5.119915 3.244853 5.478629 5.415818
## 6 3.419392 5.939352 4.620255 3.502446 7.679182 7.079275 4.400633 5.172185
## V89 V90 V91 V92 V93 V94 V95 V96
## 1 4.638047 4.648935 5.864618 4.781209 4.581879 4.869569 4.365018 3.933875
## 2 5.293047 4.015187 6.509665 4.338551 4.430226 4.618964 4.688742 5.389455
## 3 4.302911 4.715384 7.099666 4.696381 4.224483 3.484758 4.979978 5.162419
## 4 6.146197 5.816736 2.364329 5.978111 3.916872 5.119176 6.135774 6.309547
## 5 7.350988 5.664298 5.541353 6.000726 5.227135 5.361953 3.361382 4.935839
## 6 5.316849 5.509925 4.510909 3.316033 5.467789 5.780469 4.307894 5.820533
## V97 V98 V99 V100 V101 V102 V103 V104
## 1 4.926531 5.451521 5.017248 5.636656 4.776153 6.359240 5.665790 6.934336
## 2 4.288900 5.201093 5.817945 5.068234 5.083228 6.793019 5.144909 3.751889
## 3 4.980340 3.123095 4.738060 4.542765 4.638650 5.814557 7.978221 5.774194
## 4 4.931209 4.736936 3.554819 3.171603 4.795017 5.165080 4.827155 4.330540
## 5 5.301755 4.509777 5.518396 4.667021 5.465201 5.009373 5.611512 5.683406
## 6 3.036929 5.952751 3.954832 5.889513 5.194754 4.391003 5.226084 4.291935
## V105 V106 V107 V108 V109 V110 V111 V112
## 1 4.916882 3.682590 3.553191 6.178398 6.588264 3.722993 4.291036 6.209619
## 2 5.029695 4.816148 3.061924 4.118335 5.784075 5.031129 4.164354 6.112392
## 3 4.389153 4.837424 4.265490 5.130126 5.268602 5.803634 5.390342 6.347929
## 4 4.128308 5.855140 4.941786 4.834643 4.263134 3.589032 5.405492 4.955846
## 5 6.508588 4.252722 4.999025 4.962796 5.694972 4.879281 4.167013 4.673221
## 6 5.436822 4.295753 3.682477 4.373195 5.968806 4.794256 4.186473 4.809896
## V113 V114 V115 V116 V117 V118 V119 V120
## 1 3.578261 6.869507 2.238355 4.383842 4.417523 6.048996 4.675138 5.842443
## 2 5.941705 4.701242 3.920104 7.204458 4.485306 7.334596 5.865684 4.203705
## 3 7.672582 3.462561 5.254552 4.463835 4.016373 6.311201 4.517438 4.700522
## 4 4.916321 4.609474 5.251307 4.959063 3.014013 6.210320 4.482665 5.731177
## 5 4.497916 6.566364 3.930197 4.736816 5.845934 6.098909 4.571074 6.489460
## 6 4.757196 5.410635 5.014765 5.003775 4.754222 4.567107 3.474825 6.064451
## V121 V122 V123 V124 V125 V126 V127 V128
## 1 5.167029 5.750582 4.907758 5.865223 5.261798 4.956968 5.643643 5.812596
## 2 3.606904 5.627711 5.587762 5.654150 5.269067 4.922916 4.823450 5.121979
## 3 4.729896 4.529697 5.358662 5.112413 4.884113 6.012762 6.285833 4.477839
## 4 3.996006 6.642559 4.243635 6.820607 6.025009 4.956143 5.423347 5.900742
## 5 4.710165 5.496734 4.419287 4.525932 4.282508 2.885029 5.337152 4.868927
## 6 5.758978 3.747149 4.523980 5.090233 4.284195 5.217521 5.638891 3.792649
## V129 V130 V131 V132 V133 V134 V135 V136
## 1 5.078079 6.353728 4.885178 5.647268 5.396727 4.887594 3.792632 5.802780
## 2 5.395293 3.354223 5.184184 3.709929 4.930617 4.181895 5.545752 6.747582
## 3 3.045543 4.332591 3.238270 5.711071 5.764195 4.717790 4.795234 5.768002
## 4 6.472146 4.903896 3.245853 5.172849 5.101201 4.803509 6.542996 5.211635
## 5 5.371391 4.888848 4.789275 4.892961 7.136214 6.075162 5.259250 4.575743
## 6 7.184338 5.835156 4.187135 5.513385 2.575642 5.078832 4.306050 5.178530
## V137 V138 V139 V140 V141 V142 V143 V144
## 1 3.985640 5.488433 5.278389 6.098580 4.490176 4.915071 4.637860 6.020174
## 2 5.242106 6.356912 4.750170 5.434490 5.296510 5.959623 5.673556 2.154634
## 3 5.869586 5.333952 4.018724 7.537137 2.985009 6.095209 4.210550 4.409794
## 4 6.129498 4.581313 5.013140 4.977602 5.929854 5.276177 4.564731 4.240755
## 5 4.704665 4.872247 5.141363 6.538962 3.871012 5.762745 5.553001 5.418704
## 6 4.919545 6.425867 3.329013 4.154823 3.671133 6.976441 5.135457 4.695289
## V145 V146 V147 V148 V149 V150
## 1 3.860709 4.360477 4.848465 5.005363 6.678806 3.734484
## 2 6.513392 3.879315 4.890934 5.383120 5.062598 6.181172
## 3 5.687756 7.131847 6.448625 3.967368 5.851488 5.865080
## 4 3.590409 5.770194 5.654670 6.992913 6.333515 5.449932
## 5 4.207660 5.257442 4.447403 3.551037 4.092570 5.093482
## 6 5.825259 4.921798 3.497042 3.861398 4.491844 5.183752
Original code for centering columns of a dataframe:
profvis({
# Store in another variable for this run
data1 <- data
# Get column means
means <- apply(data1[, names(data1) != "id"], 2, mean)
# Subtract mean from each column
for (i in seq_along(means)) {
data1[, names(data1) != "id"][, i] <-
data1[, names(data1) != "id"][, i] - means[i]
}
})
Profile apply
vs colMeans
vs
lapply
vs vapply
:
profvis({
data1 <- data
# Four different ways of getting column means
means <- apply(data1[, names(data1) != "id"], 2, mean)
means <- colMeans(data1[, names(data1) != "id"])
means <- lapply(data1[, names(data1) != "id"], mean)
means <- vapply(data1[, names(data1) != "id"], mean, numeric(1))
})
We decide to use vapply
:
profvis({
data1 <- data
means <- vapply(data1[, names(data1) != "id"], mean, numeric(1))
for (i in seq_along(means)) {
data1[, names(data1) != "id"][, i] <- data1[, names(data1) != "id"][, i] - means[i]
}
})
Calculate mean and center in one pass:
profvis({
data1 <- data
# Given a column, normalize values and return them
col_norm <- function(col) {
col - mean(col)
}
# Apply the normalizer function over all columns except id
data1[, names(data1) != "id"] <-
lapply(data1[, names(data1) != "id"], col_norm)
})
Original code for cumulative sums:
data <- data.frame(value = runif(1e5))
profvis({
data$sum[1] <- data$value[1]
for (i in seq(2, nrow(data))) {
data$sum[i] <- data$sum[i-1] + data$value[i]
}
})
Write a function to avoid expensive indexing by $
:
profvis({
csum <- function(x) {
if (length(x) < 2) return(x)
sum <- x[1]
for (i in seq(2, length(x))) {
sum[i] <- sum[i-1] + x[i]
}
sum
}
data$sum <- csum(data$value)
})
Pre-allocate vector:
data <- data.frame(value = runif(1e6))
profvis({
csum2 <- function(x) {
if (length(x) < 2) return(x)
sum <- numeric(length(x)) # Preallocate
sum[1] <- x[1]
for (i in seq(2, length(x))) {
sum[i] <- sum[i-1] + x[i]
}
sum
}
data$sum <- csum2(data$value)
})
Modularize big projects into small functions. Profile functions as early and as frequently as possible.
Learning sources:
Video: https://vimeo.com/99375765
Advanced R: https://adv-r.hadley.nz/debugging.html
breakpoint
step in/through function
browser()
traceback()
options(error = browser)
,
options(error = NULL)
, Debug
->
On Error
-> Break in Code