rm(list = ls()) # clean-up workspace
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.3 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.3 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(faraway)
set.seed(7360)
student.groups <- matrix(c("Agbomola, Oluwasegun Joshua", "Zimmer, Mattie",
"Vaduthala, Nathaniel", "Collopy, John",
"Berko, Abena", "Weaver, Kathleen",
"Alan, Baris", "Argentino, John",
"Qu, Jiazheng", "Huang, Yongtai",
"Mehran, Syed", "Olson, Aidan",
"Kalgi, Ketan Vinod", "Zhang, Yipeng",
"Carlino, Delia", "Zhu, Kyra",
"Islam, Rubaiyat Bin", "de la Pena, Andrew",
"Lopez Santander, John Jairo", "Trinh, Lan",
"Uddin, Moslem", "Sakran, Naufil"), 11, 2, TRUE) %>%
apply(1, paste, collapse = " & ") %>%
sample() %>%
matrix(3, 4, byrow = TRUE, dimnames = list(c("1st", "2nd", "3rd"),
c("Dec 1", "Dec 4", "Dec 6", "Dec 8")))
student.groups[3, 4] <- ""
print(student.groups)
## Dec 1
## 1st "Islam, Rubaiyat Bin & de la Pena, Andrew"
## 2nd "Uddin, Moslem & Sakran, Naufil"
## 3rd "Mehran, Syed & Olson, Aidan"
## Dec 4
## 1st "Carlino, Delia & Zhu, Kyra"
## 2nd "Vaduthala, Nathaniel & Collopy, John"
## 3rd "Lopez Santander, John Jairo & Trinh, Lan"
## Dec 6
## 1st "Agbomola, Oluwasegun Joshua & Zimmer, Mattie"
## 2nd "Qu, Jiazheng & Huang, Yongtai"
## 3rd "Kalgi, Ketan Vinod & Zhang, Yipeng"
## Dec 8
## 1st "Berko, Abena & Weaver, Kathleen"
## 2nd "Alan, Baris & Argentino, John"
## 3rd ""
The question concerns data from a case-control study of esophageal cancer in Ileet-Vilaine, France. The data is distributed with R and may be obtained along with a description of the variables by:
data(esoph)
help(esoph)
Comment on the relationships seen in the plots.
lmod <- glm(chd ~ height + cigs, family = binomial, wcgs)
gdf <- wcgs %>%
mutate(residuals = residuals(lmod), linpred = predict(lmod)) %>%
group_by(cigs) %>%
summarise(residuals = mean(residuals), count = n())
gdf %>%
ggplot(mapping = aes(x = cigs, y = residuals, size = sqrt(count))) +
geom_point() +
theme_bw()
Use AIC as a criterion to select a model using the step
function. Which model is selected?
All three factors are ordered and so special contrasts have been used
approriate for ordered factors involving linear, quadratic and cubic
terms. Further simplification of the model may be possible by
eliminating some of these terms. Use the unclass
function
to convert the factors to a numerical representation and check whether
the model may be simplified.
Does your final model fit the data? Is the test you make accurate for this data?
Check for outlier in your final model.
What is the predicted effect of moving one category higher in alcohol consumption?
Compute a 95% confidence interval for this predicted effect.