--- title: "STK3100 R exercises week 34" author: "Per August Jarval Moen" date: "25/08/2023" output: pdf_document --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` \section{Ex 1.21} We begin by loading in the data. ```{r} data = read.table(file="http://users.stat.ufl.edu/~aa/glm/data/FEV.dat", header = TRUE) ``` The variable \textbf{drug} is a categorical variable. We have to cast it to a categorical variable (called factor in R) for R to treat it as such: ```{r} data$drug = as.factor(data$drug) ``` \subsection{a)} We fit a linear regression model with \textbf{fev1} as the response variable and \textbf{base} as the explanatory variable: ```{r} lm1 = lm(fev1~ base, data = data) summary(lm1) ``` Interpretation: The mean response when \textbf{base} = 0 is `r round(summary(lm1)$coefficients[1,1],4)`. Increasing the baseline measurement by one unit is estimtated to increase the mean response by `r round(summary(lm1)$coefficients[2,1],4)` units. \subsection{b)} We fit a linear regression model with \textbf{fev1} as the response variable and \textbf{drug} as the explanatory variable. Notice that \textbf{drug} is a categorical variable with 3 levels. The linear regression model is actually $$ E(Y_i) = \beta_0 + \beta_1 I(\text{drug}_i = a) + \beta_2 I(\text{drug}_i = b) + \beta_3 I(\text{drug}_i = p).$$ This model is not identifiable since the corresponding design matrix does not have full rank. R will by default set $\beta_1=0$ for identifiability. \leavevmode\newline ```{r} lm2 = lm(fev1~ drug, data = data) summary(lm2) ``` Using the notation above, intercept corresponds to $\beta_0$, drugb corresponds to $\beta_2$ and drugp corresponds to $\beta_3$. \newline \newline Interpretation: The estimated expected value of $Y$ when \textbf{drug} = a is `r round(summary(lm2)$coefficients[1,1],4)`. The estimtaed expected value of $Y$ when \textbf{drug} = b is `r round(summary(lm2)$coefficients[1,1],4)` + `r round(summary(lm2)$coefficients[2,1],4)` = `r round(summary(lm2)$coefficients[2,1] + summary(lm2)$coefficients[1,1],4)`. The expected value of $Y$ when \textbf{drug} = p is `r round(summary(lm2)$coefficients[1,1],4)` + (`r round(summary(lm2)$coefficients[3,1],4)`) = `r round(summary(lm2)$coefficients[3,1] + summary(lm2)$coefficients[1,1],4)`. \subsection{c)} We fit a linear regression model with \textbf{fev1} as the response variable and \textbf{base} and \textbf{drug} as the explanatory variables. Note that R by default sets the coefficient of the first level of the categorical variable to zero for identifiability. \leavevmode\newline ```{r} lm3 = lm(fev1~ base + drug, data = data) summary(lm3) ``` Interpretation: The estimated expected value of $Y$ when \textbf{drug} = a and \textbf{base} = 0 is `r round(summary(lm3)$coefficients[1,1],4)`. The estimated expected value of $Y$ when \textbf{drug} = b and \textbf{base} = 0 is `r round(summary(lm3)$coefficients[1,1],4)` + `r round(summary(lm3)$coefficients[3,1],4)` = `r round(summary(lm3)$coefficients[3,1] + summary(lm3)$coefficients[1,1],4)`. The estimated expected value of $Y$ when \textbf{drug} = p and \textbf{base} = 0 is `r round(summary(lm3)$coefficients[1,1],4)` + (`r round(summary(lm3)$coefficients[4,1],4)`) = `r round(summary(lm3)$coefficients[4,1] + summary(lm3)$coefficients[1,1],4)`. Increasing the baseline measurement by one unit while keeping \textbf{drug} fixed is estimtated to increase the mean response by `r round(summary(lm3)$coefficients[2,1],4)` units. \section{Ex 1.24} We begin by loading in the data. The datafile is a text file with five lines of text preceding the actual data. These are ignored by including the argument skip = 5. ```{r} data = read.table(file="http://users.stat.ufl.edu/~aa/glm/data/Anorexia.dat", skip=5,header = TRUE) data$therapy = as.factor(data$therapy) ``` Let's regress the weight (variable \textbf{after}) onto the therapy variable and the weight before therapy. That is, let's take the weight after therapy as the response, and therapy and weight before treatment as explanatory variables. ```{r} lm.anx = lm(after ~therapy + before, data = data) summary(lm.anx) ``` Interpretations: The estimated mean weight after therapy for a girl receiving therapy b weighing 80 pounds before therapy is `r round(summary(lm.anx)$coefficients[1,1],4)` + `r round(summary(lm.anx)$coefficients[4,1],4)` * 80 = `r round(summary(lm.anx)$coefficients[1,1] + summary(lm.anx)$coefficients[4,1]*80,4)` pounds. Receiving therapy c (while holding the before treatment weight fixed) instead of therapy b is estimated to reduce the estimated weight gain by `r round(-summary(lm.anx)$coefficients[2,1],4)`. Receiving therapy f (while holding the before treatment weight fixed) instead of therapy b is estimated to increase the estimated weight gain by `r round(summary(lm.anx)$coefficients[3,1],4)`. Increasing the weight before treatment, while holding therapy fixed, is estimated to increase the weight after treatment by `r round(summary(lm.anx)$coefficients[4,1],4)`