---
title: "STK3100 R exercises week 34"
author: "Per August Jarval Moen"
date: "25/08/2023"
output: pdf_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

\section{Ex 1.21}
We begin by loading in the data. 
```{r}
data = read.table(file="http://users.stat.ufl.edu/~aa/glm/data/FEV.dat", header = TRUE)
```
The variable \textbf{drug} is a categorical variable. We have to cast it to a categorical variable (called factor in R) for R to treat it as such: 
```{r}
data$drug = as.factor(data$drug)
```
\subsection{a)}
We fit a linear regression model with \textbf{fev1} as the response variable and \textbf{base} as the explanatory variable: 
```{r}
lm1 = lm(fev1~ base, data = data)
summary(lm1)
```
Interpretation: The mean response when \textbf{base} = 0 is `r round(summary(lm1)$coefficients[1,1],4)`. Increasing the baseline measurement by one unit is estimtated to increase the mean response by `r round(summary(lm1)$coefficients[2,1],4)` units.

\subsection{b)}
We fit a linear regression model with \textbf{fev1} as the response variable and \textbf{drug} as the explanatory variable. Notice that \textbf{drug} is a categorical variable with 3 levels. The linear regression model is actually
$$
E(Y_i)  = \beta_0 + \beta_1 I(\text{drug}_i = a) + \beta_2 I(\text{drug}_i = b) + \beta_3 I(\text{drug}_i = p).$$
This model is not identifiable since the corresponding design matrix does not have full rank. R will by default set $\beta_1=0$ for identifiability. 
\leavevmode\newline

```{r}
lm2 = lm(fev1~ drug, data = data)
summary(lm2)
```
Using the notation above, intercept corresponds to $\beta_0$, drugb corresponds to $\beta_2$ and drugp corresponds to $\beta_3$. 
\newline
\newline
Interpretation: 
The estimated expected value of $Y$ when \textbf{drug} = a is `r round(summary(lm2)$coefficients[1,1],4)`.
The estimtaed expected value of $Y$ when \textbf{drug} = b is `r round(summary(lm2)$coefficients[1,1],4)` + `r round(summary(lm2)$coefficients[2,1],4)` = `r round(summary(lm2)$coefficients[2,1] + summary(lm2)$coefficients[1,1],4)`.
The expected value of $Y$ when \textbf{drug} = p is `r round(summary(lm2)$coefficients[1,1],4)` + (`r round(summary(lm2)$coefficients[3,1],4)`) = `r round(summary(lm2)$coefficients[3,1] + summary(lm2)$coefficients[1,1],4)`.

\subsection{c)}
We fit a linear regression model with \textbf{fev1} as the response variable and \textbf{base} and \textbf{drug} as the explanatory variables. Note that R by default sets the coefficient of the first level of the categorical variable to zero for identifiability. 
\leavevmode\newline

```{r}
lm3 = lm(fev1~ base + drug, data = data)
summary(lm3)
```
Interpretation: 
The estimated expected value of $Y$ when \textbf{drug} = a and \textbf{base} = 0 is `r round(summary(lm3)$coefficients[1,1],4)`.
The estimated expected value of $Y$ when \textbf{drug} = b and \textbf{base} = 0 is `r round(summary(lm3)$coefficients[1,1],4)` + `r round(summary(lm3)$coefficients[3,1],4)` = `r round(summary(lm3)$coefficients[3,1] + summary(lm3)$coefficients[1,1],4)`.
The estimated expected value of $Y$ when \textbf{drug} = p and \textbf{base} = 0 is `r round(summary(lm3)$coefficients[1,1],4)` + (`r round(summary(lm3)$coefficients[4,1],4)`) = `r round(summary(lm3)$coefficients[4,1] + summary(lm3)$coefficients[1,1],4)`.
Increasing the baseline measurement by one unit while keeping \textbf{drug} fixed is estimtated to increase the mean response by `r round(summary(lm3)$coefficients[2,1],4)` units.

\section{Ex 1.24}
We begin by loading in the data. The datafile is a text file with five lines of text preceding the actual data. These are ignored by including the argument skip = 5.
```{r}
data = read.table(file="http://users.stat.ufl.edu/~aa/glm/data/Anorexia.dat", skip=5,header = TRUE)
data$therapy = as.factor(data$therapy)
```
Let's regress the weight (variable \textbf{after}) onto the therapy variable and the weight before therapy. That is, let's take the weight after therapy as the response, and therapy and weight before treatment as explanatory variables. 
```{r}
lm.anx = lm(after ~therapy + before, data = data)
summary(lm.anx)
```
Interpretations: The estimated mean weight after therapy for a girl receiving therapy b weighing 80 pounds before therapy is `r round(summary(lm.anx)$coefficients[1,1],4)` + `r round(summary(lm.anx)$coefficients[4,1],4)` * 80 = `r round(summary(lm.anx)$coefficients[1,1] + summary(lm.anx)$coefficients[4,1]*80,4)` pounds. Receiving therapy c (while holding the before treatment weight fixed) instead of therapy b is estimated to reduce the estimated weight gain by `r round(-summary(lm.anx)$coefficients[2,1],4)`. Receiving therapy f (while holding the before treatment weight fixed) instead of therapy b is estimated to increase the estimated weight gain by `r round(summary(lm.anx)$coefficients[3,1],4)`. Increasing the weight before treatment, while holding therapy fixed, is estimated to increase the weight after treatment by `r round(summary(lm.anx)$coefficients[4,1],4)`