Ordinal and Multinomial Logistic Regression

class: center, middle, inverse, title-slide

# Ordinal and Multinomial Logistic Regression
### Thierry Warin, PhD
### quantum simulations<a style="color:#6f97d0">*</a>

---

background-image: url(./images/qslogo.PNG)
background-size: 100px
background-position: 90% 8%

# Navigation tips

- Tile view: Just press O (the letter O for Overview) at any point in your slideshow and the tile view appears. Click on a slide to jump to the slide, or press O to exit tile view.

- Draw: Click on the pen icon (top right of the slides) to start drawing.

- Search: click on the loop icon (bottom left of the slides) to start searching.

You can also click on h at any moments to have more navigations tips.

---
class: inverse, center, middle

# Outline

---
background-image: url(./images/qslogo.PNG)
background-size: 100px
background-position: 90% 8%

### outline

1. Ordinal logistic regression

2. Multinomial Logistic Regression

---
class: inverse, center, middle

# Ordinal logistic regression

---
background-image: url(./images/qslogo.PNG)
background-size: 100px
background-position: 90% 8%
# Ordinal logistic regression

A study looks at factors that influence the decision of whether to apply to graduate school. College juniors are asked if they are unlikely, somewhat likely, or very likely to apply to graduate school. Hence, our outcome variable has three categories.

- Data on parental educational status, whether the undergraduate institution is public or private, and current GPA is also collected. The researchers have reason to believe that the “distances” between these three points are not equal.

- For example, the “distance” between “unlikely” and “somewhat likely” may be shorter than the distance between “somewhat likely” and “very likely”.

```r
# load the data set and summarize the included variables
dat <- readr::read_csv("https://www.warin.ca/datalake/courses_data/qmibr/session8/ologit.csv")
```

---
background-image: url(./images/qslogo.PNG)
background-size: 100px
background-position: 90% 8%
# Ordinal logistic regression

.pull-left[

```r
head(dat)
```

```
## # A tibble: 6 x 4
##   apply           pared public   gpa
##   <chr>           <dbl>  <dbl> <dbl>
## 1 very likely         0      0  3.26
## 2 somewhat likely     1      0  3.21
## 3 unlikely            1      1  3.94
## 4 somewhat likely     0      0  2.81
## 5 somewhat likely     0      0  2.53
## 6 unlikely            0      1  2.59
```

]

.pull-right[

```r
ftable(xtabs(~ public + apply + pared, data = dat))
```

```
##                        pared   0   1
## public apply                        
## 0      somewhat likely        98  26
##        unlikely              175  14
##        very likely            20  10
## 1      somewhat likely        12   4
##        unlikely               25   6
##        very likely             7   3
```

]

---
background-image: url(./images/qslogo.PNG)
background-size: 100px
background-position: 90% 8%
# Ordinal logistic regression

```r
library(dplyr)
dat <- dat %>% mutate(apply = case_when(apply == "very likely" ~ 2, apply == "somewhat likely" ~ 1
                                        , TRUE ~ 0))
```

---
background-image: url(./images/qslogo.PNG)
background-size: 100px
background-position: 90% 8%
# Ordinal logistic regression

.panelset[
.panel[.panel-name[R Code]

```r
library(ggplot2)
ggplot(dat, aes(x = apply, y = gpa)) +
  geom_boxplot(size = .75) +
  geom_jitter(alpha = .5) +
  facet_grid(pared ~ public, margins = TRUE) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1))
```
]

.panel[.panel-name[Output]

<img src="concept8_files/figure-html/unnamed-chunk-6-1.png" width="300px" style="display: block; margin: auto;" />
]
]

---
background-image: url(./images/qslogo.PNG)
background-size: 100px
background-position: 90% 8%
# Ordinal logistic regression

.panelset[
.panel[.panel-name[R Code]

```r
library(MASS)
## fit ordered logit model and store results 'm'
m <- polr(as.factor(apply) ~ pared + public + gpa, data = dat, Hess=TRUE)

## view a summary of the model
summary(m)
```
]

.panel[.panel-name[Plot]

```
## Call:
## polr(formula = as.factor(apply) ~ pared + public + gpa, data = dat, 
##     Hess = TRUE)
## 
## Coefficients:
##           Value Std. Error t value
## pared   1.04769     0.2658  3.9418
## public -0.05879     0.2979 -0.1974
## gpa     0.61594     0.2606  2.3632
## 
## Intercepts:
##     Value   Std. Error t value
## 0|1  2.2039  0.7795     2.8272
## 1|2  4.2994  0.8043     5.3453
## 
## Residual Deviance: 717.0249 
## AIC: 727.0249
```
]
]

---
background-image: url(./images/qslogo.PNG)
background-size: 100px
background-position: 90% 8%
# Ordinal logistic regression

```r
exp(coef(m))
```

```
##     pared    public       gpa 
## 2.8510579 0.9429088 1.8513972
```

.panelset[
.panel[.panel-name[Parental Education]

- For students whose parents did attend college, the **odds **of being **more likely** (i.e., **very or somewhat likely versus unlikely**) to apply is 2.85 times (i.e., increases 185%) that of students whose parents did not go to college, holding constant all other variables.
    
- For students whose parents did not attend college, the **odds** of being **less likely** to apply (i.e., **unlikely versus somewhat or very likely**) is 2.85 times that of students whose parents did go to college, holding constant all other variables.
]

.panel[.panel-name[School Type]

- For students in public school, the **odds** of being **more likely** (i.e., very or somewhat likely versus unlikely) to apply is 5.71% lower [i.e., (1 -0.943) x 100%] than private school students, holding constant all other variables.
    
- For students in private school, the **odds **of being **more likely** to apply is 1.06 times [i.e., 1/0.943] that of public school students, holding constant all other variables (positive odds ratio).
        
- For students in private school, the **odds** of being **less likely** to apply (i.e., unlikely versus somewhat or very likely) is 5.71% lower than public school students, holding constant all other variables.

- For students in public school, the **odds** of being **less likely** to apply is 1.06 times that of private school students, holding constant all other variables (positive odds ratio).
]

.panel[.panel-name[GPA]

- For every one unit increase in student’s GPA the **odds** of being **more likely** to apply (very or somewhat likely versus unlikely) is multiplied 1.85 times (i.e., increases 85%), holding constant all other variables.
    
- For every one unit decrease in student’s GPA the **odds** of being **less likely** to apply (unlikely versus somewhat or very likely) is multiplied 1.85 times, holding constant all other variables.
]
]

---
class: inverse, center, middle

# Multinomial logistic regression

---
background-image: url(./images/qslogo.PNG)
background-size: 100px
background-position: 90% 8%
# Multinomial logistic regression

People’s occupational choices might be influenced by their parents’ occupations and their own education level.

We can study the relationship of one’s occupation choice with education level and father’s occupation. The occupational choices will be the outcome variable which consists of categories of occupations.

---
background-image: url(./images/qslogo.PNG)
background-size: 100px
background-position: 90% 8%
# Multinomial logistic regression

.panelset[
.panel[.panel-name[R Code]

```r
library(nnet)
library(reshape2)
ml <- readr::read_csv("https://www.warin.ca/datalake/courses_data/qmibr/session8/hsbdemo.csv")
summary(ml)
```
]

.panel[.panel-name[Output]

```
##        id            female              ses               schtyp         
##  Min.   :  1.00   Length:200         Length:200         Length:200        
##  1st Qu.: 50.75   Class :character   Class :character   Class :character  
##  Median :100.50   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :100.50                                                           
##  3rd Qu.:150.25                                                           
##  Max.   :200.00                                                           
##      prog                read           write            math      
##  Length:200         Min.   :28.00   Min.   :31.00   Min.   :33.00  
##  Class :character   1st Qu.:44.00   1st Qu.:45.75   1st Qu.:45.00  
##  Mode  :character   Median :50.00   Median :54.00   Median :52.00  
##                     Mean   :52.23   Mean   :52.77   Mean   :52.65  
##                     3rd Qu.:60.00   3rd Qu.:60.00   3rd Qu.:59.00  
##                     Max.   :76.00   Max.   :67.00   Max.   :75.00  
##     science          socst          honors              awards    
##  Min.   :26.00   Min.   :26.00   Length:200         Min.   :0.00  
##  1st Qu.:44.00   1st Qu.:46.00   Class :character   1st Qu.:0.00  
##  Median :53.00   Median :52.00   Mode  :character   Median :1.00  
##  Mean   :51.85   Mean   :52.41                      Mean   :1.67  
##  3rd Qu.:58.00   3rd Qu.:61.00                      3rd Qu.:2.00  
##  Max.   :74.00   Max.   :71.00                      Max.   :7.00  
##       cid       
##  Min.   : 1.00  
##  1st Qu.: 5.00  
##  Median :10.50  
##  Mean   :10.43  
##  3rd Qu.:15.00  
##  Max.   :20.00
```
]
]

---
background-image: url(./images/qslogo.PNG)
background-size: 100px
background-position: 90% 8%
# Multinomial logistic regression

- The data set contains variables on 200 students.

- The outcome variable is `$prog$`, program type. The predictor variables are social economic status, `$ses$`, a three-level categorical variable and writing score, `$write$`, a continuous variable.

```r
with(ml, table(ses, prog))
```

```
##         prog
## ses      academic general vocation
##   high         42       9        7
##   low          19      16       12
##   middle       44      20       31
```

```r
with(ml, do.call(rbind, tapply(write, prog, function(x) c(Mean = mean(x), SD = sd(x)))))
```

```
##              Mean       SD
## academic 56.25714 7.943343
## general  51.33333 9.397775
## vocation 46.76000 9.318754
```

---
background-image: url(./images/qslogo.PNG)
background-size: 100px
background-position: 90% 8%
# Multinomial logistic regression

First, we need to choose the level of our outcome that we wish to use as our baseline and specify this in the relevel function.

```r
library(nnet)
require(reshape2)
ml$prog <- as.factor(ml$prog)
ml$prog2 <- relevel(ml$prog, ref = "academic")
```

Then, we run our model using multinom. The multinom package does not include p-value calculation for the regression coefficients, so we calculate p-values using Wald tests (here z-tests).

```r
test <- multinom(prog2 ~ ses + write, data = ml)
summary(test)
```

---
background-image: url(./images/qslogo.PNG)
background-size: 100px
background-position: 90% 8%
# Multinomial logistic regression

.panelset[
.panel[.panel-name[Model]

```r
test <- multinom(prog2 ~ ses + write, data = ml)
```

```
## # weights:  15 (8 variable)
## initial  value 219.722458 
## iter  10 value 179.983731
## final  value 179.981726 
## converged
```
]

.panel[.panel-name[Summary]

```r
summary(test)
```

```
## Call:
## multinom(formula = prog2 ~ ses + write, data = ml)
## 
## Coefficients:
##          (Intercept)    seslow sesmiddle       write
## general     1.689478 1.1628411 0.6295638 -0.05793086
## vocation    4.235574 0.9827182 1.2740985 -0.11360389
## 
## Std. Errors:
##          (Intercept)    seslow sesmiddle      write
## general     1.226939 0.5142211 0.4650289 0.02141101
## vocation    1.204690 0.5955688 0.5111119 0.02222000
## 
## Residual Deviance: 359.9635 
## AIC: 375.9635
```
]
]

---
background-image: url(./images/qslogo.PNG)
background-size: 100px
background-position: 90% 8%
# Multinomial logistic regression

```r
exp(coef(test))
```

```
##          (Intercept)   seslow sesmiddle     write
## general     5.416653 3.199009  1.876792 0.9437152
## vocation   69.101326 2.671709  3.575477 0.8926115
```

- The odds (also called the relative risk ratio) for a one-unit increase in the variable `$write$` is `$.9437$` for being in general program vs. academic program.

- The odds (relative risk ratio) switching from `$ses = 1$` to `$3$` is `$.3126$` for being in general program vs. academic program.