class: center, middle, inverse, title-slide # Ordinal and Multinomial Logistic Regression ### Thierry Warin, PhD ### quantum simulations
*
--- background-image: url(./images/qslogo.PNG) background-size: 100px background-position: 90% 8% # Navigation tips - Tile view: Just press O (the letter O for Overview) at any point in your slideshow and the tile view appears. Click on a slide to jump to the slide, or press O to exit tile view. - Draw: Click on the pen icon (top right of the slides) to start drawing. - Search: click on the loop icon (bottom left of the slides) to start searching. You can also click on h at any moments to have more navigations tips. --- class: inverse, center, middle # Outline --- background-image: url(./images/qslogo.PNG) background-size: 100px background-position: 90% 8% ### outline 1. Ordinal logistic regression 2. Multinomial Logistic Regression --- class: inverse, center, middle # Ordinal logistic regression --- background-image: url(./images/qslogo.PNG) background-size: 100px background-position: 90% 8% # Ordinal logistic regression A study looks at factors that influence the decision of whether to apply to graduate school. College juniors are asked if they are unlikely, somewhat likely, or very likely to apply to graduate school. Hence, our outcome variable has three categories. - Data on parental educational status, whether the undergraduate institution is public or private, and current GPA is also collected. The researchers have reason to believe that the “distances” between these three points are not equal. - For example, the “distance” between “unlikely” and “somewhat likely” may be shorter than the distance between “somewhat likely” and “very likely”. ```r # load the data set and summarize the included variables dat <- readr::read_csv("https://www.warin.ca/datalake/courses_data/qmibr/session8/ologit.csv") ``` --- background-image: url(./images/qslogo.PNG) background-size: 100px background-position: 90% 8% # Ordinal logistic regression .pull-left[ ```r head(dat) ``` ``` ## # A tibble: 6 x 4 ## apply pared public gpa ## <chr> <dbl> <dbl> <dbl> ## 1 very likely 0 0 3.26 ## 2 somewhat likely 1 0 3.21 ## 3 unlikely 1 1 3.94 ## 4 somewhat likely 0 0 2.81 ## 5 somewhat likely 0 0 2.53 ## 6 unlikely 0 1 2.59 ``` ] .pull-right[ ```r ftable(xtabs(~ public + apply + pared, data = dat)) ``` ``` ## pared 0 1 ## public apply ## 0 somewhat likely 98 26 ## unlikely 175 14 ## very likely 20 10 ## 1 somewhat likely 12 4 ## unlikely 25 6 ## very likely 7 3 ``` ] --- background-image: url(./images/qslogo.PNG) background-size: 100px background-position: 90% 8% # Ordinal logistic regression ```r library(dplyr) dat <- dat %>% mutate(apply = case_when(apply == "very likely" ~ 2, apply == "somewhat likely" ~ 1 , TRUE ~ 0)) ``` --- background-image: url(./images/qslogo.PNG) background-size: 100px background-position: 90% 8% # Ordinal logistic regression .panelset[ .panel[.panel-name[R Code] ```r library(ggplot2) ggplot(dat, aes(x = apply, y = gpa)) + geom_boxplot(size = .75) + geom_jitter(alpha = .5) + facet_grid(pared ~ public, margins = TRUE) + theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1)) ``` ] .panel[.panel-name[Output] <img src="concept8_files/figure-html/unnamed-chunk-6-1.png" width="300px" style="display: block; margin: auto;" /> ] ] --- background-image: url(./images/qslogo.PNG) background-size: 100px background-position: 90% 8% # Ordinal logistic regression .panelset[ .panel[.panel-name[R Code] ```r library(MASS) ## fit ordered logit model and store results 'm' m <- polr(as.factor(apply) ~ pared + public + gpa, data = dat, Hess=TRUE) ## view a summary of the model summary(m) ``` ] .panel[.panel-name[Plot] ``` ## Call: ## polr(formula = as.factor(apply) ~ pared + public + gpa, data = dat, ## Hess = TRUE) ## ## Coefficients: ## Value Std. Error t value ## pared 1.04769 0.2658 3.9418 ## public -0.05879 0.2979 -0.1974 ## gpa 0.61594 0.2606 2.3632 ## ## Intercepts: ## Value Std. Error t value ## 0|1 2.2039 0.7795 2.8272 ## 1|2 4.2994 0.8043 5.3453 ## ## Residual Deviance: 717.0249 ## AIC: 727.0249 ``` ] ] --- background-image: url(./images/qslogo.PNG) background-size: 100px background-position: 90% 8% # Ordinal logistic regression ```r exp(coef(m)) ``` ``` ## pared public gpa ## 2.8510579 0.9429088 1.8513972 ``` .panelset[ .panel[.panel-name[Parental Education] - For students whose parents did attend college, the **odds **of being **more likely** (i.e., **very or somewhat likely versus unlikely**) to apply is 2.85 times (i.e., increases 185%) that of students whose parents did not go to college, holding constant all other variables. - For students whose parents did not attend college, the **odds** of being **less likely** to apply (i.e., **unlikely versus somewhat or very likely**) is 2.85 times that of students whose parents did go to college, holding constant all other variables. ] .panel[.panel-name[School Type] - For students in public school, the **odds** of being **more likely** (i.e., very or somewhat likely versus unlikely) to apply is 5.71% lower [i.e., (1 -0.943) x 100%] than private school students, holding constant all other variables. - For students in private school, the **odds **of being **more likely** to apply is 1.06 times [i.e., 1/0.943] that of public school students, holding constant all other variables (positive odds ratio). - For students in private school, the **odds** of being **less likely** to apply (i.e., unlikely versus somewhat or very likely) is 5.71% lower than public school students, holding constant all other variables. - For students in public school, the **odds** of being **less likely** to apply is 1.06 times that of private school students, holding constant all other variables (positive odds ratio). ] .panel[.panel-name[GPA] - For every one unit increase in student’s GPA the **odds** of being **more likely** to apply (very or somewhat likely versus unlikely) is multiplied 1.85 times (i.e., increases 85%), holding constant all other variables. - For every one unit decrease in student’s GPA the **odds** of being **less likely** to apply (unlikely versus somewhat or very likely) is multiplied 1.85 times, holding constant all other variables. ] ] --- class: inverse, center, middle # Multinomial logistic regression --- background-image: url(./images/qslogo.PNG) background-size: 100px background-position: 90% 8% # Multinomial logistic regression People’s occupational choices might be influenced by their parents’ occupations and their own education level. We can study the relationship of one’s occupation choice with education level and father’s occupation. The occupational choices will be the outcome variable which consists of categories of occupations. --- background-image: url(./images/qslogo.PNG) background-size: 100px background-position: 90% 8% # Multinomial logistic regression .panelset[ .panel[.panel-name[R Code] ```r library(nnet) library(reshape2) ml <- readr::read_csv("https://www.warin.ca/datalake/courses_data/qmibr/session8/hsbdemo.csv") summary(ml) ``` ] .panel[.panel-name[Output] ``` ## id female ses schtyp ## Min. : 1.00 Length:200 Length:200 Length:200 ## 1st Qu.: 50.75 Class :character Class :character Class :character ## Median :100.50 Mode :character Mode :character Mode :character ## Mean :100.50 ## 3rd Qu.:150.25 ## Max. :200.00 ## prog read write math ## Length:200 Min. :28.00 Min. :31.00 Min. :33.00 ## Class :character 1st Qu.:44.00 1st Qu.:45.75 1st Qu.:45.00 ## Mode :character Median :50.00 Median :54.00 Median :52.00 ## Mean :52.23 Mean :52.77 Mean :52.65 ## 3rd Qu.:60.00 3rd Qu.:60.00 3rd Qu.:59.00 ## Max. :76.00 Max. :67.00 Max. :75.00 ## science socst honors awards ## Min. :26.00 Min. :26.00 Length:200 Min. :0.00 ## 1st Qu.:44.00 1st Qu.:46.00 Class :character 1st Qu.:0.00 ## Median :53.00 Median :52.00 Mode :character Median :1.00 ## Mean :51.85 Mean :52.41 Mean :1.67 ## 3rd Qu.:58.00 3rd Qu.:61.00 3rd Qu.:2.00 ## Max. :74.00 Max. :71.00 Max. :7.00 ## cid ## Min. : 1.00 ## 1st Qu.: 5.00 ## Median :10.50 ## Mean :10.43 ## 3rd Qu.:15.00 ## Max. :20.00 ``` ] ] --- background-image: url(./images/qslogo.PNG) background-size: 100px background-position: 90% 8% # Multinomial logistic regression - The data set contains variables on 200 students. - The outcome variable is `\(prog\)`, program type. The predictor variables are social economic status, `\(ses\)`, a three-level categorical variable and writing score, `\(write\)`, a continuous variable. ```r with(ml, table(ses, prog)) ``` ``` ## prog ## ses academic general vocation ## high 42 9 7 ## low 19 16 12 ## middle 44 20 31 ``` ```r with(ml, do.call(rbind, tapply(write, prog, function(x) c(Mean = mean(x), SD = sd(x))))) ``` ``` ## Mean SD ## academic 56.25714 7.943343 ## general 51.33333 9.397775 ## vocation 46.76000 9.318754 ``` --- background-image: url(./images/qslogo.PNG) background-size: 100px background-position: 90% 8% # Multinomial logistic regression First, we need to choose the level of our outcome that we wish to use as our baseline and specify this in the relevel function. ```r library(nnet) require(reshape2) ml$prog <- as.factor(ml$prog) ml$prog2 <- relevel(ml$prog, ref = "academic") ``` Then, we run our model using multinom. The multinom package does not include p-value calculation for the regression coefficients, so we calculate p-values using Wald tests (here z-tests). ```r test <- multinom(prog2 ~ ses + write, data = ml) summary(test) ``` --- background-image: url(./images/qslogo.PNG) background-size: 100px background-position: 90% 8% # Multinomial logistic regression .panelset[ .panel[.panel-name[Model] ```r test <- multinom(prog2 ~ ses + write, data = ml) ``` ``` ## # weights: 15 (8 variable) ## initial value 219.722458 ## iter 10 value 179.983731 ## final value 179.981726 ## converged ``` ] .panel[.panel-name[Summary] ```r summary(test) ``` ``` ## Call: ## multinom(formula = prog2 ~ ses + write, data = ml) ## ## Coefficients: ## (Intercept) seslow sesmiddle write ## general 1.689478 1.1628411 0.6295638 -0.05793086 ## vocation 4.235574 0.9827182 1.2740985 -0.11360389 ## ## Std. Errors: ## (Intercept) seslow sesmiddle write ## general 1.226939 0.5142211 0.4650289 0.02141101 ## vocation 1.204690 0.5955688 0.5111119 0.02222000 ## ## Residual Deviance: 359.9635 ## AIC: 375.9635 ``` ] ] --- background-image: url(./images/qslogo.PNG) background-size: 100px background-position: 90% 8% # Multinomial logistic regression ```r exp(coef(test)) ``` ``` ## (Intercept) seslow sesmiddle write ## general 5.416653 3.199009 1.876792 0.9437152 ## vocation 69.101326 2.671709 3.575477 0.8926115 ``` - The odds (also called the relative risk ratio) for a one-unit increase in the variable `\(write\)` is `\(.9437\)` for being in general program vs. academic program. - The odds (relative risk ratio) switching from `\(ses = 1\)` to `\(3\)` is `\(.3126\)` for being in general program vs. academic program.