A short tutorial on how to interpreting regression coefficients, including interaction coefficients.
Get source code for this RMarkdown script here.
Donate and become a patron: If you find value in what I do and have learned something from my site, please consider becoming a patron. It takes me many hours to research, learn, and put together tutorials. Your support really matters.
This tutorial provides a step-by-step introduction to interpreting regression coefficients in linear models. I will use the built-in dataset mtcars
.
General guidelines for interpreting regression coefficients
library(data.table) # to manipulate dataframes
library(interactions) # to plot interactions later on
library(ggplot2)
Have a look at the mtcars
dataset.
dt1 <- as.data.table(mtcars) # convert to datatable
dt1
mpg cyl disp hp drat wt qsec vs am gear carb
1: 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
2: 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
3: 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
4: 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
5: 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
6: 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
7: 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
8: 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
9: 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
10: 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
11: 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
12: 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
13: 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
14: 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
15: 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
16: 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
17: 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
18: 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
19: 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
20: 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
21: 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
22: 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
23: 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
24: 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
25: 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
26: 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
27: 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
28: 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
29: 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
30: 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
31: 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
32: 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
mpg cyl disp hp drat wt qsec vs am gear carb
head(dt1) # check data
mpg cyl disp hp drat wt qsec vs am gear carb
1: 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
2: 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
3: 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
4: 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
5: 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
6: 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
model_continuous_predictor <- lm(mpg ~ wt, dt1)
# summary(model_continuous_predictor)
coef(model_continuous_predictor)
(Intercept) wt
37.285126 -5.344472
wt
increases by 1 (unit), mpg
changes by this amountwt
is 0, mpg
is this value (i.e., intercept: the value of mpg
when wt = 0
, of the value of the outcome variable when the predictor is 0)Note that in the data, wt
only takes on values between 1 and 5, so the intercept of 37.29 is an extrapolation of the regression line to wt
values that don’t exist in our data (see figure below).
ggplot(dt1, aes(wt, mpg)) +
geom_vline(xintercept = 0) +
geom_point() +
geom_smooth(method = 'lm', formula = y ~ poly(x, 1), fullrange = TRUE) +
scale_x_continuous(limits = c(-1, 7), breaks = -1:7) +
annotate("text", x = 1.7, y = coef(model_continuous_predictor)[1] + 2,
label = paste0(round(coef(model_continuous_predictor)[1], 2), " (intercept)"),
size = 6)
head(dt1) # check data (vs is a binary variable with just 0 and 1)
mpg cyl disp hp drat wt qsec vs am gear carb
1: 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
2: 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
3: 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
4: 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
5: 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
6: 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
dt1[, vs_factor := as.factor(vs)] # turn vs into a factor
model_categorical_predictor <- lm(mpg ~ vs_factor, dt1)
# summary(model_categorical_predictor)
coef(model_categorical_predictor)
(Intercept) vs_factor1
16.616667 7.940476
When the categorical predictor has only two levels (coded 0 and 1), we can use the numeric variable as the predictor. We’ll get the same results as above.
coef(lm(mpg ~ vs_factor, dt1)) # factor predictor
(Intercept) vs_factor1
16.616667 7.940476
coef(lm(mpg ~ vs, dt1)) # numeric predictor
(Intercept) vs
16.616667 7.940476
vs_factor
increases by 1 (unit), mpg
changes by this amount; here, when vs = 0
is one categorical level/condition, and vs = 1 is the second categorical level/condition; thus this value refers to the difference in mean values between the two conditionsvs_factor
is 0, mpg
is this value (i.e., intercept: the value of y when x = 0); thus, the intercept is the mean of the values when vs = 0
.To show you the interpretation of the coefficients is indeed correct, let’s manually compute the mean of the two conditions (vs = 0
, vs = 1
) and compute their difference.
# compute mean mpg for each vs condition
vs_condition_means <- dt1[, .(mpg_group_mean = mean(mpg)), keyby = vs]
vs_condition_means
vs mpg_group_mean
1: 0 16.61667
2: 1 24.55714
The mean mpg
value for the group vs = 0
is the same as the intercept value from the regression above (16.62).
# compute difference in mpg value between vs conditions
vs_condition_means$mpg_group_mean[2] - vs_condition_means$mpg_group_mean[1]
[1] 7.940476
The difference in mean mpg
values between the two vs
conditions is the same as the slope (beta coefficient) from the regression above (7.94).
ggplot(dt1, aes(vs, mpg)) +
geom_point() +
geom_smooth(method = 'lm', formula = y ~ poly(x, 1), fullrange = TRUE)
head(dt1) # check data (cyl is a categorical predictor with 3 levels)
mpg cyl disp hp drat wt qsec vs am gear carb vs_factor
1: 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 0
2: 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 0
3: 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 1
4: 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 1
5: 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 0
6: 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 1
dt1[, cyl_factor := as.factor(cyl)] # turn cyl into a factor
model_categorical_predictor_3 <- lm(mpg ~ cyl_factor, dt1)
# summary(model_categorical_predictor_3)
coef(model_categorical_predictor_3)
(Intercept) cyl_factor6 cyl_factor8
26.663636 -6.920779 -11.563636
When the categorical predictor has three or more levels, we can’t use the numeric variable as the predictor because the coefficients will be different.
coef(lm(mpg ~ cyl, dt1)) # numeric predictor
(Intercept) cyl
37.88458 -2.87579
coef(lm(mpg ~ cyl_factor, dt1)) # factor predictor
(Intercept) cyl_factor6 cyl_factor8
26.663636 -6.920779 -11.563636
Interpreting the coefficients in the model with the categorical predictor
When we convert variables to factors or characters, R
automatically represents the “smallest” condition (1 is smaller than 9; “a” is smaller than “b”) as the intercept. In other words, this condition is treated assigned the value 0 and all other conditions are assigned 1. That is, R
by default uses “dummy coding”.
26.66: when cyl_factor
is 4 (or the “smallest” cyl_factor
value in the dataset), mpg
is this value (i.e., intercept); thus, the intercept is the mean of the values when cyl_factor = 4
.
-6.92: difference in mean mpg
values between the conditions cyl_factor = 4
and cyl_factor = 6
-11.56: difference in mean mpg
values between the conditions cyl_factor = 4
and cyl_factor = 8
To show you the interpretation of the coefficients is indeed correct, let’s manually compute the mean of the three conditions (cyl_factor
is 4, 6, 8) and compute their differences.
# compute mean mpg for each vs condition
cyl_condition_means <- dt1[, .(mpg_group_mean = mean(mpg)), keyby = cyl_factor]
cyl_condition_means
cyl_factor mpg_group_mean
1: 4 26.66364
2: 6 19.74286
3: 8 15.10000
The mean mpg
value for the group cyl_factor = 4
is the same as the intercept value from the regression above.
# compute difference in mpg value between cyl = 6 and cyl = 4
cyl_condition_means$mpg_group_mean[2] - cyl_condition_means$mpg_group_mean[1]
[1] -6.920779
coef(model_categorical_predictor_3)[2] # beta coefficient
cyl_factor6
-6.920779
# compute difference in mpg value between cyl = 8 and cyl = 4
cyl_condition_means$mpg_group_mean[3] - cyl_condition_means$mpg_group_mean[1]
[1] -11.56364
coef(model_categorical_predictor_3)[3] # beta coefficient
cyl_factor8
-11.56364
ggplot(dt1, aes(cyl, mpg)) +
geom_point() +
geom_smooth(method = 'lm', formula = y ~ poly(x, 1), fullrange = TRUE)
When fitting the regression model, R
uses dummy coding by default. Hence, the condition cyl = 4
is actually assigned 0 (and thus is the intercept).
Let’s fit a regression model that includes an interaction term.
model_interaction1 <- lm(mpg ~ disp * vs_factor, data = dt1)
coef(model_interaction1)
(Intercept) disp vs_factor1 disp:vs_factor1
25.63755459 -0.02936965 8.39770888 -0.04218648
How do we interpret the interaction coefficient?
For every 1 unit increase in vs_factor
(coded 0 and 1), the coefficient of disp
changes by -0.042. READ THAT SENTENCE AGAIN TO SLOWLY DIGEST IT! It’s the change in the COEFFICIENT of disp
when vs_factor
increases by 1 (unit).
Let’s fit separate models for the two vs_factor
conditions to verify the statement/interpretation above.
Fit linear models (mpg ~ disp
) separately for vs_factor = 0
and vs_factor = 1
.
model_mpg_disp_vs0 <- lm(mpg ~ disp, data = dt1[vs_factor == 0]) # blue line in figure below
model_mpg_disp_vs1 <- lm(mpg ~ disp, data = dt1[vs_factor == 1]) # orange line in figure below
Check the coefficeints of disp
for these two models
coef(model_mpg_disp_vs0)
(Intercept) disp
25.63755459 -0.02936965
coef(model_mpg_disp_vs1)
(Intercept) disp
34.03526346 -0.07155613
Here’s a reminder (again) of how to interpret the disp * vs_factor
interaction coefficient in the interaction model (mpg ~ disp * vs_factor
): For every 1 unit increase in vs_factor
(coded 0 and 1), the coefficient of disp
changes by -0.042. Or the change in the COEFFICIENT of disp
when vs_factor
increases by 1 (unit).
Let’s compute the difference of the disp
coefficients in the two models above (where vs
is 0 and 1).
coef(model_mpg_disp_vs1)['disp'] - coef(model_mpg_disp_vs0)['disp']
disp
-0.04218648
The difference in the disp
coefficients (-0.042) in the two models (where vs_factor
is 1 or 0) is identical to the interaction coefficient (disp * factor
: -0.042) in the model_interaction1
model.
In other words, the interaction coefficient is the difference between the values of the two slopes (i.e., coefficients) (see figure below).
mpg ~ disp
when vs_factor = 0
: disp
coefficient is -0.029mpg ~ disp
when vs_factor = 1
: disp
coefficient is -0.072disp
in the mpg ~ disp
model is more negative (by disp * factor
: -0.042) when vs_factor = 1
than when vs_factor = 0
.
interact_plot(model_interaction1, pred = disp, modx = vs_factor)
You can interpret the interaction coefficients in all models (continuous or categorical variables) the same way.
model_interaction2 <- lm(mpg ~ disp * wt, data = dt1) # all continuous predictors
coef(model_interaction2)
(Intercept) disp wt disp:wt
44.08199770 -0.05635816 -6.49567966 0.01170542
disp:wt
= 0.012: the change in the coefficient of disp
when wt
increases by 1 unit (or the reverse is also fine: the change in the coefficient of wt
when disp
increases by 1 unit)When all predictors are continuous variables, the convention is to plot the effect of one regressor at different levels (+/- 1 SD and mean value) of the other regressor.
interact_plot(model_interaction2, pred = wt, modx = disp)
interact_plot(model_interaction2, pred = disp, modx = wt)
No matter how complicated your interaction terms are (3 or 4 or 10-way interactions), you interpret the coefficients the same way!
model_interaction4 <- lm(mpg ~ disp * wt * qsec * drat, data = dt1) # all continuous predictors
coef(model_interaction4)["disp:wt:qsec:drat"] # the 4-way interaction
disp:wt:qsec:drat
0.0301177
There are many ways to interpret the coefficient disp:wt:qsec:drat
= 0.03:
disp
increases by 1, the wt:qsec:drat
coefficient (slope) changes by 0.03wt
increases by 1, the disp:qsec:drat
coefficient (slope) changes by 0.03qsec
increases by 1, the disp:wt:drat
coefficient (slope) changes by 0.03drat
increases by 1, the disp:wt:qsec
coefficient (slope) changes by 0.03You can also interpret the three- or two-way interactions in the same model in the same way. You get it…
If you see mistakes or want to suggest changes, please create an issue on the source repository.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/hauselin/rtutorialsite, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Lin (2019, July 6). Data science: Interpreting regression coefficients (including interaction coefficients). Retrieved from https://hausetutorials.netlify.com/posts/2019-07-06-interpreting-interaction-regression-coefficients/
BibTeX citation
@misc{lin2019interpreting, author = {Lin, Hause}, title = {Data science: Interpreting regression coefficients (including interaction coefficients)}, url = {https://hausetutorials.netlify.com/posts/2019-07-06-interpreting-interaction-regression-coefficients/}, year = {2019} }