A short tutorial on how to interpreting regression coefficients, including interaction coefficients.
Get source code for this RMarkdown script here.
Donate and become a patron: If you find value in what I do and have learned something from my site, please consider becoming a patron. It takes me many hours to research, learn, and put together tutorials. Your support really matters.
This tutorial provides a step-by-step introduction to interpreting regression coefficients in linear models. I will use the built-in dataset mtcars.
General guidelines for interpreting regression coefficients
library(data.table) # to manipulate dataframes
library(interactions) # to plot interactions later on
library(ggplot2)
Have a look at the mtcars dataset.
dt1 <- as.data.table(mtcars) # convert to datatable
dt1
mpg cyl disp hp drat wt qsec vs am gear carb
1: 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
2: 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
3: 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
4: 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
5: 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
6: 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
7: 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
8: 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
9: 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
10: 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
11: 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
12: 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
13: 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
14: 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
15: 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
16: 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
17: 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
18: 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
19: 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
20: 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
21: 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
22: 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
23: 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
24: 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
25: 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
26: 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
27: 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
28: 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
29: 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
30: 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
31: 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
32: 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
mpg cyl disp hp drat wt qsec vs am gear carb
head(dt1) # check data
mpg cyl disp hp drat wt qsec vs am gear carb
1: 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
2: 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
3: 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
4: 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
5: 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
6: 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
model_continuous_predictor <- lm(mpg ~ wt, dt1)
# summary(model_continuous_predictor)
coef(model_continuous_predictor)
(Intercept) wt
37.285126 -5.344472
wt increases by 1 (unit), mpg changes by this amountwt is 0, mpg is this value (i.e., intercept: the value of mpg when wt = 0, of the value of the outcome variable when the predictor is 0)Note that in the data, wt only takes on values between 1 and 5, so the intercept of 37.29 is an extrapolation of the regression line to wt values that don’t exist in our data (see figure below).
ggplot(dt1, aes(wt, mpg)) +
geom_vline(xintercept = 0) +
geom_point() +
geom_smooth(method = 'lm', formula = y ~ poly(x, 1), fullrange = TRUE) +
scale_x_continuous(limits = c(-1, 7), breaks = -1:7) +
annotate("text", x = 1.7, y = coef(model_continuous_predictor)[1] + 2,
label = paste0(round(coef(model_continuous_predictor)[1], 2), " (intercept)"),
size = 6)

head(dt1) # check data (vs is a binary variable with just 0 and 1)
mpg cyl disp hp drat wt qsec vs am gear carb
1: 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
2: 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
3: 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
4: 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
5: 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
6: 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
dt1[, vs_factor := as.factor(vs)] # turn vs into a factor
model_categorical_predictor <- lm(mpg ~ vs_factor, dt1)
# summary(model_categorical_predictor)
coef(model_categorical_predictor)
(Intercept) vs_factor1
16.616667 7.940476
When the categorical predictor has only two levels (coded 0 and 1), we can use the numeric variable as the predictor. We’ll get the same results as above.
coef(lm(mpg ~ vs_factor, dt1)) # factor predictor
(Intercept) vs_factor1
16.616667 7.940476
coef(lm(mpg ~ vs, dt1)) # numeric predictor
(Intercept) vs
16.616667 7.940476
vs_factor increases by 1 (unit), mpg changes by this amount; here, when vs = 0 is one categorical level/condition, and vs = 1 is the second categorical level/condition; thus this value refers to the difference in mean values between the two conditionsvs_factor is 0, mpg is this value (i.e., intercept: the value of y when x = 0); thus, the intercept is the mean of the values when vs = 0.To show you the interpretation of the coefficients is indeed correct, let’s manually compute the mean of the two conditions (vs = 0, vs = 1) and compute their difference.
# compute mean mpg for each vs condition
vs_condition_means <- dt1[, .(mpg_group_mean = mean(mpg)), keyby = vs]
vs_condition_means
vs mpg_group_mean
1: 0 16.61667
2: 1 24.55714
The mean mpg value for the group vs = 0 is the same as the intercept value from the regression above (16.62).
# compute difference in mpg value between vs conditions
vs_condition_means$mpg_group_mean[2] - vs_condition_means$mpg_group_mean[1]
[1] 7.940476
The difference in mean mpg values between the two vs conditions is the same as the slope (beta coefficient) from the regression above (7.94).
ggplot(dt1, aes(vs, mpg)) +
geom_point() +
geom_smooth(method = 'lm', formula = y ~ poly(x, 1), fullrange = TRUE)

head(dt1) # check data (cyl is a categorical predictor with 3 levels)
mpg cyl disp hp drat wt qsec vs am gear carb vs_factor
1: 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 0
2: 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 0
3: 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 1
4: 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 1
5: 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 0
6: 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 1
dt1[, cyl_factor := as.factor(cyl)] # turn cyl into a factor
model_categorical_predictor_3 <- lm(mpg ~ cyl_factor, dt1)
# summary(model_categorical_predictor_3)
coef(model_categorical_predictor_3)
(Intercept) cyl_factor6 cyl_factor8
26.663636 -6.920779 -11.563636
When the categorical predictor has three or more levels, we can’t use the numeric variable as the predictor because the coefficients will be different.
coef(lm(mpg ~ cyl, dt1)) # numeric predictor
(Intercept) cyl
37.88458 -2.87579
coef(lm(mpg ~ cyl_factor, dt1)) # factor predictor
(Intercept) cyl_factor6 cyl_factor8
26.663636 -6.920779 -11.563636
Interpreting the coefficients in the model with the categorical predictor
When we convert variables to factors or characters, R automatically represents the “smallest” condition (1 is smaller than 9; “a” is smaller than “b”) as the intercept. In other words, this condition is treated assigned the value 0 and all other conditions are assigned 1. That is, R by default uses “dummy coding”.
26.66: when cyl_factor is 4 (or the “smallest” cyl_factor value in the dataset), mpg is this value (i.e., intercept); thus, the intercept is the mean of the values when cyl_factor = 4.
-6.92: difference in mean mpg values between the conditions cyl_factor = 4 and cyl_factor = 6
-11.56: difference in mean mpg values between the conditions cyl_factor = 4 and cyl_factor = 8
To show you the interpretation of the coefficients is indeed correct, let’s manually compute the mean of the three conditions (cyl_factor is 4, 6, 8) and compute their differences.
# compute mean mpg for each vs condition
cyl_condition_means <- dt1[, .(mpg_group_mean = mean(mpg)), keyby = cyl_factor]
cyl_condition_means
cyl_factor mpg_group_mean
1: 4 26.66364
2: 6 19.74286
3: 8 15.10000
The mean mpg value for the group cyl_factor = 4 is the same as the intercept value from the regression above.
# compute difference in mpg value between cyl = 6 and cyl = 4
cyl_condition_means$mpg_group_mean[2] - cyl_condition_means$mpg_group_mean[1]
[1] -6.920779
coef(model_categorical_predictor_3)[2] # beta coefficient
cyl_factor6
-6.920779
# compute difference in mpg value between cyl = 8 and cyl = 4
cyl_condition_means$mpg_group_mean[3] - cyl_condition_means$mpg_group_mean[1]
[1] -11.56364
coef(model_categorical_predictor_3)[3] # beta coefficient
cyl_factor8
-11.56364
ggplot(dt1, aes(cyl, mpg)) +
geom_point() +
geom_smooth(method = 'lm', formula = y ~ poly(x, 1), fullrange = TRUE)

When fitting the regression model, R uses dummy coding by default. Hence, the condition cyl = 4 is actually assigned 0 (and thus is the intercept).
Let’s fit a regression model that includes an interaction term.
model_interaction1 <- lm(mpg ~ disp * vs_factor, data = dt1)
coef(model_interaction1)
(Intercept) disp vs_factor1 disp:vs_factor1
25.63755459 -0.02936965 8.39770888 -0.04218648
How do we interpret the interaction coefficient?
For every 1 unit increase in vs_factor (coded 0 and 1), the coefficient of disp changes by -0.042. READ THAT SENTENCE AGAIN TO SLOWLY DIGEST IT! It’s the change in the COEFFICIENT of disp when vs_factor increases by 1 (unit).
Let’s fit separate models for the two vs_factor conditions to verify the statement/interpretation above.
Fit linear models (mpg ~ disp) separately for vs_factor = 0 and vs_factor = 1.
model_mpg_disp_vs0 <- lm(mpg ~ disp, data = dt1[vs_factor == 0]) # blue line in figure below
model_mpg_disp_vs1 <- lm(mpg ~ disp, data = dt1[vs_factor == 1]) # orange line in figure below
Check the coefficeints of disp for these two models
coef(model_mpg_disp_vs0)
(Intercept) disp
25.63755459 -0.02936965
coef(model_mpg_disp_vs1)
(Intercept) disp
34.03526346 -0.07155613
Here’s a reminder (again) of how to interpret the disp * vs_factor interaction coefficient in the interaction model (mpg ~ disp * vs_factor): For every 1 unit increase in vs_factor (coded 0 and 1), the coefficient of disp changes by -0.042. Or the change in the COEFFICIENT of disp when vs_factor increases by 1 (unit).
Let’s compute the difference of the disp coefficients in the two models above (where vs is 0 and 1).
coef(model_mpg_disp_vs1)['disp'] - coef(model_mpg_disp_vs0)['disp']
disp
-0.04218648
The difference in the disp coefficients (-0.042) in the two models (where vs_factor is 1 or 0) is identical to the interaction coefficient (disp * factor: -0.042) in the model_interaction1 model.
In other words, the interaction coefficient is the difference between the values of the two slopes (i.e., coefficients) (see figure below).
mpg ~ disp when vs_factor = 0: disp coefficient is -0.029mpg ~ disp when vs_factor = 1: disp coefficient is -0.072disp in the mpg ~ disp model is more negative (by disp * factor: -0.042) when vs_factor = 1 than when vs_factor = 0.
interact_plot(model_interaction1, pred = disp, modx = vs_factor)

You can interpret the interaction coefficients in all models (continuous or categorical variables) the same way.
model_interaction2 <- lm(mpg ~ disp * wt, data = dt1) # all continuous predictors
coef(model_interaction2)
(Intercept) disp wt disp:wt
44.08199770 -0.05635816 -6.49567966 0.01170542
disp:wt = 0.012: the change in the coefficient of disp when wt increases by 1 unit (or the reverse is also fine: the change in the coefficient of wt when disp increases by 1 unit)When all predictors are continuous variables, the convention is to plot the effect of one regressor at different levels (+/- 1 SD and mean value) of the other regressor.
interact_plot(model_interaction2, pred = wt, modx = disp)

interact_plot(model_interaction2, pred = disp, modx = wt)

No matter how complicated your interaction terms are (3 or 4 or 10-way interactions), you interpret the coefficients the same way!
model_interaction4 <- lm(mpg ~ disp * wt * qsec * drat, data = dt1) # all continuous predictors
coef(model_interaction4)["disp:wt:qsec:drat"] # the 4-way interaction
disp:wt:qsec:drat
0.0301177
There are many ways to interpret the coefficient disp:wt:qsec:drat = 0.03:
disp increases by 1, the wt:qsec:drat coefficient (slope) changes by 0.03wt increases by 1, the disp:qsec:drat coefficient (slope) changes by 0.03qsec increases by 1, the disp:wt:drat coefficient (slope) changes by 0.03drat increases by 1, the disp:wt:qsec coefficient (slope) changes by 0.03You can also interpret the three- or two-way interactions in the same model in the same way. You get it…
If you see mistakes or want to suggest changes, please create an issue on the source repository.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/hauselin/rtutorialsite, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Lin (2019, July 6). Data science: Interpreting regression coefficients (including interaction coefficients). Retrieved from https://hausetutorials.netlify.com/posts/2019-07-06-interpreting-interaction-regression-coefficients/
BibTeX citation
@misc{lin2019interpreting,
author = {Lin, Hause},
title = {Data science: Interpreting regression coefficients (including interaction coefficients)},
url = {https://hausetutorials.netlify.com/posts/2019-07-06-interpreting-interaction-regression-coefficients/},
year = {2019}
}