Learn and visualize how melt reshapes dataframes from long to wide
This tutorial is also on Medium, Towards Data Science.
Get source code for this RMarkdown script here.
Donate and become a patron: If you find value in what I do and have learned something from my site, please consider becoming a patron. It takes me many hours to research, learn, and put together tutorials. Your support really matters.
How do you reshape a dataframe from wide to long form in R? How does the melt()
function reshape dataframes in R? This tutorial will walk you through reshaping dataframes using the melt
function.
Common terms for this transformation are melt, pivot-long, unpivot, gather, stack, and reshape. Many functions have been written to convert data from wide to long form, but I believe melt()
from the data.table
library is the best. See melt()
documentation here.
melt
function/method that works the same way (see my pandas melt tutorial here)gather
and pivot_longer
are often just wrapper functions for melt()
or reshape()
—these other functions simplify melt
and often can’t deal with more complex transformationsmelt
is more powerful but isn’t any more complicated than the other functions…melt
alone is often enough for all your wide-to-long transformations; don’t need to learn gather
, pivot_longer
, and melt
; melt
alone is enough!data.table
package’s implementation of melt
, which is extremely powerful—much more efficient and powerful than the reshape library’s melt
function. According to the documentation:The melt and dcast functions for data.tables are for reshaping wide-to-long and long-to-wide, respectively; the implementations are specifically designed with large in-memory data (e.g. 10Gb) in mind.
Reminder: We’re using melt
from the data.table
library, not reshape
library! Compare the documentation of the melt
functions from the two libraries to the differences: ?data.table::melt
and ?reshape::melt
# load data.table library so we use its melt implementation
library(data.table)
It’s easiest to understand what a wide dataframe is or looks like if we look at one and compare it with a long dataframe.
And below is the corresponding dataframe (with the same information) but in the long form:
Before we begin our melt
tutorial, let’s recreate the wide dataframe above.
df_wide <- data.table(
student = c("Andy", "Bernie", "Cindey", "Deb"),
school = c("Z", "Y", "Z", "Y"),
english = c(10, 100, 1000, 10000), # eng grades
math = c(20, 200, 2000, 20000), # math grades
physics = c(30, 300, 3000, 30000) # physics grades
)
df_wide
student school english math physics
1: Andy Z 10 20 30
2: Bernie Y 100 200 300
3: Cindey Z 1000 2000 3000
4: Deb Y 10000 20000 30000
Note that I like to use data.tables
instead of data.frames
because data.tables
are much more powerful. If you data isn’t a data.table (check by running class(your_dataframe)
in your console), I highly recommend you convert it to a data.table
.
class(df_wide) # data.table and data.frame
[1] "data.table" "data.frame"
Simply load the data.table
library and use the setDT
function to convert any dataframe to data.table
setDT(df_wide) # no reassignment required! no need to write df_wide <- setDT(df_wide)
melt
We melt by specifying the identifier columns via id.vars
. The “leftover” non-identifier columns (english, math, physics) will be melted or stacked onto each other into one column.
A new indicator column will be created (contains values english, math, physics) and we can rename this new column (cLaSs) via variable.name
. We can also rename the column in which all the actual grades are contained (gRaDe) via value.name
.
df_long <- melt(data = df_wide,
id.vars = c("student", "school"),
variable.name = "cLaSs",
value.name = "gRaDe")
df_long
student school cLaSs gRaDe
1: Andy Z english 10
2: Bernie Y english 100
3: Cindey Z english 1000
4: Deb Y english 10000
5: Andy Z math 20
6: Bernie Y math 200
7: Cindey Z math 2000
8: Deb Y math 20000
9: Andy Z physics 30
10: Bernie Y physics 300
11: Cindey Z physics 3000
12: Deb Y physics 30000
You can use measure.vars
to specify which columns you want to melt or stack into column (here, we exclude physics column, so measure.vars = c("english", "math")
). We also drop the school column from id.vars
.
df_long <- melt(data = df_wide,
id.vars = "student",
measure.vars = c("english", "math"),
variable.name = "cLaSs",
value.name = "gRaDe")
df_long
student cLaSs gRaDe
1: Andy english 10
2: Bernie english 100
3: Cindey english 1000
4: Deb english 10000
5: Andy math 20
6: Bernie math 200
7: Cindey math 2000
8: Deb math 20000
Finally, let’s see what happens if we specify only the student column as the identifier column (id.vars = "student"
) but do not specify which columns you want to stack via measure.vars
. As a result, all non-identifier columns (school, english, math, physics) will be stacked into one column.
The resulting long dataframe looks wrong because now the cLaSs and gRaDe columns contain values that shouldn’t be there. The point here is to show you how melt
works.
df_long <- melt(data = df_wide,
id.vars = "student",
variable.name = "cLaSs",
value.name = "gRaDe")
df_long
student cLaSs gRaDe
1: Andy school Z
2: Bernie school Y
3: Cindey school Z
4: Deb school Y
5: Andy english 10
6: Bernie english 100
7: Cindey english 1000
8: Deb english 10000
9: Andy math 20
10: Bernie math 200
11: Cindey math 2000
12: Deb math 20000
13: Andy physics 30
14: Bernie physics 300
15: Cindey physics 3000
16: Deb physics 30000
This table looks wrong because the school
column in df_wide
doesn’t belong—school
should be another identifier column (see Melt 1 above). The melt function also also returned a warning message that tells you your column (gRaDe
) have values of different types (i.e., character and numeric).
I hope now you have a better understanding of how melt
performs wide-to-long transformations If you find this post useful, check out my other articles and follow me on Medium.
If you’ve enjoyed my article, support my work and become a patron here!
If you see mistakes or want to suggest changes, please create an issue on the source repository.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/hauselin/rtutorialsite, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Lin (2020, June 10). Data science: Reshape R dataframes from wide to long with melt. Retrieved from https://hausetutorials.netlify.com/posts/2020-06-27-reshape-r-dataframes-from-wide-to-long-with-melt/
BibTeX citation
@misc{lin2020reshape, author = {Lin, Hause}, title = {Data science: Reshape R dataframes from wide to long with melt}, url = {https://hausetutorials.netlify.com/posts/2020-06-27-reshape-r-dataframes-from-wide-to-long-with-melt/}, year = {2020} }