Reshape R dataframes from wide to long with melt

Learn and visualize how melt reshapes dataframes from long to wide

Hause Lin true
06-10-2020

Table of Contents


This tutorial is also on Medium, Towards Data Science.

Get source code for this RMarkdown script here.

Consider being a patron and supporting my work?

Donate and become a patron: If you find value in what I do and have learned something from my site, please consider becoming a patron. It takes me many hours to research, learn, and put together tutorials. Your support really matters.

How do you reshape a dataframe from wide to long form in R? How does the melt() function reshape dataframes in R? This tutorial will walk you through reshaping dataframes using the melt function.

Summary of how melt() works

Common terms for this transformation are melt, pivot-long, unpivot, gather, stack, and reshape. Many functions have been written to convert data from wide to long form, but I believe melt() from the data.table library is the best. See melt() documentation here.

The melt and dcast functions for data.tables are for reshaping wide-to-long and long-to-wide, respectively; the implementations are specifically designed with large in-memory data (e.g. 10Gb) in mind.

Reminder: We’re using melt from the data.table library, not reshape library! Compare the documentation of the melt functions from the two libraries to the differences: ?data.table::melt and ?reshape::melt


# load data.table library so we use its melt implementation
library(data.table)

Wide data

It’s easiest to understand what a wide dataframe is or looks like if we look at one and compare it with a long dataframe.

Wide dataframe can be melted/stacked using melt

And below is the corresponding dataframe (with the same information) but in the long form:

Long dataframe can be “unmelted” using dcast

Before we begin our melt tutorial, let’s recreate the wide dataframe above.


df_wide <- data.table(
  student = c("Andy", "Bernie", "Cindey", "Deb"),
  school = c("Z", "Y", "Z", "Y"),
  english = c(10, 100, 1000, 10000),  # eng grades
  math = c(20, 200, 2000, 20000),  # math grades
  physics = c(30, 300, 3000, 30000)   # physics grades
)
df_wide

   student school english  math physics
1:    Andy      Z      10    20      30
2:  Bernie      Y     100   200     300
3:  Cindey      Z    1000  2000    3000
4:     Deb      Y   10000 20000   30000

Note that I like to use data.tables instead of data.frames because data.tables are much more powerful. If you data isn’t a data.table (check by running class(your_dataframe) in your console), I highly recommend you convert it to a data.table.


class(df_wide) # data.table and data.frame

[1] "data.table" "data.frame"

Simply load the data.table library and use the setDT function to convert any dataframe to data.table


setDT(df_wide) # no reassignment required! no need to write df_wide <- setDT(df_wide)

Wide to long with melt

Melt example 1

We melt by specifying the identifier columns via id.vars. The “leftover” non-identifier columns (english, math, physics) will be melted or stacked onto each other into one column.

A new indicator column will be created (contains values english, math, physics) and we can rename this new column (cLaSs) via variable.name. We can also rename the column in which all the actual grades are contained (gRaDe) via value.name.


df_long <- melt(data = df_wide, 
                id.vars = c("student", "school"),
                variable.name = "cLaSs",
                value.name = "gRaDe")
df_long

    student school   cLaSs gRaDe
 1:    Andy      Z english    10
 2:  Bernie      Y english   100
 3:  Cindey      Z english  1000
 4:     Deb      Y english 10000
 5:    Andy      Z    math    20
 6:  Bernie      Y    math   200
 7:  Cindey      Z    math  2000
 8:     Deb      Y    math 20000
 9:    Andy      Z physics    30
10:  Bernie      Y physics   300
11:  Cindey      Z physics  3000
12:     Deb      Y physics 30000
Wide to long: new indicator column “cLaSs” + values melted/stacked “gRaDe” column

Melt example 2

You can use measure.vars to specify which columns you want to melt or stack into column (here, we exclude physics column, so measure.vars = c("english", "math")). We also drop the school column from id.vars.


df_long <- melt(data = df_wide, 
                id.vars = "student",
                measure.vars = c("english", "math"),
                variable.name = "cLaSs",
                value.name = "gRaDe")
df_long

   student   cLaSs gRaDe
1:    Andy english    10
2:  Bernie english   100
3:  Cindey english  1000
4:     Deb english 10000
5:    Andy    math    20
6:  Bernie    math   200
7:  Cindey    math  2000
8:     Deb    math 20000
Wide to long: original columns school and physics have been dropped

Melt example 3

Finally, let’s see what happens if we specify only the student column as the identifier column (id.vars = "student") but do not specify which columns you want to stack via measure.vars. As a result, all non-identifier columns (school, english, math, physics) will be stacked into one column.

The resulting long dataframe looks wrong because now the cLaSs and gRaDe columns contain values that shouldn’t be there. The point here is to show you how melt works.


df_long <- melt(data = df_wide,
                id.vars = "student",
                variable.name = "cLaSs",
                value.name = "gRaDe")
df_long

    student   cLaSs gRaDe
 1:    Andy  school     Z
 2:  Bernie  school     Y
 3:  Cindey  school     Z
 4:     Deb  school     Y
 5:    Andy english    10
 6:  Bernie english   100
 7:  Cindey english  1000
 8:     Deb english 10000
 9:    Andy    math    20
10:  Bernie    math   200
11:  Cindey    math  2000
12:     Deb    math 20000
13:    Andy physics    30
14:  Bernie physics   300
15:  Cindey physics  3000
16:     Deb physics 30000

This table looks wrong because the school column in df_wide doesn’t belong—school should be another identifier column (see Melt 1 above). The melt function also also returned a warning message that tells you your column (gRaDe) have values of different types (i.e., character and numeric).

Wide to long: school column isn’t an identifier column

Support my work

I hope now you have a better understanding of how melt performs wide-to-long transformations If you find this post useful, check out my other articles and follow me on Medium.

If you’ve enjoyed my article, support my work and become a patron here!

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/hauselin/rtutorialsite, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Lin (2020, June 10). Data science: Reshape R dataframes from wide to long with melt. Retrieved from https://hausetutorials.netlify.com/posts/2020-06-27-reshape-r-dataframes-from-wide-to-long-with-melt/

BibTeX citation

@misc{lin2020reshape,
  author = {Lin, Hause},
  title = {Data science: Reshape R dataframes from wide to long with melt},
  url = {https://hausetutorials.netlify.com/posts/2020-06-27-reshape-r-dataframes-from-wide-to-long-with-melt/},
  year = {2020}
}