Data science: Reshape R dataframes from wide to long with melt

Hause Lin

This tutorial is also on Medium, Towards Data Science.

Get source code for this RMarkdown script here.

Consider being a patron and supporting my work?

Donate and become a patron: If you find value in what I do and have learned something from my site, please consider becoming a patron. It takes me many hours to research, learn, and put together tutorials. Your support really matters.

How do you reshape a dataframe from wide to long form in R? How does the melt() function reshape dataframes in R? This tutorial will walk you through reshaping dataframes using the melt function.

Common terms for this transformation are melt, pivot-long, unpivot, gather, stack, and reshape. Many functions have been written to convert data from wide to long form, but I believe melt() from the data.table library is the best. See melt() documentation here.

Python’s pandas library also has the equivalent melt function/method that works the same way (see my pandas melt tutorial here)
other functions like gather and pivot_longer are often just wrapper functions for melt() or reshape()—these other functions simplify melt and often can’t deal with more complex transformations
melt is more powerful but isn’t any more complicated than the other functions…
melt alone is often enough for all your wide-to-long transformations; don’t need to learn gather, pivot_longer, and melt; melt alone is enough!
data.table package’s implementation of melt, which is extremely powerful—much more efficient and powerful than the reshape library’s melt function. According to the documentation:

The melt and dcast functions for data.tables are for reshaping wide-to-long and long-to-wide, respectively; the implementations are specifically designed with large in-memory data (e.g. 10Gb) in mind.

Reminder: We’re using melt from the data.table library, not reshape library! Compare the documentation of the melt functions from the two libraries to the differences: ?data.table::melt and ?reshape::melt


# load data.table library so we use its melt implementation
library(data.table)

Wide data

It’s easiest to understand what a wide dataframe is or looks like if we look at one and compare it with a long dataframe.

And below is the corresponding dataframe (with the same information) but in the long form:

Long dataframe can be “unmelted” using dcast

Before we begin our melt tutorial, let’s recreate the wide dataframe above.


df_wide <- data.table(
  student = c("Andy", "Bernie", "Cindey", "Deb"),
  school = c("Z", "Y", "Z", "Y"),
  english = c(10, 100, 1000, 10000),  # eng grades
  math = c(20, 200, 2000, 20000),  # math grades
  physics = c(30, 300, 3000, 30000)   # physics grades
)
df_wide


   student school english  math physics
1:    Andy      Z      10    20      30
2:  Bernie      Y     100   200     300
3:  Cindey      Z    1000  2000    3000
4:     Deb      Y   10000 20000   30000

Note that I like to use data.tables instead of data.frames because data.tables are much more powerful. If you data isn’t a data.table (check by running class(your_dataframe) in your console), I highly recommend you convert it to a data.table.


class(df_wide) # data.table and data.frame


[1] "data.table" "data.frame"

Simply load the data.table library and use the setDT function to convert any dataframe to data.table


setDT(df_wide) # no reassignment required! no need to write df_wide <- setDT(df_wide)

Wide to long with `melt`

Melt example 1

We melt by specifying the identifier columns via id.vars. The “leftover” non-identifier columns (english, math, physics) will be melted or stacked onto each other into one column.

A new indicator column will be created (contains values english, math, physics) and we can rename this new column (cLaSs) via variable.name. We can also rename the column in which all the actual grades are contained (gRaDe) via value.name.


df_long <- melt(data = df_wide, 
                id.vars = c("student", "school"),
                variable.name = "cLaSs",
                value.name = "gRaDe")
df_long


    student school   cLaSs gRaDe
 1:    Andy      Z english    10
 2:  Bernie      Y english   100
 3:  Cindey      Z english  1000
 4:     Deb      Y english 10000
 5:    Andy      Z    math    20
 6:  Bernie      Y    math   200
 7:  Cindey      Z    math  2000
 8:     Deb      Y    math 20000
 9:    Andy      Z physics    30
10:  Bernie      Y physics   300
11:  Cindey      Z physics  3000
12:     Deb      Y physics 30000

Wide to long: new indicator column “cLaSs” + values melted/stacked “gRaDe” column

Melt example 2

You can use measure.vars to specify which columns you want to melt or stack into column (here, we exclude physics column, so measure.vars = c("english", "math")). We also drop the school column from id.vars.


df_long <- melt(data = df_wide, 
                id.vars = "student",
                measure.vars = c("english", "math"),
                variable.name = "cLaSs",
                value.name = "gRaDe")
df_long


   student   cLaSs gRaDe
1:    Andy english    10
2:  Bernie english   100
3:  Cindey english  1000
4:     Deb english 10000
5:    Andy    math    20
6:  Bernie    math   200
7:  Cindey    math  2000
8:     Deb    math 20000

Wide to long: original columns school and physics have been dropped

Melt example 3

Finally, let’s see what happens if we specify only the student column as the identifier column (id.vars = "student") but do not specify which columns you want to stack via measure.vars. As a result, all non-identifier columns (school, english, math, physics) will be stacked into one column.

The resulting long dataframe looks wrong because now the cLaSs and gRaDe columns contain values that shouldn’t be there. The point here is to show you how melt works.


df_long <- melt(data = df_wide,
                id.vars = "student",
                variable.name = "cLaSs",
                value.name = "gRaDe")
df_long


    student   cLaSs gRaDe
 1:    Andy  school     Z
 2:  Bernie  school     Y
 3:  Cindey  school     Z
 4:     Deb  school     Y
 5:    Andy english    10
 6:  Bernie english   100
 7:  Cindey english  1000
 8:     Deb english 10000
 9:    Andy    math    20
10:  Bernie    math   200
11:  Cindey    math  2000
12:     Deb    math 20000
13:    Andy physics    30
14:  Bernie physics   300
15:  Cindey physics  3000
16:     Deb physics 30000

This table looks wrong because the school column in df_wide doesn’t belong—school should be another identifier column (see Melt 1 above). The melt function also also returned a warning message that tells you your column (gRaDe) have values of different types (i.e., character and numeric).

Wide to long: school column isn’t an identifier column

Support my work

I hope now you have a better understanding of how melt performs wide-to-long transformations If you find this post useful, check out my other articles and follow me on Medium.

If you’ve enjoyed my article, support my work and become a patron here!

Comment on this article Share:

Reshape R dataframes from wide to long with melt

Table of Contents

Consider being a patron and supporting my work?

Wide data

Wide to long with `melt`

Melt example 1

Melt example 2

Melt example 3

Support my work

Corrections

Reuse

Citation

Reshape R dataframes from wide to long with melt

Table of Contents

Consider being a patron and supporting my work?

Wide data

Wide to long with melt

Melt example 1

Melt example 2

Melt example 3

Support my work

Corrections

Reuse

Citation

Wide to long with `melt`