Reshape Python pandas dataframe from wide to long with pd.melt

Learn and visualize how pd.melt reshapes data from wide to long form

Hause Lin true
05-14-2020

Table of Contents


Get source code for this RMarkdown script here.

This Python tutorial is on Medium, Towards Data Science. Click here if you’re looking for the tutorial for the R version of melt (also the melt function in R).

Consider being a patron and supporting my work?

Donate and become a patron: If you find value in what I do and have learned something from my site, please consider becoming a patron. It takes me many hours to research, learn, and put together tutorials. Your support really matters.

How does the pandas melt() method reshape dataframes? How do you reshape a dataframe from wide to long form? This tutorial will walk you through reshaping dataframes using pd.melt() or the melt method associated with pandas dataframes.

Summary of how pd.melt() works

Also, you might want to check out the official pandas documentation and my numpy reshape tutorial.

Let’s load the reticulate library in R so we can run Python in RStudio. I’ll then load my conda python environment. I highly recommend you try the code in Python while you read this article. Try running this tutorial on my shared DeepNote notebook (you can only run but not edit this notebook).


library(reticulate)
# load conda python environment
use_condaenv(condaenv = "python376", conda = "/opt/anaconda3/bin/conda")

import pandas as pd

Wide data

It’s easiest to understand what a wide dataframe is or looks like if we look at one and compare it with a long dataframe.

Wide pandas dataframe can be melted/stacked using pd.melt()

And below is the corresponding dataframe (with the same information) but in the long form:

Long pandas dataframe can be “unmelted” using pd.pivot_table()

Before we begin our pd.melt tutorial, let’s recreate the wide dataframe above in Python with pd.DataFrame.


df_wide = pd.DataFrame(
  {"student": ["Andy", "Bernie", "Cindy", "Deb"],
   "school":  ["Z", "Y", "Z", "Y"],
   "english": [10, 100, 1000, 10000],  # eng grades
   "math":    [20, 200, 2000, 20000],  # math grades
   "physics": [30, 300, 3000, 30000]   # physics grades
  }
)
df_wide

  student school  english   math  physics
0    Andy      Z       10     20       30
1  Bernie      Y      100    200      300
2   Cindy      Z     1000   2000     3000
3     Deb      Y    10000  20000    30000

Wide to long with melt

Common terms for this transformation are melt, unpivot, gather, stack. See pd.melt() documentation here.

Melt example 1

We melt the dataframe by specifying the identifier columns via id_vars. The “leftover” non-identifier columns (english, math, physics) will be melted or stacked onto each other into one column.

A new indicator column will be created (contains values english, math, physics) and we can rename this new column (cLaSs) via var_name. We can also rename the column in which all the actual grades are contained (gRaDe) via value_name.


pd.melt(frame=df_wide, 
        id_vars=["student", "school"], 
        var_name="cLaSs", 
        value_name="gRaDe")

# or you can use the melt method associated with a dataframe
# df_wide.melt(id_vars=["student", "school"], var_name="cLaSs", value_name="gRaDe")

   student school    cLaSs  gRaDe
0     Andy      Z  english     10
1   Bernie      Y  english    100
2    Cindy      Z  english   1000
3      Deb      Y  english  10000
4     Andy      Z     math     20
5   Bernie      Y     math    200
6    Cindy      Z     math   2000
7      Deb      Y     math  20000
8     Andy      Z  physics     30
9   Bernie      Y  physics    300
10   Cindy      Z  physics   3000
11     Deb      Y  physics  30000
Wide to long: new indicator column “cLaSs” + values melted/stacked “gRaDe” column

Melt example 2

You can use value_vars to specify which columns you want to melt or stack into column (here, we exclude physics column, so value_vars=["english", "math"]). We also drop the school column from id_vars.


pd.melt(frame=df_wide, 
  id_vars="student", 
  value_vars=["english","math"], 
  var_name="cLaSs", 
  value_name="gRaDe")

# or you can use the melt method associated with a dataframe
# df_wide.melt(id_vars="student", value_vars=["english","math"], var_name="cLaSs", value_name="gRaDe")

  student    cLaSs  gRaDe
0    Andy  english     10
1  Bernie  english    100
2   Cindy  english   1000
3     Deb  english  10000
4    Andy     math     20
5  Bernie     math    200
6   Cindy     math   2000
7     Deb     math  20000
Wide to long: original columns school and physics have been dropped

Melt example 3

Finally, let’s see what happens if we specify only the student column as the identifier column (id_vars="student") but do not specify which columns you want to stack via value_vars. As a result, all non-identifier columns (school, english, math, physics) will be stacked into one column.

The resulting long dataframe looks wrong because now the cLaSs and gRaDe columns contain values that shouldn’t be there. The point here is to show you how pd.melt works.


pd.melt(frame=df_wide,
        id_vars="student",
        var_name="cLaSs", 
        value_name="gRaDe")

# or you can use the melt method associated with a dataframe
# df_wide.melt(id_vars="student", var_name="cLaSs", value_name="gRaDe")

   student    cLaSs  gRaDe
0     Andy   school      Z
1   Bernie   school      Y
2    Cindy   school      Z
3      Deb   school      Y
4     Andy  english     10
5   Bernie  english    100
6    Cindy  english   1000
7      Deb  english  10000
8     Andy     math     20
9   Bernie     math    200
10   Cindy     math   2000
11     Deb     math  20000
12    Andy  physics     30
13  Bernie  physics    300
14   Cindy  physics   3000
15     Deb  physics  30000

This table looks wrong because the school column in df_wide doesn’t belong—school should be another identifier column (see Melt 1 above).

Wide to long: school column isn’t an identifier column

Support my work

I hope now you have a better understanding of how pd.melt reshapes dataframes. If you find this post useful, check out my other articles and follow me on Medium.

If you’ve enjoyed my article, support my work and become a patron here!

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/hauselin/rtutorialsite, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Lin (2020, May 14). Data science: Reshape Python pandas dataframe from wide to long with pd.melt. Retrieved from https://hausetutorials.netlify.com/posts/2020-05-14-reshaping-data-in-python-pandas/

BibTeX citation

@misc{lin2020reshape,
  author = {Lin, Hause},
  title = {Data science: Reshape Python pandas dataframe from wide to long with pd.melt},
  url = {https://hausetutorials.netlify.com/posts/2020-05-14-reshaping-data-in-python-pandas/},
  year = {2020}
}