Learn and visualize how pd.melt reshapes data from wide to long form
Get source code for this RMarkdown script here.
This Python tutorial is on Medium, Towards Data Science. Click here if you’re looking for the tutorial for the R version of melt (also the melt
function in R).
Donate and become a patron: If you find value in what I do and have learned something from my site, please consider becoming a patron. It takes me many hours to research, learn, and put together tutorials. Your support really matters.
How does the pandas melt()
method reshape dataframes? How do you reshape a dataframe from wide to long form? This tutorial will walk you through reshaping dataframes using pd.melt()
or the melt
method associated with pandas dataframes.
Also, you might want to check out the official pandas documentation and my numpy reshape tutorial.
Let’s load the reticulate
library in R so we can run Python in RStudio. I’ll then load my conda python environment. I highly recommend you try the code in Python while you read this article. Try running this tutorial on my shared DeepNote notebook (you can only run but not edit this notebook).
library(reticulate)
# load conda python environment
use_condaenv(condaenv = "python376", conda = "/opt/anaconda3/bin/conda")
import pandas as pd
It’s easiest to understand what a wide dataframe is or looks like if we look at one and compare it with a long dataframe.
And below is the corresponding dataframe (with the same information) but in the long form:
Before we begin our pd.melt
tutorial, let’s recreate the wide dataframe above in Python with pd.DataFrame
.
df_wide = pd.DataFrame(
{"student": ["Andy", "Bernie", "Cindy", "Deb"],
"school": ["Z", "Y", "Z", "Y"],
"english": [10, 100, 1000, 10000], # eng grades
"math": [20, 200, 2000, 20000], # math grades
"physics": [30, 300, 3000, 30000] # physics grades
}
)
df_wide
student school english math physics
0 Andy Z 10 20 30
1 Bernie Y 100 200 300
2 Cindy Z 1000 2000 3000
3 Deb Y 10000 20000 30000
melt
Common terms for this transformation are melt, unpivot, gather, stack. See pd.melt()
documentation here.
We melt the dataframe by specifying the identifier columns via id_vars
. The “leftover” non-identifier columns (english, math, physics) will be melted or stacked onto each other into one column.
A new indicator column will be created (contains values english, math, physics) and we can rename this new column (cLaSs) via var_name
. We can also rename the column in which all the actual grades are contained (gRaDe) via value_name
.
pd.melt(frame=df_wide,
id_vars=["student", "school"],
var_name="cLaSs",
value_name="gRaDe")
# or you can use the melt method associated with a dataframe
# df_wide.melt(id_vars=["student", "school"], var_name="cLaSs", value_name="gRaDe")
student school cLaSs gRaDe
0 Andy Z english 10
1 Bernie Y english 100
2 Cindy Z english 1000
3 Deb Y english 10000
4 Andy Z math 20
5 Bernie Y math 200
6 Cindy Z math 2000
7 Deb Y math 20000
8 Andy Z physics 30
9 Bernie Y physics 300
10 Cindy Z physics 3000
11 Deb Y physics 30000
You can use value_vars
to specify which columns you want to melt or stack into column (here, we exclude physics column, so value_vars=["english", "math"]
). We also drop the school column from id_vars
.
pd.melt(frame=df_wide,
id_vars="student",
value_vars=["english","math"],
var_name="cLaSs",
value_name="gRaDe")
# or you can use the melt method associated with a dataframe
# df_wide.melt(id_vars="student", value_vars=["english","math"], var_name="cLaSs", value_name="gRaDe")
student cLaSs gRaDe
0 Andy english 10
1 Bernie english 100
2 Cindy english 1000
3 Deb english 10000
4 Andy math 20
5 Bernie math 200
6 Cindy math 2000
7 Deb math 20000
Finally, let’s see what happens if we specify only the student column as the identifier column (id_vars="student"
) but do not specify which columns you want to stack via value_vars
. As a result, all non-identifier columns (school, english, math, physics) will be stacked into one column.
The resulting long dataframe looks wrong because now the cLaSs and gRaDe columns contain values that shouldn’t be there. The point here is to show you how pd.melt
works.
pd.melt(frame=df_wide,
id_vars="student",
var_name="cLaSs",
value_name="gRaDe")
# or you can use the melt method associated with a dataframe
# df_wide.melt(id_vars="student", var_name="cLaSs", value_name="gRaDe")
student cLaSs gRaDe
0 Andy school Z
1 Bernie school Y
2 Cindy school Z
3 Deb school Y
4 Andy english 10
5 Bernie english 100
6 Cindy english 1000
7 Deb english 10000
8 Andy math 20
9 Bernie math 200
10 Cindy math 2000
11 Deb math 20000
12 Andy physics 30
13 Bernie physics 300
14 Cindy physics 3000
15 Deb physics 30000
This table looks wrong because the school
column in df_wide
doesn’t belong—school
should be another identifier column (see Melt 1 above).
I hope now you have a better understanding of how pd.melt
reshapes dataframes. If you find this post useful, check out my other articles and follow me on Medium.
If you’ve enjoyed my article, support my work and become a patron here!
If you see mistakes or want to suggest changes, please create an issue on the source repository.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/hauselin/rtutorialsite, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Lin (2020, May 14). Data science: Reshape Python pandas dataframe from wide to long with pd.melt. Retrieved from https://hausetutorials.netlify.com/posts/2020-05-14-reshaping-data-in-python-pandas/
BibTeX citation
@misc{lin2020reshape, author = {Lin, Hause}, title = {Data science: Reshape Python pandas dataframe from wide to long with pd.melt}, url = {https://hausetutorials.netlify.com/posts/2020-05-14-reshaping-data-in-python-pandas/}, year = {2020} }