Articles and posts

Reshape R dataframes from wide to long with melt

Learn and visualize how melt reshapes dataframes from long to wide

Reshape Python pandas dataframe from long to wide with pivot_table

Learn and visualize how pd.pivot_table reshapes data from long to wide form

Reshape Python pandas dataframe from wide to long with pd.melt

Learn and visualize how pd.melt reshapes data from wide to long form

Causal inference and Lord's Paradox: Change score or covariate?

Use directed acyclic graphs and structural equation modeling to understand Lord's paradox.

Reshape and stack multi-dimensional arrays in Python numpy

The only tutorial and cheatsheet you'll need to understand how Python numpy reshapes and stacks multidimensional arrays

Neural networks: Deriving the sigmoid derivative via chain and quotient rules

Deriving the derivative of the sigmoid function for neural networks

Resources for making R packages

Resources for making R packages

Use median absolute deviation instead of z-score to detect outliers

Why is it bad to use z-scores to detect outliers and why you should use median absolute deviation instead

Resources for principal components analysis and dimension reduction

Resources for learning principal components analysis and dimension reduction

Good online classes (MOOCs) for data science

A list of online classes I've completed or am planning to take

What happens when you set the intercept to 0 in regression models

What happens when you force the intercept to be 0 in a regression model and why you should (generally) never do it

Interpreting regression coefficients (including interaction coefficients)

A short tutorial on how to interpreting regression coefficients, including interaction coefficients.

Gentle intro to logistic regression

A gentle, step-by-step intro to logistic regression, inverse logit and logit functions, and maximum likelihood estimation in the context of logistic regression

Resources for data.table

Resources for the amazing R data.table package

Bayesian inference and MCMC sampling introduction

A gentle and intuitive introduction to Markov Chain Monte Carlo sampling with rejection sampling.

Dockerize a ShinyApp and host it on your own server with DigitalOcean

Step-by-step instructions describing how I deployed my ShinyApp with Docker and hosted it on a web server with DigitalOcean (using Mac OS X).

Why bar (dynamite) plots are terrible (use ggbeeswarm instead)

Why are barplots or dynamite plots so bad? Comparing four different types of plots: barplot, boxplot, violinplot, and geom_quasirandom plot

Complete vs. partial vs. no-pooling: Fit the same model to different groups (just one simple line of code)

What is complete pooling, no-pooling, and partial pooling, and how to use data.table for no-pooling models (fit the same model, but separately to each group)

More articles »

Articles and posts


If you see mistakes or want to suggest changes, please create an issue on the source repository.


Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".