Deriving the derivative of the sigmoid function for neural networks

Get source code for this RMarkdown script here.

Donate and become a patron: If you find value in what I do and have learned something from my site, please consider becoming a patron. It takes me many hours to research, learn, and put together tutorials. Your support really matters.

The sigmoid function \(\sigma(x)=\frac{1}{1+e^{-x}}\) is frequently used in neural networks because its derivative is very simple and computationally fast to calculate, making it great for backpropagation.

Let’s denote the sigmoid function as the following:

\[\sigma(x)=\frac{1}{1+e^{-x}}\]

Another way to express the sigmoid function:

\[\sigma(x)=\frac{e^{x}}{e^{x}+1}\] You can easily derive the second equation from the first equation:

\[\frac{1}{1+e^{-x}}= \frac{1}{1+e^{-x}} \frac{e^{x}}{e^{x}} =\frac{e^{x}}{e^{x}+1} \]

Since \(\frac{e^x}{e^x} = 1\), so in essence, we’re just multiplying \(\frac{1}{1+e^{-x}}\) by 1.

The derivative of the sigmoid function \(\sigma(x)\) is the sigmoid function \(\sigma(x)\) multiplied by \(1 - \sigma(x)\).

\[\sigma(x)=\frac{1}{1+e^{-x}}\]

\[\sigma'(x)=\frac{d}{dx}\sigma(x)=\sigma(x)(1-\sigma(x))\]

Before we begin, here’s a reminder of how to find the derivatives of exponential functions.

\[ \frac{d}{dx}e^x = e^x\] \[ \frac{d}{dx}e^{-3x^2 + 2x} = (-6x + 2)e^{-3x^2 + 2x}\]

Chain rule: \(\frac{d}{dx} \left[ f(g(x)) \right] = f'\left[g(x) \right] * g'(x)\).

Example: Find the derivative of \(f(x) = (x^2 + 1)^3\):

\[\begin{aligned} f'(x) &= 3(x^2 + 1)^{3-1} * 2x^{2-1}\\ &= 3(x^2 + 1)^2(2x) \\ &= 6x(x^2 + 1)^2 \end{aligned}\]Line 2 of the sigmoid derivation below uses this rule.

\[\begin{aligned} \frac{d}{dx} \sigma(x) &= \frac{d}{dx} \left[ \frac{1}{1+e^{-x}} \right] =\frac{d}{dx}(1+e^{-x})^{-1} \\ &=-1*(1+e^{-x})^{-2}(-e^{-x}) \\ &=\frac{-e^{-x}}{-(1+e^{-x})^{2}} \\ &=\frac{e^{-x}}{(1+e^{-x})^{2}} \\ &=\frac{1}{1+e^{-x}} \frac{e^{-x}}{1+e^{-x}} \\ &=\frac{1}{1+e^{-x}} \frac{e^{-x} + (1 - 1)}{1+e^{-x}} \\ &=\frac{1}{1+e^{-x}} \frac{(1 + e^{-x}) - 1}{1+e^{-x}} \\ &=\frac{1}{1+e^{-x}} \left[ \frac{(1 + e^{-x})}{1+e^{-x}} - \frac{1}{1+e^{-x}} \right] \\ &=\frac{1}{1+e^{-x}} \left[ 1 - \frac{1}{1+e^{-x}} \right] \\ &=\sigma(x) (1-\sigma(x)) \\ \end{aligned}\]Quotient rule: If \(f(x) = \frac{g(x)}{h(x)}\), then \(f'(x) = \frac{g'(x)h(x) - h'(x)g(x)}{(h(x))^2}\).

Example: Find the derivative of \(f(x) = \frac{3x}{1 + x}\):

\[\begin{aligned} f'(x) &= \frac{(\frac{d}{dx}(3x))*(1+x) - (\frac{d}{dx}(1+x)) * (3x)} {(1+x)^2} \\ &= \frac{3(1 + x) - 1(3x)}{(1+x)^2} \\ &= \frac{3 + 3x - 3x}{(1+x)^2} \\ &= \frac{3}{(1+x)^2} \end{aligned}\]Line 2 of the sigmoid derivation below uses this rule.

\[\begin{aligned} \frac{d}{dx} \sigma(x) &= \frac{d}{dx} \left[ \frac{1}{1+e^{-x}} \right] \\ &=\frac{(0)(1 + e^{-x}) - (-e^{-x})(1)}{(1 + e^{-x})^2} \\ &=\frac{e^{-x}}{(1 + e^{-x})^2} \\ &=\frac{1}{1+e^{-x}} \frac{e^{-x}}{1+e^{-x}} \\ &=\frac{1}{1+e^{-x}} \frac{e^{-x} + (1 - 1)}{1+e^{-x}} \\ &=\frac{1}{1+e^{-x}} \frac{(1 + e^{-x}) - 1}{1+e^{-x}} \\ &=\frac{1}{1+e^{-x}} \left[ \frac{(1 + e^{-x})}{1+e^{-x}} - \frac{1}{1+e^{-x}} \right] \\ &=\frac{1}{1+e^{-x}} \left[ 1 - \frac{1}{1+e^{-x}} \right] \\ &=\sigma(x) (1-\sigma(x)) \\ \end{aligned}\]- Khan Academy 5-min video on chain rule
- Khan Academcy 4-min video on quotient rule
- StackExchange derivation (chain rule)
- YouTube partial derivative of sigmoid function via chain rule
- Derivation via quotient rule

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/hauselin/rtutorialsite, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

For attribution, please cite this work as

Lin (2019, Oct. 1). Data science: Neural networks: Deriving the sigmoid derivative via chain and quotient rules. Retrieved from https://hausetutorials.netlify.com/posts/2019-12-01-neural-networks-deriving-the-sigmoid-derivative/

BibTeX citation

@misc{lin2019neural, author = {Lin, Hause}, title = {Data science: Neural networks: Deriving the sigmoid derivative via chain and quotient rules}, url = {https://hausetutorials.netlify.com/posts/2019-12-01-neural-networks-deriving-the-sigmoid-derivative/}, year = {2019} }