Neural networks: Deriving the sigmoid derivative via chain and quotient rules

Deriving the derivative of the sigmoid function for neural networks

Hause Lin true
10-01-2019

Table of Contents


Get source code for this RMarkdown script here.

Consider being a patron and supporting my work?

Donate and become a patron: If you find value in what I do and have learned something from my site, please consider becoming a patron. It takes me many hours to research, learn, and put together tutorials. Your support really matters.

Sigmoid function (aka logistic or inverse logit function)

The sigmoid function \(\sigma(x)=\frac{1}{1+e^{-x}}\) is frequently used in neural networks because its derivative is very simple and computationally fast to calculate, making it great for backpropagation.

Let’s denote the sigmoid function as the following:

\[\sigma(x)=\frac{1}{1+e^{-x}}\]

Another way to express the sigmoid function:

\[\sigma(x)=\frac{e^{x}}{e^{x}+1}\] You can easily derive the second equation from the first equation:

\[\frac{1}{1+e^{-x}}= \frac{1}{1+e^{-x}} \frac{e^{x}}{e^{x}} =\frac{e^{x}}{e^{x}+1} \]

Since \(\frac{e^x}{e^x} = 1\), so in essence, we’re just multiplying \(\frac{1}{1+e^{-x}}\) by 1.

Sigmoid derivative

The derivative of the sigmoid function \(\sigma(x)\) is the sigmoid function \(\sigma(x)\) multiplied by \(1 - \sigma(x)\).

\[\sigma(x)=\frac{1}{1+e^{-x}}\]

\[\sigma'(x)=\frac{d}{dx}\sigma(x)=\sigma(x)(1-\sigma(x))\]

Before we begin, here’s a reminder of how to find the derivatives of exponential functions.

\[ \frac{d}{dx}e^x = e^x\] \[ \frac{d}{dx}e^{-3x^2 + 2x} = (-6x + 2)e^{-3x^2 + 2x}\]

Sigmoid derivative via chain rule

Chain rule: \(\frac{d}{dx} \left[ f(g(x)) \right] = f'\left[g(x) \right] * g'(x)\).

Example: Find the derivative of \(f(x) = (x^2 + 1)^3\):

\[\begin{aligned} f'(x) &= 3(x^2 + 1)^{3-1} * 2x^{2-1}\\ &= 3(x^2 + 1)^2(2x) \\ &= 6x(x^2 + 1)^2 \end{aligned}\]

Line 2 of the sigmoid derivation below uses this rule.

\[\begin{aligned} \frac{d}{dx} \sigma(x) &= \frac{d}{dx} \left[ \frac{1}{1+e^{-x}} \right] =\frac{d}{dx}(1+e^{-x})^{-1} \\ &=-1*(1+e^{-x})^{-2}(-e^{-x}) \\ &=\frac{-e^{-x}}{-(1+e^{-x})^{2}} \\ &=\frac{e^{-x}}{(1+e^{-x})^{2}} \\ &=\frac{1}{1+e^{-x}} \frac{e^{-x}}{1+e^{-x}} \\ &=\frac{1}{1+e^{-x}} \frac{e^{-x} + (1 - 1)}{1+e^{-x}} \\ &=\frac{1}{1+e^{-x}} \frac{(1 + e^{-x}) - 1}{1+e^{-x}} \\ &=\frac{1}{1+e^{-x}} \left[ \frac{(1 + e^{-x})}{1+e^{-x}} - \frac{1}{1+e^{-x}} \right] \\ &=\frac{1}{1+e^{-x}} \left[ 1 - \frac{1}{1+e^{-x}} \right] \\ &=\sigma(x) (1-\sigma(x)) \\ \end{aligned}\]

Sigmoid derivative via quotient rule

Quotient rule: If \(f(x) = \frac{g(x)}{h(x)}\), then \(f'(x) = \frac{g'(x)h(x) - h'(x)g(x)}{(h(x))^2}\).

Example: Find the derivative of \(f(x) = \frac{3x}{1 + x}\):

\[\begin{aligned} f'(x) &= \frac{(\frac{d}{dx}(3x))*(1+x) - (\frac{d}{dx}(1+x)) * (3x)} {(1+x)^2} \\ &= \frac{3(1 + x) - 1(3x)}{(1+x)^2} \\ &= \frac{3 + 3x - 3x}{(1+x)^2} \\ &= \frac{3}{(1+x)^2} \end{aligned}\]

Line 2 of the sigmoid derivation below uses this rule.

\[\begin{aligned} \frac{d}{dx} \sigma(x) &= \frac{d}{dx} \left[ \frac{1}{1+e^{-x}} \right] \\ &=\frac{(0)(1 + e^{-x}) - (-e^{-x})(1)}{(1 + e^{-x})^2} \\ &=\frac{e^{-x}}{(1 + e^{-x})^2} \\ &=\frac{1}{1+e^{-x}} \frac{e^{-x}}{1+e^{-x}} \\ &=\frac{1}{1+e^{-x}} \frac{e^{-x} + (1 - 1)}{1+e^{-x}} \\ &=\frac{1}{1+e^{-x}} \frac{(1 + e^{-x}) - 1}{1+e^{-x}} \\ &=\frac{1}{1+e^{-x}} \left[ \frac{(1 + e^{-x})}{1+e^{-x}} - \frac{1}{1+e^{-x}} \right] \\ &=\frac{1}{1+e^{-x}} \left[ 1 - \frac{1}{1+e^{-x}} \right] \\ &=\sigma(x) (1-\sigma(x)) \\ \end{aligned}\]

Resources

Support my work

Support my work and become a patron here!

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/hauselin/rtutorialsite, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Lin (2019, Oct. 1). Data science: Neural networks: Deriving the sigmoid derivative via chain and quotient rules. Retrieved from https://hausetutorials.netlify.com/posts/2019-12-01-neural-networks-deriving-the-sigmoid-derivative/

BibTeX citation

@misc{lin2019neural,
  author = {Lin, Hause},
  title = {Data science: Neural networks: Deriving the sigmoid derivative via chain and quotient rules},
  url = {https://hausetutorials.netlify.com/posts/2019-12-01-neural-networks-deriving-the-sigmoid-derivative/},
  year = {2019}
}