Danny Denenberg

   Blog    About    Projects

Taking the derivative of sigmoid

A comprehensive walk-through of taking the derivative of the sigmoid function.

One of the most frequently used activation functions in machine learning, or more specifically, neural networks is the sigmoid function. In the backpropagation step in training a neural network, you have to find the derivative of the loss function with respect to each weight in the network. To do this, you have to find the derivative of your activation function. This article aims to clear up any confusion about finding the derivative of the sigmoid function.

To begin, here is the sigmoid function:

$\sigma(x)=\frac{1}{1+e^{-x}}$

For a test, take the sigmoid of 5 on your calculator. You should get 0.99330714907.

For the purposes of the derivative, this function can also be written as:

$\sigma(x)=(1+e^{-x})^{-1}$

First Impressions

The first thing I noticed about this function, is that it is a composition of functions. The first function being

$\sigma(x)=(m)^{-1}$

and the second being

$m=1+e^{-x}$

Recall that in Calculus, when there is a composition of functions, the derivative, is the first function with respect to the second multiplied by the second function with respect to the variable, in this case x. Like this:

$\frac{d\sigma}{dx}=\frac{d\sigma}{dm}\frac{dm}{dx}$

So, the derivative of the sigmoid with respect to x is the derivative of the sigmoid function with respect to m times the derivative of m with respect to x. You can think of this function composition rule as a kind of intermediate computation that results in the original derivative that you wanted by cross cancelation:

$\frac{d\sigma}{dx}=\frac{d\sigma}{\cancel{dm}}\frac{\cancel{dm}}{dx}=\frac{d\sigma}{dx}$

Now that we know the sigmoid function is a composition of functions, all we have to do to find the derivative, is:

  1. Find the derivative of the sigmoid function with respect to m, our intermediate value
  2. Find the derivative of m with respect to x
  3. Multiply those values together

1. Derivative of the sigmoid with respect to m

Let’s look back to what the sigmoid function looks like with m as our intermediate value:

$\sigma(x)=(m)^{-1}$

To find the derivative of this with respect to m is fairly simple if we can remember the power rule:

$\frac{d(x^n)}{dx}=n\times x^{n-1}$

The derivative of x^n is n times the derivative of x to the power of n-1.

So,

$\frac{d(x^n)}{dx}=-1\times m^{-1-1}=-m^{-2}$

Now, if we substitute our original value of m back into the equation, we get

$-m^{-2}=-(1+e^{-x})^{-2}$

Finally,

$\frac{d\sigma}{dm}=-(1+e^{-x})^{-2}$

Yay! We completed step 1.

2. Find the derivative of m with respect to x

Here’s m:

$m=1+e^{-x}$

To find the derivative, we have to find the derivative of each term with respect to x. The first term is easy:

$\frac{d(1)}{dx}=0$

The second term is a bit more complicated.

Let’s let

$y=e^u$

and

$u=-x$

We know that

$\frac{dy}{du}=e^u \\ \frac{du}{dx}=-1$

If getting to e^u is not clear, please read this.

Now, using the chain rule once again,

$\frac{dy}{dx}=\frac{dy}{du}\frac{du}{dx}$

So, we just multiply those derivatives we just calculated to get the derivative with respect to x:

$\frac{dy}{dx}=e^u \times (-1)=-e^u=-e^{-x}$

All in all for step 2,

$\frac{dm}{dx}=-e^{-x}$

3. Multiply the derivatives

Recall, that once we found the two intermediate derivatives, we had to multiply them. So, here is a quick summary:

$\frac{d\sigma}{dx}=\frac{d\sigma}{dm}\frac{dm}{dx}$

$\frac{d\sigma}{dm}=-(1+e^{-x})^{-2}$

$\frac{dm}{dx}=-e^{-x}$

Now, if you remember how to multiply :), we can finally finish this!

$\frac{d\sigma}{dx}=-(1+e^{-x})^{-2} \times -e^{-x}=\frac{e^{-x}}{(1+e^{-x})^2}$

You can now take this value and use it as the derivation of the sigmoid function. An interesting thing occurs after you manipulate this result, though. It turns out that you can rewrite the derivative like this:

$\frac{d\sigma}{dx}=\sigma(x)(1-\sigma(x))$

The derivative of the sigmoid function is the sigmoid times 1 minus the sigmoid. Wow. I feel cheated. :)

If you even remotely enjoyed this article, keep learning about AI and deep learning! 👍


Questions?

Have a question about this post, js, GitHub, or anything else? Ask away on my AMA repo or by 🔫-ing me an email.