书名：Hands-On Mathematics for Deep Learning
作者名：Jay Dawani
本章字数：657字
更新时间：2024-10-30 02:24:30

Derivatives

Earlier, we saw that functions are differentiated by using the limit of the variable in the quotient. But vectors, as we know, are not like scalars in that we cannot divide by vectors, which creates the need for new definitions for vector-valued functions.

We can define a vector function as a function —that is, it takes in a scalar value as input and outputs a vector. So, the derivative of F is defined as follows:

In the preceding equation, δx is a small perturbation on x. Additionally, F is only differentiable if the following applies:

We can also write the preceding differential as follows:

Generally, we differentiate vectors component-wise, so, the preceding differential becomes this:

Here e_i is an orthonormal basis vector.

Some rules for vector differentiation are shown in the following list:

We know from earlier that we use the concept of limits to find the derivative of a function. So, let's see how we can find the limit of a vector. We use the concept of norms here. We say , and so, if , then as , .

Generally, the derivative is calculated in all possible directions. But what if we want to find it in only one particular direction n (unit vector)? Then, assuming δr = hn, we have the following:

From this, we can derive the directional derivative to be the following:

This gives us the rate of change of f in this direction.

Suppose now that we have n = e_i. Then, our directional derivative becomes the following:

Therefore, we have the following:

And so, the condition of differentiability now becomes the following:

We can express this in differential notation, as follows:

This looks very similar to something we encountered earlier. It's the chain rule for partial derivatives.

Let's now take a function that takes in a vector input such that . The partial derivatives of this function are written as follows:

We can then write this collectively as an vector, which we write as follows:

Let's go a step further and imagine a vector function made of m different scalar functions, which take the vector x as input. We will write this as y = f(x).

Expanding y = f(x), we get the following:

Let's revisit the Jacobian matrix briefly. As you can see, it is simply an (m×n) matrix containing all the partial derivatives of the earlier vector function. We can see what this looks like here:

Let's go a step further and extend this definition to multiple functions. Here, we have y, which is the sum of two functions f and g, each taking in a different vectorial input, which gives us the following:

And for the sake of simplicity, f, g, a, and b are all n-dimensional, which results in an n×n matrix, as follows:

We can differentiate this matrix with respect to a or b and find the Jacobian matrix(es) for each.

By differentiating with respect to a, we get the following:

By differentiating with respect to b, we get the following:

We can do the same for any type of element-wise operation on the two functions.

As in single variable and multivariable calculus, we have a chain rule for vector differentiation as well.

Let's take the composition of two vector functions that take in a vector input , and so the gradient of this will be , which looks similar to what we encountered before. Let's expand this further, as follows:

In the majority of cases, for arguments in the Jacobian matrix where i ≠ j, the argument tends to be zero, which leads us to the following definitions:

And so, the following applies:

As we can see, this is a diagonal matrix.