One naive way to evaluate gradients in a computer is using method of finite differences, which uses the limit definition of gradient (Eq 1). Concretely, we iteratively evaluate the equation 1 for each dimension of x with a small value of h for that dimension. And it can be very slow when the size of x is large.
But thankfully, we do not have to do that. We can use calculus to compute an analytic gradient, i.e. to write down an expression for what the gradient should be.
In summary, there are 2 ways to compute gradients.
In practice, we should always use analytic gradients, but check implementation with numerical gradients. This is called gradient check.
torch.autograd is PyTorch’s automatic differentiation engine that helps us to compute gradients.
We first create a tensor x with requires_grad=True. This signals to autograd that every operation on it should be tracked. When we call .backward() on z, autograd calculates these gradients and stores them in the tensor’s .gradattribute. Hence, we can see the gradients in x.grad.