What is the correct way to perform gradient clipping in pytorch?
I have an exploding gradients problem, and I need to program my way around it.
A more complete example
optimizer.zero_grad()
loss, hidden = model(data, hidden, targets)
loss.backward()
torch.nn.utils.clip_grad_norm_(model.parameters(), args.clip)
optimizer.step()
And if you are using Automatic Mixed Precision (AMP), you need to do a bit more before clipping:
optimizer.zero_grad()
loss, hidden = model(data, hidden, targets)
self.scaler.scale(loss).backward()
# Unscales the gradients of optimizer's assigned params in-place
self.scaler.unscale_(optimizer)
# Since the gradients of optimizer's assigned params are unscaled, clips as usual:
torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm)
# optimizer's gradients are already unscaled, so scaler.step does not unscale them,
# although it still skips optimizer.step() if the gradients contain infs or NaNs.
scaler.step(optimizer)
# Updates the scale for next iteration.
scaler.update()
Well, I met with same err. I tried to use the clip norm but it doesn't work.
I don't want to change the network or add regularizers. So I change the optimizer to Adam, and it works.
Then I use the pretrained model from Adam to initate the training and use SGD + momentum for fine tuning. It is now working.