tf.clip_by_value(t, clip_value_min, clip_value_max, name=None)
This is an example that could be used but where do I introduce this ? In the def of RNN
lstm_cell = rnn_cell.BasicLSTMCell(n_hidden, forget_bias=1.0)
# Split data because rnn cell needs a list of inputs for the RNN inner loop
_X = tf.split(0, n_steps, _X) # n_steps
tf.clip_by_value(_X, -1, 1, name=None)
But this doesn't make sense as the tensor _X is the input and not the grad what is to be clipped?
Do I have to define my own Optimizer for this or is there a simpler option?
optimizer = tf.train.AdamOptimizer(1e-3)
gradients, variables = zip(*optimizer.compute_gradients(loss))
gradients, _ = tf.clip_by_global_norm(gradients, 5.0)
optimize = optimizer.apply_gradients(zip(gradients, variables))
Clipping each gradient matrix individually changes their relative scale but is also possible:
optimizer = tf.train.AdamOptimizer(1e-3)
gradients, variables = zip(*optimizer.compute_gradients(loss))
gradients = [
None if gradient is None else tf.clip_by_norm(gradient, 5.0)
for gradient in gradients]
optimize = optimizer.apply_gradients(zip(gradients, variables))
In TensorFlow 2, a tape computes the gradients, the optimizers come from Keras, and we don't need to store the update op because it runs automatically without passing it to a session:
optimizer = tf.keras.optimizers.Adam(1e-3)
# ...
with tf.GradientTape() as tape:
loss = ...
variables = ...
gradients = tape.gradient(loss, variables)
gradients, _ = tf.clip_by_global_norm(gradients, 5.0)
optimizer.apply_gradients(zip(gradients, variables))
It's easy for tf.keras!
optimizer = tf.keras.optimizers.Adam(clipvalue=1.0)
This optimizer will clip all gradients to values between [-1.0, 1.0]
.