QBoard » Artificial Intelligence & ML » AI and ML - Tensorflow » How to apply gradient clipping in TensorFlow?

How to apply gradient clipping in TensorFlow?

  • I would like to know How to apply gradient clipping on this network on the RNN where there is a possibility of exploding gradients.
    tf.clip_by_value(t, clip_value_min, clip_value_max, name=None)

    This is an example that could be used but where do I introduce this ? In the def of RNN

     lstm_cell = rnn_cell.BasicLSTMCell(n_hidden, forget_bias=1.0)
        # Split data because rnn cell needs a list of inputs for the RNN inner loop
        _X = tf.split(0, n_steps, _X) # n_steps
    tf.clip_by_value(_X, -1, 1, name=None)

    But this doesn't make sense as the tensor _X is the input and not the grad what is to be clipped?

    Do I have to define my own Optimizer for this or is there a simpler option?

     

      December 14, 2020 7:17 PM IST
    0
  • Despite what seems to be popular, you probably want to clip the whole gradient by its global norm:
    optimizer = tf.train.AdamOptimizer(1e-3)
    gradients, variables = zip(*optimizer.compute_gradients(loss))
    gradients, _ = tf.clip_by_global_norm(gradients, 5.0)
    optimize = optimizer.apply_gradients(zip(gradients, variables))

    Clipping each gradient matrix individually changes their relative scale but is also possible:

    optimizer = tf.train.AdamOptimizer(1e-3)
    gradients, variables = zip(*optimizer.compute_gradients(loss))
    gradients = [
        None if gradient is None else tf.clip_by_norm(gradient, 5.0)
        for gradient in gradients]
    optimize = optimizer.apply_gradients(zip(gradients, variables))

    In TensorFlow 2, a tape computes the gradients, the optimizers come from Keras, and we don't need to store the update op because it runs automatically without passing it to a session:

    optimizer = tf.keras.optimizers.Adam(1e-3)
    # ...
    with tf.GradientTape() as tape:
      loss = ...
    variables = ...
    gradients = tape.gradient(loss, variables)
    gradients, _ = tf.clip_by_global_norm(gradients, 5.0)
    optimizer.apply_gradients(zip(gradients, variables))

     

      December 16, 2020 1:21 PM IST
    0
  • Gradient Clipping basically helps in case of exploding or vanishing gradients.Say your loss is too high which will result in exponential gradients to flow through the network which may result in Nan values . To overcome this we clip gradients within a specific range (-1 to 1 or any range as per condition) .

    clipped_value=tf.clip_by_value(grad, -range, +range), var) for grad, var in grads_and_vars

    where grads _and_vars are the pairs of gradients (which you calculate via tf.compute_gradients) and their variables they will be applied to.

    After clipping we simply apply its value using an optimizer. optimizer.apply_gradients(clipped_value)
      October 26, 2021 12:56 PM IST
    0
  • It's easy for tf.keras!

    optimizer = tf.keras.optimizers.Adam(clipvalue=1.0)
    

     

    This optimizer will clip all gradients to values between [-1.0, 1.0].

     

      October 27, 2021 1:53 PM IST
    0