init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
mse = tf.reduce_mean(tf.square(out - out_))
train_step = tf.train.GradientDescentOptimizer(0.3).minimize(mse)
The thing here is that I don't know how to set an update rule for the learning rate or a decay value for that.
How can I use an adaptive learning rate here?
This post was edited by Shivakumar Kota at August 24, 2020 2:01 PM ISTlearning_rate = tf.placeholder(tf.float32, shape=[])
# ...
train_step = tf.train.GradientDescentOptimizer(
learning_rate=learning_rate).minimize(mse)
sess = tf.Session()
# Feed different values for learning rate to each training step.
sess.run(train_step, feed_dict={learning_rate: 0.1})
sess.run(train_step, feed_dict={learning_rate: 0.1})
sess.run(train_step, feed_dict={learning_rate: 0.01})
sess.run(train_step, feed_dict={learning_rate: 0.01})
From tensorflow official docs
global_step = tf.Variable(0, trainable=False)
starter_learning_rate = 0.1
learning_rate = tf.train.exponential_decay(starter_learning_rate, global_step,
100000, 0.96, staircase=True)
# Passing global_step to minimize() will increment it at each step.
learning_step = (
tf.train.GradientDescentOptimizer(learning_rate)
.minimize(...my loss..., global_step=global_step))
# Optimizer: set up a variable that's incremented once per batch and
# controls the learning rate decay.
batch = tf.Variable(0)
learning_rate = tf.train.exponential_decay(
0.01, # Base learning rate.
batch * BATCH_SIZE, # Current index into the dataset.
train_size, # Decay step.
0.95, # Decay rate.
staircase=True)
# Use simple momentum for the optimization.
optimizer = tf.train.MomentumOptimizer(learning_rate,
0.9).minimize(loss,
global_step=batch)
Note the global_step=batch parameter to minimize. That tells the optimizer to helpfully increment the 'batch' parameter for you every time it trains.Gradient descent algorithm uses the constant learning rate which you can provide in during the initialization. You can pass various learning rates in a way showed by Mrry.
But instead of it you can also use more advanced optimizers which have faster convergence rate and adapts to the situation.
Here is a brief explanation based on my understanding:
Adam or adaptive momentum is an algorithm similar to AdaDelta. But in addition to storing learning rates for each of the parameters it also stores momentum changes for each of them separately