QBoard » Artificial Intelligence & ML » AI and ML - Conceptual » What does it mean to unroll a RNN dynamically?

What does it mean to unroll a RNN dynamically?

  • What does it mean to "unroll a RNN dynamically". I've seen this specifically mentioned in the Tensorflow source code, but I'm looking for a conceptual explanation that extends to RNN in general.

    In the tensorflow rnn method, it is documented:

    If the sequence_length vector is provided, dynamic calculation is performed. This method of calculation does not compute the RNN steps past the maximum sequence length of the minibatch (thus saving computational time),
    But in the dynamic_rnn method it mentions:

    The parameter sequence_length is optional and is used to copy-through state and zero-out outputs when past a batch element's sequence length. So it's more for correctness than performance, unlike in rnn().
    So does this mean rnn is more performant for variable length sequences? What is the conceptual difference between dynamic_rnn and rnn?
      June 11, 2019 4:06 PM IST
    0
  • From the documentation I understand that what they are saying is that the parameter sequence_length in the rnn method affects the performance because when set, it will perform dynamic computation and it will stop before.

    For example, if the rnn largest input sequence has a length of 50, if the other sequences are shorter it will be better to set the sequence_length for each sequence, so that the computation for each sequence will stop when the sequence ends and won't compute the padding zeros until reaching 50 timesteps. However, if sequence_length is not provided, it will consider each sequence to have the same length, so it will treat the zeros used for padding as normal items in the sequence.

    This does not mean that dynamic_rnn is less performant, the documentation says that the parameter sequence_length will not affect the performance because the computation is already dynamic.

    Also according to this post about RNNs in Tensorflow:

    Internally, tf.nn.rnn creates an unrolled graph for a fixed RNN length. That means, if you call tf.nn.rnn with inputs having 200 time steps you are creating a static graph with 200 RNN steps. First, graph creation is slow. Second, you’re unable to pass in longer sequences (> 200) than you’ve originally specified.

    tf.nn.dynamic_rnn solves this. It uses a tf.While loop to dynamically construct the graph when it is executed. That means graph creation is faster and you can feed batches of variable size. What about performance? You may think the static rnn is faster than its dynamic counterpart because it pre-builds the graph. In my experience that’s not the case.

    In short, just use tf.nn.dynamic_rnn. There is no benefit to tf.nn.rnn and I wouldn’t be surprised if it was deprecated in the future.
    dynamic_rnn is even faster (or equal) so he suggests to use dynamic_rnn anyway.
      June 11, 2019 4:16 PM IST
    0
  • LSTM (or GRU) cell are the base of both.

    Imagine an RNN as a stacked deep net with

    • weights sharing (=weights and biases matrices are the same in all layers)
    • input coming "from the side" into each layer
    • outputs are interpreted in higher layers (i.e. decoder), one in each layer

    The depth of this net should depend on (actually be equal to) actual input and output lengths. And nothing else, as weights are the same in all the layers anyway.

    Now, the classic way to build this is to group input-output-pairs into fixed max-lengths (i.e. model_with_buckets()). DynRNN breaks with this constraint and adapts to the actual sequence-lengths.

    So no real trade-off here. Except maybe that you'll have to rewrite older code in order to adapt.

      June 14, 2019 1:02 PM IST
    0