QBoard » Artificial Intelligence & ML » AI and ML - Tensorflow » What is num_units in tensorflow BasicLSTMCell?

What is num_units in tensorflow BasicLSTMCell?

  • In MNIST LSTM examples, I don't understand what "hidden layer" means. Is it the imaginary-layer formed when you represent an unrolled RNN over time?

    Why is the num_units = 128  in most cases ?

      September 8, 2020 3:53 PM IST
    0
  • The number of hidden units is a direct representation of the learning capacity of a neural network -- it reflects the number of learned parameters. The value 128 was likely selected arbitrarily or empirically. You can change that value experimentally and rerun the program to see how it affects the training accuracy (you can get better than 90% test accuracy with a lot fewer hidden units). Using more units makes it more likely to perfectly memorize the complete training set (although it will take longer, and you run the risk of over-fitting).
    The key thing to understand, which is somewhat subtle in the famous Colah's blog post (find "each line carries an entire vector"), is that X is an array of data (nowadays often called a tensor) -- it is not meant to be a scalar value. Where, for example, the tanh function is shown, it is meant to imply that the function is broadcast across the entire array (an implicit for loop) -- and not simply performed once per time-step.
    As such, the hidden units represent tangible storage within the network, which is manifest primarily in the size of the weights array. And because an LSTM actually does have a bit of it's own internal storage separate from the learned model parameters, it has to know how many units there are -- which ultimately needs to agree with the size of the weights. In the simplest case, an RNN has no internal storage -- so it doesn't even need to know in advance how many "hidden units" it is being applied to.
    This post was edited by Rakesh Racharla at September 8, 2020 3:57 PM IST
      September 8, 2020 3:56 PM IST
    0
  • Since I had some problems to combine the information from the different sources I created the graphic below which shows a combination of the blog post (http://colah.github.io/posts/2015-08-Understanding-LSTMs/) and (https://jasdeep06.github.io/posts/Understanding-LSTM-in-Tensorflow-MNIST/) where I think the graphics are very helpful but an error in explaining the number_units is present.

    Several LSTM cells form one LSTM layer. This is shown in the figure below. Since you are mostly dealing with data that is very extensive, it is not possible to incorporate everything in one piece into the model. Therefore, data is divided into small pieces as batches, which are processed one after the other until the batch containing the last part is read in. In the lower part of the figure you can see the input (dark grey) where the batches are read in one after the other from batch 1 to batch batch_size. The cells LSTM cell 1 to LSTM cell time_step above represent the described cells of the LSTM model (http://colah.github.io/posts/2015-08-Understanding-LSTMs/). The number of cells is equal to the number of fixed time steps. For example, if you take a text sequence with a total of 150 characters, you could divide it into 3 (batch_size) and have a sequence of length 50 per batch (number of time_steps and thus of LSTM cells). If you then encoded each character one-hot, each element (dark gray boxes of the input) would represent a vector that would have the length of the vocabulary (number of features). These vectors would flow into the neuronal networks (green elements in the cells) in the respective cells and would change their dimension to the length of the number of hidden units (number_units). So the input has the dimension (batch_size x time_step x features). The Long Time Memory (Cell State) and Short Time Memory (Hidden State) have the same dimensions (batch_size x number_units). The light gray blocks that arise from the cells have a different dimension because the transformations in the neural networks (green elements) took place with the help of the hidden units (batch_size x time_step x number_units). The output can be returned from any cell but mostly only the information from the last block (black border) is relevant (not in all problems) because it contains all information from the previous time steps.

    LSTM architecture_new

      November 24, 2021 12:09 PM IST
    0
  • From this brilliant article

    num_units can be interpreted as the analogy of hidden layer from the feed forward neural network.The number of nodes in hidden layer of a feed forward neural network is equivalent to num_units number of LSTM units in a LSTM cell at every time step of the network.

    See the image there too!

                                           

      September 8, 2020 4:00 PM IST
    0
    • Rakesh Racharla
      Rakesh Racharla @Viaan Prakash,Excellent block diagram for LSTM, Can you explain with diagram what exactly is inside of the units in the num_units of each LSTM cell, as each LSTM cell contains Input Gate, output Gate and Forget gates respectively.
      September 8, 2020
    • Viaan Prakash
      Viaan Prakash @Rakesh Racharla,Inside the LSTM cell are LSTM units. In the article cited, each of the num_units in each LSTM cells receives one pixel of a certain row of an image. The size of the image is 28x28 pixels. In the example, they used 28 num_units and 28...  more
      September 8, 2020