Tensorflow RNN cells have different weights

QBoard » Artificial Intelligence & ML » AI and ML - Tensorflow » Tensorflow RNN cells have different weights

User Dashboard

Tensorflow RNN cells have different weights

Back To Topics

Tags : rnn tensorflow

Rakesh Racharla

129 8

I'm trying to write a simple RNN in tensorflow, based on the tutorial here: https://danijar.com/introduction-to-recurrent-networks-in-tensorflow/ (I'm using a simple RNN cell rather than GRU, and not using dropout).

I'm confused because the different RNN cells in my sequence appear to be being assigned separate weights. If I run the following code

import tensorflow as tf

seq_length = 3
n_h = 100 # Number of hidden units
n_x = 26 # Size of input layer
n_y = 26 # Size of output layer

inputs = tf.placeholder(tf.float32, [None, seq_length, n_x])

cells = []
for _ in range(seq_length):
cell = tf.contrib.rnn.BasicRNNCell(n_h)
cells.append(cell)
multi_rnn_cell = tf.contrib.rnn.MultiRNNCell(cells)

initial_state = tf.placeholder(tf.float32, [None, n_h])

outputs_h, output_final_state = tf.nn.dynamic_rnn(multi_rnn_cell, inputs, dtype=tf.float32)

sess = tf.Session()
sess.run(tf.global_variables_initializer())

print('Trainable variables:')
for v in tf.trainable_variables():
print(v)
If I run this in python 3, I get the following output:

Trainable variables:

Firstly, this isn't what I want - an RNN needs to have the same weights from input-to-hidden and hidden-to-hidden at each layer!

Secondly, I don't really understand why I get all these separate variables. If I look at the source code for rnn cells it looks like BasicRNNCell should call _linear, which should look up whether there's a variable with name _WEIGHTS_VARIABLE_NAME (which is set globally to "kernel"), and use it if so. I don't understand how "kernel" gets decorated to "rnn/multi_rnn_cell/cell_0/basic_rnn_cell/kernel:0".

If anyone can explain what I'm doing wrong, I'd be very grateful.

June 11, 2019 4:01 PM IST

0
Pranav B

106 5
Pay attention to distinguish two different things: the number of layers of your recurrent neural network and the number of time this RNN gets unrolled by the Back Propagation Through Time algorithm to handle sequence length.

In your code:
- The MultiCellRNN is taking care of creating a 3 layers RNN (you are creating three LAYERS there, and MultiCellRNN is only a wrapper to make easier to deal with them)
- The tf.nn.dynamic_rnn is taking care of unrolling this three layered network for a number of times related to your sequence length
June 11, 2019 4:03 PM IST

0
Vaibhav Mali

259
Recurrent networks like LSTM and GRU are powerful sequence models. I will explain how to create recurrent networks in TensorFlow and use them for sequence classification and labelling tasks.

If you are not familiar with recurrent networks, I suggest you take a look at Christopher Olah’s great article first. On the TensorFlow part, I also expect some basic knowledge. The official tutorials are a good place to start.

Defining the Network

To use recurrent networks in TensorFlow we first need to define the network architecture consiting of one or more layers, the cell type and possibly dropout between the layers. In TensorFlow, we build recurrent networks out of so called cells that wrap each other.
```
import tensorflow as tf

num_units = 200
num_layers = 3
dropout = tf.placeholder(tf.float32)

cells = []
for _ in range(num_layers):
  cell = tf.contrib.rnn.GRUCell(num_units)  # Or LSTMCell(num_units)
  cell = tf.contrib.rnn.DropoutWrapper(
      cell, output_keep_prob=1.0 - dropout)
  cells.append(cell)
cell = tf.contrib.rnn.MultiRNNCell(cells)
```
Simulating Time Steps

We can now add the operations to the graph that simulate the recurrent network over the time steps of the input. We do this using TensorFlow’s dynamic_rnn() operation. It takes the a tensor block holding the input sequences and returns the output activations and last hidden state as tensors.
```
# Batch size x time steps x features.
data = tf.placeholder(tf.float32, [None, None, 28])
output, state = tf.nn.dynamic_rnn(cell, data, dtype=tf.float32)
```
August 28, 2021 1:15 PM IST

0
Samar Patil

346 3
You can recover the LSTM weights from your tensorflow session "sess" as follows:
```
trainable_vars_dict = {}
for key in tvars:
    trainable_vars_dict[key.name] = sess.run(key)
    # Checking the names of the keys
    print(key)
```
From this code you will get the key names. One key name corresponds to a matrix containing all weights of LSTM. The key in your case should have the name "LSTM/rnn/basic_lstm_cell/weights:0". Assuming the size of your input is input_size, you have to do:
```
lstm_weight_vals = trainable_vars_dict["LSTM/rnn/basic_lstm_cell/weights:0"]
w_i, w_C, w_f, w_o = np.split(lstm_weight_vals, 4, axis=1)

w_xi = w_i[:input_size, :]
w_hi = w_i[input_size:, :]

w_xC = w_C[:input_size, :]
w_hC = w_C[input_size:, :]

w_xf = w_f[:input_size, :]
w_hf = w_f[input_size:, :]

w_xo = w_o[:input_size, :]
w_ho = w_o[input_size:, :]
```
Where the matrices with "h" in them should be quadratic at the end (of size $http://www.w3.org/1998/Math/MathML"><mn>128</mn><mo>×</mo><mn>128</mn></math>"> id="MathJax-Span-7" class="math"> 128 \times 128$ in your case). I think for you the input size is $http://www.w3.org/1998/Math/MathML"><mn>28</mn></math>"> id="MathJax-Span-12" class="math"> 28$ .
September 16, 2021 1:36 PM IST

0

Viaan Prakash

461

You can see that the weights are not shared by executing the following script:

import tensorflow as tf

with tf.variable_scope("scope1") as vs:
  cell = tf.nn.rnn_cell.GRUCell(10)
  stacked_cell = tf.nn.rnn_cell.MultiRNNCell([cell] * 2)
  stacked_cell(tf.Variable(np.zeros((100, 100), dtype=np.float32), name="moo"), tf.Variable(np.zeros((100, 100), dtype=np.float32), "bla"))
  # Retrieve just the LSTM variables.
  vars = [v.name for v in tf.all_variables()
                    if v.name.startswith(vs.name)]
  print vars

You will see that besides dummy variables it returns two sets of GRU weights: those with "Cell1" and those with "Cell0".

To make them shared, you can implement your own cell class that inherits from GRUcell and always reuses the weights by the means of always using the same variable scope:

import tensorflow as tf

class SharedGRUCell(tf.nn.rnn_cell.GRUCell):
    def __init__(self, num_units, input_size=None, activation=tf.nn.tanh):
        tf.nn.rnn_cell.GRUCell.__init__(self, num_units, input_size, activation)
        self.my_scope = None

    def __call__(self, a, b):
        if self.my_scope == None:
            self.my_scope = tf.get_variable_scope()
        else:
            self.my_scope.reuse_variables()
        return tf.nn.rnn_cell.GRUCell.__call__(self, a, b, self.my_scope)

with tf.variable_scope("scope2") as vs:
  cell = SharedGRUCell(10)
  stacked_cell = tf.nn.rnn_cell.MultiRNNCell([cell] * 2)
  stacked_cell(tf.Variable(np.zeros((20, 10), dtype=np.float32), name="moo"), tf.Variable(np.zeros((20, 10), dtype=np.float32), "bla"))
  # Retrieve just the LSTM variables.
  vars = [v.name for v in tf.all_variables()
                    if v.name.startswith(vs.name)]
  print vars

This way the variables between the two GRUCells are shared. Note that you need to be careful with shapes, since the same cell need to work with both the raw input and the output of itself.

September 17, 2021 1:20 PM IST

Cluzters.ai

Cluzters.ai is the first step towards uniting various Industry participants in the field of Applied Data Innovations. It is a gamified community geared towards creating a level playing turf for Data science professionals.

Member Sign In

Member Sign In

Create Account

Tensorflow RNN cells have different weights

Defining the Network

Simulating Time Steps

Connect With Us