QBoard » Artificial Intelligence & ML » AI and ML - Tensorflow » Where do I call the BatchNormalization function in Keras?

Where do I call the BatchNormalization function in Keras?

  • If I want to use the BatchNormalization function in Keras, then do I need to call it once only at the beginning?

    I read this documentation for it: http://keras.io/layers/normalization/

    I don't see where I'm supposed to call it. Below is my code attempting to use it:

    model = Sequential()
    keras.layers.normalization.BatchNormalization(epsilon=1e-06, mode=0, momentum=0.9, weights=None)
    model.add(Dense(64, input_dim=14, init='uniform'))
    model.add(Activation('tanh'))
    model.add(Dropout(0.5))
    model.add(Dense(64, init='uniform'))
    model.add(Activation('tanh'))
    model.add(Dropout(0.5))
    model.add(Dense(2, init='uniform'))
    model.add(Activation('softmax'))
    
    sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
    model.compile(loss='binary_crossentropy', optimizer=sgd)
    model.fit(X_train, y_train, nb_epoch=20, batch_size=16, show_accuracy=True, validation_split=0.2, verbose = 2)​

    I ask because if I run the code with the second line including the batch normalization and if I run the code without the second line I get similar outputs. So either I'm not calling the function in the right place, or I guess it doesn't make that much of a difference

    This post was edited by Jasmine Chacko at September 9, 2020 11:28 AM IST
      September 9, 2020 11:27 AM IST
    0
  • ust to answer this question in a little more detail, and as Pavel said, Batch Normalization is just another layer, so you can use it as such to create your desired network architecture.

    The general use case is to use BN between the linear and non-linear layers in your network, because it normalizes the input to your activation function, so that you're centered in the linear section of the activation function (such as Sigmoid). There's a small discussion of it here

    In your case above, this might look like

    # import BatchNormalization
    from keras.layers.normalization import BatchNormalization
    
    # instantiate model
    model = Sequential()
    
    # we can think of this chunk as the input layer
    model.add(Dense(64, input_dim=14, init='uniform'))
    model.add(BatchNormalization())
    model.add(Activation('tanh'))
    model.add(Dropout(0.5))
    
    # we can think of this chunk as the hidden layer    
    model.add(Dense(64, init='uniform'))
    model.add(BatchNormalization())
    model.add(Activation('tanh'))
    model.add(Dropout(0.5))
    
    # we can think of this chunk as the output layer
    model.add(Dense(2, init='uniform'))
    model.add(BatchNormalization())
    model.add(Activation('softmax'))
    
    # setting up the optimization of our weights 
    sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
    model.compile(loss='binary_crossentropy', optimizer=sgd)
    
    # running the fitting
    model.fit(X_train, y_train, nb_epoch=20, batch_size=16, show_accuracy=True, validation_split=0.2, verbose = 2)
    


    Hope this clarifies things a bit more.

      September 9, 2020 12:19 PM IST
    0
  • Batch normalization works best after the activation function, and here or here is why: it was developed to prevent internal covariate shift. Internal covariate shift occurs when the distribution of the activations of a layer shifts significantly throughout training. Batch normalization is used so that the distribution of the inputs (and these inputs are literally the result of an activation function) to a specific layer doesn't change over time due to parameter updates from each batch (or at least, allows it to change in an advantageous way). It uses batch statistics to do the normalizing, and then uses the batch normalization parameters (gamma and beta in the original paper) "to make sure that the transformation inserted in the network can represent the identity transform" (quote from original paper). But the point is that we're trying to normalize the inputs to a layer, so it should always go immediately before the next layer in the network. Whether or not that's after an activation function is dependent on the architecture in question.
      September 9, 2020 12:21 PM IST
    0
    • Mitali Bhavsar
      Mitali Bhavsar I saw in the deeplearning.ai class that Andrew Ng says that there is a debate on this in the Deep Learning community. He prefers applying batch normalization before the non-linearity
      September 9, 2020
  • The authors of Batch Normalization say that It should be applied immediately before the non-linearity of the current layer. The reason ( quoted from original paper) -

    "We add the BN transform immediately before the nonlinearity, by normalizing x = Wu+b. We could have also normalized the layer inputs u, but since u is likely the output of another nonlinearity, the shape of its distribution is likely to change during training, and constraining its first and second moments would not eliminate the covariate shift. In contrast, Wu + b is more likely to have a symmetric, non-sparse distribution, that is “more Gaussian” (Hyv¨arinen & Oja, 2000); normalizing it is likely to produce activations with a stable distribution."

      September 9, 2020 12:24 PM IST
    0
  • It's almost become a trend now to have a Conv2D followed by a ReLu followed by a BatchNormalization layer. So I made up a small function to call all of them at once. Makes the model definition look a whole lot cleaner and easier to read.

    def Conv2DReluBatchNorm(n_filter, w_filter, h_filter, inputs):
        return BatchNormalization()(Activation(activation='relu')(Convolution2D(n_filter, w_filter, h_filter, border_mode='same')(inputs)))​
      September 9, 2020 12:26 PM IST
    0
  • Keras now supports the use_bias=False option, so we can save some computation by writing like

    model.add(Dense(64, use_bias=False))
    model.add(BatchNormalization(axis=bn_axis))
    model.add(Activation('tanh'))

    or

    model.add(Convolution2D(64, 3, 3, use_bias=False))
    model.add(BatchNormalization(axis=bn_axis))
    model.add(Activation('relu'))
     
      September 9, 2020 12:28 PM IST
    0