QBoard » Artificial Intelligence & ML » AI and ML - Conceptual » What are logits? What is the difference between softmax and softmax_cross_entropy_with_logits?

What are logits? What is the difference between softmax and softmax_cross_entropy_with_logits?

  • In the tensorflow API docs they use a keyword called logits. What is it? A lot of methods are written like:
    tf.nn.softmax(logits, name=None)
    ​

    If logits is just a generic Tensor input, why is it named logits?

    Secondly, what is the difference between the following two methods?

    tf.nn.softmax(logits, name=None)
    tf.nn.softmax_cross_entropy_with_logits(logits, labels, name=None)
    I know what tf.nn.softmax does, but not the other. An example would be really helpful.

      August 6, 2021 10:05 PM IST
    0
  • Logits simply means that the function operates on the unscaled output of earlier layers and that the relative scale to understand the units is linear. It means, in particular, the sum of the inputs may not equal 1, that the values are not probabilities (you might have an input of 5).

    tf.nn.softmax produces just the result of applying the softmax function to an input tensor. The softmax "squishes" the inputs so that sum(input) = 1: it's a way of normalizing. The shape of output of a softmax is the same as the input: it just normalizes the values. The outputs of softmax can be interpreted as probabilities.

    a = tf.constant(np.array([[.1, .3, .5, .9]]))
    print s.run(tf.nn.softmax(a))
    [[ 0.16838508  0.205666    0.25120102  0.37474789]]​


    In contrast, tf.nn.softmax_cross_entropy_with_logits computes the cross entropy of the result after applying the softmax function (but it does it all together in a more mathematically careful way). It's similar to the result of:

    sm = tf.nn.softmax(x)
    ce = cross_entropy(sm)


    The cross entropy is a summary metric: it sums across the elements. The output of tf.nn.softmax_cross_entropy_with_logits on a shape [2,5] tensor is of shape [2,1] (the first dimension is treated as the batch).

    If you want to do optimization to minimize the cross entropy AND you're softmaxing after your last layer, you should use tf.nn.softmax_cross_entropy_with_logits instead of doing it yourself, because it covers numerically unstable corner cases in the mathematically right way. Otherwise, you'll end up hacking it by adding little epsilons here and there.

    Edited 2016-02-07: If you have single-class labels, where an object can only belong to one class, you might now consider using tf.nn.sparse_softmax_cross_entropy_with_logits so that you don't have to convert your labels to a dense one-hot array. This function was added after release 0.6.0.

      August 7, 2021 1:02 PM IST
    0
  • One more thing that I would definitely like to highlight as logit is just a raw output, generally the output of last layer. This can be a negative value as well. If we use it as it's for "cross entropy" evaluation as mentioned below:

    -tf.reduce_sum(y_true * tf.log(logits))
    
    then it wont work. As log of -ve is not defined. So using o softmax activation, will overcome this problem.
    This is my understanding, please correct me if Im wrong.

     

      August 7, 2021 11:12 PM IST
    0
  • tf.nn.softmax computes the forward propagation through a softmax layer. You use it during evaluation of the model when you compute the probabilities that the model outputs.

    tf.nn.softmax_cross_entropy_with_logits computes the cost for a softmax layer. It is only used during training.

    The logits are the unnormalized log probabilities output the model (the values output before the softmax normalization is applied to them).
      August 9, 2021 1:25 PM IST
    0