What is the difference between a sigmoid followed by the cross entropy and sigmoid_cross_entropy_with_logits in TensorFlow?

QBoard » Artificial Intelligence & ML » AI and ML - Tensorflow » What is the difference between a sigmoid followed by the cross entropy and sigmoid_cross_entropy_with_logits in TensorFlow?

User Dashboard

What is the difference between a sigmoid followed by the cross entropy and sigmoid_cross_entropy_with_logits in TensorFlow?

Back To Topics

Tags : None

Tarun Reddy

When trying to get cross-entropy with sigmoid activation function, there is a difference between

loss1 = -tf.reduce_sum(p*tf.log(q), 1)

loss2 = tf.reduce_sum(tf.nn.sigmoid_cross_entropy_with_logits(labels=p, logits=logit_q),1)

But they are the same when with softmax activation function.

Following is the sample code:

import tensorflow as tf

sess2 = tf.InteractiveSession()
p = tf.placeholder(tf.float32, shape=[None, 5])
logit_q = tf.placeholder(tf.float32, shape=[None, 5])
q = tf.nn.sigmoid(logit_q)
sess.run(tf.global_variables_initializer())

feed_dict = {p: [[0, 0, 0, 1, 0], [1,0,0,0,0]], logit_q: [[0.2, 0.2, 0.2, 0.2, 0.2], [0.3, 0.3, 0.2, 0.1, 0.1]]}
loss1 = -tf.reduce_sum(p*tf.log(q),1).eval(feed_dict)
loss2 = tf.reduce_sum(tf.nn.sigmoid_cross_entropy_with_logits(labels=p, logits=logit_q),1).eval(feed_dict)

print(p.eval(feed_dict), "\n", q.eval(feed_dict))
print("\n",loss1, "\n", loss2)

August 28, 2021 11:37 PM IST

Samar Patil

346 3
you can understand differences between softmax and sigmoid cross entropy in following way:
1. for softmax cross entropy, it actually has one probability distribution
2. for sigmoid cross entropy, it actually has multi independently binary probability distributions, each binary probability distribution can treated as two class probability distribution
so anyway the cross entropy is:
```
   p * -tf.log(q)
```
for softmax cross entropy it looks exactly as above formula，

but for sigmoid, it looks a little different for it has multi binary probability distribution for each binary probability distribution, it is
```
p * -tf.log(q)+(1-p) * -tf.log(1-q)
```
p and (1-p) you can treat as two class probability within each binary probability distribution
August 30, 2021 1:23 PM IST

0
Viaan Prakash

461
You're confusing the cross-entropy for binary and multi-class problems.

Multi-class cross-entropy
The formula that you use is correct and it directly corresponds to tf.nn.softmax_cross_entropy_with_logits:
```
-tf.reduce_sum(p * tf.log(q), axis=1)
```
p and q are expected to be probability distributions over N classes. In particular, N can be 2, as in the following example:
```
p = tf.placeholder(tf.float32, shape=[None, 2])
logit_q = tf.placeholder(tf.float32, shape=[None, 2])
q = tf.nn.softmax(logit_q)

feed_dict = {
  p: [[0, 1],
      [1, 0],
      [1, 0]],
  logit_q: [[0.2, 0.8],
            [0.7, 0.3],
            [0.5, 0.5]]
}

prob1 = -tf.reduce_sum(p * tf.log(q), axis=1)
prob2 = tf.nn.softmax_cross_entropy_with_logits(labels=p, logits=logit_q)
print(prob1.eval(feed_dict))  # [ 0.43748799  0.51301527  0.69314718]
print(prob2.eval(feed_dict))  # [ 0.43748799  0.51301527  0.69314718]
```
Note that q is computing tf.nn.softmax, i.e. outputs a probability distribution. So it's still multi-class cross-entropy formula, only for N = 2.

Binary cross-entropy
This time the correct formula is
```
p * -tf.log(q) + (1 - p) * -tf.log(1 - q)
```
You're confusing the cross-entropy for binary and multi-class problems.

Multi-class cross-entropy
The formula that you use is correct and it directly corresponds to tf.nn.softmax_cross_entropy_with_logits:

-tf.reduce_sum(p * tf.log(q), axis=1)
p and q are expected to be probability distributions over N classes. In particular, N can be 2, as in the following example:

p = tf.placeholder(tf.float32, shape=[None, 2])
logit_q = tf.placeholder(tf.float32, shape=[None, 2])
q = tf.nn.softmax(logit_q)

feed_dict = {
p: [[0, 1],
[1, 0],
[1, 0]],
logit_q: [[0.2, 0.8],
[0.7, 0.3],
[0.5, 0.5]]
}

prob1 = -tf.reduce_sum(p * tf.log(q), axis=1)
prob2 = tf.nn.softmax_cross_entropy_with_logits(labels=p, logits=logit_q)
print(prob1.eval(feed_dict)) # [ 0.43748799 0.51301527 0.69314718]
print(prob2.eval(feed_dict)) # [ 0.43748799 0.51301527 0.69314718]
Note that q is computing tf.nn.softmax, i.e. outputs a probability distribution. So it's still multi-class cross-entropy formula, only for N = 2.

Binary cross-entropy
This time the correct formula is

p * -tf.log(q) + (1 - p) * -tf.log(1 - q)
Though mathematically it's a partial case of the multi-class case, the meaning of p and q is different. In the simplest case, each p and q is a number, corresponding to a probability of the class A.

Important: Don't get confused by the common p * -tf.log(q) part and the sum. Previous p was a one-hot vector, now it's a number, zero or one. Same for q - it was a probability distribution, now's it's a number (probability).

If p is a vector, each individual component is considered an independent binary classification. See this answer that outlines the difference between softmax and sigmoid functions in tensorflow. So the definition p = [0, 0, 0, 1, 0] doesn't mean a one-hot vector, but 5 different features, 4 of which are off and 1 is on. The definition q = [0.2, 0.2, 0.2, 0.2, 0.2] means that each of 5 features is on with 20% probability.

This explains the use of sigmoid function before the cross-entropy: its goal is to squash the logit to [0, 1] interval.

The formula above still holds for multiple independent features, and that's exactly what tf.nn.sigmoid_cross_entropy_with_logits computes:
```
p = tf.placeholder(tf.float32, shape=[None, 5])
logit_q = tf.placeholder(tf.float32, shape=[None, 5])
q = tf.nn.sigmoid(logit_q)

feed_dict = {
  p: [[0, 0, 0, 1, 0],
      [1, 0, 0, 0, 0]],
  logit_q: [[0.2, 0.2, 0.2, 0.2, 0.2],
            [0.3, 0.3, 0.2, 0.1, 0.1]]
}

prob1 = -p * tf.log(q)
prob2 = p * -tf.log(q) + (1 - p) * -tf.log(1 - q)
prob3 = p * -tf.log(tf.sigmoid(logit_q)) + (1-p) * -tf.log(1-tf.sigmoid(logit_q))
prob4 = tf.nn.sigmoid_cross_entropy_with_logits(labels=p, logits=logit_q)
print(prob1.eval(feed_dict))
print(prob2.eval(feed_dict))
print(prob3.eval(feed_dict))
print(prob4.eval(feed_dict))
```
You should see that the last three tensors are equal, while the prob1 is only a part of cross-entropy, so it contains correct value only when p is 1:
```
[[ 0.          0.          0.          0.59813893  0.        ]
 [ 0.55435514  0.          0.          0.          0.        ]]
[[ 0.79813886  0.79813886  0.79813886  0.59813887  0.79813886]
 [ 0.5543552   0.85435522  0.79813886  0.74439669  0.74439669]]
[[ 0.7981388   0.7981388   0.7981388   0.59813893  0.7981388 ]
 [ 0.55435514  0.85435534  0.7981388   0.74439663  0.74439663]]
[[ 0.7981388   0.7981388   0.7981388   0.59813893  0.7981388 ]
 [ 0.55435514  0.85435534  0.7981388   0.74439663  0.74439663]]
```
Now it should be clear that taking a sum of -p * tf.log(q) along axis=1 doesn't make sense in this setting, though it'd be a valid formula in multi-class case.
September 17, 2021 1:26 PM IST

0
Rishi Pandya

131 2

sigmoid_cross_entropy_with_logits solves N binary classifications at once. ... sigmoid_cross_entropy allows to set the in-batch weights, i.e. make some examples more important than others. tf. nn.

August 31, 2021 3:48 PM IST

0

Cluzters.ai

Cluzters.ai is the first step towards uniting various Industry participants in the field of Applied Data Innovations. It is a gamified community geared towards creating a level playing turf for Data science professionals.

Member Sign In

Member Sign In

Create Account

What is the difference between a sigmoid followed by the cross entropy and sigmoid_cross_entropy_with_logits in TensorFlow?

Connect With Us