Does Tensorflow have something similar to scikit learn's One Hot Encoder?

QBoard » Artificial Intelligence & ML » AI and ML - Tensorflow » Does Tensorflow have something similar to scikit learn's One Hot Encoder?

User Dashboard

Does Tensorflow have something similar to scikit learn's One Hot Encoder?

Back To Topics

Tags : tensorflow python machine-learning neural-network

Viaan Prakash

461

Does tensorflow have something similar to scikit learn's one hot encoder for processing categorical data? Would using a placeholder of tf.string behave as categorical data?

I realize I can manually pre-process the data before sending it to tensorflow, but having it built in is very convenient.

December 22, 2020 5:52 PM IST

0
Tarun Reddy

84
A simple and short way to one-hot encode any integer or list of integers:
```
a = 5 
b = [1, 2, 3]
# one hot an integer
one_hot_a = tf.nn.embedding_lookup(np.identity(10), a)
# one hot a list of integers
one_hot_b = tf.nn.embedding_lookup(np.identity(max(b)+1), b)
```
This post was edited by Tarun Reddy at December 22, 2020 6:38 PM IST
December 22, 2020 6:38 PM IST

0
Nitara Bobal

53
As of TensorFlow 0.8, there is now a native one-hot op, tf.one_hot that can convert a set of sparse labels to a dense one-hot representation. This is in addition to tf.nn.sparse_softmax_cross_entropy_with_logits, which can in some cases let you compute the cross entropy directly on the sparse labels instead of converting them to one-hot.

Previous answer, in case you want to do it the old way: @Salvador's answer is correct - there (used to be) no native op to do it. Instead of doing it in numpy, though, you can do it natively in tensorflow using the sparse-to-dense operators:
```
num_labels = 10

# label_batch is a tensor of numeric labels to process
# 0 <= label < num_labels

sparse_labels = tf.reshape(label_batch, [-1, 1])
derived_size = tf.shape(label_batch)[0]
indices = tf.reshape(tf.range(0, derived_size, 1), [-1, 1])
concated = tf.concat(1, [indices, sparse_labels])
outshape = tf.pack([derived_size, num_labels])
labels = tf.sparse_to_dense(concated, outshape, 1.0, 0.0)
```
The output, labels, is a one-hot matrix of batch_size x num_labels.

Note also that as of 2016-02-12 (which I assume will eventually be part of a 0.7 release), TensorFlow also has the tf.nn.sparse_softmax_cross_entropy_with_logits op, which in some cases can let you do training without needing to convert to a one-hot encoding.

Edited to add: At the end, you may need to explicitly set the shape of labels. The shape inference doesn't recognize the size of the num_labels component. If you don't need a dynamic batch size with derived_size, this can be simplified.
December 22, 2020 6:44 PM IST

0

Sowkya Annam

Tensorflow 2.0 Compatible Answer: You can do it efficiently using Tensorflow Transform.

Code for performing One-Hot Encoding using Tensorflow Transform is shown below:

def get_feature_columns(tf_transform_output):
  """Returns the FeatureColumns for the model.

  Args:
    tf_transform_output: A `TFTransformOutput` object.

  Returns:
    A list of FeatureColumns.
  """
  # Wrap scalars as real valued columns.
  real_valued_columns = [tf.feature_column.numeric_column(key, shape=())
                         for key in NUMERIC_FEATURE_KEYS]

  # Wrap categorical columns.
  one_hot_columns = [
      tf.feature_column.categorical_column_with_vocabulary_file(
          key=key,
          vocabulary_file=tf_transform_output.vocabulary_file_by_name(
              vocab_filename=key))
      for key in CATEGORICAL_FEATURE_KEYS]

  return real_valued_columns + one_hot_columns

December 22, 2020 10:34 PM IST

Member Sign In

Member Sign In

Create Account

Does Tensorflow have something similar to scikit learn's One Hot Encoder?

Connect With Us