QBoard » Artificial Intelligence & ML » AI and ML - Tensorflow » Using a pre-trained word embedding (word2vec or Glove) in TensorFlow

Using a pre-trained word embedding (word2vec or Glove) in TensorFlow

  • I've recently reviewed an interesting implementation for convolutional text classification. However all TensorFlow code I've reviewed uses a random (not pre-trained) embedding vectors like the following:
    with tf.device('/cpu:0'), tf.name_scope("embedding"):
        W = tf.Variable(
            tf.random_uniform([vocab_size, embedding_size], -1.0, 1.0),
            name="W")
        self.embedded_chars = tf.nn.embedding_lookup(W, self.input_x)
        self.embedded_chars_expanded = tf.expand_dims(self.embedded_chars, -1)​

    Does anybody know how to use the results of Word2vec or a GloVe pre-trained word embedding instead of a random one?

     
      August 3, 2021 10:49 PM IST
    0
  • I use this method to load and share embedding.

    W = tf.get_variable(name="W", shape=embedding.shape, initializer=tf.constant_initializer(embedding), trainable=False)
    
      August 17, 2021 1:06 PM IST
    0
  • The answer of @mrry is not right because it provoques the overwriting of the embeddings weights each the network is run, so if you are following a minibatch approach to train your network, you are overwriting the weights of the embeddings. So, on my point of view the right way to pre-trained embeddings is:

    embeddings = tf.get_variable("embeddings", shape=[dim1, dim2], initializer=tf.constant_initializer(np.array(embeddings_matrix))
    
      October 30, 2021 2:29 PM IST
    0
  • With tensorflow version 2 its quite easy if you use the Embedding layer

    X=tf.keras.layers.Embedding(input_dim=vocab_size,
                                output_dim=300,
                                input_length=Length_of_input_sequences,
                                embeddings_initializer=matrix_of_pretrained_weights
                                )(ur_inp)

     

      November 15, 2021 12:41 PM IST
    0
  • 2.0 Compatible Answer: There are many Pre-Trained Embeddings, which are developed by Google and which have been Open Sourced.

    Some of them are Universal Sentence Encoder (USE), ELMO, BERT, etc.. and it is very easy to reuse them in your code.

    Code to reuse the Pre-Trained Embedding, Universal Sentence Encoder is shown below:

    !pip install "tensorflow_hub>=0.6.0"
      !pip install "tensorflow>=2.0.0"
    
      import tensorflow as tf
      import tensorflow_hub as hub
    
      module_url = "https://tfhub.dev/google/universal-sentence-encoder/4"
      embed = hub.KerasLayer(module_url)
      embeddings = embed(["A long sentence.", "single-word",
                          "http://example.com"])
      print(embeddings.shape)  #(3,128)​


    For more information the Pre-Trained Embeddings developed and open-sourced by Google, refer TF Hub Link.


      August 16, 2021 3:07 PM IST
    0