Does anybody know how to use the results of Word2vec or a GloVe pre-trained word embedding instead of a random one?
with tf.device('/cpu:0'), tf.name_scope("embedding"):
W = tf.Variable(
tf.random_uniform([vocab_size, embedding_size], -1.0, 1.0),
name="W")
self.embedded_chars = tf.nn.embedding_lookup(W, self.input_x)
self.embedded_chars_expanded = tf.expand_dims(self.embedded_chars, -1)
Does anybody know how to use the results of Word2vec or a GloVe pre-trained word embedding instead of a random one?
I use this method to load and share embedding.
W = tf.get_variable(name="W", shape=embedding.shape, initializer=tf.constant_initializer(embedding), trainable=False)
The answer of @mrry is not right because it provoques the overwriting of the embeddings weights each the network is run, so if you are following a minibatch approach to train your network, you are overwriting the weights of the embeddings. So, on my point of view the right way to pre-trained embeddings is:
embeddings = tf.get_variable("embeddings", shape=[dim1, dim2], initializer=tf.constant_initializer(np.array(embeddings_matrix))
With tensorflow version 2 its quite easy if you use the Embedding layer
X=tf.keras.layers.Embedding(input_dim=vocab_size,
output_dim=300,
input_length=Length_of_input_sequences,
embeddings_initializer=matrix_of_pretrained_weights
)(ur_inp)
!pip install "tensorflow_hub>=0.6.0"
!pip install "tensorflow>=2.0.0"
import tensorflow as tf
import tensorflow_hub as hub
module_url = "https://tfhub.dev/google/universal-sentence-encoder/4"
embed = hub.KerasLayer(module_url)
embeddings = embed(["A long sentence.", "single-word",
"http://example.com"])
print(embeddings.shape) #(3,128)
For more information the Pre-Trained Embeddings developed and open-sourced by Google, refer TF Hub Link.