Saving trained Tensorflow model to inference on another machine [duplicate]

QBoard » Artificial Intelligence & ML » AI and ML - Conceptual » Saving trained Tensorflow model to inference on another machine [duplicate]

User Dashboard

Saving trained Tensorflow model to inference on another machine [duplicate]

Back To Topics

Tags : data tensorflow python datasciecne

Rishi Pandya

131 2

This question already has answers here:

How to save/restore a model after training? (26 answers)

Closed 3 years ago.

I'm relatively new to machine learning and the Tensorflow framework. I was trying to take my trained model heavily influenced by the code presented here, using the MNIST handwritten digit dataset and perform inferences on testing examples that I have created. However, I am doing the training on a remote machine with a GPU and am trying to save the data to a directory so that I can transfer the data and inference on a local machine

It seems that I was able to save some of the model with tf.saved_model.simple_save, however, I'm unsure of how to use the saved data to do inferencing and to use the data to make a prediction given a new image. It seems like there are multiple ways to save a model, but I am unsure of what the convention or of what the "correct way" is to do it with the Tensorflow framwork.

So far, this is the line that I think I would need, but am unsure if it is correct.

tf.saved_model.simple_save(sess, 'mnist_model', inputs={'x': self.x}, outputs={'y_': self.y_, 'y_conv':self.y_conv})

If someone could point me in the direction of how to properly save trained models and which variables to use to be able to inference using the saved model, I'd really appreciate it.

August 23, 2021 8:04 PM IST

0
Laksh Nath

126

A way you could do this is to create a tf.train.Saver() object in your graph definition, then use that to save the network to a specified directory. The weights in that directory can then be downloaded from the remote machine to your local one and restored locally. Here is a small example network:

import tensorflow as tf from tensorflow.examples.tutorials.mnist import input_data mnist = input_data.read_data_sets('MNIST_data', one_hot=True) # >>>> Config. Vars <<<< TRAIN_STEPS = 1000 SAVE_EVERY = 100 # >>>> Network <<<< inputs = tf.placeholder(tf.float32, shape=[None, 784]) labels = tf.placeholder(tf.float32, shape=[None, 10]) h1 = tf.layers.dense(inputs, 256, activation=tf.nn.relu, use_bias=True) logits = tf.layers.dense(h1, 10, use_bias=True) predictions = tf.nn.softmax(logits) prediction_ids = tf.argmax(predictions, axis=1) # >>>> Loss & Optimisation <<<< loss = tf.nn.softmax_cross_entropy_with_logits_v2(labels=labels, logits=logits) opt = tf.train.AdamOptimizer().minimize(loss) # >>>> Utilities <<<< init = tf.global_variables_initializer() saver = tf.train.Saver() with tf.Session() as sess: sess.run(init) # >>>> Training - run on remote, comment out locally <<<< for i in range(TRAIN_STEPS): print("Train step {}".format(i), end="\r") batch_data, batch_labels = mnist.train.next_batch(batch_size=128) feed_dict = { inputs: batch_data, labels: batch_labels } l, _ = sess.run([loss, opt], feed_dict=feed_dict) if i % SAVE_EVERY == 0: saver.save(sess, "saved_model/network_weights.ckpt") # >>>> Using the network - run locally to use the network <<< saver.restore(sess, "saved_model/network_weights.ckpt") test_data, test_labels = mnist.test.images, mnist.test.labels feed_dict = { inputs: test_data, labels: test_labels } preds = sess.run(prediction_ids, feed_dict=feed_dict) print(preds)

So once you define the saver in the network, you can use it to save the weights to the specified directory - in this case in the directory "saved_models", which you'll need to have created before you run this particular code.

Restoring the model is as simple as calling saver.restore() then and passing it the session and the path to where your weights are stored. So you can run this code on your remote machine, download the "saved_models" directory to your local machine then run this code with the training part commented out to actually use the model.

August 24, 2021 4:35 PM IST

0

Viaan Prakash

461

There are different ways to save TensorFlow models depending on the API you're using. This guide uses tf.keras, a high-level API to build and train models in TensorFlow. For other approaches see the TensorFlow Save and Restore guide or Saving in eager.

pip install pyyaml h5py  # Required to save models in HDF5 format

TensorFlow
Learn
TensorFlow Core
Tutorials
Was this helpful?

Save and load models
Run in Google Colab
View source on GitHub
Download notebook
Model progress can be saved during and after training. This means a model can resume where it left off and avoid long training times. Saving also means you can share your model and others can recreate your work. When publishing research models and techniques, most machine learning practitioners share:

code to create the model, and
the trained weights, or parameters, for the model
Sharing this data helps others understand how the model works and try it themselves with new data.

Caution: TensorFlow models are code and it is important to be careful with untrusted code. See Using TensorFlow Securely for details.
Options
There are different ways to save TensorFlow models depending on the API you're using. This guide uses tf.keras, a high-level API to build and train models in TensorFlow. For other approaches see the TensorFlow Save and Restore guide or Saving in eager.

Setup
Installs and imports
Install and import TensorFlow and dependencies:


pip install pyyaml h5py  # Required to save models in HDF5 format

import os

import tensorflow as tf
from tensorflow import keras

print(tf.version.VERSION)

2.5.0
Get an example dataset
To demonstrate how to save and load weights, you'll use the MNIST dataset. To speed up these runs, use the first 1000 examples:


(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data()

train_labels = train_labels[:1000]
test_labels = test_labels[:1000]

train_images = train_images[:1000].reshape(-1, 28 * 28) / 255.0
test_images = test_images[:1000].reshape(-1, 28 * 28) / 255.0

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
11493376/11490434 [==============================] - 0s 0us/step
Define a model
Start by building a simple sequential model:


# Define a simple sequential model
def create_model():
  model = tf.keras.models.Sequential([
    keras.layers.Dense(512, activation='relu', input_shape=(784,)),
    keras.layers.Dropout(0.2),
    keras.layers.Dense(10)
  ])

  model.compile(optimizer='adam',
                loss=tf.losses.SparseCategoricalCrossentropy(from_logits=True),
                metrics=[tf.metrics.SparseCategoricalAccuracy()])

  return model

# Create a basic model instance
model = create_model()

# Display the model's architecture
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 512)               401920    
_________________________________________________________________
dropout (Dropout)            (None, 512)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 10)                5130      
=================================================================
Total params: 407,050
Trainable params: 407,050
Non-trainable params: 0
_________________________________________________________________
Save checkpoints during training
You can use a trained model without having to retrain it, or pick-up training where you left off in case the training process was interrupted. The tf.keras.callbacks.ModelCheckpoint callback allows you to continually save the model both during and at the end of training.

Checkpoint callback usage
Create a tf.keras.callbacks.ModelCheckpoint callback that saves weights only during training:


checkpoint_path = "training_1/cp.ckpt"
checkpoint_dir = os.path.dirname(checkpoint_path)

# Create a callback that saves the model's weights
cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_path,
                                                 save_weights_only=True,
                                                 verbose=1)

# Train the model with the new callback
model.fit(train_images, 
          train_labels,  
          epochs=10,
          validation_data=(test_images, test_labels),
          callbacks=[cp_callback])  # Pass callback to training

# This may generate warnings related to saving the state of the optimizer.
# These warnings (and similar warnings throughout this notebook)
# are in place to discourage outdated usage, and can be ignored.

Epoch 1/10
32/32 [==============================] - 1s 7ms/step - loss: 1.1572 - sparse_categorical_accuracy: 0.6500 - val_loss: 0.7440 - val_sparse_categorical_accuracy: 0.7800

Epoch 00001: saving model to training_1/cp.ckpt
Epoch 2/10
32/32 [==============================] - 0s 4ms/step - loss: 0.4429 - sparse_categorical_accuracy: 0.8740 - val_loss: 0.5538 - val_sparse_categorical_accuracy: 0.8260

Epoch 00002: saving model to training_1/cp.ckpt
Epoch 3/10
32/32 [==============================] - 0s 4ms/step - loss: 0.2902 - sparse_categorical_accuracy: 0.9310 - val_loss: 0.5104 - val_sparse_categorical_accuracy: 0.8410

Epoch 00003: saving model to training_1/cp.ckpt
Epoch 4/10
32/32 [==============================] - 0s 4ms/step - loss: 0.2225 - sparse_categorical_accuracy: 0.9430 - val_loss: 0.4639 - val_sparse_categorical_accuracy: 0.8530

Epoch 00004: saving model to training_1/cp.ckpt
Epoch 5/10
32/32 [==============================] - 0s 4ms/step - loss: 0.1649 - sparse_categorical_accuracy: 0.9610 - val_loss: 0.4476 - val_sparse_categorical_accuracy: 0.8610

Epoch 00005: saving model to training_1/cp.ckpt
Epoch 6/10
32/32 [==============================] - 0s 4ms/step - loss: 0.1192 - sparse_categorical_accuracy: 0.9800 - val_loss: 0.4489 - val_sparse_categorical_accuracy: 0.8570

Epoch 00006: saving model to training_1/cp.ckpt
Epoch 7/10
32/32 [==============================] - 0s 4ms/step - loss: 0.0888 - sparse_categorical_accuracy: 0.9870 - val_loss: 0.4190 - val_sparse_categorical_accuracy: 0.8650

Epoch 00007: saving model to training_1/cp.ckpt
Epoch 8/10
32/32 [==============================] - 0s 4ms/step - loss: 0.0674 - sparse_categorical_accuracy: 0.9920 - val_loss: 0.4086 - val_sparse_categorical_accuracy: 0.8670

Epoch 00008: saving model to training_1/cp.ckpt
Epoch 9/10
32/32 [==============================] - 0s 4ms/step - loss: 0.0507 - sparse_categorical_accuracy: 0.9960 - val_loss: 0.4145 - val_sparse_categorical_accuracy: 0.8630

Epoch 00009: saving model to training_1/cp.ckpt
Epoch 10/10
32/32 [==============================] - 0s 4ms/step - loss: 0.0385 - sparse_categorical_accuracy: 0.9990 - val_loss: 0.4140 - val_sparse_categorical_accuracy: 0.8670

Epoch 00010: saving model to training_1/cp.ckpt
<tensorflow.python.keras.callbacks.History at 0x7f213dba4fd0>
This creates a single collection of TensorFlow checkpoint files that are updated at the end of each epoch:


os.listdir(checkpoint_dir)

['cp.ckpt.index', 'cp.ckpt.data-00000-of-00001', 'checkpoint']
As long as two models share the same architecture you can share weights between them. So, when restoring a model from weights-only, create a model with the same architecture as the original model and then set its weights.

Now rebuild a fresh, untrained model and evaluate it on the test set. An untrained model will perform at chance levels (~10% accuracy):


# Create a basic model instance
model = create_model()

# Evaluate the model
loss, acc = model.evaluate(test_images, test_labels, verbose=2)
print("Untrained model, accuracy: {:5.2f}%".format(100 * acc))

32/32 - 0s - loss: 2.3208 - sparse_categorical_accuracy: 0.0990
Untrained model, accuracy:  9.90%
Then load the weights from the checkpoint and re-evaluate:


# Loads the weights
model.load_weights(checkpoint_path)

# Re-evaluate the model
loss, acc = model.evaluate(test_images, test_labels, verbose=2)
print("Restored model, accuracy: {:5.2f}%".format(100 * acc)

August 25, 2021 2:48 PM IST

Tarun Reddy

84
A way you could do this is to create a tf.train.Saver() object in your graph definition, then use that to save the network to a specified directory. The weights in that directory can then be downloaded from the remote machine to your local one and restored locally. Here is a small example network:
```
 # >>>> Training - run on remote, comment out locally <<<<

    for i in range(TRAIN_STEPS):

        print("Train step {}".format(i), end="\r")

        batch_data, batch_labels = mnist.train.next_batch(batch_size=128)

        feed_dict = {
            inputs: batch_data,
            labels: batch_labels
        }

        l, _ = sess.run([loss, opt], feed_dict=feed_dict)

        if i % SAVE_EVERY == 0:

            saver.save(sess, "saved_model/network_weights.ckpt")


    # >>>> Using the network - run locally to use the network <<<

    saver.restore(sess, "saved_model/network_weights.ckpt")

    test_data, test_labels = mnist.test.images, mnist.test.labels

    feed_dict = {
        inputs: test_data,
        labels: test_labels
    }

    preds = sess.run(prediction_ids, feed_dict=feed_dict)

    print(preds)
```
So once you define the saver in the network, you can use it to save the weights to the specified directory - in this case in the directory "saved_models", which you'll need to have created before you run this particular code.

Restoring the model is as simple as calling saver.restore() then and passing it the session and the path to where your weights are stored. So you can run this code on your remote machine, download the "saved_models" directory to your local machine then run this code with the training part commented out to actually use the model.
August 25, 2021 11:19 PM IST

0

Member Sign In

Member Sign In

Create Account

Saving trained Tensorflow model to inference on another machine [duplicate]

Connect With Us