QBoard » Artificial Intelligence & ML » AI and ML - Python » How to save labels during training to a file so that we can use them again during inference?

How to save labels during training to a file so that we can use them again during inference?

  • Suppose we are training a keras model on 1000 images of 3 classes and the labels list is ["label1", "label3", "label2", ......"label3"]. How can we save these labels to a file and use them again during predictions to get the label name from the prediction array?
      January 11, 2021 5:19 PM IST
    0
  • We can save those class names to a numpy file

    from sklearn.preprocessing import LabelEncoder
    le = LabelEncoder()
    lab = le.fit_transform(labels)
    unique_labels = le.classes_
    np.save("labels.npy", unique_labels)
    num_labels = len(unique_labels)
    
    labels = keras.utils.to_categorical(lab)
    print(labels)

    In the above code, we are using sklearn LabelEncoder to convert text labels to integers and then using keras.utils.to_categorical to convert the integer labels to numpy matrix of binary values. The label names are saved to a numpy file 'labels.npy'.

    During inference, the label names can be reread from the file to get class from model predictions

    unique_labels = np.load("labels.npy", allow_pickle=True)
    yhat = model.predict(images)
    yhat = np.array(yhat)
    indices = np.argmax(yhat, axis=1)
    scores = yhat[np.arange(len(yhat)), indices]
    predicted_categories = [unique_labels for i in indices]

     

      January 11, 2021 5:26 PM IST
    0
  • Here is a simple example using Tensorflow 2.0 SavedModel format (which is the recommended format, according to the docs) for a simple MNIST dataset classifier, using Keras functional API without too much fancy going on:

    # Imports
    import tensorflow as tf
    from tensorflow.keras.layers import Input, Dense, Flatten
    from tensorflow.keras.models import Model
    import matplotlib.pyplot as plt
    
    # Load data
    mnist = tf.keras.datasets.mnist # 28 x 28
    (x_train,y_train), (x_test, y_test) = mnist.load_data()
    
    # Normalize pixels [0,255] -> [0,1]
    x_train = tf.keras.utils.normalize(x_train,axis=1)
    x_test = tf.keras.utils.normalize(x_test,axis=1)
    
    # Create model
    input = Input(shape=(28,28), dtype='float64', name='graph_input')
    x = Flatten()(input)
    x = Dense(128, activation='relu')(x)
    x = Dense(128, activation='relu')(x)
    output = Dense(10, activation='softmax', name='graph_output', dtype='float64')(x)
    model = Model(inputs=input, outputs=output)
    
    model.compile(optimizer='adam',
                 loss='sparse_categorical_crossentropy',
                 metrics=['accuracy'])
    
    # Train
    model.fit(x_train, y_train, epochs=3)
    
    # Save model in SavedModel format (Tensorflow 2.0)
    export_path = 'model'
    tf.saved_model.save(model, export_path)
    
    # ... possibly another python program 
    
    # Reload model
    loaded_model = tf.keras.models.load_model(export_path) 
    
    # Get image sample for testing
    index = 0
    img = x_test[index] # I normalized the image on a previous step
    
    # Predict using the signature definition (Tensorflow 2.0)
    predict = loaded_model.signatures["serving_default"]
    prediction = predict(tf.constant(img))
    
    # Show results
    print(np.argmax(prediction['graph_output']))  # prints the class number
    plt.imshow(x_test[index], cmap=plt.cm.binary)  # prints the image

    What is serving_default? It's the name of the signature def of the tag you selected (in this case, the default serve tag was selected). Also, here explains how to find the tag's and signatures of a model using saved_model_cli. Disclaimers This is just a basic example if you just want to get it up and running, but is by no means a complete answer - maybe I can update it in the future. I just wanted to give a simple example using the SavedModel in TF 2.0 because I haven't seen one, even this simple, anywhere. @Tom's answer is a SavedModel example, but it will not work on Tensorflow 2.0, because unfortunately there are some breaking changes. @Vishnuvardhan Janapati's answer says TF 2.0, but it's not for SavedModel format.

      January 3, 2022 2:00 PM IST
    0
  • You should have two files in your current working directory:

    • model.pkl
    • scaler.pkl

    # load model and scaler and make predictions on new data
    from sklearn.datasets import make_blobs
    from sklearn.model_selection import train_test_split
    from sklearn.metrics import accuracy_score
    from pickle import load
    # prepare dataset
    X, y = make_blobs(n_samples=100, centers=2, n_features=2, random_state=1)
    # split data into train and test sets
    _, X_test, _, y_test = train_test_split(X, y, test_size=0.33, random_state=1)
    # load the model
    model = load(open('model.pkl', 'rb'))
    # load the scaler
    scaler = load(open('scaler.pkl', 'rb'))
    # check scale of the test set before scaling
    print('Raw test set range')
    for i in range(X_test.shape[1]):
    	print('>%d, min=%.3f, max=%.3f' % (i, X_test[:, i].min(), X_test[:, i].max()))
    # transform the test dataset
    X_test_scaled = scaler.transform(X_test)
    print('Scaled test set range')
    for i in range(X_test_scaled.shape[1]):
    	print('>%d, min=%.3f, max=%.3f' % (i, X_test_scaled[:, i].min(), X_test_scaled[:, i].max()))
    # make predictions on the test set
    yhat = model.predict(X_test_scaled)
    # evaluate accuracy
    acc = accuracy_score(y_test, yhat)
    print('Test Accuracy:', acc)​
      January 5, 2022 2:07 PM IST
    0