QBoard » Artificial Intelligence & ML » AI and ML - Tensorflow » How to fit TensorFlow Serving Client API in a Python lambda?

How to fit TensorFlow Serving Client API in a Python lambda?

  • I'm trying to build a Python Lambda to send images to TensorFlow Serving for inferences. I have at least two dependencies: CV2 and tensorflow_serving.apis. I've run multiple tutorials showing it's possible to run tensorflow in a lambda, but they provide the package to install and don't explain how they got it to fit in the limit of less than 256MB unzipped.

    I've tried following the official instructions for packaging but just this downloads 475MB of dependencies:

    $ python -m pip install tensorflow-serving-api --target .
    Collecting tensorflow-serving-api
      Downloading https://files.pythonhosted.org/packages/79/69/1e724c0d98f12b12f9ad583a3df7750e14ec5f06069aa4be8d75a2ab9bb8/tensorflow_serving_api-1.12.0-py2.py3-none-any.whl
    ...
    $ du -hs .
    475M    .

     

    I see that others have fought this dragon and won (1) (2) by doing contortions to rip out all unused libraries from all dependencies or compile from scratch. But such extremes strike me as complicated and hopefully outdated in a world where data science and lambdas are almost mainstream. Is it true that so few people are using TensorFlow Serving with Python that I'll have to jump through such hoops to get one working as a Lambda? Or is there an easier way?

     
      January 10, 2022 12:39 PM IST
    0
  • TL;DR: min-tfs-client is a minimal python client for TensorFlow Serving that allows you to use serverless services (e.g. AWS Lambda) that have deployment size limits (250 Mb uncompressed, at time of writing). It works by removing TensorFlow as a dependency to create tensor protobufs for the prediction request. Feel free to check out the repository here if you’d like to contribute

    TensorFlow Serving (TFS) is a serving system for machine learning (ML) models, primarily used for models built in TensorFlow. In this blog post we introduce a lightweight python client for TFS that allows python apps to make gRPC requests to an instance of TFS without having to install TensorFlow.

    Serving ML models in production is often the last and also trickiest part of completing the development of an ML product. In this phase of development Data Scientists, ML Engineers and Software Engineers must all collaborate to integrate the ML stack with the broader product stack. TFS is a gRPC/HTTP server written in C++ and distributed by Google to accelerate the deployment of TensorFlow models to production environments. In our experience, it provides a robust, highly scalable, and reasonably configurable platform to serve models. We’ve used it successfully on the Answer Bot product, running our semantic models on both CPU and GPU -based infrastructure.

    Although TFS now supports REST, we elected to use gRPC internally because 1) REST wasn’t available when we were productionizing TensorFlow models, and 2) The use of protobufs in gRPC requests makes API contract management slightly more robust. There are other benefits associated with using gRPC, but we won’t be re-litigating those reasons in this blog post. That being said, using gRPC also introduces complexities that are directly tied to its benefits; specifically the requirement that API contract-defining protobufs must be compiled and circulated to all clients in order to communicate with the server.

    The protobufs required to communicate with TFS using python are contained within the package tensorflow-serving-api, distributed by the TensorFlow team at Google. These protobufs have been compiled into Python classes that can be instantiated and loaded with attributes that correspond to protobuf fields. Collectively, these protobufs contain the definition of objects including the payload of a TFS request (prediction_service_pb2_grpc.py), a tensor in TensorFlow (tensor.proto), and even the shape of a tensor in TensorFlow (tensor_shape.proto). An important property of protobufs is their ability to support interdependency — for instance, loading a TFS request protobuf depends on TensorFlow’s tensor protobuf, which in turn depends on the tensor shape protobuf, to handle the shape of the tensor being loaded.

    These dependencies create a situation where the tensorflowpython package is a dependency for tensorflow-serving-api, as it necessarily provides the protobuf definitions for a tensor in order to make a request, and deserialize the response from TFS. TensorFlow provides this functionality through the tf.make_tensor_proto and tf.make_nd_array functions. However, installing the tensorflow python package just to gain access to the protobuf definitions for tensors is far from ideal; at the time of writing, TensorFlow 2.1.0 requires 544 Mb for tensorflow_core, and another 10 Mb for Tensorboard. The tensorflow package contains all the code required to train, monitor, evaluate, and export models — code that is not required to communicate with TFS in a production environment.

    The size of the TensorFlow package creates complications when trying to build apps that need to communicate with TFS. In particular, it greatly increases the size of a Docker container, and in the case of serverless function services like AWS Lambda, it results in a virtual environment that exceeds the maximum deployment size. This makes it impossible to deploy a lambda that communicates with TFS by installing tensorflow-serving-api.

      February 11, 2022 12:22 PM IST
    0
  •  

    The goal is to not actually have tensorflow on the client side, as it uses a ton of space but isn't really needed for inference. Unfortunately the tensorflow-serving-api requires the entire tensorflow package, which by itself is too big to fit into a lambda. What you can do instead is build your own client instead of using that package. This involves using the grpcio-tools package for the protobuf communication, and the various .proto files from tensorflow and tensorflow serving. Specifically you'll want to package up these files- 

    tensorflow/serving/  
      tensorflow_serving/apis/model.proto
      tensorflow_serving/apis/predict.proto
      tensorflow_serving/apis/prediction_service.proto
    tensorflow/tensorflow/  
      tensorflow/core/framework/resource_handle.proto
      tensorflow/core/framework/tensor_shape.proto
      tensorflow/core/framework/tensor.proto
      tensorflow/core/framework/types.proto

     

    From there you can generate the python protobuf files.

     
    This post was edited by Vaibhav Mali at January 14, 2022 1:59 PM IST
      January 14, 2022 1:58 PM IST
    0