Tag Archives: Docker

[ 翻譯 ] 2018.12.07- 部署機器學習模型更簡單!!如何使用TensorFlow Serving搭配Docker?






Designed by Freepik


Gautam Vasudevan與Abhijit Karmarkar,特此致謝!






Serving machine learning models quickly and easily is one of the key challenges when moving from experimentation into production. Serving machine learning models is the process of taking a trained model and making it available to serve prediction requests. When serving in production, you want to make sure your environment is reproducible, enforces isolation, and is secure. To this end, one of the easiest ways to serve machine learning models is by using TensorFlow Serving with Docker. Docker is a tool that packages software into units called containers that include everything needed to run the software.

當從實驗階段移至生產階段,如何快速、簡單地部署機器學習模型,是其中一項關鍵性挑戰。部署機器學習模型,是取用一個訓練好的模型並使此模型能回應預測請求的過程。當把模型部署至實際生產時,使用者想要確保環境是可重現、獨立且安全的。至此,部署機器學習模型最簡單的方法之一,是使用TensorFlow Serving搭配Docker。什麼是Docker呢?Docker是一項把軟體打包成一個個單元的工具,而這樣的單元被稱作「容器」,它包含了運作該軟體的一切所需。


Since the release of TensorFlow Serving 1.8, we’ve been improving our support for Docker. We now provide Docker images for serving and development for both CPU and GPU models. To get a sense of how easy it is to deploy a model using TensorFlow Serving, let’s try putting the ResNet model into production. This model is trained on the ImageNet dataset and takes a JPEG image as input and returns the classification category of the image.

自TensorFlow Serving 1.8發佈以來,我們持續改善對Docker的支援。現在,我們提供了Docker 映像檔,讓使用者可針對CPU與GPU模型進行部署和開發。為讓讀者們了解,運用TensorFlow Serving部署模型到底有多簡單,讓我們試著使ResNet模型進行生產。這個模型是以ImageNet資料集來訓練,以JPEG圖像作為輸入,並會回傳此圖像的分類結果。


Our example will assume you’re running Linux, but it should work with little to no modification on macOS or Windows as well.



Serving ResNet with TensorFlow Serving and Docker

The first step is to install Docker CE. This will provide you all the tools you need to run and manage Docker containers.

TensorFlow Serving uses the SavedModel format for its ML models. A SavedModel is a language-neutral, recoverable, hermetic serialization format that enables higher-level systems and tools to produce, consume, and transform TensorFlow models. There are several ways to export a SavedModel(including from Keras). For this exercise, we will simply download a pre-trained ResNet SavedModel:

運用TensorFlow Serving與Docker部署ResNet

第一步是安裝Docker CE,它將提供運作並管理Docker容器所需的工具。

針對其下各種機器學習模型,TensorFlow Serving使用SavedModel格式。SavedModel是一種語言中立、可回復、密閉序列化的格式,使高階系統和工具得以產生、運用並轉化TensorFlow模型。匯出SavedModel格式(包括Keras的模型)的方法相當多元,本範例將下載預先訓練好的ResNet SavedModel。

$ mkdir /tmp/resnet
$ curl -s https://storage.googleapis.com/download.tensorflow.org/models/official/20181001_resnet/savedmodels/resnet_v2_fp32_savedmodel_NHWC_jpg.tar.gz | tar --strip-components=2 -C /tmp/resnet -xvz


We should now have a folder inside /tmp/resnet that has our model. We can verify this by running:


$ ls /tmp/resnet


Now that we have our model, serving it with Docker is as easy as pulling the latest released TensorFlow Serving serving environment image, and pointing it to the model:

有了模型之後,要運用Docker部署模型就簡單了,只要使用pull指令取得最新發佈的TensorFlow Serving的serving environment映像檔,並且將serving environment映像檔指向模型即可:

$ docker pull tensorflow/serving
$ docker run -p 8501:8501 --name tfserving_resnet \
--mount type=bind,source=/tmp/resnet,target=/models/resnet \
-e MODEL_NAME=resnet -t tensorflow/serving &
… main.cc:327] Running ModelServer at…
… main.cc:337] Exporting HTTP/REST API at:localhost:8501 …


Breaking down the command line arguments, we are:


  • -p 8501:8501: Publishing the container’s port 8501 (where TF Serving responds to REST API requests) to the host’s port 8501

發佈容器埠8501(TensorFlow Serving在此回應REST API請求)對應到主機埠8501。


  • –name tfserving_resnet: Giving the container we are creating the name “tfserving_resnet” so we can refer to it later



  • –mount type=bind,source=/tmp/resnet,target=/models/resnet: Mounting the host’s local directory (/tmp/resnet) on the container (/models/resnet) so TF Serving can read the model from inside the container.

運用mount命令,將主機的本地目錄(/tmp/resnet)掛載至容器上(/models/resnet)。這樣,TensorFlow Serving可從容器內讀取模型。


  • -e MODEL_NAME=resnet: Telling TensorFlow Serving to load the model named “resnet”

告訴TensorFlow Serving,載入名稱為「resnet」的模型。


  • -t tensorflow/serving: Running a Docker container based on the serving image “tensorflow/serving”



Next, let’s download the python client script, which will send the served model images and get back predictions. We will also measure server response times.


$ curl -o /tmp/resnet/resnet_client.py https://raw.githubusercontent.com/tensorflow/serving/master/tensorflow_serving/example/resnet_client.py


This script will download an image of a cat and send it to the server repeatedly while measuring response times, as seen in the main loop of the script:


# The server URL specifies the endpoint of your server running the ResNet
# model with the name "resnet" and using the predict interface.
SERVER_URL = 'http://localhost:8501/v1/models/resnet:predict'


# Send few actual requests and time average latency.                                                                                                                                                                   
total_time = 0
num_requests = 10
for _ in xrange(num_requests):
    response = requests.post(SERVER_URL, data=predict_request)
total_time += response.elapsed.total_seconds()
prediction = response.json()['predictions'][0]

print('Prediction class: {}, avg latency: {} ms'.format(
prediction['classes'], (total_time*1000)/num_requests))


This script uses the requests module, so you’ll need to install it if you haven’t already. By running this script, you should see output that looks like:


$ python /tmp/resnet/resnet_client.py
Prediction class: 282, avg latency: 185.644 ms


As you can see, bringing up a model using TensorFlow Serving and Docker is pretty straight forward. You can even create your own custom Docker imagethat has your model embedded, for even easier deployment.

從上面的實例可知,運用TensorFlow Serving與Docker部署模型十分直接。讀者甚至可以建置自己的客製Docker映像檔,其中內嵌您的模型,部署起來更加容易。


Improving performance by building an optimized serving binary

Now that we have a model being served in Docker, you may have noticed a log message from TensorFlow Serving that looks like:


既然我們已將模型部署至Docker中,讀者可能已經注意到一則來自TensorFlow Serving的log訊息,如下:

Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA


The published Docker images for TensorFlow Serving are intended to work on as many CPU architectures as possible, and so some optimizations are left out to maximize compatibility. If you don’t see this message, your binary is likely already optimized for your CPU.

針對TensorFlow Serving已發佈的Docker映像檔,其目的是盡可能在更多CPU架構上運作,所以會放棄某些最佳化措施來提高相容性。如果您並未看見上述訊息,代表您的二元檔案可能已針對您的CPU進行了最佳化。


Depending on the operations your model performs, these optimizations may have a significant effect on your serving performance. Thankfully, putting together your own optimized serving image is straightforward.



First, we’ll want to build an optimized version of TensorFlow Serving. The easiest way to do this is to build the official Tensorflow Serving development environment Docker image. This has the nice property of automatically generating an optimized TensorFlow Serving binary for the system the image is building on. To distinguish our created images from the official images, we’ll be prepending $USER/ to the image names. Let’s call this development image we’re building $USER/tensorflow-serving-devel:

首先,我們要建置一個TensorFlow Serving經過最佳化的版本。最簡單的方式,是建置官方的TensorFlow Serving開發環境的Docker映像檔。這個映像檔有個很棒的屬性,就是可針對映像檔立基的系統,自動產生一個經過最佳化的TensorFlow二元檔。為區分我們自己建置的映像檔和官方的映像檔,我們將在自己建置的映像檔名稱前加上「$USER/」,讓我們把它命名為「$USER/tensorflow-serving-devel」:

$ docker build -t $USER/tensorflow-serving-devel \
-f Dockerfile.devel \ 


Building the TensorFlow Serving development image may take a while, depending on the speed of your machine. Once it’s done, let’s build a new serving image with our optimized binary and call it $USER/tensorflow-serving:

建置自己的TensorFlow Serving開發映像檔根據您的電腦規格可能會花上一段時間。一旦建置完成,就能運用最佳化後的二元檔來建置一個新的部署映像檔,並將它命名為「$USER/tensorflow-serving」:

$ docker build -t $USER/tensorflow-serving \
--build-arg TF_SERVING_BUILD_IMAGE=$USER/tensorflow-serving-devel \ https://github.com/tensorflow/serving.git#:tensorflow_serving/tools/docker


Now that we have our new serving image, let’s start the server again:


$ docker kill tfserving_resnet
$ docker run -p 8501:8501 --name tfserving_resnet \
  --mount type=bind,source=/tmp/resnet,target=/models/resnet \
  -e MODEL_NAME=resnet -t $USER/tensorflow-serving &


And finally run our client:


$ python /tmp/resnet/resnet_client.py
Prediction class: 282, avg latency: 84.8849 ms


On our machine, we saw a speedup of over 100ms (119%) on average per prediction with our native optimized binary. Depending on your machine (and model), you may see different results.



Finally, feel free to kill the TensorFlow Serving container:

最後,請用「kill」指令來終止TensorFlow Serving容器:

$ docker kill tfserving_resnet


Now that you have TensorFlow Serving running with Docker, you can deploy your machine learning models in containers easily while maximizing ease of deployment and performance.

既然,您已讓TensorFlow Serving與Docker一同運作,自然能輕鬆地部署機器學習模型至容器內,不但能輕鬆部署,效能也能最大化。


Please read our Using TensorFlow Serving via Docker documentation for more details, and star our GitHub project to stay up to date.