【翻譯】2018.12.07- 部署機器學習模型更簡單!!如何使用TensorFlow Serving搭配Docker?






Designed by Freepik


Gautam Vasudevan與Abhijit Karmarkar,特此致謝!





  Serving machine learning models quickly and easily is one of the key challenges when moving from experimentation into production. Serving machine learning models is the process of taking a trained model and making it available to serve prediction requests. When serving in production, you want to make sure your environment is reproducible, enforces isolation, and is secure. To this end, one of the easiest ways to serve machine learning models is by using TensorFlow Serving with Docker. Docker is a tool that packages software into units called containers that include everything needed to run the software. 當從實驗階段移至生產階段,如何快速、簡單地部署機器學習模型,是其中一項關鍵性挑戰。部署機器學習模型,是取用一個訓練好的模型並使此模型能回應預測請求的過程。當把模型部署至實際生產時,使用者想要確保環境是可重現、獨立且安全的。至此,部署機器學習模型最簡單的方法之一,是使用TensorFlow Serving搭配Docker。什麼是Docker呢?Docker是一項把軟體打包成一個個單元的工具,而這樣的單元被稱作「容器」,它包含了運作該軟體的一切所需。   Since the release of TensorFlow Serving 1.8, we’ve been improving our support for Docker. We now provide Docker images for serving and development for both CPU and GPU models. To get a sense of how easy it is to deploy a model using TensorFlow Serving, let’s try putting the ResNet model into production. This model is trained on the ImageNet dataset and takes a JPEG image as input and returns the classification category of the image. 自TensorFlow Serving 1.8發佈以來,我們持續改善對Docker的支援。現在,我們提供了Docker 映像檔,讓使用者可針對CPU與GPU模型進行部署和開發。為讓讀者們了解,運用TensorFlow Serving部署模型到底有多簡單,讓我們試著使ResNet模型進行生產。這個模型是以ImageNet資料集來訓練,以JPEG圖像作為輸入,並會回傳此圖像的分類結果。   Our example will assume you’re running Linux, but it should work with little to no modification on macOS or Windows as well. 下面的實例假定您使用Linux,但只需要稍微修改(甚至不用),也能在macOS或Windows上運作。  

Serving ResNet with TensorFlow Serving and Docker

The first step is to install Docker CE. This will provide you all the tools you need to run and manage Docker containers. TensorFlow Serving uses the SavedModel format for its ML models. A SavedModel is a language-neutral, recoverable, hermetic serialization format that enables higher-level systems and tools to produce, consume, and transform TensorFlow models. There are several ways to export a SavedModel(including from Keras). For this exercise, we will simply download a pre-trained ResNet SavedModel:

運用TensorFlow Serving與Docker部署ResNet

第一步是安裝Docker CE,它將提供運作並管理Docker容器所需的工具。 針對其下各種機器學習模型,TensorFlow Serving使用SavedModel格式。SavedModel是一種語言中立、可回復、密閉序列化的格式,使高階系統和工具得以產生、運用並轉化TensorFlow模型。匯出SavedModel格式(包括Keras的模型)的方法相當多元,本範例將下載預先訓練好的ResNet SavedModel。

$ mkdir /tmp/resnet
$ curl -s https://storage.googleapis.com/download.tensorflow.org/models/official/20181001_resnet/savedmodels/resnet_v2_fp32_savedmodel_NHWC_jpg.tar.gz | tar --strip-components=2 -C /tmp/resnet -xvz

  We should now have a folder inside /tmp/resnet that has our model. We can verify this by running: 現在,在/tmp/resnet中有一個資料夾,其中包含了我們的模型。請執行以下指令進行驗證:

$ ls /tmp/resnet

  Now that we have our model, serving it with Docker is as easy as pulling the latest released TensorFlow Serving serving environment image, and pointing it to the model: 有了模型之後,要運用Docker部署模型就簡單了,只要使用pull指令取得最新發佈的TensorFlow Serving的serving environment映像檔,並且將serving environment映像檔指向模型即可:

$ docker pull tensorflow/serving
$ docker run -p 8501:8501 --name tfserving_resnet \
--mount type=bind,source=/tmp/resnet,target=/models/resnet \
-e MODEL_NAME=resnet -t tensorflow/serving &
… main.cc:327] Running ModelServer at…
… main.cc:337] Exporting HTTP/REST API at:localhost:8501 …

  Breaking down the command line arguments, we are: 以下介紹各個命令列參數:

  • -p 8501:8501: Publishing the container’s port 8501 (where TF Serving responds to REST API requests) to the host’s port 8501

發佈容器埠8501(TensorFlow Serving在此回應REST API請求)對應到主機埠8501。

  • –name tfserving_resnet: Giving the container we are creating the name “tfserving_resnet” so we can refer to it later


  • –mount type=bind,source=/tmp/resnet,target=/models/resnet: Mounting the host’s local directory (/tmp/resnet) on the container (/models/resnet) so TF Serving can read the model from inside the container.

運用mount命令,將主機的本地目錄(/tmp/resnet)掛載至容器上(/models/resnet)。這樣,TensorFlow Serving可從容器內讀取模型。

  • -e MODEL_NAME=resnet: Telling TensorFlow Serving to load the model named “resnet”

告訴TensorFlow Serving,載入名稱為「resnet」的模型。

  • -t tensorflow/serving: Running a Docker container based on the serving image “tensorflow/serving”

根據部署映像檔「tensorflow/serving」來運作Docker容器。   Next, let’s download the python client script, which will send the served model images and get back predictions. We will also measure server response times. 接著,讓我們下載python客戶端腳本,這個腳本將傳送部署模型的映像檔,並且取回預測結果。我們也將ˋ計算伺服器的回應時間。

$ curl -o /tmp/resnet/resnet_client.py https://raw.githubusercontent.com/tensorflow/serving/master/tensorflow_serving/example/resnet_client.py

  This script will download an image of a cat and send it to the server repeatedly while measuring response times, as seen in the main loop of the script: 這個腳本將下載一張貓咪的圖像,並且重複傳送這張圖像至伺服器並計算回應時間,如腳本中的主迴圈所示:

# The server URL specifies the endpoint of your server running the ResNet
# model with the name "resnet" and using the predict interface.
SERVER_URL = 'http://localhost:8501/v1/models/resnet:predict'


# Send few actual requests and time average latency.                                                                                                                                                                   
total_time = 0
num_requests = 10
for _ in xrange(num_requests):
    response = requests.post(SERVER_URL, data=predict_request)
total_time += response.elapsed.total_seconds()
prediction = response.json()['predictions'][0]

print('Prediction class: {}, avg latency: {} ms'.format(
prediction['classes'], (total_time*1000)/num_requests))

  This script uses the requests module, so you’ll need to install it if you haven’t already. By running this script, you should see output that looks like: 這個腳本使用requests模組,所以如果讀者尚未安裝這個模組,將需要安裝它。執行後應該會看到類似下面的輸出結果:

$ python /tmp/resnet/resnet_client.py
Prediction class: 282, avg latency: 185.644 ms

  As you can see, bringing up a model using TensorFlow Serving and Docker is pretty straight forward. You can even create your own custom Docker imagethat has your model embedded, for even easier deployment. 從上面的實例可知,運用TensorFlow Serving與Docker部署模型十分直接。讀者甚至可以建置自己的客製Docker映像檔,其中內嵌您的模型,部署起來更加容易。  

Improving performance by building an optimized serving binary

Now that we have a model being served in Docker, you may have noticed a log message from TensorFlow Serving that looks like:


既然我們已將模型部署至Docker中,讀者可能已經注意到一則來自TensorFlow Serving的log訊息,如下:

Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA

  The published Docker images for TensorFlow Serving are intended to work on as many CPU architectures as possible, and so some optimizations are left out to maximize compatibility. If you don’t see this message, your binary is likely already optimized for your CPU. 針對TensorFlow Serving已發佈的Docker映像檔,其目的是盡可能在更多CPU架構上運作,所以會放棄某些最佳化措施來提高相容性。如果您並未看見上述訊息,代表您的二元檔案可能已針對您的CPU進行了最佳化。   Depending on the operations your model performs, these optimizations may have a significant effect on your serving performance. Thankfully, putting together your own optimized serving image is straightforward. 取決於您的模型所執行的運算,這些最佳化措施對您模型的部署效能可能有相當顯著的影響。好在,要把最佳化後的部署映像檔組合起來相當簡單。   First, we’ll want to build an optimized version of TensorFlow Serving. The easiest way to do this is to build the official Tensorflow Serving development environment Docker image. This has the nice property of automatically generating an optimized TensorFlow Serving binary for the system the image is building on. To distinguish our created images from the official images, we’ll be prepending $USER/ to the image names. Let’s call this development image we’re building $USER/tensorflow-serving-devel: 首先,我們要建置一個TensorFlow Serving經過最佳化的版本。最簡單的方式,是建置官方的TensorFlow Serving開發環境的Docker映像檔。這個映像檔有個很棒的屬性,就是可針對映像檔立基的系統,自動產生一個經過最佳化的TensorFlow二元檔。為區分我們自己建置的映像檔和官方的映像檔,我們將在自己建置的映像檔名稱前加上「$USER/」,讓我們把它命名為「$USER/tensorflow-serving-devel」:

$ docker build -t $USER/tensorflow-serving-devel \
-f Dockerfile.devel \ 

  Building the TensorFlow Serving development image may take a while, depending on the speed of your machine. Once it’s done, let’s build a new serving image with our optimized binary and call it $USER/tensorflow-serving: 建置自己的TensorFlow Serving開發映像檔根據您的電腦規格可能會花上一段時間。一旦建置完成,就能運用最佳化後的二元檔來建置一個新的部署映像檔,並將它命名為「$USER/tensorflow-serving」:

$ docker build -t $USER/tensorflow-serving \
--build-arg TF_SERVING_BUILD_IMAGE=$USER/tensorflow-serving-devel \ https://github.com/tensorflow/serving.git#:tensorflow_serving/tools/docker

  Now that we have our new serving image, let’s start the server again: 既然我們已經擁有自己的部署映像檔了,再次啟動伺服器:

$ docker kill tfserving_resnet
$ docker run -p 8501:8501 --name tfserving_resnet \
  --mount type=bind,source=/tmp/resnet,target=/models/resnet \
  -e MODEL_NAME=resnet -t $USER/tensorflow-serving &

  And finally run our client: 最後,運作前面提到的python客戶端腳本:

$ python /tmp/resnet/resnet_client.py
Prediction class: 282, avg latency: 84.8849 ms

  On our machine, we saw a speedup of over 100ms (119%) on average per prediction with our native optimized binary. Depending on your machine (and model), you may see different results. 在我們的電腦上運用原生的最佳化二元檔,每次預測的時間平均提升了100ms以上(119%)。不過取決於您的電腦和模型,結果也許不盡相同。   Finally, feel free to kill the TensorFlow Serving container: 最後,請用「kill」指令來終止TensorFlow Serving容器:

$ docker kill tfserving_resnet

  Now that you have TensorFlow Serving running with Docker, you can deploy your machine learning models in containers easily while maximizing ease of deployment and performance. 既然,您已讓TensorFlow Serving與Docker一同運作,自然能輕鬆地部署機器學習模型至容器內,不但能輕鬆部署,效能也能最大化。   Please read our Using TensorFlow Serving via Docker documentation for more details, and star our GitHub project to stay up to date. 如果想了解更多細節,請閱讀我們的教學文件,並關注我們的GitHub專案來得到最新訊息。