Intro

Perhaps the following conversation is familiar sounding to you, or unfortunately, something you have experienced before or even struggling with currently:

   Frank (the ML person):  Hey Jane, how things going with getting
                           my ML project launched into production?

Jane (the DevOps person):  (sighs) It's going. We have a lot on our plate
                           right now so it's going to take more time.

                   Frank:  (trying not to get upset) But that's what
                           you said like a month or two ago..

                    Jane:  (cooly) Well, there is a lot that needs
                           to happen before your project can be
                           production-ready. We have to set up
                           the build pipeline to package up your
                           python code, write tests, add continuous
                           integration, sort out logging and monitoring.
                           Oh, and there's error handling,
                           security audits, auto-scaling, load balancing.
                           After that, go through integration and user
                           acceptance testing, before we can launch
                           into production.

                   Frank:  (frustration creeping in) That sounds
                           like a lot of hoops to jump through and
                           I have no idea if all that is necessary to
                           getting it launched. Can it be make simpler?
                           I was talking to a ML friend of mine and he
                           mentioned they launched his project using
                           <name some shiny object here>. Can't we _just_
                           use that and get mine launched also?

                    Jane:  (frowns and replies tensely) Let's take this
                           offline. My team just need more time to work
                           through the steps to get the project
                           launched.

                   Frank:  (steaming)...

Whether you have been on one side of the conversation or the other, interactions like this is unpleasant, and worse part is, this ML project hadn’t gotten any closer to being deployed. The DevOps side has their good reasons, while the ML side is frustrated that they are being stonewalled from getting their work delivered, and it feels like there’s little they can do.

Can such a situation be avoided or at least managed better? Is there anything the ML side can do to help the process along?

The Lowdown

The reality is the the closer the two sides can come towards their handoff point, the smoother and faster the process would be. We will get into more details on where these rendezvous points may be, but conceptually they are that long list of steps that Jane said that’s needed in order to get a ML model (or any new projects for that matter) deployed. The more items the ML side can take care of (but up to a certain point), the less work for the devops folks and thus shortens the process.

This doesn’t mean the ML side needs to become devops in order to get projects deployed. It’s more to structure your projects in a way that’s similar to other software packages out there, such that they “blend in” with other software being deployed. This would greatly reduce the overhead of conforming a new package, as well as lower the anxiety for on-boarding an unknown quantity.

The good news is, the work involved isn’t that much, once you become familiar with steps. In this and follow on posts, We will walk you through examples on packaging ML models such that they sail through your deployment workflows just like the rest of them.

Let’s get started.

Note: The resulting code for this example ML package can be found here.

Step 0: The Plan

Essentially, we want to make Frank’s ML model as easy to use for his clients (Jane in this example) as this pseudocode:

import frank's-ml-model
create an instance of frank's-ml-model
load an image (or whatever the input might be)
get predictions of the image using frank's-ml-model
do something with the predictions

As a matter of fact, at the end of this tutorial we’d turn the above pseudocode into Python code that you’d add to the README page of the ML model, such that your clients can try out your model by simple copy-and-paste the code to get started.

The larger goal is to make Frank’s ML model behave no differently than other Python packages, such that the devops team can deploy it like any others they have done.

Step 1: Gather the Ingredients

Let’s assume that you have reached the following point in researching and refining a ML model for a particular use case, such that you have:

A ML model trained to an accuracy that meets your requirements
An input-output format that supports the typical use case of your pipeline
Any supporting code for pre- and post-process the input and output data

To give an example, an image classification model would consists of

A ResNet-50 model you trained/fine-tuned to classify 23 types of fruits
Inputs are RGB image matrix (vs URLs or files), and outputs are the top k classes and prediction probabilities (vs an array of 23 floats)
Some Python functions to resize, crop, and normalize the input images

A different scenario would be to predict the home price (1 integer) for a given set of metadata of home listings, formatted as a CSV row (or JSON/XML or Pandas frame, etc). The specifics doesn’t matter, but the point is to decide what are the input and its format you need to make your ML models function correctly, and how the output would be returned to the requesters so they can use the results.

The nice thing about ML is that usually this input-output pair is well understood, since it is already expressed in the training data. You’d just need to choose the formats for the inputs and outputs that work for the consumer of your model. With that, you would have defined an interface to the outside world, and everything else is internal to your domain. This allows for you to update or even swap out your ML model later on, without the outside world needing to change anything or even noticing!

Step 2: Pick a Box for the Package

This used to be the step that’s a pain in the butt, but nowadays there’s a de facto way to to package Python software. If you have ever use pip to install Python packages, including Numpy, Tensorflow, and PyTorch, you’d know how easy they are to install. That’s what we will be creating for Frank’s ML model.

Creating Python packages isn’t super complicated, but does involve several manual steps. Even better, people have created templates that we can use to generate the boilerplate code so we need not become packaging experts. For this tutorial, we will use a simple template based on Cookiecutter I modified for PyTorch models, named cookiecutter-pytorch-basic, although it can be used for non-PyTorch projects without much adaptation. The process of Cookiecutter templates is that it would ask for some essential details about the package, clone the template, and generate an empty project structure customized for packaging our model, a.k.a., the box. The steps are pretty simple:

# install the cookiecutter tool
pip install -U cookiecutter
# generate the Python package project
cookiecutter https://github.com/ml-illustrated/cookiecutter-pytorch-basic

You’d be asked for some details on the package, most important the name of it. This should be a unique name so in the future, people would install it via pip <name_of_you_ml_package>. Here’s an example:

project_name [Name of your Pytorch package]: franks-ml-model
author_full_name []: Frank Torch
author_email []: frank@my-co.com
github_username []: frank_torch
project_slug [franks_ml_model]:
project_short_description []: Model for classifying fruits
pypi_username [frank_torch]:
version [0.0.1]:
create_author_file [y]:
Select open_source_license:
1 - MIT license
2 - BSD license
3 - ISC license
4 - Apache Software License 2.0
5 - GNU General Public License v3
6 - Not open source
Choose from 1, 2, 3, 4, 5, 6 [1]: 6
package_name [franks_ml_model]:
package_url [https://github.com/frank_torch/franks_ml_model]:

Cookiecutter will create the following directory structure:

franks_ml_model/
├── .editorconfig
├── .gitignore
├── LICENSE
├── MANIFEST.in
├── README.md
├── franks_ml_model       <-- this is the empty box
│   └── __init__.py
├── requirements.txt
├── requirements_dev.txt
├── setup.cfg
└── setup.py

This is our “box” readied to be populated with our ML model. We will customized it further later on, but by and large, these files are preconfigured and ready to go so Cookiecutter saved us the time and energy in setting up the packaging project.

Step 3: Place the Content into the Box

Now let’s copy/move your ML model definition into this project and place them into franks_ml_model/franks_ml_model/. For this tutorial we will use an example PyTorch model from the excellent project EfficientNet-PyTorch from Luke Melas-Kyriazi. It’s very self-contained and has pretrained models so it makes for an ideal project for this tutorial. This would simulate the code for Frank’s hypothetical ML model prior to being packaged.

git clone https://github.com/lukemelas/EfficientNet-PyTorch

cp EfficientNet-PyTorch/efficientnet_pytorch/*.py franks_ml_model/franks_ml_model/

With that, we technically have finished packaging our ML model. Quite easy, isn’t it? At this point you can run the build command, upload the package, and it’d be ready (kinda) for others like Jane to use.

However, this package would not meet the interface requirement for our input-output pair. Thus, let’s do a bit more work to make it as dead simple as possible.

Step 4: Add Some Bubble Wrap to Encapsulate the Content

What we have completed so far is a package that your clients could do the following (what the original interface of EfficientNet is):

from efficientnet_pytorch import EfficientNet
model = EfficientNet.from_pretrained('efficientnet-b0')

model.eval()
with torch.no_grad():
    outputs = model(img)

While that seems simple enough, the model requires the image to be a tensor that’s been properly transformed first. However, we don’t necessarily want every client to have to know what this transform entails, since this is an internal detail we’d like to hide from them. Same thing goes for the outputs of the model. It’s a tensor of n floats, which we can convert into a friendlier format. This is what “wrapping” refers to, to create a simpler interface and hide internal details.

The process is relatively simple: we could either modify the original EfficientNet implementation and add convenience methods, or create a new (small) class for the wrapper. Without going into the details, it is usually cleaner to do the latter.

At a high level, we will create a new “infer” class (or “predict” if you prefer) that gets initialized with the name of the architecture (e.g., efficientnet-b0). It would expose only a simple method the client can call to get class predictions for an image, conforming to the interface we decided earlier. Everything else is considered implementation details within the model.

The simplified code for this new class that we’d put into franks_ml_model/infer.py would be along the lines of:

class EfficientNetInfer(object):
    def __init__( self, architecture_name='efficientnet-b0' ):
        self.effnet_model = EfficientNet.from_pretrained(architecture_name)
        self.transforms = transforms.Compose(...)

    def infer_image( self, fn_image, topk=5 ):
        batch_image_tensor = self.load_and_transform_image( fn_image )
        return self.infer_batch_image_tensor( batch_image_tensor, topk=topk )

    ...

Omitting the internal details of the EfficientNetInfer class, the point is that to the outside world, they just need to figure out 1) how to instantiate the model, 2) get predictions, and 3) interpret the results.

To make this super easy, we can add a demo command line tool to this class so people can try out the model without even copying and pasting code, i.e., via python franks_ml_model/infer.py <an_image>.jpg:

if __name__ == '__main__':
    import sys
    fn_image = sys.argv[1]

    model = EfficientNetInfer()
    top_predictions = model.infer_image( fn_image )

    for row in top_predictions:
        print( row )

Can’t get much simpler than this for the clients! So the desired interface for our ML model has now been implemented by the wrapper class EfficientNetInfer, so we can move onto packaging it.

Step 5: Seal and Add Shipping Labels

The last step is to build the package so others can install it via pip. Fortunately all of the hard work was taken care of by Cookiecutter, so we just need to build the package via:

cd franks_ml_model
python setup.py bdist_wheel

and you’ll see output similar to the following:

running bdist_wheel
running build
running build_py
creating build
creating build/lib
creating build/lib/franks_ml_model
copying franks_ml_model/__init__.py -> build/lib/franks_ml_model
...

and the end result would be a .whl under the dist directory, such as

ls -l dist/
total 8
-rw-r--r--  1 gerald  staff  1692 Feb 23 09:13 franks_ml_model-0.0.1-py2.py3-none-any.whl

An optional but nice touch is to add to the README file of this project so your client can test out this package without digging through code. This effectively converts that pseudocode we outline in Step 0 into functional Python code for this model:

from franks_ml_model import EfficientNetInfer
fn_image='<path_to_example_image>.jpg'
model = EfficientNetInfer() # defaults to 'efficientnet-b0'
top_predictions = model.infer_image( fn_image ) # defaults to topk=5
print(top_predictions)

Step 6: Test out the Package Locally

To be thorough, an optional step is to simulate your clients installing and trying out your packaged ML model. This will not be fail-proof since everyone computer can be slightly different (or a lot), but it better work on yours!

To test this out, let’s create a clean virtualenv, install our package and make sure our sample code works:

cd .. # or somewhere else outside the project source
virtualenv -p python3 ~/.virtualenvs/test_franks_model
source ~/.virtualenvs/test_franks_model/bin/activate
pip install franks_ml_model/dist/franks_ml_model-0.0.1-py2.py3-none-any.whl

and if you run pip freeze, you should see that our package was successfully installed:

# pip freeze
franks-ml-model==0.0.1
numpy==1.18.1
torch==1.4.0

Let’s try out the sample code we added to our README page:

>>> from franks_ml_model import EfficientNetInfer
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/gerald/.virtualenvs/test_franks_model/lib/python3.7/site-packages/franks_ml_model/__init__.py", line 3, in <module>
    from .infer import EfficientNetInfer
  File "/Users/gerald/.virtualenvs/test_franks_model/lib/python3.7/site-packages/franks_ml_model/infer.py", line 3, in <module>
    from PIL import Image
ModuleNotFoundError: No module named 'PIL'

Oops, looks like we forgot some of our dependencies. Good thing we tested out the package before shipping it! For this we just need to update setup.py so the requirements would look like the following:

requirements = [
    'numpy',
    'torch',
    'torchvision',
    'Pillow',
]

Then rebuild the wheel package, pip install it (make sure you add the –upgrade flag), and try out the python code.

cd franks_ml_model
python setup.py bdist_wheel

cd ..
source ~/.virtualenvs/test_franks_model/bin/activate
pip install --upgrade franks_ml_model/dist/franks_ml_model-0.0.1-py2.py3-none-any.whl

Let try again:

>>> from franks_ml_model import EfficientNetInfer
>>> model = EfficientNetInfer()
>>> fn_image='dog.jpg'
>>> top_predictions = model.infer_image( fn_image )
[[207, 'golden retriever', 0.5610832571983337], [213, 'Irish setter, red setter', 0.22866328060626984],...

Works! We have achieved what we set out in our plan initially, which is to package Frank’s ML model so it can be pip installed and invoked with 3 lines of Python code. This improved packaging and interface ensure that Frank’s model “blends in” with other software packages so deployment is made that much simpler.

As a quick note, we will not worry about the output format in this post, but as a rule of thumb, it’s generally not a good idea to pass arrays back to the clients. For now, let’s assume that’s fine with them.

Step 7: Ship it!

This step will be unique to your situation. The most common is to upload the package to PyPi, the public repository for Python packages. However, unless your model is open sourced like this example EfficientNet model, you will most likely not want to make your package publicly available. You have two options:

Set up a private PyPi server for you organization (if not already there)
As a fallback, distribute the .whl file directly

Obviously if your organization already has a private PyPi server, use that and follow the procedure for uploading your package there. If not, and if you want to help your devops team setting one up, Nexus is a good option. This will be well worth the work, especially if you plan on adding lots more ML model packages, as well releasing new versions of existing ones.

Short of that, you might be able to get away with using the .whl file directly, so your clients have something to work with. It’s a stop gap since whenever you build a new version of your package, the distribution of that can become a huge headache. With a PyPI server set up, it’d be as simple as a single twine command, which is beyond the scope of this post.

Conclusion

There you have it: we have taken a ML model, settled on an interface, and packaged the model such that clients of this model can treat this ML model much like any other Python packages they are used to work with and deploy. As you saw, the work involved isn’t monumental, especially once you know how to save time using tools like Cookiecutter.

So hopefully the next time you meet you devops team, the conversation would go a lot smoother, perhaps something along the lines of:

   Frank (the ML person):  Hey Jane, I know you're quite busy, so
                           I did some work and simplified my ML model's
                           interface, and packaged it into a Python
                           wheel. Now it can be pip-installed
                           and only takes 3 lines of code to get
                           model predictions.

Jane (the DevOps person):  That sounds great. Send me a link to the
                           repo and I'll take a look. If looks easy
                           enough we may be able to add the package
                           to our next Docker build, scheduled in
                           the next sprint or two.

                   Frank:  That'd be great! Looking forward to it
                           and let me know if I can help in any way.

This is an idealized scenario. More likely than not, there would be additional steps the devops team would need to get your model fully deployable, which we will explore further in future posts. Nevertheless, packaging your ML model is a prerequisite step anyway, not to mention the fun part of naming your own package!

How to Package your ML Project for Easy Distribution