Test and iterate on your model servers with scaffolds

Alex GillmorAlex Gillmor|

Today, we’re excited to share more about scaffolds, the technology that underpins the model serving experience on BaseTen. Put simply, a BaseTen scaffold is a context for building a container for serving predictions from a model. Scaffolds are powered by familiar technologies: docker, KServe, and Python. We’ve added some light opinions and stitched them together.

Scaffolds enable functionality including complex pre-processing of model inputs and client deployments so you can quickly test and iterate on your model servers locally before deploying on BaseTen. You can also use scaffolds independently from BaseTen if you want to build your own container and deploy to your own server. And this is just a start—we’re building towards an increasingly robust scaffolds ecosystem, enabling you to do things like build observability pipelines and dynamically define and auto-document model server interfaces in the future.

Here, we’ll walk through a simple example to show how to set up a scaffold for a project. Then, we’ll use a second example to show the power of scaffolds for local iteration on a custom model.

Example 1: Setting up scaffolds

To show how scaffolds work, we’ll start with a simple model. We love scikit-learn here so let’s use a random forest classifier and the classic iris dataset.

First, we train the model:

from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier

iris = load_iris()
data_x = iris['data']
data_y = iris['target']
rfc_model = RandomForestClassifier()
rfc_model.fit(data_x, data_y)

Once the model is trained, it is trivial to create a scikit-learn scaffold around the model using the BaseTen client package:

import baseten

scaffold = baseten.build_baseten_scaffold(
   rfc_model,
   target_directory='scaffold_rfc',
   model_name='scaffold_rfc'
)

INFO To build this model server locally execute `docker build  -f scaffold_rfc/sklearn-server.Dockerfile scaffold_rfc -t scaffold_rfc`
INFO To run this model server locally execute `docker run --rm -p 8080:8080 scaffold_rfc`
INFO To use the Python shell locally execute `docker run --rm -it scaffold_rfc python3`

Et voila, we now have a scaffold object in Python memory and on our local disk at the directory scaffold_rfc/.

Next, we could choose to deploy this scaffold onto the BaseTen infrastructure and start quickly building a user-facing application powered by our model:

baseten.deploy_scaffold(
   scaffold,
   model_name='Iris Random Forest Classifier',
)

However, we don’t have to deploy the scaffold onto the BaseTen infrastructure to interact with our model’s inference capabilities. We can also call inference on the model directly in the scaffold like this:

>> scaffold.predict([[1,2,3,4]])
# ->
{'predictions': array([2]), 'probabilities': [[0.0, 0.33, 0.67]]}

From the INFO statements provided to us at instantiation, we can build the container for the model server locally and make calls into it via HTTP with JSON. Here’s how we build the container image:

$ docker build  -f scaffold_rfc/sklearn-server.Dockerfile scaffold_rfc -t scaffold_rfc

Next, we can run the container:

docker run --rm -p 8080:8080 scaffold_rfc

We can now interact with the model through the containerized web server and not just as an inference model. This allows for us to test our request and response structures:

$ curl
   -H 'Content-Type: application/json'
   -X POST http://localhost:8080/v1/models/model:predict
   -d '{"inputs": [[0,0,0,0]]}'

{"predictions": [0], "probabilities": [[1.0, 0.0, 0.0]]}

Example 2: Using scaffolds for local iteration on a custom model

A big reason to use scaffolds is that they make it easier to find problems and debug your model server before it gets deployed onto BaseTen’s infrastructure. We know that for high touch points such as integrations it’s especially helpful to be able to fix any issues that occur in the same environment you have trained your model in. Let’s walk through how this works with a custom model—while scaffolds for simple models in scikit-learn, Keras, and PyTorch “just work”, the custom scaffold provides for more flexibility in implementation.

Here we define a custom model that we’ll use in a scaffold:

import json

class SimpleBuggyModel:

   def __init__(self):
       pass

   def load(self):
       pass

   def predict(self, inputs) -> dict:
       predictions = []
       for this_input in inputs:
           try:
               predictions.append(json.loads(this_input))
           except:
               predictions.append(
                   {
                       "error": f"Could not parse input correctly. Please ensure that input is formatted correctly."
                   }
               )
       return predictions

The model here is very simple, it’s just an identity function. The code is designed to pass through the input the model receives if it can parse it. Nothing is outwardly wrong with this code, so let’s test it.

First we create a custom BaseTen scaffold:

scaffold = baseten.build_custom_scaffold(
   model_name='Simple Model',
   model_files =['simple_buggy_model.py'],
   model_class='SimpleBuggyModel',
   target_directory='simple_model/'
)

Next, we build and run the scaffold locally. Running the scaffold should look something like this:

$ docker run --rm -p 8080:8080 simple_model
{"asctime": "2021-12-15 00:17:38,790", "levelname": "INFO", "message": "Registering model: model"}
{"asctime": "2021-12-15 00:17:38,792", "levelname": "INFO", "message": "Listening on port 8080"}
{"asctime": "2021-12-15 00:17:38,796", "levelname": "INFO", "message": "Will fork 1 workers"}

If we call the endpoint provided by this model, it should return the same JSON we call it with. Let’s give it a try!

$ curl
 -X POST http://localhost:8080/v1/models/model:predict
 -H 'Content-Type: application/json'
 -d '{"instances": [[0,0,0,0]]}'

$ curl
 -X POST http://localhost:8080/v1/models/model:predict
 -H 'Content-Type: application/json'
 -d '{"instances": ["hello world"]}'

In both cases, we see this response: {“error”: “Could not parse input correctly. Please ensure that input is formatted correctly.“}

Do you see the bug? It turns out we are calling json.loads on an object that has already been parsed from JSON into Python types by the model server. The JSON parsing has already been done for us.

If we change the existing custom server code from predictions.append(json.loads(input)) to predictions.append(input) the codepath should work. Thanks to the scaffold, we can quickly make this change, run the new container, and test it locally:

$ curl
 -X POST http://localhost:8080/v1/models/model:predict
 -H 'Content-Type: application/json'
 -d '{"instances": [[0,0,0,0]]}'

{"predictions": [[0, 0, 0, 0]]}

$ curl
 -X POST http://localhost:8080/v1/models/model:predict
 -H 'Content-Type: application/json'
 -d '{"instances": ["hello world"]}'

{"predictions": ["hello world"]}

After making a change to our model and rebuilding the container, the request is successfully processed! We were able to fix a bug in the data formats our model server expects without wasting time waiting for deployment on BaseTen. Now, we can deploy our custom model on BaseTen with greater confidence.

This just scratches the surface of scaffold functionality. Scaffolds do not have to live on our infrastructure–you can deploy scaffolds on your own server. The scaffold structure contains Docker files to edit and more. Read our scaffolds documentation for additional information. And please reach out with any questions or ideas—we’d love to hear from you.