top of page
Search

Deploy-Predict-Monitor explainable ML models with MLServe.com

  • Writer: Nick Gavriil
    Nick Gavriil
  • Sep 14
  • 5 min read

Updated: Sep 14


Problem Statement

For most companies, 95% of what’s needed to put a machine learning model into production can already be done by a data scientist (e.g data gathering and analysis, model design and development, model performance monitoring and explainability) —but the remaining 5% (real time inference, scalability, low latency) can derail timelines, budgets, and morale.

Usually a business will consider three options:

  1. Hire people

  2. Hire software

  3. Up-skilling


For big organisations hiring people might bring ROI especially if they have 10-100s of use cases and operate in industries where low latency is critical (e.g. finance). The problem I have identified lies in smaller organisations where the boundaries between what is optimal might be blurred. Let me expand a bit more.


Here are some common pitfalls for small organisations:

  • Overestimating use cases and rushing investments.

  • Overpaying upfront in a hurry to generate value.

  • Extrapolating early wins without accounting for diminishing returns.

  • Copying big players without matching their scale or needs.


So if they invest heavily in people they risk going underwater in their investment. If they invest in tooling they might go for solutions that, while valuable for bigger organisations, might not be the best fit for smaller ones leading to delays. Keep in mind that when something is so niche like MLOps, vendors will adjust their solution to players with multiple use cases and highly specialised workforce.


This is the challenge I signed up for:

Support the smaller players making their first steps into ML and MLOps in the most efficient way possible.

The Ideal Candidate

To design MLServe.com I had to start from the ideal candidate. This would be a data scientist that:

  1. Is a high performing generalist. In order to be a generalist and manage to stay out of the "jack of all trades" category you have to be very mindful of the tools you learn/use. A generalist will not learn 5 flavours of scikit-learn for various cloud vendors or 4 plotting libraries. This individual has high preference for open source tools like sklearn or xgboost that can solve 99% of problems.

  2. Has access to jupyter-type notebooks (Hex being my favourite) with a couple requirements:

    1. Enough resources to perform model training.

    2. Scheduled runs


The Solution

MLServe.com is an inference-first MLOps platform with the goal to make MLOps as simple as possible so that data scientists use technologies they are familiar with and can be productive from minute 1.

The product consists of:

  1. An API service that allows you to deploy ML models, get real-time or batch predictions, run AB tests and monitor model performance (latency, drift, online model performance). This service integrates with the main product / backend and serves predictions.

  2. A python SDK (API wrapper) that allows the data scientist to do all of the above but in a more pythonic way.

  3. A streamlit app (SDK wrapper) to be used either by data scientists or business folks in order to deploy / monitor and test models in an even more user friendly package.

All of the above are protected with JWT authentication, end-to-end encryption and MLServe servers operate behind Cloudflare tunnel so that maximum security protocols are maintained.

Here’s a closer look at what MLServe offers.


Use Cases

  • Deploy Models in Minutes (API, SDK, Streamlit UI)

  • Get Predictions Anywhere (real-time, batch, Redis feature store)

  • Monitor and Retrain with Confidence (latency, drift, SHAP, retraining triggers)

  • Experiment and Optimize (A/B testing, sticky bucketing, multivariate tests)


Let's go through them in more detail.


Model Deployment

This is what you need to deploy your ML model:

  1. Your model, this one is obvious

  2. Model name and version for easier tracking and management

  3. A list of features. These are used to provide interpretable insights (e.g. data quality) from input data provided during inference without the user being required to provide them again during inference.

  4. A background dataset (just a couple hundred rows needed) is used to:

    1. Generate a SHAP explainer for explainable inference. Critical for industries like healthcare or finance

    2. Set a baseline for feature distribution that is later on used to detect drift

  5. Offline metrics. These can be used to track progress on your models or even build rules for retraining based on spread between offline / online performance.

  6. The task type. So that the platform can generate a plethora of online task specific performance metrics based on inference feedback.


Running this method will return you the endpoint so that you can integrate predictions in your product.

ree

Inference

For inference you have quite a few options:

  • Option 1: Pass in your input features (and optionally feature names if you have changed columns for some reason), the model name / version and whether you would like SHAP explanations for your model.

  • Option 2: Use a single endpoint to either get the predictions from the live version or while running ab tests on multiple versions. In this case you can optionally provide entity ids so that MLServe can use sticky bucketing (each entity will always be served by the same model version) which is the standard in RCTs.

  • Option 3 (under development): If you have a feature store (at the moment supporting Redis) you can send entity ids instead of the input features and MLServe will fetch the features in the backend and serve your predictions.

ree

ree

Monitoring

When it comes to monitoring you have access to:

  • Inference performance metrics like throughput, latency (avg, p50, p95, p99) as well as per element metrics (per prediction instead of per batch)

  • Data quality metrics on drift, outliers and missing values (e.g. PSI, KS, avg mean etc.) along with statuses (warning, alert) based on industry standards

  • Online model performance metrics. When you provide feedback (true labels or rewards e.g. revenue gain) on prediction_ids you have already received, you will have access to online performance metrics

  • Online metrics by version: Especially useful to see when you are hitting diminishing returns and resources might be better invested invested in other models.

ree

Experimentation

You can run experiments (ab / multivariate) on model versions. Once you use the predict_weighted method, each prediction is assigned to each model based on the assigned weights (in this case no sticky bucketing as entity ids are not provided).

ree

Retraining

If you have access to a scheduled notebook environment then you can also retrain new versions based on various criteria. You just schedule your notebook to run (let's say every week/month) and you use the drift / model performance metrics as conditions for retraining. If the conditions are met then you should be able to update your model and automatically get the latest one in production.

ree

Streamlit UI

If you don't want to run a single line of code you can still get all of the above functionality through the streamlit UI. The goal here is to stick to open source options as much as possible so that it feels very familiar for any data science professional to work with the product and be productive from minute 1.




ree

What's Next

MLServe.com is in private beta—perfect for individual data scientists and small DS teams looking for production-ready MLOps without the enterprise overhead. Join the waitlist today or DM me on LinkedIn to get early access. Have a look at the demo notebooks here

 
 
 

Comments


Follow me

  • LinkedIn
  • Instagram
  • Threads
bottom of page