Pre-Production Testing

Along with KServe, pre-production testing can be carried out with the following tools:

Iter8: A Kubernetes release optimizer built for DevOps, MLOps, SRE and data science teams.
Grafana k6: An open-source tool and cloud service that makes deployment testing easy for developers and QA engineers.

Canary Deployment

Canary deployment is a progressive rollout of an application that splits traffic between an already-deployed version and a new version, rolling it out to a subset of users before rolling out fully.

Example (KServe)

KServe supports canary rollouts for inference services. Canary rollouts allow for a new version of an InferenceService to receive a percentage of traffic. KServe supports a configurable canary rollout strategy with multiple steps. The rollout strategy can also be implemented to rollback to the previous revision if a rollout step fails.

For an example taken from the KServe documentation, go to:

Canary Rollout Example

Example (Iter8)

For an AI Platform example using Iter8 for canary deployment, go to:

Canary Deployments

Shadow/Mirror Deployment

Shadow deployment is a method of testing a candidate model for production where production data runs through the model without the model actually returning predictions to the service or customers. Essentially, simulating how the model would perform in the production environment.

Example (using Iter8)

For an example from the Iter8 documentation of shadow deployment for Kubernetes-based model services, go to:

Traffic Mirroring

For an AI Platform example, go to:

Shadow/Mirror Deployment

Load Testing

Before taking the deployment to production, it is very important to perform testing on the endpoint to check the performance of the API under various conditions.

Different type of tests are as follows:

Smoke tests: validate that your script works and that the system performs adequately under minimal load.
Average-load test: assess how your system performs under expected normal conditions.
Stress tests: assess how a system performs at its limits when load exceeds the expected average.
Soak tests: assess the reliability and performance of your system over extended periods.
Spike tests: validate the behavior and survival of your system in cases of sudden, short, and massive increases in activity.
Breakpoint tests: gradually increase load to identify the capacity limits of the system.

Further resources

More information about these tests and their applicability can be found here:

Types of load testing

Example (using Iter8)

For an AI Platform example using Iter8, go to:

Load Test a KServer Model

Example (using K6)

For an AI Platform example of conducting load testing on the KServe inference service, go to:

Deploying a scikit-learn model using KServe

A/B Testing

A/B testing enables you to compare two versions of a deployed ML model in a real-life setting, and select a winner based on a (business) reward metric.

Further resources

You can learn about A/B testing from this article:

The What, Why, and How of A/B Testing

Example (using Iter8)

For an example taken from the Iter8 documentation, go to:

A/B Testing a Backend ML Model

Canary Deployment​

Example (KServe)​

Example (Iter8)​

Shadow/Mirror Deployment​

Example (using Iter8)​

Load Testing​

Example (using Iter8)​

Example (using K6)​

A/B Testing​

Example (using Iter8)​

Canary Deployment

Example (KServe)

Example (Iter8)

Shadow/Mirror Deployment

Example (using Iter8)

Load Testing

Example (using Iter8)

Example (using K6)

A/B Testing

Example (using Iter8)