Pre-Production Testing
Along with KServe, pre-production testing can be carried out with the following tools:
- Iter8: A Kubernetes release optimizer built for DevOps, MLOps, SRE and data science teams.
- Grafana k6: An open-source tool and cloud service that makes deployment testing easy for developers and QA engineers.
Canary Deployment
Canary deployment is a progressive rollout of an application that splits traffic between an already-deployed version and a new version, rolling it out to a subset of users before rolling out fully.
Example (KServe)
KServe supports canary rollouts for inference services. Canary rollouts allow for a new version of an InferenceService to receive a percentage of traffic. KServe supports a configurable canary rollout strategy with multiple steps. The rollout strategy can also be implemented to rollback to the previous revision if a rollout step fails.
For an example taken from the KServe documentation, go to:
Example (Iter8)
For an AI Platform example using Iter8 for canary deployment, go to:
Shadow/Mirror Deployment
Shadow deployment is a method of testing a candidate model for production where production data runs through the model without the model actually returning predictions to the service or customers. Essentially, simulating how the model would perform in the production environment.
Example (using Iter8)
For an example from the Iter8 documentation of shadow deployment for Kubernetes-based model services, go to:
For an AI Platform example, go to:
Load Testing
Before taking the deployment to production, it is very important to perform testing on the endpoint to check the performance of the API under various conditions.
Different type of tests are as follows:
- Smoke tests: validate that your script works and that the system performs adequately under minimal load.
- Average-load test: assess how your system performs under expected normal conditions.
- Stress tests: assess how a system performs at its limits when load exceeds the expected average.
- Soak tests: assess the reliability and performance of your system over extended periods.
- Spike tests: validate the behavior and survival of your system in cases of sudden, short, and massive increases in activity.
- Breakpoint tests: gradually increase load to identify the capacity limits of the system.
More information about these tests and their applicability can be found here:
Example (using Iter8)
For an AI Platform example using Iter8, go to:
Example (using K6)
For an AI Platform example of conducting load testing on the KServe inference service, go to:
A/B Testing
A/B testing enables you to compare two versions of a deployed ML model in a real-life setting, and select a winner based on a (business) reward metric.
You can learn about A/B testing from this article:
Example (using Iter8)
For an example taken from the Iter8 documentation, go to: