Skip to main content

Model management with MLflow

Model registry

What is model registry in MLflow?

When using MLflow in Azure Machine Learning, the model registry is a central store for managing models across their lifecycle. It supports:

  • Versioning: Each model registered under the same name gets a new version.
  • Metadata tracking: Tags and properties can store details like the Git commit ID, dataset version, and training config.
  • Lineage and reproducibility: Models are linked to the code, environment, and data that produced them—enabling full traceability.

The model registry is a key part of MLOps in Azure ML, similar to how Git tracks code. It ensures models can be reproduced, compared, and deployed reliably across environments.

MLflow model registry in Azure ML

What should an Azure ML model registry store?

A robust Azure ML model registry should capture all information needed for model lineage, reproducibility, and deployment:

  • Model Artifacts:
    Store the trained model files (for example, Pickle, ONNX, MLflow, TensorFlow formats) for easy deployment and reuse.

  • Metadata & Lineage:
    Track model name, version, registration time, creator, and tags. Include references to training code (e.g., Git commit), environment (Docker/Conda), and training data location or snapshot.
    Model management and lineage | Azure ML-Ops Accelerator

  • Parameters & Metrics:
    Store input parameters (hyperparameters) and evaluation metrics (accuracy, F1, etc.) for each model version to enable comparison and monitoring.

  • Environment Details:
    Reference or store the software environment (Dockerfile, Conda YAML) used for training and inference to ensure reproducibility.

  • Data References:
    Link to the original training data or a snapshot in Azure Blob Storage or other supported datastores.

Model packaging

What is model packaging and serialization?

  • Serialization is the process of converting a trained ML model into a file format (like Pickle, ONNX, or TensorFlow SavedModel) so it can be saved, transferred, and loaded elsewhere.
  • Model packaging in Azure ML goes further: it bundles the serialized model plus all its dependencies (such as Python packages, environment files, and configuration) into a single, portable unit. This ensures the model can be reliably deployed and run in different environments.

Serialization formats

Here are some popular serialization formats:

Sr. No.FormatFile ExtensionFrameworkQuantization
1Pickle.pklscikit-learnNo
2HDF5.h5KerasYes
3ONNX.onnxTensorFlow, PyTorch, scikit-learn, Caffe, Keras, MXNet, iOS Core MLYes
4PMML.pmmlscikit-learnNo
5TorchScript.ptPyTorchYes
6Apple ML Model.mlmodeliOS Core MLYes
7MLeap.zipPySparkNo
8Protobuf.pbTensorFlowYes

Addressing Interoperability

Most formats (except ONNX) are not interoperable across frameworks. ONNX solves this by enabling models to be transferred between frameworks like scikit-learn, PyTorch, and TensorFlow. This supports framework-independent deployment, federated learning, and batch inference.