MLOps with MLflow on Kraken CI

3 min readApr 29, 2022

Besides building, testing and deploying, Kraken CI is also a pretty
nice tool to build an MLOps pipeline. In this article, it will be shown how to leverage Kraken CI to build a CI workflow for machine learning using
MLflow.

Kraken CI is a new Continuous Integration tool. It is a modern, open-source, on-premise CI/CD system that is highly scalable and focused on testing. It is licensed under Apache 2.0 license. Its source code is available on Kraken CI GitHub page.

This article is the eighth installment of the series of articles about Kraken CI. Till now we have:

Part 1, Kraken CI, New Kid on the CI block
Part 2, Your First Workflow in Kraken CI
Part 3, Autoscaling CI with Kraken CI
Part 4, Webhooks in Kraken CI for GitHub, GitLab and Gitea
Part 5, Autoscaling CI on Kubernetes in Kraken CI
Part 6, Tests Basics in Kraken CI
Part 7, What is Wrong with Your Testing?

MLOps and MLflow

MLOps is a set of practices that aims to build and maintain machine
learning models in production reliably and efficiently. One of
prominent tools in this area is MLflow.

MLflow is an open-source platform for managing the end-to-end machine
learning lifecycle. It tackles four primary functions:

Tracking experiments to record and compare parameters and results
(MLflow Tracking).
Packaging ML code in a reusable, reproducible form to share
with other data scientists or transfer to production (MLflow
Projects).
Managing and deploying models from various ML libraries to a
variety of model serving and inference platforms (MLflow Models).
Providing a central model store to collaboratively manage the entire
lifecycle of an MLflow Model, including model versioning, stage
transitions, and annotations (MLflow Model Registry).

MLflow in Kraken CI

In the following sections, I will describe how to prepare a workflow
in Kraken CI to train an ML model. This is an LSTM model that will
predict stock prices based on historical data.

The workflow will be:

pulling live stock data and preparing it for training (source 1, source 2)
performing the training (source 3)
storing model metrics in Kraken CI for charting

The MLflow project is described in MLproject.

Workflow Definition

The whole Kraken CI workload is defined here.

There are 3 steps:

Checkout mflow example project sources
Run the mlflow project ie. download data, prepare it, run a
training and at the end store metrics about the trained model to
metrics.json
Upload collected metrics together with hyperparameters from
params.json to Kraken server

3. Upload collected metrics together with hyperparameters from
params.json to Kraken server

The last step allows for charting accuracy and RMS of the model over
builds.

Here we can notice the use of a pre-prepared image with mlflow.
It is available in Docker hub: krakenci/mlflow.

The whole example of workflow is present in Kraken lab:
https://lab.kraken.ci/branches/32/ci. Check the steps definitions in
branch management page.

Execution and Monitoring

Besides the workflow definition, Kraken UI also shows collected data
and the charts drawn from this data:
https://lab.kraken.ci/test_case_results/595950, the charts tab.

The right chart shows value of loss collected over time:

Summary

This article shows how Kraken CI can be used to build an MLOps
pipeline. The pipeline downloads raw data, prepares the data for
training and then executes the training. The trained model
metrics are collected and charted in Kraken UI at the end.