Kubeflow & MLflow
This tutorial shows you how to use Boxkite in the context of a Kubeflow cluster with MLflow.
Launch the Test Drive
Note: the test drive doesn't work in Safari yet. Please use Chrome or Firefox for now! Also please note it won't work in Private/Incognito windows.
Use the following test drive to launch a temporary Kubernetes cluster with the tutorial running in it:
At busy times, you may need to wait a few minutes for a test drive environment to become available.
Note that the environment will shut down automatically 1 hour after you start using it.
If you get a black screen on "Booting VM", please be patient - it's loading. Failing that, scroll down to the bottom of this page to see a video of the demo.
Start Kubeflow Notebook Server
Click the "Kubeflow" button inside the test drive frame, and you should see a "CoreOS" login screen (this is the default Kubeflow "dex" login screen). You may want to arrange this tutorial window side by side with the Kubeflow one so that you can easily follow along.
Log into Kubeflow with:
Click the "burger bar" (three lines) if necessary, and navigate to to Notebook Servers.
Click "+ New Server".
Change the following settings from the defaults:
- Name: Name the notebook server anything you like, such as
- Image: Tick the "Custom image" checkbox and enter:
This preinstalls the required dependencies and makes the demo notebook available.
- Workspace Volume: Tick the "Don't use Persistent Storage for User's home" box. Then click "dismiss" on the warning that pops up. This is so that the demo notebook shows up in your home directory.
- Configurations: Click "Configurations" and then select "MLflow". This will set up the notebook environment so that it can talk to MLflow automatically.
Now click the blue "Launch" button at the bottom of the screen. The notebook server may take a few moments to start up.
Run demo notebook
Once the notebook server has started, click the "Connect" button.
demo.ipynb notebook and click the "play" icon for each of the cells in turn.
This will demonstrate training a model, recording the model and the training data distribution as a histogram to mlflow, then deploying the model, and running a load test against it.
Inspect the model in MLflow
Click the "MLflow" button in the test drive interface above. Observe that the model has been recorded in the MLflow model registry along with the histogram.
This is useful so that you can maintain a "model registry" which records which models you've trained along with their training distributions in a central location, for improved collaboration and governance in your team.
Click the "Grafana" button in the test drive interface above.
Log into Grafana with:
Click on the Dashboards icon on the left (four boxes).
Then click Manage -> MLOps -> Model Metrics.
Observe that the load test you started in the demo notebook is visible in the Grafana dashboard.
This is useful so that you can monitor how the model data and predictions are drifting from what it was trained on.
Note that Grafana here is aggregating the statistics over the three model servers you deployed from the notebook, so it is working in HA mode!
Notes for advanced users
- You can also use the SSH tab above to poke around the cluster with
- You can also view the Terraform used for the tutorial environment and to replicate it yourself: Terraform for MLOps stack.
This video shows the above tutorial in action.