Get your MLOps (Level 1) platform started and going fast.

jmeisele jmeisele Last update: Dec 31, 2022

MLOps

Cloud agnostic tech stack for starting an MLOps platform (Level 1)

"We'll build a pipeline - after we deploy the model."

Wink

Model drift will hit when it's least convenient for you

To run: Make sure docker is running and you have Docker Compose installed.

  1. Clone the project

    git clone https://github.com/jmeisele/ml-ops.git
  2. Change directories into the repo

    cd ml-ops
  3. Run database migrations and create the first Airflow user account.

    docker-compose up airflow-init
  4. Build our images and launch with docker compose

    docker-compose pull && docker-compose up
  5. Open a browser and log in to MinIO

    user: minioadmin

    password : minioadmin

    Create a bucket called mlflow

    MinIO

  6. Open a browser and log in to Grafana

    user: admin

    password : admin

    Grafana

    Both Promethus and InfluxDB data sources have already been provisioned along with an MLOps Demo Dashboard and a Notification Channel.

  7. Add the notification channel to some panels Panels

  8. Start the send_data.py script which sends a POST request every 0.1 seconds

  9. Open a browser and turn on the Airflow DAG used to retrain our ML model

    user: airflow

    password : airflow

Airflow

  1. Lower the alarm threshold to see the Airflow DAG pipeline get triggered

Threshold

  1. Check MLFlow after the Airflow DAG has run to see the model artifacts stored using MinIO as the object storage layer.

  2. (Optional) Send a POST request to our model service API endpoint

    curl -v -H "Content-Type: application/json" -X POST -d
    '{
        "median_income_in_block": 8.3252,
        "median_house_age_in_block": 41,
        "average_rooms": 6,
        "average_bedrooms": 1,
        "population_per_block": 322,
        "average_house_occupancy": 2.55,
        "block_latitude": 37.88,
        "block_longitude": -122.23
    }'
    http://localhost/model/predict
  3. (Optional) If you are so bold, you can also simluate production traffic using locust, but keep in mind you have a lot of services running on your local machine, you would never deploy a production ML API on your local machine to handle production traffic.

Level 1 Workflow & Platform Architecture

MLOps

Model Serving Architecture

API worker architecture

Services

  • nginx: Load Balancer
  • python-model-service1: FastAPI Machine Learning API 1
  • python-model-service2: FastAPI Machine Learning API 2
  • postgresql: RDBMS
  • rabbitmq: Message Queue
  • rabbitmq workers: Workers listening to RabbitMQ
  • locust: Load testing and simulate production traffic
  • prometheus: Metrics scraping
  • minio: Object storage
  • mlflow: Machine Learning Experiment Management
  • influxdb: Time Series Database
  • chronograf: Admin & WebUI for InxfluxDB
  • grafana: Performance Monitoring
  • redis: Cache
  • airflow: Workflow Orchestrator
  • bridge server: Receives webhook from Grafana and translates to Airflow REST API

gotchas:

Postgres:

Warning: scripts in /docker-entrypoint-initdb.d are only run if you start the container with a data directory that is empty; any pre-existing database will be left untouched on container startup.

Contributors

Thanks goes to these incredible people:

Subscribe to our newsletter