MLOps

Cloud agnostic tech stack for starting an MLOps platform (Level 1)

"We'll build a pipeline - after we deploy the model."

Model drift will hit when it's least convenient for you

To run: Make sure docker is running and you have Docker Compose installed.

Clone the project

git clone https://github.com/jmeisele/ml-ops.git

Change directories into the repo
```
cd ml-ops
```
Run database migrations and create the first Airflow user account.
```
docker-compose up airflow-init
```
Build our images and launch with docker compose
```
docker-compose pull && docker-compose up
```
Open a browser and log in to MinIO

user: minioadmin

password : minioadmin

Create a bucket called mlflow
Open a browser and log in to Grafana

user: admin

password : admin

Both Promethus and InfluxDB data sources have already been provisioned along with an MLOps Demo Dashboard and a Notification Channel.
Add the notification channel to some panels
Start the send_data.py script which sends a POST request every 0.1 seconds
Open a browser and turn on the Airflow DAG used to retrain our ML model

user: airflow

password : airflow

Lower the alarm threshold to see the Airflow DAG pipeline get triggered

Check MLFlow after the Airflow DAG has run to see the model artifacts stored using MinIO as the object storage layer.

(Optional) Send a POST request to our model service API endpoint

curl -v -H "Content-Type: application/json" -X POST -d
'{
    "median_income_in_block": 8.3252,
    "median_house_age_in_block": 41,
    "average_rooms": 6,
    "average_bedrooms": 1,
    "population_per_block": 322,
    "average_house_occupancy": 2.55,
    "block_latitude": 37.88,
    "block_longitude": -122.23
}'
http://localhost/model/predict

(Optional) If you are so bold, you can also simluate production traffic using locust, but keep in mind you have a lot of services running on your local machine, you would never deploy a production ML API on your local machine to handle production traffic.

Level 1 Workflow & Platform Architecture

Model Serving Architecture

Services

nginx: Load Balancer
python-model-service1: FastAPI Machine Learning API 1
python-model-service2: FastAPI Machine Learning API 2
postgresql: RDBMS
rabbitmq: Message Queue
rabbitmq workers: Workers listening to RabbitMQ
locust: Load testing and simulate production traffic
prometheus: Metrics scraping
minio: Object storage
mlflow: Machine Learning Experiment Management
influxdb: Time Series Database
chronograf: Admin & WebUI for InxfluxDB
grafana: Performance Monitoring
redis: Cache
airflow: Workflow Orchestrator
bridge server: Receives webhook from Grafana and translates to Airflow REST API

gotchas:

Postgres:

Warning: scripts in /docker-entrypoint-initdb.d are only run if you start the container with a data directory that is empty; any pre-existing database will be left untouched on container startup.