This project develops an MLOps pipeline using Evidently to monitor key performance metrics of a machine learning model, including prediction drift and median fare amount. It employs Prefect for workflow orchestration, managing tasks such as database updates and metric calculations. The results are visualized through Grafana, providing interactive dashboards for real-time analysis, all supported by a Docker Compose environment that orchestrates the interplay between PostgreSQL, Adminer, and Grafana to handle data storage, management, and visualization.
docker-compose upThis will initiate the following steps:
This notebook develops a regression model using the pandas library for data manipulation, scikit-learn for model building and evaluation, with matplotlib and seaborn for visualization. It processes New York City taxi data to predict trip durations or fare amounts through data cleaning, exploratory data analysis, feature engineering, model training, and validation. The notebook integrates Evidently to monitor performance drift, number of drifted columns, missing values, and regression performance quality, as well as tracking the median fare amount.
This Python script utilizes the Evidently library to monitor key model performance metrics such as prediction drift, number of drifted columns, and median fare amount. It employs Prefect for orchestrating the pipeline to manage tasks like database preparation and daily metric calculations effectively. The metrics generated by Evidently are stored in a PostgreSQL database, managed via the psycopg library for SQL operations. For visualization, Grafana is integrated, providing interactive dashboards for real-time monitoring and analysis, all facilitated through a Docker Compose setup that includes services for PostgreSQL, Adminer, and Grafana, ensuring seamless interaction and data flow between these components.
Navigate to http://localhost:3000/
Open the Dashboard titled 'Taxi Duration Prediction'
