Apache-airflow Libraries

2 Components & Libraries

Sortby

Apache-airflow Libraries

An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.

Pipeline Consists of various modules: EMR - I used a 3 node cluster with below Instance Types: Finally, pyspark uses python2 as default setup on EMR. To change to python3, setup environment variable…

A complete development environment setup for working with Airflow

This boilerplate has more tools than was discussed in the article. In particular, it has the following things that were not discussed in the article: Create a virtualenv for this project. Feel free t…

Related tags