7 Components & Libraries
Sortby
With Dagster, you declare—as Python functions—the data assets that you want to build. Dagster then helps you run your functions at the right time and keep your assets up-to-date. Here is an example o…
These three functionalities enable a variety of use cases for data scientists, machine learning engineers, and data engineers: From here, you can quickly log a dataset: And there you have it, you now…
Pipeline Consists of various modules: EMR - I used a 3 node cluster with below Instance Types: Finally, pyspark uses python2 as default setup on EMR. To change to python3, setup environment variable…
Covalent is a Python library for AI/ML engineers, developers, and researchers. It provides a straightforward approach to running compute jobs, like LLMs, generative AI, and scientific research, on v…
Beneath is a serverless real-time data platform. Our goal is to create one end-to-end platform for data workers that combines data storage, processing, and visualization with data quality management …
This repository contains Docker compose script that creates opensource data analytics stack on your local machine. Currently, the stack consists of multiple components: I plan to add more components …
Subscribe to our newsletter