Data-science Libraries

286 Components & Libraries

Sortby

Data-science Libraries

:earth_americas: machine learning tutorials (mainly in Python3)

This is a continuously updated repository that documents personal journey on learning data science, machine learning related topics. Curated notes on deep learning. Notes related to advertising domai…

🙌 Welcome open-source Python mini-project contributions!

A collection of easy Python small projects to help you improve your programming skills. As a Python newbie, I understand the problems that people face when they first begin studying and attempting to…

An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model performance over time. 🛡️ Supports privacy-preserving data collection, ensuring safety & robustness. 📈

These three functionalities enable a variety of use cases for data scientists, machine learning engineers, and data engineers: From here, you can quickly log a dataset: And there you have it, you now…

Lecture Notes for Linear Algebra Featuring Python. This series of lecture notes will walk you through all the must-know concepts that set the foundation of data science or advanced quantitative skillsets. Suitable for statistician/econometrician, quantitative analysts, data scientists and etc. to quickly refresh the linear algebra with the assistance of Python computation and visualization.

These lecture notes are intended for introductory linear algebra courses, suitable for university students, programmers, data analysts, algorithmic traders and etc. The lectures notes are loosely bas…

Library to scrape and clean web pages to create massive datasets.

A straightforward library that allows you to crawl, clean up, and deduplicate webpages to create massive monolingual datasets. Using this library, you should be able to create datasets larger than th…

A Django app that creates automatic web UIs for Python scripts.

Wooey is a simple web interface to run command line Python scripts. Think of it as an easy way to get your scripts up on the web for routine data analysis, file processing, or anything else. Wooey wa…

Turn Python scripts into handouts with Markdown and figures

Turn Python scripts into handouts with Markdown comments and inline figures. An alternative to Jupyter notebooks without hidden state that supports any text editor. You use Python Handout as a librar…

The purpose of this project is to share knowledge on how awesome Streamlit is and can be

This project provides This repo is maintained by me :-) Thanks In the pull request you should Thanks. In the pull request you should Please note that your app should not require high compute power as…

Open-source low code data preparation library in python. Collect, clean and visualization your data in python with a few lines of code.

Currently, you can use DataPrep to: DataPrep.EDA is the fastest and the easiest EDA (Exploratory Data Analysis) tool in Python. It allows you to understand a Pandas/Dask DataFrame with a few lines of…

A minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.) of the top machine learning algorithms for binary classification (random forests, gradient boosted trees, deep neural networks etc.).

The algorithms studied are in various commonly used open source implementations like In summary, we are focusing on which algos/implementations can be used to train relatively accurate binary classif…

✨ Argilla: Open-source platform to build better language models through human feedback

Argilla Client: a powerful Python library for reading and writing data into Argilla, using all the libraries you love (transformers, spaCy, datasets, and any other). Argilla Server and UI: the API an…

Lecture Notes for Linear Algebra Featuring Python. This series of lecture notes will walk you through all the most must-know concepts that set the foundation of data science or advanced quantitative skillsets. Suitable for statistician/econometrician, quantitative analysts, data scientists and etc. to quickly refresh the linear algebra with the assistance of Python computation and visualization.

Torchmetrics - Machine learning metrics for distributed, scalable PyTorch applications.

Simple installation from PyPI Install using conda Pip from source Pip from archive Extra dependencies for specialized metrics: Install latest developer version TorchMetrics is a collection of 100+ Py…

A site that displays up to date COVID-19 stats, powered by fastpages.

The content of this site shows statistics and reports regarding Covid-19.

[UNMAINTAINED] Automated machine learning for analytics & production

Automated machine learning for production and analytics auto_ml is designed for production. Here's an example that includes serializing and loading the trained model, then getting predictions on sing…

Algorithmic Trading in Python with Machine Learning

With PyBroker, you'll have all the tools you need to create winning trading strategies backed by data and machine learning. Start using PyBroker today and take your trading to the next level! Or you …

Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

The reference book for these and other Spark related topics is: The following notebooks can be examined individually, although there is a more or less linear 'story' when followed in sequence. By usi…

Detecting silent model failure. NannyML estimates performance for regression and classification models using tabular data. It alerts you when and why it changed. It is the only open-source library capable of fully capturing the impact of data drift on performance.

Because NannyML can estimate performance, it is possible to weed out data drift alerts that do not impact expected performance, combatting alert fatigue. Besides linking data drift issues to drops in…

Using python and scikit-learn to make stock predictions

While I would not live trade based off of the predictions from this exact code, I do believe that you can use this project as starting point for a profitable trading system – I have actually used cod…

Notes, examples, and Python demos for the 2nd edition of the textbook "Machine Learning Refined" (published by Cambridge University Press).

We believe mastery of a certain machine learning concept/topic is achieved only when the answer to each of the following three questions is affirmative. Example ”roadmaps” shown below provide suggest…

Modern columnar data format for ML implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, with more integrations coming..

Lance is a modern columnar data format that is optimized for ML workflows and datasets. Lance is perfect for: The key features of Lance include: Download the sift1m subset Convert it to Lance Build …

MLBox is a powerful Automated Machine Learning python library.

MLBox has been developed and used by many active community members. Your help is very valuable to make it better for everyone.

fklearn: Functional Machine Learning

To install via pip: You can also install from the source:

Tutorials and training material for the H2O Machine Learning Platform

This document contains tutorials and training materials for H2O-3. If you find any problems with the tutorial code, please open an issue in this repository. There are a number of tutorials on all so…

Python Client for Supabase. Query Postgres from Flask, Django, FastAPI. Python user authentication, security policies, edge functions, file storage, and realtime data streaming. Good first issue.

Using venv (Python 3 built-in): Using conda: Install the package (for > Python 3.7): Set your Supabase environment variables in a dotenv file, or using the shell: Init client: Use the supabase cli…

Multi-class confusion matrix library in Python

Fig1. ConfusionMatrix Block Diagram After that, two scores are calculated for each confusion matrices, overall and class-based. The overall score is the average of the score of seven overall benchmar…

Deep Learning Toolkit for Medical Image Analysis

If you use DLTK in your work please refer to this citation for the current version: Setup a virtual environment and activate it. Although DLTK<=0.2.1 supports and python 2.7, we will not support i…

✨ Rubrix, open-source framework for data-centric NLP. Data annotation and monitoring for enterprise NLP

Why Rubrix? Interactive weak supervision. Building a news classifier with user search queries: Getting started with Rubrix is as easy as: Then simply run: After a few iterations of data annotation, w…

Related tags

geoserver-rest-python