Simple Python examples including data analysis, ETL, web scraping

WilliamQLiu WilliamQLiu Last update: Dec 12, 2023

Python Projects

This is mainly a list of heavily commented one-off programs that I make when trying out new libraries. They include using Python's standard library, create tests (unit, functional), connect to databases, working with JSON, regex, do statistical analysis, plot data, generate fake data, elasticsearch, deploy code, spin up AWS EC2 servers, or general web scraping.

beautifulsoup - Using bs4, does web scraping (get a website in HTML, pull out specific data in a clean format)

boto - interface with Amazon Web Serviecs (spin up new servers, connecting to storage (ebs, s3), shut down servers, etc.)

celery - How to queue up work and distribute work across threads with a broker (e.g. redis)

csv - How to process csv data (read from csv, catch errors, write to csv)

datascience_ga - Data Science/Machine Learning Course that I completed at General Assembly NYC; goes over regressions, random forests, naive bayes, k means, dimension reduction, ensemble methods with a lot of scikit learn

django - The Django introductory Polls tutorial and Tango with Rango tutorial. For detailed examples of Django, projects are in their own repo.

doctest - Creating tests for Python code using doctest

docopt - A library for parsing command line arguments (good for making a python program that needs command line args)

fabric - Streamlining SSH to automate tasks (application deployment to system administration tasks) for nginx and gunicorn on Ubuntu

faker - Generate mass fake data/records (e.g. dates, names)

elasticsearch - How to do a full text search with indexed data; good for exploring data and solving searches

ggplot - R's ggplot2 for plotting (Note: the library seems very rough still, much easier to just switch to R)

googleappengine - hello world and guestbook examples using Google's App Engine backend

hashlib_and_hmac - A quick and dirty way of obfuscating data

joblib - How to do parallel computing for extremely large calculations

l2h - basic data transformations in Excel files and scraping web stats

mechanize - web scraping using mechanize

mock - Creating mock objects for unit testing

mongodb_class - MongoDB's intro class for working with MongoDB for Python Developers

nltk - Library for Natural Learning Tool Kit / Machine Learning (mainly text); e.g. count words, lexical diversity of Game of Thrones books

pandas - How to reorganize and do data analysis (e.g. use DataFrames)

pickle - How to serialize and deserialize a Python object (aka serialization, marshalling, flattening)

projecteuler - Math and programming questions from projecteuler.net

pycrypto - Simple encryption and decryption (don't use this code for external programs)

pyprind - Create progress bar on the command line

pytesseract - Uses the Python Image Library (PIL), takes an image, and tries to return text

pytest - Creating tests for Python code using pytest

random - For the times where you need to shuffle a large dataset that unix can't with shuf or gshuf command

redis - Broker system used with celery

requests - Using Python's standard library to make web requests

regex - Doing Regular Expressions in Python

rest - Django Rest Framework to convert data (Django querysets, models) to native Python datatypes that can then be rendered into JSON, XML, etc.

robotgame - Create the AI for robots to battle one another on robotgame.net

scipy - Library of probability distributions and statistical Functions

seaborn - Data Plotting library

sklearn - Machine Learning library

sqlalchemy - An Object Relational Mapper for connecting Python to databases through code

starcluster - Spin up clusters of EC2 machines for machine learning

statsmodels - Using patsy to do statistical analysis

std_lib - Python's standard library, covers a lot of the Python basics (e.g. decorators, classes)

tkinter - Create GUIs for the desktop with Python

vowpal_wabbit - Machine learning library

xkcd - Plotting with matplotlib, but with the xkcd look

Subscribe to our newsletter