Simplify your ETL processes with these hands-on data sanitation tips, tricks, and best practices

TrainingByPackt Last update: Jan 13, 2024

Data Wrangling with Python by Packt

Data is the new Oil and it is ruling the modern way of life through incredibly smart tools and transformative technologies. But oil does not come out in its final form from the rig. It has to be refined through a complex processing network. Similarly, data needs to be curated, massaged and refined to be used in intelligent algorithms and consumer products. This is called wrangling and (according to Forbes) all the good data scientists spend almost 60-80% of their time on this, each day, every project. It involves scraping the raw data from multiple sources (including web and database tables), imputing, formatting, transforming – basically making it ready, to be used flawlessly in the modeling process. This course aims to teach you all the core ideas behind this process and to equip you with the knowledge of the most popular tools and techniques in the domain. As the programming framework, we have chosen Python, the most widely used language for data science. We work through real-life examples, not toy datasets. At the end of this course, you will be confident to handle a myriad array of sources to extract, clean, transform, and format your data for the great machine learning app you are thinking of building. Hop on and be the part of this exciting journey.

What you will learn

Able to manipulate complex and simple data structure using Python and it’s built-in functions
Use the fundamental and advanced level of Pandas DataFrames and numpy.array. Manipulate them at run time.
Extract and format data from various formats (textual) – normal text file, SQL, CSV, Excel, JSON, and XML
Perform web scraping using Python libraries such as BeautifulSoup4 and html5lib
Perform advanced string search and manipulation using Python and RegEX
Handle outliers, apply advanced programming tricks, and perform data imputation using Pandas
Basic descriptive statistics and plotting techniques in Python for quick examination of data
Practice data wrangling and modeling using the random data generation techniques - Bonus Topic

Hardware requirements

For an optimal student experience, we recommend the following hardware configuration:

OS: Windows 7 SP1 64-bit, Windows 8.1 64-bit or Windows 10 64-bit, Ubuntu Linux, or the latest version of macOS
Processor: Intel Core i5 or equivalent
Memory: 8GB RAM or more
Hard disk: 40GB or more
Stable Internet connection

Software requirements

You'll also need the following software installed in advance:

Browser: Google Chrome/Mozilla Firefox Latest Version
Python 3.4+ (preferably Python 3.6) installed
Python libraries as needed (Jupyter, Numpy, Pandas, Matplotlib, BeautifulSoup4, and so)
Notepad++/Sublime Text (latest version), Atom IDE (latest version) or other similar text editor applications.

The following Python libraries are needed:

NumPy
Pandas
SciPy
scikit-learn
Matplotlib
BeautifulSoup4

Simplify your ETL processes with these hands-on data sanitation tips, tricks, and best practices

Data Wrangling with Python by Packt

What you will learn

Hardware requirements

Software requirements

Xiaomi Yi Camera settings via python (PC) script

A Python script that automatically checks in to your Southwest flight 24 hours beforehand.

AutoDoc-ChatGPT is a Python script that leverages the power of ChatGPT model to automatically generate documentation for any programming language. With AutoDoc-ChatGPT, you can easily generate comprehensive documentation for your codebase.

Record my python script about Iearning to identify similar images

Simple Python script that can clone Warp Plus (1.1.1.1) keys and generate 12PB (or 24PB) keys.

Linux Bash Shell Script and Python Script For Ops and Devops

Hekatomb is a python script that connects to LDAP directory to retrieve all computers and users informations. Then it will download all DPAPI blob of all users from all computers and uses Domain backup keys to decrypt them.

A python script that allows your terminal to snow.

Collection of Python scripts for reading information about and extracting data from UBI and UBIFS images.

Simple Python script to interact with the TikTok TTS API