SQL with Python in the middle
SPyQL is a query language that combines:
- the simplicity and structure of SQL;
- with the power and readability of Python.
SQL provides the structure of the query, while Python is used to define expressions, bringing along a vast ecosystem of packages.
SPyQL is fast and memory efficient. Take a look at the benchmarks with GB-size JSON data.
SPyQL offers a command-line interface that allows running SPyQL queries on top of text data (e.g. CSV, JSON). Data can come from files but also from data streams, such as as Kafka, or from databases such as PostgreSQL. Basically, data can come from any command that outputs text :-). More, data can be generated by a Python expression! And since SPyQL also writes to different formats, it allows to easily convert between data formats.
Take a look at the Command line examples to see how to query parquet, process API calls, transverse directories of zipped JSONs, convert CSV to JSON, and import JSON/CSV data into SQL databases, among many other things.
See also:
SPyQL is also available as a Python module. In addition to the CLI features, you can also:
- query variables (e.g. lists of dicts);
- get results into in-memory data structures.
We aim for SPyQL to be:
- Simple: simple to use with a straightforward implementation;
- Familiar: you should feel at home if you are acquainted with SQL and Python;
- Light: small memory footprint that allows you to process large data that fit into your machine;
- Useful: it should make your life easier, filling a gap in the eco-system.
- Row order guarantee
- Natural window for aggregations
- No distinction between aggregate and window functions
- IMPORT clause
- Natural support for lists, sets, dictionaries, objects, etc
- 1-liner by design
- Multiple data formats supported
"I'm very impressed - this is some very neat pragmatic software design."
Simon Willison, Creator of Datasette, co-creator of Django
"I love this tool! I use it every day"...
Alin Panaitiu, Creator of Lunar
"Brilliant tool, thanks a lot for creating it and for the example here!"
Greg Sadetsky, Co-founder and CTO at Decibel Ads
The official documentation of SPyQL can be found at: https://spyql.readthedocs.io/.
The easiest way to install SPyQL is from pip:
To test your installation run in the terminal:
Output:
Message
-----------
Hello world
You can try replacing the output format by JSON or CSV, and adding more columns. e.g. run in the terminal:
Output:
You can run the following example queries in the terminal: spyql "the_query" < a_data_file
Example data files are not provided on most cases.
or
Sample input:
Output:
id, name, price
1028, tomatoes, 1.5
1028, bananas, 2.0
1029, peaches, 3.12
or
Here we import hashlib
to calculate a md5 hash for each input line. Before running this example you need to install the hashlib
package (pip install hashlib
).
Totals by player, alphabetically ordered.
Calculating the cumulative sum of a variable using the PARTIALS
modifier. Also demoing the lag aggregator.
Sample input:
Output:
{"new_entries" : 10, "cum_new_entries" : 10, "prev_entries": null}
{"new_entries" : 5, "cum_new_entries" : 15, "prev_entries": 10}
{"new_entries" : 25, "cum_new_entries" : 40, "prev_entries": 5}
{"new_entries" : null, "cum_new_entries" : 40, "prev_entries": 25}
{"new_entries" : null, "cum_new_entries" : 40, "prev_entries": null}
{"new_entries" : 100, "cum_new_entries" : 140, "prev_entries": null}
If PARTIALS
was omitted the result would be equivalent to the last output row.
To run the following examples, type Ctrl-x Ctrl-e
on you terminal. This will open your default editor (emacs/vim). Paste the code of one of the examples, save and exit.
Here, find
transverses a directory and executes parquet-tools
for each parquet file, dumping each file to json format. jq -c
makes sure that the output has 1 json per line before handing over to spyql. This is far from being an efficient way to query parquet files, but it might be a handy option if you need to do a quick inspection.
yq converts yaml, xml and toml files to json, allowing to easily query any of these with spyql.
Read data from a kafka topic and write to postgres table name customer
.
Read data from a kafka topic, continuously calculating statistics.
A special file format (spy) is used to efficiently pipe data between queries.
It is possible to make simple (LEFT) JOIN operations based on dictionary lookups.
Given `numbers.json`:
Query:
Output:
If you want a INNER JOIN instead of a LEFT JOIN, you can add a criteria to the where clause, e.g.:
Output:
Plotting with matplotcli
This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage
project template.