pytablereader
pytablereader is a Python library to load structured table data from files/strings/URL with various data format: CSV / Excel / Google-Sheets / HTML / JSON / LDJSON / LTSV / Markdown / SQLite / TSV.
- Extract structured tabular data from various data format:
- CSV / Tab separated values (TSV) / Space separated values (SSV)
- Microsoft Excel TM file
- Google Sheets
- HTML (
table
tags) - JSON
- Labeled Tab-separated Values (LTSV)
- Line-delimited JSON(LDJSON) / NDJSON / JSON Lines
- Markdown
- MediaWiki
- SQLite database file
- Supported data sources are:
- Files on a local file system
- Accessible URLs
str
instances
- Loaded table data can be used as:
- pandas.DataFrame instance
dict
instance
- Sample Code
import pytablereader as ptr import pytablewriter as ptw # prepare data --- file_path = "sample_data.csv" csv_text = "\n".join([ '"attr_a","attr_b","attr_c"', '1,4,"a"', '2,2.1,"bb"', '3,120.9,"ccc"', ]) with open(file_path, "w") as f: f.write(csv_text) # load from a csv file --- loader = ptr.CsvTableFileLoader(file_path) for table_data in loader.load(): print("\n".join([ "load from file", "==============", "{:s}".format(ptw.dumps_tabledata(table_data)), ])) # load from a csv text --- loader = ptr.CsvTableTextLoader(csv_text) for table_data in loader.load(): print("\n".join([ "load from text", "==============", "{:s}".format(ptw.dumps_tabledata(table_data)), ]))
- Output
load from file ============== .. table:: sample_data ====== ====== ====== attr_a attr_b attr_c ====== ====== ====== 1 4.0 a 2 2.1 bb 3 120.9 ccc ====== ====== ====== load from text ============== .. table:: csv2 ====== ====== ====== attr_a attr_b attr_c ====== ====== ====== 1 4.0 a 2 2.1 bb 3 120.9 ccc ====== ====== ======
- Sample Code
- Output
a b
0 1 2 1 3.3 4.4
More examples are available at https://pytablereader.rtfd.io/en/latest/pages/examples/index.html
pip install pytablereader
Some of the formats require additional dependency packages, you can install the dependency packages as follows:
- Excel
pip install pytablereader[excel]
- Google Sheets
pip install pytablereader[gs]
- Markdown
pip install pytablereader[md]
- Mediawiki
pip install pytablereader[mediawiki]
- SQLite
pip install pytablereader[sqlite]
- Load from URLs
pip install pytablereader[url]
- All of the extra dependencies
pip install pytablereader[all]
sudo add-apt-repository ppa:thombashi/ppa
sudo apt update
sudo apt install python3-pytablereader
logging
extras- loguru: Used for logging if the package installed
excel
extras
md
extras
mediawiki
extras
sqlite
extras
url
extras
- pandas
- required to get table data as a pandas data frame
- lxml
libxml2
(faster HTML conversion)- pandoc (required when loading MediaWiki file)
https://pytablereader.rtfd.io/
- pytablewriter
- Tabular data loaded by
pytablereader
can be written another tabular data format withpytablewriter
.
- Tabular data loaded by