Free and Open Source SEO Software. Automate Google Sheets without plug-ins. Build Google Analytics and Search Console Dashboards. Learn Python in Jupyter Notebook. Move your scripts to generic Linux schedulers.

miklevin miklevin Last update: Aug 11, 2022

Pipulate

Free & Open Source SEO Software

This file will become your README and also the index of your documentation.

Install

pip install -U pipulate

Fill me in please! Don't forget code examples:

How to use

It's Time To Learn a Little Python!

Pipulate is designed to make playing with data in Jupyter Notebook easy, andset the stage for you to move your code over to cheap generic Linux servers forautomation. The easiest way to install Pipulate from Jupyter is to executethese commands from within a Jupyter Terminal:

pip install -U pipulate

If you're not using Anaconda, then you will have to install Pandas first. Butyou should really be using Anaconda. The idea is that you experiment in JupyterNotebook, which is a great learning environment and move your scripts as theymature over to a more automated generic Linux server.

The first time you import Pipulate, you will receive a Google OAuth prompt toallow access to the Google Spreadsheet you want to edit. Click the big linkand paste the resulting token back into the field shown in Jupyter Notebook oryour command-line. Once this is done, the basic use is:

import pipulatepipulate.sheet('119mnC8Day78KexU_yv7J_wfA3p7iZeXa0YEtmg1Igu4')  # replace with yourscl, df = pipulate.pull(tab=0, rows='A1:J10')df.loc[:,:] = 'foo'pipulate.push(0, cl, df)

This loads the rectangular region you defined with the rows and columns intomemory in a way where you can treat it a lot like a tab in Microsoft Excel ortable in SQL. You can manipulate the pandas "DataFrame" (abbreviated as df),and then push the changes back out to the Google Sheet. This is veryopen-ended, designed to make automation of traditional tedious tasks in SEOmuch simpler. Once you're happy with your script, you can copy/paste it into a.py file and schedule it with a standard Linux scheduler, which is another partof this project I'll be expanding considerably (generic task-scheduling under:set number!

There is some flexibility in the API and ability to assign column labels fromrow 1, allowing you to adapt to your style. Here are some variations:

cl, df = pipulate.pull('Sheet1', rows='A1:J5')                # Most common usagecl, df = pipulate.pull('Sheet1', rows='A1:J5', columns=True)  # Turn row 1 into column labelscl, df = pipulate.pull(tab=0, rows='A1:J5')                   # Address tab by indexcl, df = pipulate.pull(0, rows='A1:J5')                       # Address tab by index, no labelcl, df = pipulate.pull(0, rows='A1:J')                        # Automatically figure out number of rowscl, df = pipulate.pull(0, rows=(1,5), cols=('A','J'))         # Use the rows=, cols= interfacecl, df = pipulate.pull(0, (1,5), ('A','J'))                   # Use row, col interface, no labelscl, df = pipulate.pull(0, (1,5), (1,10))                      # Use row, col with numeric column indexcl, df = pipulate.pull(2, rows='A1:J5')                       # Work on the 3rd tab.cl, df = pipulate.pull(wksht, rows='A1:J5')                   # Use GSpread Worksheet object

Using argument labels is optional. The exact string-name, 0-based numericalindex or a GSpread Worksheet object-type must be in the first position. The 2ndposition is the "rows" value, which may either be an Excel-like range or arow-range. If a row-range then you must also have a col-range in position 3 (orlabel the argument "cols"). The details of how you do it will vary with yourproject. It is often useful to name your tabs so you can do differentmanipulations to different tabs without worrying about changing their order inthe spreadsheet Web user interface.

If you don't give any column labels, Pipulate will assign them automaticallyusing the Excel-like letter-labels for columns. If you want to name yourcolumn, you can either use row 1 by setting your columns parameter to true oryou can provide your own list of column names:

cols = ['one', 'two', 'three', 'four', 'five']cl, df = pipulate.pull(tab=0, rows='A1:J5', columns=cols)  # Set columns labels from list

After you make your cl (GSpread cell_list) and df (pandas DataFrame) selection,you can modify your df and push it back into Google Sheets with the symmetricalpopulate command:

pipulate.push(0, cl, df)  # Works in most cases

Real Life Example

Say you have a spreadsheet with a list of URLs that were provided to you fromsome unreliable source in column A starting from A1. You don't know whether itshould have a www at the beginning or if the protocol is correct::

amazon.comapple.comgoogle.com

And now let's say you write a simple function that will test what the realhomepage URL is. This example uses the popular "Requests" package for makinghttp requests::

import requestsdef cleanurl(provided):    guess = provided    if provided[:4].lower() != 'http':        guess = 'https://%s' % provided    try:        r = requests.head(guess, allow_redirects=True)        resolved = r.url    except:        resolved = "Can't find %s" % provided    return resolved

To use Pipulate and Pandas to apply this function to each line of column A, youwould first select columns A & B into a Pandas DataFrame::

import pipulatepipulate.sheet('119mnC8Day78KexU_yv7J_wfA3p7iZeXa0YEtmg1Igu4')cl, df = pipulate.pull(tab=0, rows='A1:B3')

Then you apply the function to each cell in column A of your DataFrame and putthe results in column B and push the results back up into Google Sheets::

df['B'] = df['A'].apply(cleanurl)pipulate.push(0, cl, df)

And that's it! Column B will now contain::

https://www.apple.com/https://www.google.com/https://www.amazon.com/

Imagine the utility! I addition to working magic in Google Sheets, you'reacquiring incredibly useful Python Pandas skills!

Parametrizing Your Arguments for Scheduling

The time has come the Walrus said to talk about automation. After you get yourscript working in Jupyter Notebook what are you going to do? Unless you want tosit there and press a button every day like in Lost, then you're going to haveto put the code in some sort of scheduling system. When you do this, it is bestto do it in a parameterized fashion, meaning you can feed the one scriptdifferent sites, Google Analytics IDs, keywords, filters and whatnot; allowingthe same code to drive different dashboards.

Generic Linux script scheduling is too much to cover here, but I may as wellshow you how to develop with parameterized arguments in Jupyter Notebook thenhave those same arguments able to be used on the command-line when invoked fromscheduling. So if you have filename.ipynb (A Jupyter Notebook) and youcopy/paste its content into filename.py, then you can change its internalvariable values by the way you run it form the Unix shell, thereby using it todrive many different Google Sheet dashboards. You're welcome::

(py36) MikeL@LunderVand:$ filename.py --kung "foo"kung: foo

You can alternatively use:

(py36) MikeL@LunderVand:$ filename.py -k "foo"kung: foo

To support more arguments, just make more of the parser.add_argument() functioncalls and access them through the args dictionary. Here's the code infilename.py in the above commands. When this is run from Jupyter Notebook, the"foo" value is set in the "if jn:" if-block. It's fairly likely you'll havecode like this at the top of any Pipulate script that starts out in JupyterNotebook and is destined for scheduling. Clear? Enjoy!::

name = 'kung'jn = Truetry:    get_ipython()except NameError:    jn = Falseif jn:    val = 'foo'else:    import argparse    parser = argparse.ArgumentParser(description='Parses args when run from console.')    parser.add_argument('-k','--kung', default='foo', help='', required=False)    args = vars(parser.parse_args())    val = args[name]print('%s: %s' % (name, val))

Fear Is The Mind Killer

Why do I call Google Sheets automation components "SEO software"? It's becausethis is the main missing ingredient in building all those custom reportsnecessary to surface new actionable data every day. It's also your easy routeinto data manipulation with Python/Pandas and the repurposing of your aging SEOcareer into something simultaneously more timely and timeless. SEO is dead.Long live SEO! Sound familiar? Well, the other shoe is finally dropping withall the Machine Learning tricks ol' Uncle Google is learning, and if you're adinosaur then y'all better learn to fly. If you can do Excel macros, then youcan survive, evolve and thrive.

Learning Linux, Python, vim and git (LPvg) will set you on a good course for atleast the next 10-years. If you're still scared of the Unix shell, that oldtext-based command-line interface which is both the past and future of tech,then stick your hand into the Gom Jabbar pain-box of Linux and cut your teethon my other repo here on Github Levinux <https://github.com/miklevin/levinux>_.Now repeat after me: I must not fear. Fear is the mind-killer. Fear is thelittle-death that brings total obliteration. I will face my fear. I will permitit to pass over me and through me.

Tags:

Subscribe to our newsletter