Warning: WP_Syntax::substituteToken(): Argument #1 ($match) must be passed by reference, value given in /home/djones/public_html/blog/wp-content/plugins/wp-syntax/wp-syntax.php on line 383

Warning: WP_Syntax::substituteToken(): Argument #1 ($match) must be passed by reference, value given in /home/djones/public_html/blog/wp-content/plugins/wp-syntax/wp-syntax.php on line 383

Warning: WP_Syntax::substituteToken(): Argument #1 ($match) must be passed by reference, value given in /home/djones/public_html/blog/wp-content/plugins/wp-syntax/wp-syntax.php on line 383

The last post documented early explorations of Jupyter notebooks ending with a simple query of a Moodle database. This post takes the first baby steps toward some sort of indicators platform using Jupyter notebooks, Python and github. The focus here is to find some form of ORM or other form of database independent layer.

Problem: the code from the last post was specific to Postgresql. If you’re Moodle database is based on another database that code won’t work. The aim here is to enable some level of sharing of code/analysis/indicators. This means needed a way to keep the code independent of database specifics. This is where object relational mappers (ORMS) enter the picture. See this for an argument why this is a good idea.

Pandas and SQLAlchemy

This book offers some examples using sqlalchemy with pandas. A likely combination. This sqlalchemy cheatsheet offers some useful examples.

Need to install sqlalchemy, it appears. Actually just updated

conda install -c anaconda sqlalchemy

Oh dear, time wasted. Needed to restart the notebook server after that.

Process is

  1. Modify the config stuff to create an sqlalchemy engine.
  2. Read the data

Ends up with the following code

import json
 
# Put the config outside the Indicators directory, keep it out of git
with open('../config.json') as f:
    conf = json.load(f)
 
from sqlalchemy.engine.url import URL 
from sqlalchemy import create_engine
 
engine = create_engine(URL(**conf))
 
df = pd.read_sql('select id,username,firstname,lastname ' +
                 'from moodle.mdl_user where id<100 ',engine)
df

The config.json file looks something like the following. The current plan is that this sits above this directory, as this directory and its contents will eventually end up in github

{
  "drivername": "postgresql",
  "database": "someDbaseName",
  "username": "user", 
  "host": "localhost",
  "port": "5432",
  "password": "myPassword"
}

What’s next?

That works and seems a reasonable. Some ideas for the next step

  • Figure out how to remove/handle the moodle schema that’s in the SQL above, not to mention the mdl_ prefix on the table.

    Linked to allowing the code to be run across different institution’s easily.

  • Move the config code into a library for this work?
  • Set up the github repository, get this code shared and start working with @beerc on this.
  • Experiment with how the assumptions built into the Perl code I was using can be translated appropriately into this environment.

    How to develop the class hierarchy (or if) using sqlalchemy.

    How the perl idioms translate into python, sqlalchemy and pandas. Pandas has some nice features that might eliminate some practices.