PINQ to SQLAlchemy

from macropy.experimental.pinq import macros, sql, query, generate_schema
from sqlalchemy import *

# prepare database
engine = create_engine("sqlite://")
for line in open("macropy/experimental/test/world.sql").read().split(";"):
    engine.execute(line.strip())

db = generate_schema(engine)

# Countries in Europe with a GNP per Capita greater than the UK
results = query[(
    x.name for x in db.country
    if x.gnp / x.population > (
        y.gnp / y.population for y in db.country
        if y.name == 'United Kingdom'
    ).as_scalar()
    if (x.continent == 'Europe')
)]
for line in results: print(line)
# (u'Austria',)
# (u'Belgium',)
# (u'Switzerland',)
# (u'Germany',)
# (u'Denmark',)
# (u'Finland',)
# (u'France',)
# (u'Iceland',)
# (u'Liechtenstein',)
# (u'Luxembourg',)
# (u'Netherlands',)
# (u'Norway',)
# (u'Sweden',)

PINQ (Python INtegrated Query) to SQLAlchemy is inspired by C#’s LINQ to SQL. In short, code used to manipulate lists is lifted into an AST which is then cross-compiled into a snippet of SQL. In this case, it is the query macro which does this lifting and cross-compilation. Instead of performing the manipulation locally on some data structure, the compiled query is sent to a remote database to be performed there.

This allows you to write queries to a database in the same way you would write queries on in-memory lists, which is really very nice. The translation is a relatively thin layer of over the SQLAlchemy Query Language, which does the heavy lifting of converting the query into a raw SQL string:. If we start with a simple query:

# Countries with a land area greater than 10 million square kilometers
print(query[((x.name, x.surface_area) for x in db.country if x.surface_area > 10000000)])
# [(u'Antarctica', Decimal('13120000.0000000000')), (u'Russian Federation', Decimal('17075400.0000000000'))]

This is to the equivalent SQLAlchemy query:

print(engine.execute(select([country.c.name, country.c.surface_area]).where(country.c.surface_area > 10000000)).fetchall())

To verify that PINQ is actually cross-compiling the python to SQL, and not simply requesting everything and performing the manipulation locally, we can use the sql macro to perform the lifting of the query without executing it:

query_string = sql[((x.name, x.surface_area) for x in db.country if x.surface_area > 10000000)]
print(type(query_string))
# <class 'sqlalchemy.sql.expression.Select'>
print(query_string)
# SELECT country_1.name, country_1.surface_area
# FROM country AS country_1
# WHERE country_1.surface_area > ?

As we can see, PINQ converts the python list-comprehension into a SQLAlchemy Select, which when stringified becomes a valid SQL string. The ? are there because SQLAlchemy uses parametrized queries, and doesn’t interpolate values into the query itself.

Consider a less trivial example: we want to find all countries in europe who have a GNP per Capita greater than the United Kingdom. This is the SQLAlchemy code to do so:

query = select([db.country.c.name]).where(
    db.country.c.gnp / db.country.c.population > select(
        [(db.country.c.gnp / db.country.c.population)]
    ).where(
            db.country.c.name == 'United Kingdom'
    ).as_scalar()
).where(
    db.country.c.continent == 'Europe'
)

The SQLAlchemy query looks pretty odd, for somebody who knows python but isn’t familiar with the library. This is because SQLAlchemy cannot “lift” Python code into an AST to manipulate, and instead have to construct the AST manually using python objects. Although it works pretty well, the syntax and semantics of the queries is completely different from python.

Already we are bumping into edge cases: the db.country in the nested query is referred to the same way as the db.country in the outer query, although they are clearly different! One may wonder, what if, in the inner query, we wish to refer to the outer query’s values? Naturally, there will be solutions to all of these requirements. In the end, SQLAlchemy ends up effectively creating its own mini programming language, with its own concept of scoping, name binding, etc., basically duplicating what Python already has but with messier syntax and subtly different semantics.

In the equivalent PINQ code, the scoping of which db.country you are referring to is much more explicit, and in general the semantics are identical to a typical python comprehension:

query = sql[(
    x.name for x in db.country
    if x.gnp / x.population > (
        y.gnp / y.population for y in db.country
        if y.name == 'United Kingdom'
    ).as_scalar()
    if (x.continent == 'Europe')
)]

As we can see, rather than mysteriously referring to the db.country all over the place, we clearly bind it in two places: once to the variable x in the outer query, once to the variable y in the inner query. Overall, we make use of Python’s syntax and semantics (scoping, names, etc.) rather than having to re-invent our own, which is a big win for anybody who already understands Python.

Executing either of these will give us the same answer:

print(query)
# SELECT country_1.name
# FROM country AS country_1
# WHERE country_1.gnp / country_1.population > (SELECT country_2.gnp / country_2.population AS anon_1
# FROM country AS country_2
# WHERE country_2.name = ?) AND country_1.continent = ?

results = engine.execute(query).fetchall()

for line in results: print(line)
# (u'Austria',)
# (u'Belgium',)
# (u'Switzerland',)
# (u'Germany',)
# (u'Denmark',)
# (u'Finland',)
# (u'France',)
# (u'Iceland',)
# (u'Liechtenstein',)
# (u'Luxembourg',)
# (u'Netherlands',)
# (u'Norway',)
# (u'Sweden',)

Although PINQ does not support the vast capabilities of the SQL language, it supports a useful subset, like JOINs:

# The number of cities in all of Asia
query = sql[(
    func.count(t.name)
    for c in db.country
    for t in db.city
    if t.country_code == c.code
    if c.continent == 'Asia'
)]
print(query)
# SELECT count(city_1.name) AS count_1
# FROM city AS city_1, country AS country_1
# WHERE city_1.country_code = country_1.code AND country_1.continent = ?

result = engine.execute(query).fetchall()

print(result)
[(1766,)]

As well as ORDER BY, with LIMIT and OFFSET:

# The top 10 largest countries in the world by population
query = sql[
    (c.name for c in db.country)
    .order_by(c.population.desc())
    .limit(10)
]

print(query)
# SELECT country_1.name
# FROM country AS country_1
# ORDER BY country_1.population DESC
# LIMIT ? OFFSET ?

res = engine.execute(query).fetchall()
for line in res:
    print(line)
# (u'China',)
# (u'India',)
# (u'United States',)
# (u'Indonesia',)
# (u'Brazil',)
# (u'Pakistan',)
# (u'Russian Federation',)
# (u'Bangladesh',)
# (u'Japan',)
# (u'Nigeria',)

You can emulate a select like SELECT * FROM country by using:

.. code:: python
query = sql[(row for row in db.country)]

i.e. if you don’t specify any column name, the comprehension will become equivalent to selecting all the columns.

In general, apart from the translation of generator expressions (and their guards) into SELECT an WHERE clauses, the rest of the functionality of SQL (like the .order_by(), .limit(), etc. functions shown above) is accessed as in the SQLAlchemy Expression Language. See the unit tests for a fuller set of examples of what PINQ can do, or browse the SQLAlchemy docs mentioned earlier.

PINQ demonstrates how easy it is to use macros to lift python snippets into an AST and cross-compile it into another language, and how nice the syntax and semantics can be for these embedded DSLs. PINQ’s entire implementation comprises about 100 lines of code, which really isn’t much considering how much it does for you!