Showing posts with label sqlalchemy. Show all posts

Wednesday, May 20, 2009

SQLAlchemy Bind and Result Processors

With any given object-relational mapper technology, the basic use case is the ability to map abstract data types to tables and instances of these types to rows in these tables. There is also a need to may primitive programming language types to primitive database column types. This is be no means easily achieved. Especially if the object-relational mapper in question supports multiple databases. Any popular object-relational mapper will support more than a single database. If it were the case that only a single database were supported by an ORM, there really wouldn't be a need for the ORM to begin with. In fact, if only a single database would need to be supported, it might be beneficial to not use an ORM because of the overhead that would be removed. This is rarely the case. Having said that, developers can generally assume that support for multiple database technologies is a given. Going back to the problem of dealing with primitive type mapping, how would an ORM accomplish this? One approach might be to implement the specific column types for each supported database. This would mean much duplicated functionality and would not be good object oriented practice. SQLAlchemy takes a different approach to handling the vary disparate column types between different database technologies.

SQLAlchemy defines two abstract type classes; AbstractType and TypeEngine. These two classes not only provide the base type interfaces used in SQLAlchemy, but also provide the default implementations for these methods. When mapping a primitive programming language type to a primitive database column type, there are two directions these types can be mapped. When supplying binding parameters to SQL queries that are initially primitive programming language types, and when retrieving query results from an executed query. In the latter case, the type mapping goes from the primitive database column type to the primitive programming language type.

The AbstractType class defines two methods to accomplish this; bind_processor() and result_processor(). Both methods accept a database dialect parameter so they know which database technology they are dealing with and the idea is to return a callable that performs the necessary transformations before using the data in a query or result set. In other words, bind_processor() returns a function to make the bind parameters used in a query actually usable. The result_processor() method uses the inverse concept. The SQLAlchemy String type is a good example of why these methods are useful. With strings, there are many questions of encoding before they can be used in a database.

One implementation note on the SQLAlchemy implementation of the AbstractType and TypeEngine classes. As illustrated below, the TypeEngine class directly inherits from AbstractType. TypeEngine unnecessarily overrides the bind_processor() and result_processor() methods. Both the signature and implementation of the two methods are the same in both classes. All that changes is the API documentation.

Friday, March 20, 2009

SQL engine dialect in SQLAlchemy

Object relational mapping technology is used in object-oriented systems to provide a layer between the objects in the system, and the underlying relational database system. ORMs help reduce the time investment, and thus the cost as well, during development because developers can still think in an object-oriented way while interacting with the database. Tables in the database are classes in the code. Instances of these classes are rows in the database tables. This is an over-simplification of the problem. It isn't quite so straightforward as to generate standardized SQL statements based on what the instance is doing in the code. Not all database systems use a standard set of SQL. To further complicate matters, the various popular database systems don't even support the same features.

In any given language, there are different dialects of that language. For instance, if two people were engaged in a conversation, one person from the west coast of the country, the other from the east, there is bound to be some ambiguity even though they are speaking the same language. This could arise from a number of factors such as politics, or culturally-related customs. The same effect happens in technology. The language is SQL and while some database systems are very similar in some ways, they can be just different enough to cause an application to require a huge amount of duplicated code just so that the application can support multiple database systems. These are the systems that are supposed to speak the same language. SQLAlchemy addresses this issue by introducing the concept of a dialect. Every database system supported by SQLAlchemy defines a dialect. This isn't just a dialect for the SQL language alone but rather for the entire database system. We can do a lot more than merely execute SQL statements with SQLAlchemy so this is a requirement of the Dialect interface. This interface specifies what each dialect for a specific database must look like. For instance, with a specific dialect, we can specify whether the database accepts unicode SQL statements or not. With SQLAlchemy, there are other classes that do work related to communication with the database such as preparing SQL statements and generating schemas. All these are specified in the Dialect interface.

SQLAlchemy also defines a DefaultDialect class which provides the Dialect interface. This class actually implements some of the methods and attributes specified by the Dialect interface. Some of these methods and attributes are common across all supported database systems. The attributes and methods that are specialized for any given database, are simply overridden by that particular dialect implementation.

When a connection is established in SQLAlchemy, it will construct a new engine instance. This engine will then use the dialect specified in the connection string. For example, mysql would be considered a dialect. Method calls made through the constructed database engine instance are then delegated to the dialect. The dialect will then act accordingly for the database system it is using.

The Dialect interface and the implemented dialect classes in SQLAlchemy serve as a good example of polymorphism in object-oriented programming. There is a single interface that is used to invoke behavior and that behavior varies by type. This is a resilient design because it is loosely coupled and highly replaceable. We can implement our own database dialect and plug in into SQLAlchemy if we are so inclined.

Friday, March 13, 2009

How Pylons connects to the ORM

The Pylons Python web application framework manages database connections for any given application written in the framework. It does so by using a PackageHub class for SQLObject. This is actually similar to how TurboGears manages database connections. The database functionality for the Pylons web framework is defined in the database.py module. This are slightly different for SQLAlchemy support in Pylons. In order to provide SQLAlchemy support, Pylons will attempt to define several functions SQLAlchemy requires to connect to the database.

Something I find a little strange is the way the support for both SQLAlchemy and SQLObject is handled by Pylons. Pylons will attempt to import SQLAlchemy and handle the import error. However, Pylons will always attempt to import SQLObject and will not handle an import failure if the library isn't installed on the system. For instance, the following is a high level view of how the database ORM libraries are imported in Pylons.

There is a slight asymmetry here. At the very least, I think SQLObject errors should be handled as well. But what would happen in the event that there are no supported ORM libraries available for import? That would obviously leave us with a non-functional web application. A nice feature to have, and this really isn't Pylons-specific, is the ability to specify in a configuration sort of way, which ORM library is preferred. The database.py module could then base the imports on this preference. For instance, illustrated below is how the ORM importing mechanism in Pylons might work if the ORM preference could be specified.

Here, the flow is quite simple. We load the configuration data, check which ORM was specified and attempt to import it. On import failure, we complain about an ORN not being available. Of course, we will most likely want a default ORM preference if one is not provided. I think that would provide a much cleaner design than basing the ORM preference on what can be imported. There is certain enhancement functionality in which we can base the availability on the fact that the library can be imported. Such as a plugin for a system. But, these are only enhancements. We can't really make these assumptions about a core component like a database connection manager.

The SQLAlchamy connection functionality in Pylons has no notion of a connection hub. There is absolutely no problem with this. The functions are what is needed to establish a connection to SQLAlchemy and they work. For illustration purposes, lets make a new pretend class that doesn't exist called SQLAlchemyHub that groups all SQLAlchemy-related functions together. The following is what we would end up with when visualizing the database.py module.

It is important to remember that the SQLALchemyHub class isn't real. It is there to help us visualize the SQLAlchemy abstraction within the context of the module.

Monday, January 26, 2009

ORM strengths and shortcomings

Object-Relational Mapper technology is used in object-oriented languages to try to reduce the amount of SQL contained inside application logic. ORM libraries exist for several dynamically-typed languages. Two popular Python ORM libraries are SQLObject and SQLAlchemy. The basic idea behind an ORM is that persistent objects within an application are mapped to a database table. The table schema is derived from the class declaration of the object to be stored.

For instance, here is an example of a BlogEntry class using SQLObject

#SQLObject declaration example.

from sqlobject import *

class BlogEntry(SQLObject):
 """An abstraction representing a BlogEntry."""
 class sqlmeta:
     table ='blog_entry'
     idName ='id'

 uuid = StringCol(length=36,\
                 unique=True,\
                 alternateID=True,\
                 alternateMethodName='by_uuid',\
                 default=gen_uuid)
 title=StringCol(length=80,default="Title Placeholder")
 body=StringCol(default="")
 user=ForeignKey('User', default=None)

Here, we have a blog entry abstraction. The BlogEntry class defines a meta class called sqlmeta. This meta class is used to specify table-specific information used by the database when the table is created. For instance, the underlying table in the example will be called blog_entry and will use the id column is the primary key. We have also defined several columns for our class. These columns will serve as attributes for any instances of this class. Once an instance of BlogEntry is created, the ORM will automatically create a table row in the database.

I consider this to be a real strength of ORM technology. It drastically simplifies the abstraction storage requirement. There is no need to write SQL CREATE statements. Or INSERT and UPDATE statements for that matter. There is nothing specifically wrong is SQL. SQL is extremely expressive and powerful. The problem arises when combining SQL with application logic in an interleaved manor. This leads to unmaintainable systems.

One approach to decoupling the SQL required for persistent objects from the behavior implemented by the objects in an object-oriented system is define SQL templates. For example, we might have an define_blog_entry.sql template file. This file could then be read by some database module that then executes the SQL. The developer would then write several other templates for UPDATE, INSERT, and various other database activities. ORM libraries do this very well. There is a very transparent layer that manages persistence.

OK, so how about querying? How do we get our objects back from the database? Well, the ORM also does this. From the example, our BlogEntry class inherits a select() class method from the SQLObject class. Using this method we can pass various criteria in order to retrieve BlogEntry instances.

I think this is the key weakness in the ORM. In a large percentage of cases, it serves well. All we want to retrieve are blog entries. What about when we need multiple types of objects? There is really no way to do this. At least not sensibly. In our example, all we can do is BlogEntry.select(). There is no BlogEntryUser.select() method to retrieve BlogEntry instances and User instances in the same query. Multiple types means multiple queries in ORM land.

SQL along with relational databases are indispensable. Especially the SELECT statement. It is by fore the most effective way to retrieve complex data. ORM technology has done a great job exploiting most of the power SQL has to offer. I just don't thing the querying functionality is as flexible as it could be in most cases.

Saturday, January 10, 2009

TurboGears 2.0 and SQLObject support

Since I use the TurboGears Python framework quite often, I was curious to see if SQLObject would still be supported by TurboGears 2.0. Searching around on the web yielded no results so I took look at the TurboGears source.

It does not look like SQLObject will be around in TurboGears 2.0.

Subscribe to: Posts ( Atom )