Showing posts with label sqlobject. Show all posts

Friday, May 15, 2009

The SQLObject attribute collection functionality

One of the many available Python database object-relational mappers, SQLObject has some interesting meta class functionality. There are many ways to declare a database table using an object-relational mapper such as SQLObject. However, since dealing with declarative meta data in the database, it would be a logical step to treat the Python declarations as meta data as well. A developer using SQLObject to declare a database table will define a class that extends the SQLObject class. Then, the fields of the database table will correspond to the class attributes of the newly derived SQLObject class. These can't just be any primitive Python type attributes. They have to be of a specific SQLObject type. When dealing with meta attributes such as these, much examination of these classes is required. SQLObject has functionality which will collect collect meta attributes from classes.

In the context of SQLObject, the rationale behind such a requirement as class attribute collection is simple. The SQLObject base classes are not always going to know which parameters are used with the class. When instantiating SQLObject instances, the keyword parameters correlate to the fields declared for that instances' database table. In several scenarios, SQLObject needs to inspect a given class and return a list of attributes that were passed to the constructor. This list of parameters at this point are unknown to the class, although, the SQLObject attribute collection functionality assumes that this each attribute in this list is a class attribute of the class being inspected. Since much of the behavior invoked on SQLObject instances is transformed, and called by meta classes, the attributes may not necessarily be needed by the instance. The attribute collection functionality in SQLObject has an opportunity to remove these attributes from the instance and perform any other transformations it sees fit. It also accepts parameters to help it determine what exactly it is modifying.

The _collectAttributes() function is where this functionality is implemented. The parameters that can be passed to this function are as follows.

cls - This is the actual class that is being inspected. It is assumed that all attributes being collected are a part of this class.
new_attrs - This parameter is the dictionary of keyword parameters that are passed to the constructor of SQLObject instances. The values in this dictionary are SQLObject column types.
look_for_class - The column type to look for. Only attributes of this type will be returned.
delete - If true, all attributes will be deleted from the class.
set_name - The actual attribute name will change to the key of the dictionary item corresponding to the attribute.
sort - If true, a sorted list will be returned. The sorting is based on field creation order.

Illustrated below is an over-simplified conceptualization of how the _collectAttributes() function is executed.

Friday, March 13, 2009

How Pylons connects to the ORM

The Pylons Python web application framework manages database connections for any given application written in the framework. It does so by using a PackageHub class for SQLObject. This is actually similar to how TurboGears manages database connections. The database functionality for the Pylons web framework is defined in the database.py module. This are slightly different for SQLAlchemy support in Pylons. In order to provide SQLAlchemy support, Pylons will attempt to define several functions SQLAlchemy requires to connect to the database.

Something I find a little strange is the way the support for both SQLAlchemy and SQLObject is handled by Pylons. Pylons will attempt to import SQLAlchemy and handle the import error. However, Pylons will always attempt to import SQLObject and will not handle an import failure if the library isn't installed on the system. For instance, the following is a high level view of how the database ORM libraries are imported in Pylons.

There is a slight asymmetry here. At the very least, I think SQLObject errors should be handled as well. But what would happen in the event that there are no supported ORM libraries available for import? That would obviously leave us with a non-functional web application. A nice feature to have, and this really isn't Pylons-specific, is the ability to specify in a configuration sort of way, which ORM library is preferred. The database.py module could then base the imports on this preference. For instance, illustrated below is how the ORM importing mechanism in Pylons might work if the ORM preference could be specified.

Here, the flow is quite simple. We load the configuration data, check which ORM was specified and attempt to import it. On import failure, we complain about an ORN not being available. Of course, we will most likely want a default ORM preference if one is not provided. I think that would provide a much cleaner design than basing the ORM preference on what can be imported. There is certain enhancement functionality in which we can base the availability on the fact that the library can be imported. Such as a plugin for a system. But, these are only enhancements. We can't really make these assumptions about a core component like a database connection manager.

The SQLAlchamy connection functionality in Pylons has no notion of a connection hub. There is absolutely no problem with this. The functions are what is needed to establish a connection to SQLAlchemy and they work. For illustration purposes, lets make a new pretend class that doesn't exist called SQLAlchemyHub that groups all SQLAlchemy-related functions together. The following is what we would end up with when visualizing the database.py module.

It is important to remember that the SQLALchemyHub class isn't real. It is there to help us visualize the SQLAlchemy abstraction within the context of the module.

Thursday, March 12, 2009

ECP and SQLObject

It seems that everyday we have a new reason to move away from using SQLObject as an object-relational mapper in ECP. The latest issue with SQLObject has been rather challenging to work-around. The problem comes from the ErrorMessage class defined in SQLObject. Here is what the class looks like.

#SQLObject ErrorMessage class.

class ErrorMessage(str):
  def __new__(cls, e, append_msg=''):
      obj = str.__new__(cls, e[1] + append_msg)
      obj.code = int(e[0])
      obj.module = e.__module__
      obj.exception = e.__class__.__name__
      return obj

The problem we are experiencing with ECP is the fact that this class always raises an IndexError. The reason being, the ErrorMessage.__new__() method makes the assumption that the 0 and 1 indices will always be available in the e parameter. The e parameter is supposed to be an instance of Exception.

The question that now arises is how do we handle this? In this case, we have exceptions being raised by other exceptions. The ErrorMessage class could simply be fixed by adding exception handling for IndexError exceptions. However, now that the error is fixed, how do we ship this fix along with our application? ECP will currently install SQLObject from pypi. One solution would be to build our own SQLObject package and point the ECP setup to a custom repository that contains this patched-version. One problem I find with this solution is that it could potentially introduce a myriad of other deployment problems.

Another solution is to perform the patch inside of ECP. In this scenario, we don't actually patch the SQLObject package. The SQLObject package would remain as is on the system so that other Python applications using SQLObject wouldn't experience any side-effects as a result of ECP providing a different SQLObject. And this is the approach we are taking. Once ECP has started up, we import the mysqlconnection module and replace ErrorMessage entirely. Here is how it is done.

#ECP approach to patching SQLObject.

from sqlobject.mysql import mysqlconnection

class NewErrorMessage(str):
   def __new__(cls, e):
       if not isinstance(e, Exception):
           e = e.args[0]
       else:
           try:
               dummy = e[1]
           except IndexError:
               e = e.args[0]
                                      
       obj = str.__new__(cls, e[1])
       obj.code = int(e[0])
       obj.module = e.__module__
       obj.exception = e.__class__.__name__
       return obj
 
mysqlconnection.ErrorMessage = NewErrorMessage

What is shown here is a new implementation of the ErrorMessage class; NewErrorMessage. The interface of the original class is kept in tact. What has changed is the exception handling inside the exception. We first test if the e parameter is in fact an Exception instance. Next, we test for IndexError exceptions and rebuild the e parameter if necessary. The method then continues on as in the original implementation. Finally, we then replace the ErrorMessage class with NewErrorMessage. This all happens in enomalism2d so that the new error message class is available right away, before it is actually needed.

As an afterthought, I'm wondering what led SQLObject to this issue to begin with. That is, how can a class so tightly associated with exception handling be the culprit for bigger problems such as this one? Is it that this class is taking on too many responsibilities and thus adding to the risk of raising unforeseen exceptions itself? I wouldn't think so. The ErrorMessage.__new__() method isn't exactly overwhelmed with code. Besides, the exceptions defined in ECP do a fair amount of work when instantiated (including interacting with SQLObject). When the ECP exceptions are raised, they never raise inadvertent exceptions.

Perhaps special care needs to be taken when defining exceptions that do any work. If nothing else, SQLObject provides us with a lesson learned. As developers, we need to be absolutely certain that any given exception we have defined ourselves can be raised under any circumstances. They cannot fail. It would obviously be nice of no code failed at all throughout an entire application. That is obviously not a reality though. The code will fail at some point and having stable exception to deal with will make your code one step closer to being fail safe.

Monday, January 26, 2009

ORM strengths and shortcomings

Object-Relational Mapper technology is used in object-oriented languages to try to reduce the amount of SQL contained inside application logic. ORM libraries exist for several dynamically-typed languages. Two popular Python ORM libraries are SQLObject and SQLAlchemy. The basic idea behind an ORM is that persistent objects within an application are mapped to a database table. The table schema is derived from the class declaration of the object to be stored.

For instance, here is an example of a BlogEntry class using SQLObject

#SQLObject declaration example.

from sqlobject import *

class BlogEntry(SQLObject):
 """An abstraction representing a BlogEntry."""
 class sqlmeta:
     table ='blog_entry'
     idName ='id'

 uuid = StringCol(length=36,\
                 unique=True,\
                 alternateID=True,\
                 alternateMethodName='by_uuid',\
                 default=gen_uuid)
 title=StringCol(length=80,default="Title Placeholder")
 body=StringCol(default="")
 user=ForeignKey('User', default=None)

Here, we have a blog entry abstraction. The BlogEntry class defines a meta class called sqlmeta. This meta class is used to specify table-specific information used by the database when the table is created. For instance, the underlying table in the example will be called blog_entry and will use the id column is the primary key. We have also defined several columns for our class. These columns will serve as attributes for any instances of this class. Once an instance of BlogEntry is created, the ORM will automatically create a table row in the database.

I consider this to be a real strength of ORM technology. It drastically simplifies the abstraction storage requirement. There is no need to write SQL CREATE statements. Or INSERT and UPDATE statements for that matter. There is nothing specifically wrong is SQL. SQL is extremely expressive and powerful. The problem arises when combining SQL with application logic in an interleaved manor. This leads to unmaintainable systems.

One approach to decoupling the SQL required for persistent objects from the behavior implemented by the objects in an object-oriented system is define SQL templates. For example, we might have an define_blog_entry.sql template file. This file could then be read by some database module that then executes the SQL. The developer would then write several other templates for UPDATE, INSERT, and various other database activities. ORM libraries do this very well. There is a very transparent layer that manages persistence.

OK, so how about querying? How do we get our objects back from the database? Well, the ORM also does this. From the example, our BlogEntry class inherits a select() class method from the SQLObject class. Using this method we can pass various criteria in order to retrieve BlogEntry instances.

I think this is the key weakness in the ORM. In a large percentage of cases, it serves well. All we want to retrieve are blog entries. What about when we need multiple types of objects? There is really no way to do this. At least not sensibly. In our example, all we can do is BlogEntry.select(). There is no BlogEntryUser.select() method to retrieve BlogEntry instances and User instances in the same query. Multiple types means multiple queries in ORM land.

SQL along with relational databases are indispensable. Especially the SELECT statement. It is by fore the most effective way to retrieve complex data. ORM technology has done a great job exploiting most of the power SQL has to offer. I just don't thing the querying functionality is as flexible as it could be in most cases.

Saturday, January 10, 2009

TurboGears 2.0 and SQLObject support

Since I use the TurboGears Python framework quite often, I was curious to see if SQLObject would still be supported by TurboGears 2.0. Searching around on the web yielded no results so I took look at the TurboGears source.

It does not look like SQLObject will be around in TurboGears 2.0.

Tuesday, July 15, 2008

Flexible SQLObject queries

Although SQLObject is considered an ORM (Object Relational Mapper), it can still be used for traditional SQL queries. The question is, why bother adding this ORM overhead when the queries can be executed directly by a SQL library?

For anything but the most trivial queries, SQL syntax is still the superior method to manipulate data sets. When using a SQL library in Python, without an ORM, the result sets need to be constructed. This boiler plate code can be handled by SQLObject while still having the ability to use more elegant, or extremely complex in some cases, SQL. For example, if you have defined a couple SQLObject classes, blogAccount, and blogEntry, we can do somthing similar to:


result=blogEntry.select("""blog_account.id=blog_entry.blog_account_id
                          AND blog_account.user_id=%s"""%getCurrentUserID(),\
                          clauseTables=['blog_account'])

This result set will contain a list of blogEntry instances. This is useful because no code was needed to construct the result set. This is obviously not the worlds most complex query, but it does illustrate the idea that fancy SQL queries can still be combined with the concept of an ORM.

Subscribe to: Posts ( Atom )