Showing posts with label orm. Show all posts

Tuesday, August 18, 2009

Trac Database Connection Wrapping

Web applications written in Python tend to use some kind of object-relational mapping technology such as SQLObject or SQLAlchemy. Trac, however, has not yet introduced such dependencies nor do they define an object-relational mapper of their own. Trac uses the more traditional database connection and database cursor abstractions. This isn't to say that there aren't any higher-level abstractions defined. There are and they are really designed quite well and are therefore easy to comprehend.

Of course, object-relational mapping technology isn't the be-all and end-all of all web applications. In fact, if you can do without the overhead involved you may be better off as long as you can clearly define a separation of concerns.

For instance, Trac defines a database drivers for MySQL as well as others. This is typically the selling point for object-relational technology. Trac's approach is to define two base classes for the core database abstractions; IterableCursor and ConnectionWrapper. These classes provide the generic functionality shared by all database drivers. A brief idea of the design is illustrated below.

Thursday, May 21, 2009

Persisting Python Objects With pod.

Some Python applications are so lightweight, it is a challenge to justify incorporating a big RDBMS into the ORM logic. Popular Python ORM packages such as SQLObject and SQLAlchemy, support many major relational database management systems. These RDBMS packages are hardly suitable for more modest Python packages. The pod Python package offers a nice alternative to using the more popular Python ORM technologies. At the core of the pod storage engine is SQLite. Although the other popular ORM technologies support SQLite, you also get support for the larger RDBMS packages even if you don't need them. The pod codebase is thus much smaller than any equivalent. Similar ORM concepts apply when using pod to store Python instances. The following is an example of the pod Python library in use.

#Example; Using the pod Python Object Database.

#Import the library.
import pod

#Initialize the database connection.
db=pod.Db(file='myblog.sqlite3', very_chatty=False)

#The blog schema.  Extends the core pod.Object class.
class Blog(pod.Object):
   #Schema fields.
   name=pod.column.String(index=False)
  
   #Constructor.
   def __init__(self, **kw):
       pod.Object.__init__(self, **kw)
      
   #Called before any instances of this class are deleted
   #from the database.
   def pre_delete(self):
       #Ensure that all blog entries associated with this
       #blog are also deleted from the database.
       print "Destroying blog entries."
       for entry in BlogEntry.owner==self:
           entry.delete()

#The blog entry schema.  Extends the core pod.Object class.
class BlogEntry(pod.Object):
   #Schema fields.
   owner=pod.column.Object(index=True)
   title=pod.column.String(index=False)
   body=pod.column.String(index=False)
  
   #Constructor.
   def __init__(self, **kw):
       pod.Object.__init__(self, **kw)

#Instantiate the main blog instance.
def make_main_blog():
   print "Creating main blog."
   blog=Blog(name="main")
   db.commit()  
   return blog

#Instantiate two blog entry instances.
def make_entries():
   print "Creating blog entries."
   blog_obj=get_main_blog()
   BlogEntry(owner=blog_obj, title="Title1", body="Body1")
   BlogEntry(owner=blog_obj, title="Title2", body="Body2")
   db.commit()
  
#Retrieve the main blog instance, creating it if not found.
def get_main_blog():
   for blog in Blog.name=="main":
       return blog
   return make_main_blog()

#Retrieve the blog entries associated with the main blog.
def get_entries():
   entries=[]
   for entry in BlogEntry.owner==get_main_blog():
       entries.append(entry)
   return entries

#Delete the main blog from the database.
def cleanup():
   print "Destroying main blog."
   blog_obj=get_main_blog()
   blog_obj.delete()
   db.commit()

#Main program.
if __name__=="__main__":
   #Build the main blog instance.
   blog_obj=get_main_blog()
   #Build the blog entry instances.
   make_entries()
   #Retrieve and display the blog entries.
   for entry in get_entries():
       print "Entry Title:",entry.title
   #Destroy the blog.
   cleanup()

Wednesday, May 20, 2009

SQLAlchemy Bind and Result Processors

With any given object-relational mapper technology, the basic use case is the ability to map abstract data types to tables and instances of these types to rows in these tables. There is also a need to may primitive programming language types to primitive database column types. This is be no means easily achieved. Especially if the object-relational mapper in question supports multiple databases. Any popular object-relational mapper will support more than a single database. If it were the case that only a single database were supported by an ORM, there really wouldn't be a need for the ORM to begin with. In fact, if only a single database would need to be supported, it might be beneficial to not use an ORM because of the overhead that would be removed. This is rarely the case. Having said that, developers can generally assume that support for multiple database technologies is a given. Going back to the problem of dealing with primitive type mapping, how would an ORM accomplish this? One approach might be to implement the specific column types for each supported database. This would mean much duplicated functionality and would not be good object oriented practice. SQLAlchemy takes a different approach to handling the vary disparate column types between different database technologies.

SQLAlchemy defines two abstract type classes; AbstractType and TypeEngine. These two classes not only provide the base type interfaces used in SQLAlchemy, but also provide the default implementations for these methods. When mapping a primitive programming language type to a primitive database column type, there are two directions these types can be mapped. When supplying binding parameters to SQL queries that are initially primitive programming language types, and when retrieving query results from an executed query. In the latter case, the type mapping goes from the primitive database column type to the primitive programming language type.

The AbstractType class defines two methods to accomplish this; bind_processor() and result_processor(). Both methods accept a database dialect parameter so they know which database technology they are dealing with and the idea is to return a callable that performs the necessary transformations before using the data in a query or result set. In other words, bind_processor() returns a function to make the bind parameters used in a query actually usable. The result_processor() method uses the inverse concept. The SQLAlchemy String type is a good example of why these methods are useful. With strings, there are many questions of encoding before they can be used in a database.

One implementation note on the SQLAlchemy implementation of the AbstractType and TypeEngine classes. As illustrated below, the TypeEngine class directly inherits from AbstractType. TypeEngine unnecessarily overrides the bind_processor() and result_processor() methods. Both the signature and implementation of the two methods are the same in both classes. All that changes is the API documentation.

Friday, May 15, 2009

The SQLObject attribute collection functionality

One of the many available Python database object-relational mappers, SQLObject has some interesting meta class functionality. There are many ways to declare a database table using an object-relational mapper such as SQLObject. However, since dealing with declarative meta data in the database, it would be a logical step to treat the Python declarations as meta data as well. A developer using SQLObject to declare a database table will define a class that extends the SQLObject class. Then, the fields of the database table will correspond to the class attributes of the newly derived SQLObject class. These can't just be any primitive Python type attributes. They have to be of a specific SQLObject type. When dealing with meta attributes such as these, much examination of these classes is required. SQLObject has functionality which will collect collect meta attributes from classes.

In the context of SQLObject, the rationale behind such a requirement as class attribute collection is simple. The SQLObject base classes are not always going to know which parameters are used with the class. When instantiating SQLObject instances, the keyword parameters correlate to the fields declared for that instances' database table. In several scenarios, SQLObject needs to inspect a given class and return a list of attributes that were passed to the constructor. This list of parameters at this point are unknown to the class, although, the SQLObject attribute collection functionality assumes that this each attribute in this list is a class attribute of the class being inspected. Since much of the behavior invoked on SQLObject instances is transformed, and called by meta classes, the attributes may not necessarily be needed by the instance. The attribute collection functionality in SQLObject has an opportunity to remove these attributes from the instance and perform any other transformations it sees fit. It also accepts parameters to help it determine what exactly it is modifying.

The _collectAttributes() function is where this functionality is implemented. The parameters that can be passed to this function are as follows.

cls - This is the actual class that is being inspected. It is assumed that all attributes being collected are a part of this class.
new_attrs - This parameter is the dictionary of keyword parameters that are passed to the constructor of SQLObject instances. The values in this dictionary are SQLObject column types.
look_for_class - The column type to look for. Only attributes of this type will be returned.
delete - If true, all attributes will be deleted from the class.
set_name - The actual attribute name will change to the key of the dictionary item corresponding to the attribute.
sort - If true, a sorted list will be returned. The sorting is based on field creation order.

Illustrated below is an over-simplified conceptualization of how the _collectAttributes() function is executed.

Friday, April 3, 2009

Magic methods of the Django QuerySet

The QuerySet class defined in the Django Python web application framework is used to manage query results returned from the database. From the developer perspective, QuerySet instances provide a high-level interface for dealing with query results. Part of the reason for this is that instances of this class behave very similarly to other Python primitive types such as lists. This is accomplished by defining "magic" methods, or, operator overloading. If an operator for a specific instance is overloaded, it simply means that the default behavior for that operator has been replaced when the operand involves this instance. This means that we can do things such as use two QuerySet instances in an and expression that will invoke custom behavior rather than the default behavior. This useful because it is generally easier and more intuitive to use operators in certain contexts than invoking methods. In the case of using two instances in an and expression, it makes more sense to use the built-in and Python keyword than it does to invoke some and() method on the instance. The Django provides several "magic" methods that do just this and we will discuss some of them below.

Python instances that need to provide custom pickling behavior need to implement the __getstate__() method. The QuerySet class provides an implementation of this method. The Django QuerySet implementation removes any references to self by copying the __dict__ attribute. Here is an example of how this method might get invoked.

import pickle
pickle.dumps(query_set_obj)

The representation of Python instances is provided by the __repr__() method if it is defined. The Django QuerySet implementation will call the builtin list() function on itself. The size of the resulting list is based on the REPR_OUTPUT_SIZE variable which defaults to 20. The result of calling the builtin repr() function on this new list is then returned. Here is an example of how this method might get invoked.

print query_set_obj

The length of the QuerySet instance can be obtained by calling the builtin len() function while using the instance as the parameter. In order for this to work, a __len__() method must be defined by the instance. In the case of Django QuerySet instances, the first thing checked is the length of the result cache. The result cache is simply a Python list that exists in Python memory to try and save on database query costs. However, if the result cache is empty, it is filled by the QuerySet iterator. If the result cache is not empty, it is extended by the QuerySet iterator. This means that the result cache is updated with any results that should be in the result cache but are not at the time of the __len__() invocation. The builtin len() function is then called on the result cache and returned. Here is an example of how this method might get invoked.

print len(query_set_obj)

Python allows user-defined instance to participate in iterations. In order to do so, the class must define an __iter__() method. This method defines the behavior for how individual elements in a set are returned in an iteration. The QuerySet implementation of this method will first check if the result cache exists. If the QuerySet result cache is empty, the iterator defined for the QuerySet instance is returned. The iterator does the actual SQL querying and so in this scenario, the iteration looses out on the performance gained by having cached results. However, if a result cache does exist in the QuerySet instance, the builtin iter() function is called on the cache and this new iterator is returned. Here is an example of how the __iter__() method might be invoked.

for row in query_set_obj:
   print row

Python instances can also take part in if statements and invoke custom behavior defined by the __nonzero__() method. If an instance defines this method, it will be invoked if the instance is an operand in a truth test. The Django QuerySet implementation of this method first checks for a result cache, as does most of the other "magic" methods. If the result cache does exist, it will return the result of calling the builtin bool() function on the result cache. If there is no result cache yet, the method will attempt to retrieve the first item by performing an iteration on the QuerySet instance. If the first item cannot be found, false is returned. Here is an example of how the __nonzero__() might be invoked.

if query_set_obj:
   print "NOT EMPTY"

Finally, Python instance may be part or and an or Python expressions. The Django QuerySet instance defines both __and__() and __or__() methods. When these methods are invoked, they will change the underlying SQL query used by returning a new QuerySet instance. Here is an example of how both these methods may be used.

print query_set_obj1 and query_set_obj2
print query_set_obj1 or query_set_obj2

Friday, March 20, 2009

SQL engine dialect in SQLAlchemy

Object relational mapping technology is used in object-oriented systems to provide a layer between the objects in the system, and the underlying relational database system. ORMs help reduce the time investment, and thus the cost as well, during development because developers can still think in an object-oriented way while interacting with the database. Tables in the database are classes in the code. Instances of these classes are rows in the database tables. This is an over-simplification of the problem. It isn't quite so straightforward as to generate standardized SQL statements based on what the instance is doing in the code. Not all database systems use a standard set of SQL. To further complicate matters, the various popular database systems don't even support the same features.

In any given language, there are different dialects of that language. For instance, if two people were engaged in a conversation, one person from the west coast of the country, the other from the east, there is bound to be some ambiguity even though they are speaking the same language. This could arise from a number of factors such as politics, or culturally-related customs. The same effect happens in technology. The language is SQL and while some database systems are very similar in some ways, they can be just different enough to cause an application to require a huge amount of duplicated code just so that the application can support multiple database systems. These are the systems that are supposed to speak the same language. SQLAlchemy addresses this issue by introducing the concept of a dialect. Every database system supported by SQLAlchemy defines a dialect. This isn't just a dialect for the SQL language alone but rather for the entire database system. We can do a lot more than merely execute SQL statements with SQLAlchemy so this is a requirement of the Dialect interface. This interface specifies what each dialect for a specific database must look like. For instance, with a specific dialect, we can specify whether the database accepts unicode SQL statements or not. With SQLAlchemy, there are other classes that do work related to communication with the database such as preparing SQL statements and generating schemas. All these are specified in the Dialect interface.

SQLAlchemy also defines a DefaultDialect class which provides the Dialect interface. This class actually implements some of the methods and attributes specified by the Dialect interface. Some of these methods and attributes are common across all supported database systems. The attributes and methods that are specialized for any given database, are simply overridden by that particular dialect implementation.

When a connection is established in SQLAlchemy, it will construct a new engine instance. This engine will then use the dialect specified in the connection string. For example, mysql would be considered a dialect. Method calls made through the constructed database engine instance are then delegated to the dialect. The dialect will then act accordingly for the database system it is using.

The Dialect interface and the implemented dialect classes in SQLAlchemy serve as a good example of polymorphism in object-oriented programming. There is a single interface that is used to invoke behavior and that behavior varies by type. This is a resilient design because it is loosely coupled and highly replaceable. We can implement our own database dialect and plug in into SQLAlchemy if we are so inclined.

Monday, January 26, 2009

ORM strengths and shortcomings

Object-Relational Mapper technology is used in object-oriented languages to try to reduce the amount of SQL contained inside application logic. ORM libraries exist for several dynamically-typed languages. Two popular Python ORM libraries are SQLObject and SQLAlchemy. The basic idea behind an ORM is that persistent objects within an application are mapped to a database table. The table schema is derived from the class declaration of the object to be stored.

For instance, here is an example of a BlogEntry class using SQLObject

#SQLObject declaration example.

from sqlobject import *

class BlogEntry(SQLObject):
 """An abstraction representing a BlogEntry."""
 class sqlmeta:
     table ='blog_entry'
     idName ='id'

 uuid = StringCol(length=36,\
                 unique=True,\
                 alternateID=True,\
                 alternateMethodName='by_uuid',\
                 default=gen_uuid)
 title=StringCol(length=80,default="Title Placeholder")
 body=StringCol(default="")
 user=ForeignKey('User', default=None)

Here, we have a blog entry abstraction. The BlogEntry class defines a meta class called sqlmeta. This meta class is used to specify table-specific information used by the database when the table is created. For instance, the underlying table in the example will be called blog_entry and will use the id column is the primary key. We have also defined several columns for our class. These columns will serve as attributes for any instances of this class. Once an instance of BlogEntry is created, the ORM will automatically create a table row in the database.

I consider this to be a real strength of ORM technology. It drastically simplifies the abstraction storage requirement. There is no need to write SQL CREATE statements. Or INSERT and UPDATE statements for that matter. There is nothing specifically wrong is SQL. SQL is extremely expressive and powerful. The problem arises when combining SQL with application logic in an interleaved manor. This leads to unmaintainable systems.

One approach to decoupling the SQL required for persistent objects from the behavior implemented by the objects in an object-oriented system is define SQL templates. For example, we might have an define_blog_entry.sql template file. This file could then be read by some database module that then executes the SQL. The developer would then write several other templates for UPDATE, INSERT, and various other database activities. ORM libraries do this very well. There is a very transparent layer that manages persistence.

OK, so how about querying? How do we get our objects back from the database? Well, the ORM also does this. From the example, our BlogEntry class inherits a select() class method from the SQLObject class. Using this method we can pass various criteria in order to retrieve BlogEntry instances.

I think this is the key weakness in the ORM. In a large percentage of cases, it serves well. All we want to retrieve are blog entries. What about when we need multiple types of objects? There is really no way to do this. At least not sensibly. In our example, all we can do is BlogEntry.select(). There is no BlogEntryUser.select() method to retrieve BlogEntry instances and User instances in the same query. Multiple types means multiple queries in ORM land.

SQL along with relational databases are indispensable. Especially the SELECT statement. It is by fore the most effective way to retrieve complex data. ORM technology has done a great job exploiting most of the power SQL has to offer. I just don't thing the querying functionality is as flexible as it could be in most cases.

Subscribe to: Posts ( Atom )