Showing posts with label webframework. Show all posts

Friday, November 6, 2009

Django Cache Nodes

As part of the Django Python web application framework is a powerful template system. Many other Python web application frameworks rely on external template rendering packages whereas Django includes this functionality. Normally, it is a good idea to not re-invent the wheel an use existing functionality provided in other packages. Especially specialized packages that do only one thing like render templates. Django, however, is a batteries-included type of framework that doesn't really have external package dependencies.

The Django templating system is not all that different from other Python template rendering systems. It is quite straightforward for both developers and for UI designers to use.

Down at the code level, everything in a template is considered to be a node. In fact, there is a Node class that every template component inherits from. It could have been named TemplateComponent but that isn't the best name for a class. Node sounds better.

One type of node that may be found in Django templates is a cache node. These are template fragments that may be stored in the Django caching system once they have been rendered. Underlying these template cache nodes is the CacheNode class and is illustrated below.

As mentioned, every Django template node type extends from the Node class and CacheNode is no different. Also, as is shown in the illustration, the render() method is overridden to provide the specific caching functionality.

Illustrated below is an activity depicting how the CacheNode.render() method will attempt to retrieve a cached version of the node that has already been rendered and return that value instead if possible.

In this illustration, we start off with two objects, cache and args. The cache object represents the Django cache system as a whole. The args object is the context of the specific node about to be rendered. Next, the context is turned into an MD5 value. The reason for this is to produce a suitable value that can used to construct a cache key. Once this operation has completed, we now have a converted copy of the context. Next, we construct a cache key. This key serves as the query we submit to the Django cache system. Next, we perform the query, asking the cache system for a value based on the key we have just constructed. Finally, if a value was returned by the cache system, this is the rendered value that we return. If now value exists in the cache system, we now need to give it one. This means that we must render the node and pass the result to the cache system and also return this rendered value.

Wednesday, October 21, 2009

Applying Django Middleware

The Django Python web application framework supports the notion of middleware. What exactly is middleware? In the Django context, middleware are Python packages that do not explicitly belong to any particular Django application. Nor do these middleware packages belong to the Django package although Django does ship with some common middleware. The middleware Python packages that are independently installed sit in the middle.

So whats the point of separately installing Django components if they aren't part of either Django or the applications that use them? Well, since middleware components can be enabled in a Django application's settings, these middleware components may be shared between different applications that use the same Django installation. This keeps the Django core small.

The actual middleware components themselves are composed of classes. These classes must implicitly provide a Django middleware method. These methods can be one of process_request(), process_view(), process_response(), and process_exception().

The middleware components that an application wants to use are invoked automatically by the Django framework. All the application needs to do is enable the desired middleware components. The Django framework uses the BaseHandler class and the WSGIHandler class to invoke the middleware behavior. These classes are illustrated below.

Below is an illustration showing how the to classes, BaseHandler and WSGIHandler, collaborate to process all enabled middleware methods.

Monday, October 19, 2009

Popular Python Frameworks

The Python programming language is great for building web application frameworks. The two main reasons for this are that it is a very simple language to use and understand and it has fantastic networking libraries. Given these two facts, it is no wonder that there exist dozens of relatively solid web application frameworks written in Python.

The more popular web frameworks are the stable ones that have been around for some years. These frameworks have stood the test of time and have a large feature set.

I found another interesting way to look at which frameworks are the popular frameworks by using the Python package index. I browsed the available packages by framework to see which ones have the most packages. The Zope world is still dominating the Python web application framework market. Django is slowly catching up. At the time of this writing, here are the top frameworks listed by number of available packages.

Plone (880)
Zope3 (872)
Zope2 (475)
Django (295)

This list also demonstrates which frameworks are extensible because the easier it is to extend a software package, the more developers are willing to extend it with other packages and release them. What is surprising is the small number of Twisted and Trac packages. Both frameworks are well written and easily extensible. Having said that, the number of packages listed isn't entirely accurate because not all conceivable framework package lives in the Python package index. Also, there are most likely some categorization errors to take into account.

Tuesday, October 13, 2009

Shrinking Python Frameworks

An older entry by Ian Bicking talks about the shrinking world of Python web application frameworks. It is still a valid statement today, I think. There is no shortage of Python web application frameworks in which to choose from. Quite the contrary, it seems that a new one springs into existence every month. This often happens because a set of developers have a very niche application to develop and the existing web application frameworks don't cut it. Either that or they are missing a feature or two, or they have too many moving parts and so they will make some modifications. Whatever the difference, some developers will release their frameworks as an open source project.

The shrinking aspect refers to the number of frameworks which are a realistic choice for implementing a production grade application. Most of the newer Python web application frameworks, still in their infancy, are simply not stable enough.

Take Pylons and TurboGears for instance. Both are still OK web frameworks, you can't have TurboGears without Pylons now. However, they are somewhat problematic to implement applications with. Even if stable enough, there are complexities that shouldn't exist in a framework. Besides, I have yet to see a stable TurboGears release.

Taking complexity to a new level is Zope. This framework has been around for a long time and is extremely stable. But unless you have been using it for several years, it isn't really worth it because of the potential for misuse is so high.

The choice of which Python web application framework to use really comes down to how much programming freedom you want. If you want everything included, Django does everything and is very stable. However, if there are still many unknowns in the application in question, there are many stable frameworks that will simply expose WSGI controllers and allow the developers to do as they will.

Thursday, October 1, 2009

Django Form Fields

The Django Python web application framework comes with almost every component that one might need to construct a full-featured application. One common component of web applications are widgets. Widgets are similarly defined in most desktop GUI libraries. A widget is simply a smaller part of the GUI whole. Widgets typically have a tightly defined concept and purpose. For example, a button widget is meant to be clicked. Also, a button widget is also meant to be visually striking enough that it is obvious that it is meant to be clicked. Django comes with a set of widgets that are featured in almost every web application in one way or another.

Web applications generally contain forms. Nobody has gotten away with using the web without having to fill out a form at some point. The pieces of these forms are often referred to as fields. This is from a data or domain perspective. When talking about a field, you are generally talking about what data that field contains, how that data is validated and so on. When talking about widgets in a form, you are generally talking about the visual aspect, like what can this widget do and how does it look when it does it.

Django provides abstractions within the framework for both of these concepts. The two concepts are closely related and thus tightly coupled. In this particular situation, tight coupling is absolutely necessary. You can't have a field without a widget and vice-versa. The alternative would be to implement all the field functionality inside the widget abstraction which would give us something bloated and hard to understand. Modularity is a good thing even when tight coupling is necessary. The two classes representing these concepts are Widget and Field. The two classes and how they are related to one another are illustrated below.

Diesel Web

Diesel Web is a Python web application framework containing very simplistic components. Rather than focusing on providing a rich set of application components, the focus is on raw performance. With this aspect of the web application taken care of, developers can focus on functionality, with the more freedom than most other frameworks can offer.

The performance offered by Diesel Web is achieved through non-blocking, asynchronous IO. This is different from most other web application frameworks in that most use the thread pool pattern. In the thread pool pattern, a main thread listens for incoming requests and passes control to a thread in the pool. Once the request has been handed off to the thread inside the pool, the main thread can then go back to listening for requests and doesn't need to worry about blocking while processing individual requests. This is the common way that concurrency is achieved in web application frameworks.

The asynchronous IO approach taken by Diesel Web scales better than the thread pool approach because there aren't any locking and synchronization primitives that accumulate overhead. IO events dictate what happens at the OS level and can thus scale better. The real benefit to using this approach would become apparent with thousands of users.

One other topic of interest with Diesel Web is the complete lack of dependencies, which is always a good thing. The framework appears to be striving for simplicity, another good thing, and doesn't really need much help from external packages. Basic HTTP protocol support is built in and that is really all that is needed as a starting point. It will be interesting to see if many other larger applications get built using this framework.

Friday, September 11, 2009

Django Content Files

The Django web application framework written in Python defines a file abstraction used for working with files within the file system. Web applications often have to deal with many very different file formats. These aren't just static files that get served to the client using the system, they are also used to parse and retrieve useful file metadata. An example of this useful metadata would be the image dimensions of an image file. There is other useful file metadata that the clients of the system may not necessarily be concerned with although the system may be.

The File class is the base Django file abstraction. Instances of this class extend the concept of file-like objects in regular Python applications. This abstraction comes in handy when iterating through file contents. Django as a certain iteration style used throughout the framework and this abstraction helps maintain it. The File class is also helpful when dealing with images; it is the base class of the ImageFile class.

Code in Django applications can remain consistent with the file abstraction even when using regular content. Similar to how StringIO works. The ContentFile class inherits from the File class and ContentFile instances and behave just like File instances do. The main difference of course being that the ContentFile only uses raw data instead of data that lives on the file system. This gives Django a huge interface consistency boost when dealing with string data.

Wednesday, August 19, 2009

Django Boundary Iterators

The Django Python web application framework provides tools to parse multi-part form data as any web framework should. In fact, the developer responsible for creating Django web applications need not concern themselves with the underlying parsing machinery. However, it is still there and very accessible.

The Django boundary iterators are used to help parse multi-part form data. These iterators are also general enough to be used in different contexts. The BoundaryIter class collaborates with other iterator classes such as ChunkIter and LazyStream. An example of these classes collaborating are illustrated below.

#Example; Using the Django boundary iterator.

#Imports.
from StringIO import StringIO
from django.http.multipartparser import ChunkIter, BoundaryIter, LazyStream
                                    
if __name__=="__main__":
  #Boundary data.
  b_boundary="-B-"
  c_boundary="-C-"

  #Example message with two boundaries.
  _message="bdata%scdata%s"%(b_boundary, c_boundary)

  #Instantiate a chunking iterator for file-like objects.
  _chunk_iter=ChunkIter(StringIO(_message))

  #Instantiate a lazy data stream using the chunking iterator.
  _lazy_stream=LazyStream(_chunk_iter)

  #Instantiate two boundary iterators.
  _bboundary_iter=BoundaryIter(_lazy_stream, b_boundary)
  _cboundary_iter=BoundaryIter(_lazy_stream, c_boundary)

  #Display the parsed boundary data.
  for data in _bboundary_iter:
      print "%s:      %s"%(b_boundary, data)

  for data in _cboundary_iter:
      print "%s:      %s"%(c_boundary, data)

In this example, we have a sample message containing two boundaries that is to be parsed. In order to achieve this, we create three iterators, and a stream. The _chunk_iter iterator is a ChunkIter instance that is a very general iterator used to read a single chunk of data at a time. This iterator expects a file-like object to iterate over. The two boundary iterators, _bboundary_iter and _cboundary_iter, are instances of the BoundaryIter class. This iterators expect both a data stream and a boundary. In this case, a LazyStream instance is passed to the boundary iterators.

We finally display the results of iterating over the boundary iterators. By the time the second iterator is reached, the data stream is now shorter in length.

Wednesday, June 10, 2009

Sending Django Dispatch Signals

In any given software system, there exist events that take place. Without events, the system would in fact not be a system at all. Instead, we would have nothing more than a schema. In addition to events taking place, there are often, but not always, responses to those events. Events can be thought of abstractly or modeled explicitly. For instance, the method invocation "obj.do_something()" could be considered an invocation event or a "do something" event. This would be an abstract way of thinking about events in an object oriented system. Developers may not even think of a method invocation as an event taking place. However, the abstraction is there if needed. A method invocation is an event when it needs to be because it has a location in both space and time. Events can also be modeled explicitly in code. This is the case when designing a system that employs a publish-subscribe event system. Events are explicitly published while the responses to events can subscribe to them. Another form of event terminology that is often used is to replace event with signal. This is the terminology used by the Django Python web application framework dispatching system.

Django defines a single Signal base class in dispatcher.py and is a crucial part of the dispatching system. The responsibility of the Signal class is to serve as a base class for all signal types that may dispatched in the system. In the Django signal dispatching system, signal instances are dispatched to receivers. Signal instances can't just spontaneously decide to send themselves. There has to be some motivating party and in the Django signal dispatching system, this concept is referred to as the sender. Thus, the three core concepts of the Django signal dispatching system are signal, sender, and receiver. The relationship between the three concepts is illustrated below.

Senders of signals may dispatch a signal to zero or more receivers. The only way that zero receivers receive a given signal is if zero receivers have been connected to that signal. Additionally, receivers, once connected to a given signal, have the option of only accepting signals from a specific sender.

So how does one wire the required connections between these signal concepts in the Django signal dispatching system? Receivers can connect to specific signal types by invoking the Signal.connect() method on the desired signal instance. The receiver that is being connected to the signal is passed to this method as a parameter. If this receiver is to only accept these signals from specific senders, the sender can also be specified as an parameter to this method. Once connected, the receiver will be activated once any of these signal types have been sent by a sender. A sender can send a signal by invoking the Signal.send() method. The sender itself is passed as a parameter to this method. This is a required parameter even though the receiver may not necessarily care who sent the signal. However, it is good practice to not take these chances. If, from a signal sending point of view, there is always a consistency in regards to who the sender is, there is a new lever of flexibility on the receiving end. Illustrated below is a sample interaction between a sender and a receiver using the Django signal dispatching system to send a signal.

The fact that the signal instances themselves are responsible for connecting receivers to signals as well as the actual sending of the signals may seem counter-intuitive at first. Especially if one is used to working with publish-subscribe style event systems. In these event systems, the publishing and subscribing mechanisms are independent from the publisher and subscriber entities. However, in the end, the same effect is achieved.

Friday, May 8, 2009

Restish Resources In Python.

RESTful resource design is not only a common theme in the Python web application framework world, but in countless other languages and frameworks. However, Python has the advantage of rapid development. Not just for the developers who use the frameworks in question, but also the developers creating these frameworks. The lessons learned from previous previous framework implementations have a very rapid turnaround. The restish Python web application framework is an example of just this. The package started out as the Pylons web application framework. However, the developers of restish soon realized that Pylons had shortcomings for what they were trying tho achieve. They then started to remove the problematic components from Pylons and what they eventually ended up with was a web application framework that was stripped down significantly. Just like Pylons, the restish package relies on the paster utility to create restish project and to execute restish projects. As the name of the project indicates, the framework is best suited for designing RESTful resources.

With restish, there is more than one way to define a resource, as is commonly the case with any software component. The best design is to represent a resource by building a class for that resource. Luckily, this is a no-brainer with restish. As a developer, all that needs to be built for a resource is a class that extends resource.Resource. The methods defined by this class are then responsible for returning the appropriate content. These resource methods are then decorated to indicate which HTTP method they correspond with. This is incredibly powerful and insightful; methods corresponding to methods. That is not all the decorated resource methods are capable of, however, I won't dig any further here. Resources can also be nested. The parent controller simply returns an instance of the child controller.

The abstractions provided by the restish package are very valuable to developers who want to design RESTful resources. This is a common design requirement that is elegantly implemented here. The sheer simplicity involved puts a smile on my face.

Monday, May 4, 2009

Django Templates and NodeLists

Whether dealing with a complex web application or a simple web site with relatively few pages, using templates is generally a good idea. With complex page structures, this is unavoidable. The alternative would be to implement the page structure manually. Even with simplistic web pages, there is often a dynamic element that is undesirable to manually edit. If the web application is built with a Python web application, developers have several template engines to choose from. TuboGears, for instance, offers support for several third-party template engines. Django, on the other hand, offers their own template rendering system. The Django template engine provides many of the same features as do other Python template engines. The syntax used in Django templates is simplistic and easy to use. Inside the template engine are two closely related classes that represent key template concepts and are illustrated below. These classes are Template and NodeList.

The Template class represents a template in its' entirety. During the construction of Template instances, the template is compiled. Since this process takes place in the Template constructor, the template string is a required constructor parameter. Since any given Template instance is compiled, it may be rendered at any time. The template passed to the constructor is only compiled once since the compilation takes place in the constructor. Template instances support iteration. Each iteration through a particular Template instance yields a node from the NodeList instance that resulted from compiling the template. Invoking Template.render() will in turn invoke NodeList.render(), passing along the specified context.

The NodeList represents a collection of nodes that result from compiling a template string. The majority of these nodes are HTML elements, template variables, or some Python language construct such as an if statement. NodeList instances are also instances of the Python list primitive. This means that each node contained within the list behaves just like a Python list element. The NodeList.render() method will cumulatively invoke render() on each node contained within the list. The end result is the rendered template. This rendering behavior provided by the NodeList.render() method can be considered polymorphic. The method iteratively invokes render() on anonymous instances. All it cares about is the render() interface.

Friday, April 24, 2009

An idea for loading Pylons middleware

Any given web application framework, Python based or otherwise, must support the concept of middleware. Some frameworks have better support for middleware than others but they all support it, even if the support is rigid and implicit. The reason to associate the concept of middleware with web application frameworks is because the middleware is code that runs in between the request and the response. In most cases, this would be an HTTP request-response pair but the same concept applies to other protocols. In the Python world of web applications, there isn't much support for this idea. Most Python web application frameworks support the concept of extension or plugins. These components can achieve the same effect as middleware components but are more geared toward domain-specific problems. Middleware is supposed to be more solution generic in that it will affect the framework as a whole. The best support of this concept is in the Pylons Python web application framework.

The middleware concept plays a big role in Pylons. In fact, several of the core Pylons components act as middleware. For instance, the routing of an HTTP request, the HTTP session, Caching, are all middleware components in Pylons. These components, as well as other middleware components, may affect the state of the core Pylons WSGI application object. Some of these middleware components will surely affect the state of the HTTP response that is sent back to the client. The middleware architecture in web application frameworks is a wise design decision because it helps reduce the complexity of the code found in the core application object. This happens by providing a better distribution of responsibilities. If all this code were located in the core application object, it would be quite messy and hard to maintain. The middleware construct also offers developers more extensibility opportunities. The entire framework functionality may be altered on account of some middleware component. Below is a simple illustration of how the Pylons web application framework interacts with components.

When a new Pylons skeleton project is generated, a middleware Python configuration script is built and placed in the project directory. It is this generated Python script that is responsible for loading the middleware components to be used in the application. This is an important task t perform when the application is started because the core application object needs its' core middleware components. Without them, much crucial functionality will be missing. However, a nice experiment to try out with the Pylons framework would be to alter this generated middleware loading code. Rather than have static Python code which directly instantiates these components, they could be loaded by iterating through a set of entry points. This way, Pylons middleware components could be discovered and loaded as they are installed on the system. This, however, could be a more challenging feat in reality. The default Pylons middleware components would need to be entry points themselves. These components could always have entry points added to them. At the very least, it would be an idea worth exploring.

Thursday, April 23, 2009

TurboGears i18n catalog loading

In a perfect world, every single application would have full translations of every major spoken language. Applications having this ability is referred to as internationalization. Internationalization become increasingly more important when dealing with web applications. The whole idea behind web applications is portability amongst many disparate clients. Thus, it is safe to assume that not all of these clients are going to speak a universal language. Fortunately, there are many internationalization tools available at developers disposal. One of the more popular tools on Linux platforms is the GNU gettext utility. In fact, Python has a built-in module that is built on top of this technology. Taking this a step further, some Python developers have built abstractions on top of the gettext module, providing further internationalization capabilities for Python web applications. The gettext Python module uses message catalogs to store the translated messages used throughout any given application. These catalogs can then be compiled into a more efficient format for performance purposes. This method of using a single message catalog would suffice for monolithic applications that cannot be extended. However, all modern Python web application frameworks can be extended one way or another. These components that extend these frameworks are more than likely to need internationalization of messages. The problem of storing and using translated messages suddenly becomes much more difficult.

The TurboGears Python web application framework provides internationalization support. TurboGears provides a function that will translate the specified string input. Of course, the message must exist in a message catalog but that is outside the scope of this discussion. The tg_gettext() function is the default implementation provided by the framework. The reason it is the default implementation and not the only implementation is because a custom text translation function may be specified in the project configuration. Either way, custom or default, the text translation function becomes the global _() function that can then be used anywhere in the application. Below is illustration of how the tg_gettext() function works.

The get_catalog() function does as the name says and returns the message catalog of the specified locale. The function will search the locale directory specified in the i18n.locale_dir configuration value. Once a message catalog has been loaded by TurboGears, it is then stored in the _catalogs dictionary. The _catalogs dictionary acts as a cache for message translations. Illustrated below is the work-flow of this function.

As mentioned earlier, this method of using a single message catalog is all well and good until your application needs to be extended by third-party components. If these components have their own message catalogs, any text translation calls made to _() will use the base application catalog. The reason being, the first time a catalog is loaded, it stays loaded because of the _catalogs cache. This means that even if the extension to the web application framework were to update the i18n.locale_dir configuration value, it will make no difference if the specified locale has already been cached. Ideally, the get_catalogs() function could check the _catalogs cache based on the locale directory rather than the locale.

Friday, April 17, 2009

The Django paginator

Most, if not all, modern web applications need pagination in one form or another. Pagination is the act of transforming a large data set into pages of a more manageable size. Google gives us a perfect example case of pagination. Google constantly deals with enormous data sets. Users who perform searches using google would promptly switch to a different search engine if there were no pagination provided. On the other side of the coin, pagination also provides more manageable data sets for the server code to deal with. Instead of the client saying "give me this entire large data set" the client says "give me page one of this large data set, page size being ten". From the developer perspective, pagination isn't always the most enjoyable task. They are error prone if not implemented correctly and can pose challenges when different query constraints come into play. An additional challenge with pagination is the fact that different web application frameworks use slightly different approaches to their pagination implementation. Another approach to implementing pagination could be to implement the functionality directly into the ORM. This would obviously only work with database query results but this is probably the most common use for pagination. However, if an ORM like SQLAlchemy for instance, were to implement pagination functionality, it could be used by developers inside a web application framework while still being functional outside the framework. The Django Python web application framework offers a good pagination implementation. It is not restricted to database query results and is easy for developers to understand and use. There is also much room in the implementation for extended functionality if so desired. To two classes used to implement the pagination functionality in Django are Paginator and Page as illustrated below.

The main class used by the Django web application framework is the Paginator class. The Paginator class deals directly with the data set in question. The constructor accepts an object_list as a required parameter. The constructor also accepts a per_page parameter that specifies the page size. This parameter is also required. Once the Paginator class has been instantiated with a data set, it can be used to generate Page instances. This is done by invoking the Paginator.page() method, specifying the desired page number. The Paginator class also defines several managed properties that can be used to query the state of the Paginator instance. Managed properties in Python are simply methods that may be invoked as attributes. The count attribute will return the number of objects in the data set. The num_pages attribute will return the number of pages that the data set contains, based on the specified page size. Finally, the page_range attribute will return a list containing each page number.

There is room for extended functionality in the Paginator class. Mainly, there could be some operators overloaded to make Paginator instances behave more like new-style Python instances. The Paginator class could define a __len__() method. This way, developers could invoke the builtin len() function on Paginator instances to retrieve the number of pages. An __iter__() method could also be defined for the Paginator class. This would allow instances of this class to be used in iterations. Each iteration would yield a new Page instance. Finally, a __getitem__() method would allow Paginator instances to behave like Python lists. The desired page number could be specified as an index.

The Page class complements the Paginator class in that it represents a specific page within the data set managed by the Paginator instance. The has_next(), has_previous() and has_other_pages() methods of Page instances are useful in determining if the page has any neighbouring pages. The start_index() method will return the index in the original data set owned by the Paginator instance that created the Page instance. The end_index() will return the end index in the original data set.

Like the Paginator class, there is also room for extended functionality here to make Page instances behave more like new-style Python instances. The Page class could define a __len__() method that could return the page size. The Page class could also define an __iter__() method that could enable Page instances to be used in iterations. Finally, the __getitem__() method, if it were defined, could return the specified object from the original object list.

Tuesday, April 7, 2009

Introducing AmiNation

AmiNation is a full-featured Python web application framework and is a product of orangoolabs.com. AmiNation is the driving force behind the Skeletonz content management system written in Python. Although AmiNation is not a formal Python package, it isn't at package at all let alone a formal one, it is a collection of smaller projects that form a coherent whole. These component projects that make the AmiNation web framework are actually independent of one another. So, in a way, it makes the coupling of projects that decide to employ AmiNation Much smaller. If, for instance, there was no longer a need for the AmiDb component, it could safely be removed without affecting the AmiFormat component. The AmiNation web framework provides a good solution for developers looking to implement a Python component-based web architecture. There are three components that make up the AmiNation framework. The AJS component is a lightweight javascript library. The AmiFormat component is an HTML formatter that provides a wiki-like syntax and produces HTML output for the web. Finally, AmiDb is a small database wrapper that sits on top of several other Python database libraries.

The AJS javascript library is very lightweight. Having lightweight javascript libraries is a very important requirement given the likelihood of the library being retrieved several times in the same day by the same user. The AJS library is lighter, that is, smaller in size, than most commonly used javascript libraries today such as jQuery. The AJS javascript library provides functionality for easily creating DOM elements on the fly. This dynamic DOM creation is an important feature of any modern day javascript library as is the requirement of making remote, ajax-style requests, which the AJS library also supports. The DOM creation functions provided by the library are easy to use because they are named after the element the developer wants to create. For instance, SPAN(). DOM elements often need to be created in response to some GUI event or in response to a remote request response. Often, DOM elements will need to be queried. For instance, a developer may implement a query that checks the which classes a certain set of DOM elements contain. Although this is supported by the AJS javascript library, the syntax can be awkward at times. Similar to the way jQuery syntax can grow to be over-complicated if not used with caution.

The AmiFormat component of the AmiNation web framework can be though of as a formatting engine. It provides a formatHTML() method that can be invoked on formatter instances. This method will take a string as a parameter and replace any custom syntax it finds in the string with HTML formatted for the web. It is trivial for developers to add new syntax to the formatter. It mostly involves creating a new class with some parameter validation and rendering functionality. This class is then registered with the formatter instance. The syntax used in the string passed to the formatter follows a square bracket element style. Inside these square brackets, any number of named parameters can be passed to the element. It is mostly a matter of the element supporting the parameter by checking for it and using it when rendering the template. The square bracket style of element syntax is great when using two brackets. However, this syntax can look rather clumsy when using four brackets while closing the second pair.

The AmiDb component is a simple database wrapper. It sits on top of SQLAlchemy and other Python database libraries. The main reason for the AmiDb component is to offer more flexibility than what traditional ORM technology offers when executing complex database queries. There are a limited number of functions in this component that allow developers less restrictive functionality. SQLAlchemy is actively maintained whereas AmiDb is not. In fact, the AmiNation project as a whole is not actively maintained. Hopefully, this trend is reversed because there is a lot of useful design here that could help simplify many Python web application projects that currently use much more than what is needed in terms of library functionality. If nothing else, the AmiNation will serve as an exemplary Python project with outstanding design elements hidden within.

Friday, April 3, 2009

Magic methods of the Django QuerySet

The QuerySet class defined in the Django Python web application framework is used to manage query results returned from the database. From the developer perspective, QuerySet instances provide a high-level interface for dealing with query results. Part of the reason for this is that instances of this class behave very similarly to other Python primitive types such as lists. This is accomplished by defining "magic" methods, or, operator overloading. If an operator for a specific instance is overloaded, it simply means that the default behavior for that operator has been replaced when the operand involves this instance. This means that we can do things such as use two QuerySet instances in an and expression that will invoke custom behavior rather than the default behavior. This useful because it is generally easier and more intuitive to use operators in certain contexts than invoking methods. In the case of using two instances in an and expression, it makes more sense to use the built-in and Python keyword than it does to invoke some and() method on the instance. The Django provides several "magic" methods that do just this and we will discuss some of them below.

Python instances that need to provide custom pickling behavior need to implement the __getstate__() method. The QuerySet class provides an implementation of this method. The Django QuerySet implementation removes any references to self by copying the __dict__ attribute. Here is an example of how this method might get invoked.

import pickle
pickle.dumps(query_set_obj)

The representation of Python instances is provided by the __repr__() method if it is defined. The Django QuerySet implementation will call the builtin list() function on itself. The size of the resulting list is based on the REPR_OUTPUT_SIZE variable which defaults to 20. The result of calling the builtin repr() function on this new list is then returned. Here is an example of how this method might get invoked.

print query_set_obj

The length of the QuerySet instance can be obtained by calling the builtin len() function while using the instance as the parameter. In order for this to work, a __len__() method must be defined by the instance. In the case of Django QuerySet instances, the first thing checked is the length of the result cache. The result cache is simply a Python list that exists in Python memory to try and save on database query costs. However, if the result cache is empty, it is filled by the QuerySet iterator. If the result cache is not empty, it is extended by the QuerySet iterator. This means that the result cache is updated with any results that should be in the result cache but are not at the time of the __len__() invocation. The builtin len() function is then called on the result cache and returned. Here is an example of how this method might get invoked.

print len(query_set_obj)

Python allows user-defined instance to participate in iterations. In order to do so, the class must define an __iter__() method. This method defines the behavior for how individual elements in a set are returned in an iteration. The QuerySet implementation of this method will first check if the result cache exists. If the QuerySet result cache is empty, the iterator defined for the QuerySet instance is returned. The iterator does the actual SQL querying and so in this scenario, the iteration looses out on the performance gained by having cached results. However, if a result cache does exist in the QuerySet instance, the builtin iter() function is called on the cache and this new iterator is returned. Here is an example of how the __iter__() method might be invoked.

for row in query_set_obj:
   print row

Python instances can also take part in if statements and invoke custom behavior defined by the __nonzero__() method. If an instance defines this method, it will be invoked if the instance is an operand in a truth test. The Django QuerySet implementation of this method first checks for a result cache, as does most of the other "magic" methods. If the result cache does exist, it will return the result of calling the builtin bool() function on the result cache. If there is no result cache yet, the method will attempt to retrieve the first item by performing an iteration on the QuerySet instance. If the first item cannot be found, false is returned. Here is an example of how the __nonzero__() might be invoked.

if query_set_obj:
   print "NOT EMPTY"

Finally, Python instance may be part or and an or Python expressions. The Django QuerySet instance defines both __and__() and __or__() methods. When these methods are invoked, they will change the underlying SQL query used by returning a new QuerySet instance. Here is an example of how both these methods may be used.

print query_set_obj1 and query_set_obj2
print query_set_obj1 or query_set_obj2

Thursday, March 26, 2009

The Trac component loader

All modern web application frameworks need replaceable application components. Reasons for this requirement are plenty. Some applications will share common functionality with other applications such as identity management. However, having the ability to extend this functionality or replace it entirely is absolutely crucial. Technology requirements change too fast to assume that a single implementation of some feature will ever be sufficient for any significant length of time. Moreover, tools that are better for the job that your component currently does will emerge and developers need a way to exploit the benefit of these tools without having to hack the core system. Trac is a Python web-framework in it's own right. That is, it implements several framework capabilities found in other frameworks such as Django and TurboGears. Trac is highly specialized as a project management system. So, you wouldn't want go use Trac as some generalized web framework for some other domain. Project management for software projects is such a huge domain by itself that it makes sense to have a web framework centered around it. Trac defines it's own component system that allows developers to create new components that build on existing Trac functionality or replace it entirely. The component framework is flexible enough to allow loading of multiple format types; eggs and .py files. The component loader used by the Trac component system is in fact so useful that it sets the standard for how other Python web frameworks should load components.

The Trac component framework will load all defined components when the environment starts it's HTTP server and uses the load_components() function to do so. This function is defined in the loader.py module. This load_components() function can be thought of as the aggregate component loader as it is responsible for loading all component types. The load_component() function will accept a list of loaders in a keyword parameter. It uses these loaders to differentiate between component types. The parameter has two default loaders that will load egg components and .py source file components. The load_components() function will also except an extra path parameter which allows the specified path to be searched for additional Trac components. This is useful because developers may want to maintain a repository of Trac components that do not reside in site-packages or the Trac environment. The load_components() function also needs an environment parameter. This parameter refers to the Trac environment in which the loader is currently executing. This environment is needed by various loaders in order to determine if the loaded components should be enabled. This would also be a requirement of a custom loader if a developer was so inclined to write one. There is other useful environment information available to new loaders that could potentially provide more enhanced functionality.

As mentioned, the load_components() function specifies two default loaders for loading Trac components by default. These loaders are actually factories that build and return a loader function. This is done so that data from the load_components() function can be built into the loader function without having to alter the loader signature which is invoked by the load_components() function. This offers maximum flexibility. The first default loader, load_eggs(), will load Trac components in the egg format. This does so by iterating through the specified component search paths. The plugins directory of the current Trac environment is part of the included search path by default. For each egg file found, the working set object, which is part of the pkg_resources package, is then extended with the found egg file. Next, the distribution object, which represents the egg, is checked for trac.plugins entry points. Each found entry point is then loaded. What is interesting about this approach is that it allows subordinate Trac components to be loaded. This means if there is a found egg distribution containing a large amount of code and a couple small Trac components to be loaded, only the Trac components are loaded. The same cannot be said about the load_py_files() loader which is the second default loader provided by load_components(). This function works in the same way as the load_eggs() function in that it will search the same paths except instead of looking for egg files, it looks for .py files. When found, the loader will import the entire file, even if there is now subordinate Trac components within the module. In both default loaders, if the path in which any components were found is the plugins directory of the current Trac environment, that component will automatically be enabled. This is done so that the act of placing the component in the plugins directory also acts as an enabling action and thus eliminating a step.

There are some limitations with the Trac component framework. The _log_error() function nested inside the _load_eggs() loader shouldn't be a nested function. There is no real rationale for doing so. Also, loading Python source files as Trac components is also quite limiting because we loose any notion of subordinate components. This is because we can't define entry points inside Python source files. If building Trac components, I would recommend only building eggs as the format.

Monday, March 16, 2009

Initializing the CherryPy server object

The CherryPy Python web application framework contains several abstractions related to an HTTP server. One of which is the Server class defined in _cpserver.py. In fact, the top-level server instance of the CherryPy package is an instance of this class. The Server class inherits from the ServerAdapter class which is defined in servers.py. Interestingly, the class serves as a wrapper, for other HTTP server classes. The constructor of ServerAdapter will accept a both a bus and a httpserver parameter. The bus parameter is a CherryPy website process bus as described here. The httpserver parameter can be any instance that implements the CherryPy server interface. Although this interface isn't formally defined, it can be easily inferred by looking at the CherryPy code.

So we have a Server class that inherits from ServerAdapter. The idea here is that many server instances may be started on a single CherryPy website process bus. One question I thought to ask was "if Server inherits from ServerAdapter and ServerAdapter is expecting a httpserver attribute, how do we specify this if the ServerAdapter constructor isn't invoked with this parameter?" In other words, the ServerAdapter isn't given a parameter that it needs by the Sever class.

It turns out that the start() method is overridden by Server. This method will then ensure that a httpserver object exists by invoking Server.httpserver_from_self(). Developers can even specify an httpserver object after the Server instance has been created by setting the Server.instance attribute with the server object we would like to use. This is the first attribute checked by the Server.httpserver_from_self() method. The Server.instance attribute defaults to None, and if this is still the case once Server.httpserver_from_self() is invoked, it will simply create and return a new CPWSGIServer instance to use.

Now we can feel safe, knowing that there will always be an object available for ServerAdapter to manipulate. We left off at the Server.start() method creating the httpserver attribute. Once this is accomplished, the ServerAdapter.start() method is invoked because it is now safe to do so. One observation about the implementation; I'm not convinced that calling the ServerAdapter.start() method with an instance of self is the best design. This is the only way that Server instances can invoke behaviour on the ServerAdapter instance, even though in theory it is the same instance by inheritance. At the same time, we wouldn't be able to override the method and then call the inherited method if we were to call ServerAdaptor.__init__() from Server.__init__(). The alternative would be to have unique method names between the two classes. Then again, this might end up taking away from the design quality of ServerAdapter. So the question is, which class is more important in terms of design. Just something to think about, not that the current implementation is defective by any means. CherryPy is probably one of the more stable Python packages in existence.

Friday, March 13, 2009

How Pylons connects to the ORM

The Pylons Python web application framework manages database connections for any given application written in the framework. It does so by using a PackageHub class for SQLObject. This is actually similar to how TurboGears manages database connections. The database functionality for the Pylons web framework is defined in the database.py module. This are slightly different for SQLAlchemy support in Pylons. In order to provide SQLAlchemy support, Pylons will attempt to define several functions SQLAlchemy requires to connect to the database.

Something I find a little strange is the way the support for both SQLAlchemy and SQLObject is handled by Pylons. Pylons will attempt to import SQLAlchemy and handle the import error. However, Pylons will always attempt to import SQLObject and will not handle an import failure if the library isn't installed on the system. For instance, the following is a high level view of how the database ORM libraries are imported in Pylons.

There is a slight asymmetry here. At the very least, I think SQLObject errors should be handled as well. But what would happen in the event that there are no supported ORM libraries available for import? That would obviously leave us with a non-functional web application. A nice feature to have, and this really isn't Pylons-specific, is the ability to specify in a configuration sort of way, which ORM library is preferred. The database.py module could then base the imports on this preference. For instance, illustrated below is how the ORM importing mechanism in Pylons might work if the ORM preference could be specified.

Here, the flow is quite simple. We load the configuration data, check which ORM was specified and attempt to import it. On import failure, we complain about an ORN not being available. Of course, we will most likely want a default ORM preference if one is not provided. I think that would provide a much cleaner design than basing the ORM preference on what can be imported. There is certain enhancement functionality in which we can base the availability on the fact that the library can be imported. Such as a plugin for a system. But, these are only enhancements. We can't really make these assumptions about a core component like a database connection manager.

The SQLAlchamy connection functionality in Pylons has no notion of a connection hub. There is absolutely no problem with this. The functions are what is needed to establish a connection to SQLAlchemy and they work. For illustration purposes, lets make a new pretend class that doesn't exist called SQLAlchemyHub that groups all SQLAlchemy-related functions together. The following is what we would end up with when visualizing the database.py module.

It is important to remember that the SQLALchemyHub class isn't real. It is there to help us visualize the SQLAlchemy abstraction within the context of the module.

Friday, March 6, 2009

Django WSGI handlers

WSGI handlers in the Django Python web application framework are used to provide the WSGI Python interface. This means that the Django framework can easily support middleware projects that manipulate the response sent back to the client. The code for the WSGIHandler class and the supporting WSGIRequest class can be found in the wsgi.py Python module. Here is a simple illustration of what the module provides.

Just as a brief overview of what this module contains, I'll simply describe each of the four classes shown above. First, the HttpRequest class provide the basic HTTP request attributes and methods. The BaseHandler class provides attributes and methods common to all handlers. The WSGIRequest class is a specialized HTTP request. The WSGIHandler class is a specialized handler.

The WSGIHandler class depends on the WSGIRequest class because it instantiates the request instances. This is done by setting the WSGIRequest class as the request_class attribute. There are two aspects of the WSGIHandler class that I find interesting.

First, this class isn't instantiated. There is no __init__() method defined and all attributes are class attributes. However, the __call__() method is defined so this class can actually be invoked is if we were creating an instance only we get a response object returned instead. At the beginning of the __call__() method, the WSGIHandler class acquires a threading lock. This lock is than released after the middleware has been loaded.

Second, each middleware method is than cumulatively invoked on the response so as to filter it. Very cool!

Subscribe to: Posts ( Atom )