Showing posts with label cherrypy. Show all posts

Thursday, October 22, 2009

Publishing CSV

I recently came across this Python recipe for serving CSV data from a CherryPy web controller. The recipe itself is fairly straightforward. It is a simple decorator that can be applied to a method which is also a CherryPy controller. The decorator will take the returned list data and transform it into a CSV format before returning the response.

You sometimes have to wonder about the CSV format, it is quite non-descriptive for humans to read. Fortunately, that isn't the reason it was created. The CSV format is easy for software to understand without requiring a lot of heavy lifting. This is why it is still widely used. Virtually any data can be represented with it. But what happens when something goes wrong? Typically, if an application complains about some kind of data it is trying to read, a developer of some sort needs to take a look at it. Good luck with trying to diagnose malformed CSV data, especially a large chunk of it.

The fact of the matter is, the CSV format is still a requirement in many contexts. Especially in terms of providing the data as opposed to consuming it. It is most likely going to be easier to provide a CSV format to a client than it is to say the client needs to be altered to support SOAP messages. If that were the case, there would certainly be many upset people using your service, or, nobody using your service at all.

As the recipe shows, transforming primitive data structures into CSV format isn't that difficult. Especially with high level languages that have a nice CSV library like Python does. The HTTP protocol is more than likely going to be the protocol of choice. In this case, the important HTTP headers to set are Content-Type and Content-Length.

Odds are, the CSV format isn't the format of choice for most web application designs. That is, the developers building APIs aren't going to use CSV. They are probably going to use something more verbose like JSON or some XML variant. This just makes the clients that need to interact with this data much easier. Also, they much more common. Chances are that CSV support will be an afterthought. This isn't an uncommon change request for any web application project though. No one building a web application can expect the initial chosen format to suffice throughout the lifetime of the application. What this means is that exposing the data to the web is the easy part. It is coming up with a sustainable design that is the challenge.

Many web application frameworks provide support for multiple response formats. And, if CSV isn't one of them, chances are that there is a plugin that will do it.

So, even if the framework doesn't support CSV data transformation functionality, as is the case with CherryPy, the Python CSV functionality will do just fine on its own. That is, the controller can be extended with CSV capabilities as is the case in the recipe. Below is an illustration of a web controller with CSV capabilities.

Here the Controller class inherits CSV capabilities. The DomainObject class, which for our purposes here could be anything that is part of the application domain and needs to be exposed to the web. With this design, as is the case with CSV functionality offered by the web framework of choice, the responsibility of the CSV transformation falls outside of the domain entirely.

Below is an alternate design that give CSV data transformation capabilities directly to the DomainObject class.

So which design is more realistic? Probably the former simply because the use case for CSV data transformation outside of the web controller context isn't all that common. But does the latter design even make sense? Are we giving too much responsibility to a business logic class? Well, I would argue that it depends on how portable the domain design classes need to be. Sure, with the CSV capabilities being given to a domain class, we are violating a separation of concerns principle, albeit, only slightly. It isn't as though the class itself is being altered to support a specific data format. If this is a valid trade-off in a given context, I would strongly consider keeping data format transformation out of the web controllers.

Saturday, August 22, 2009

Stand Alone Python Servers

The Python programming language provides a full-featured suite of both high and low-level networking capabilities. This allows for stand-alone web servers to be written in Python. For instance, CherryPy is an object publishing framework for Python with a very powerful built-in HTTP server.

In production environments, Python is often run behind a more production-ready web server such as Apache. The main reason for doing so is performance. Web servers such as Apache are written low-level languages and thus have a raw computing advantage. They are generally mature as software packages. The have been tried and tested for much longer than most Python web servers.

Another reason may be architecture. If there are several other production services that already use Apache, then it would make sense for consistency. It is much easier to maintain a single web server than many, often very different web servers.

If one were to deploy a single Python web application with a single web server, is deploying to a production-ready HTTP server really necessary? Are the performance gains really all that noticeable in the grand end-user scheme of things? As for stability, this is the kind of thing that should be rigorously tested before even considering placing the stable label on the server to begin with.

Wednesday, March 18, 2009

Evaluating file-monitoring techniques in Python

There is a general need to monitor changes made to files in any computer system. The question being, why? The short answer being that when a file has changed, the state of the system has also changed and there are going to be reactions to that change in state. These events that take place in response to file changes usually happen at a very low system level. At the application level, there is also a need to monitor the system state or sub-states such as files. For instance, if we are working with a web development framework such as TurboGears and we want the development HTTP server to reload every time a source code file changes, those files need to be monitored. Once a change has been detected by the monitoring process, the process can then reload the HTTP server. Another use for files is to communicate between different processes within a system. One process, or many processes can monitor a given file and react accordingly when the state of that file changes.

There are two approaches I'm going to evaluate here. The first is the CherryPy method which is non-blocking. The second is the generic method which is blocking. Although the two methods differ at a higher level, they are the same in that they determine if a file has changed state based on the modification date.

The first of the two methods is a blocking method. This method will block the flow of control within an application. This means that any code that comes after this logic, will not be executed until this file monitoring code is complete. The reason this method is blocking to begin with is it involves a loop and breaking out of it is the only way to terminate the file monitoring logic. The developer using this method can specify an interval at which the logic will test the specified files for changes. It will check the modification date of the file and compare it to the last modification date recorded by this method. If the date is later than the last recorded date, the file has changed. Obviously the more files being monitored, the more overhead involved so care must be taken to not overload the system by monitoring too many files. This method of file monitoring is generic and can be used in many contexts with little modification since it wasn't designed with any particular application domain in mind.

The non-blocking method is based on the CherryPy Python web application framework. It uses this non-blocking method to monitor changes made to Python modules within a given CherryPy application. Once a module state change has been detected, it is an indication that the HTTP server should be restarted to reflect those changes in the running application. The monitoring logic is periodically run in a separate thread of control at a specified interval. This means that is the server is in the middle of processing a request, the control flow does not block in the middle of the request entirely. The method used to determine if the file has been changed is the same as the blocking method. The modification date is compared to the previous modification date. This method of monitoring for file state changes on a file system is a very elegant solution. The main downfall is that it is context-specific. It was designed with CherryPy in mind. However, it is not so tightly-coupled to CherryPy that it could not be used somewhere else. Some minor changes would do the trick.

Lastly, if you need to monitor file system state changes, which method do you want to use? That is, which method is best suited for your application? There are a couple factors to consider. First, if your application can only support a single process, the blocking method is out of the question. However, this is rarely the case. The application could simply spawn a new file monitoring process. This could introduce a new problem though because the file monitoring process would need to communicate to the main application process. Having done this, you will have introduced a new communication channel in your application and thus increasing the complexity considerably. The CherryPy, non-blocking, file monitoring approach could prove to be the better approach if the application you are developing needs to react to file system state changes as well as changes in state from other resources. The challenge here is that it is not nearly as generic as the blocking method and would require a larger development time investment. Do some investigation as to what state changes your application must respond to. If only file system state changes, the blocking method may suffice. In must other cases, the non-blocking approach may be better suited.

Monday, March 16, 2009

Initializing the CherryPy server object

The CherryPy Python web application framework contains several abstractions related to an HTTP server. One of which is the Server class defined in _cpserver.py. In fact, the top-level server instance of the CherryPy package is an instance of this class. The Server class inherits from the ServerAdapter class which is defined in servers.py. Interestingly, the class serves as a wrapper, for other HTTP server classes. The constructor of ServerAdapter will accept a both a bus and a httpserver parameter. The bus parameter is a CherryPy website process bus as described here. The httpserver parameter can be any instance that implements the CherryPy server interface. Although this interface isn't formally defined, it can be easily inferred by looking at the CherryPy code.

So we have a Server class that inherits from ServerAdapter. The idea here is that many server instances may be started on a single CherryPy website process bus. One question I thought to ask was "if Server inherits from ServerAdapter and ServerAdapter is expecting a httpserver attribute, how do we specify this if the ServerAdapter constructor isn't invoked with this parameter?" In other words, the ServerAdapter isn't given a parameter that it needs by the Sever class.

It turns out that the start() method is overridden by Server. This method will then ensure that a httpserver object exists by invoking Server.httpserver_from_self(). Developers can even specify an httpserver object after the Server instance has been created by setting the Server.instance attribute with the server object we would like to use. This is the first attribute checked by the Server.httpserver_from_self() method. The Server.instance attribute defaults to None, and if this is still the case once Server.httpserver_from_self() is invoked, it will simply create and return a new CPWSGIServer instance to use.

Now we can feel safe, knowing that there will always be an object available for ServerAdapter to manipulate. We left off at the Server.start() method creating the httpserver attribute. Once this is accomplished, the ServerAdapter.start() method is invoked because it is now safe to do so. One observation about the implementation; I'm not convinced that calling the ServerAdapter.start() method with an instance of self is the best design. This is the only way that Server instances can invoke behaviour on the ServerAdapter instance, even though in theory it is the same instance by inheritance. At the same time, we wouldn't be able to override the method and then call the inherited method if we were to call ServerAdaptor.__init__() from Server.__init__(). The alternative would be to have unique method names between the two classes. Then again, this might end up taking away from the design quality of ServerAdapter. So the question is, which class is more important in terms of design. Just something to think about, not that the current implementation is defective by any means. CherryPy is probably one of the more stable Python packages in existence.

Friday, March 13, 2009

The need for a simplified pypi package.

Given the growing complexity of many Python applications these days, developers often use other packages and libraries to help manage this complexity. TurboGears, for instance, will fetch several other packages from PyPi when it is installed and install these packages as well. PyPi provides access to packages that the Python community has provided, possibly because they feel it will be useful in a different context.

However, the PyPi code itself isn't exactly a simple Python package used to host egg files. It is a full-featured, hosted solution. The setuptools package can fetch eggs listed on a simple HTML page. In this case, you wouldn't even need anything other than Apache. However, what would be nice, is a middle-ground. A Python package that uses CherryPy or some other web framework to host the actual packages and provides a very simplistic management interface. I think something like this would be very valuable for packages that are limited by having to retrieving dependencies from PyPi and would need their own repository.

I'm not too sure how difficult this would actually by to implement, I'm only thinking of the need for such a solution at the moment. Perhaps I'll do some experimentation and write about what I find.

Thursday, March 5, 2009

The Cherrypy Web Site Process Bus

The Cherrypy web application framework defines several process utilities for communication across several website components. One such utility is the Web Site Process Bus. This bus is implemented as a class called Bus. The bus implements a micro publish subscribe event system along with some default subscribers to carry-out the default server behaviour. An interesting note about this publish-subscribe framework; every subscriber is guaranteed to execute, even if previous subscribers raise an exception.

The Bus class defines several methods that will publish basic server state change events such as starting and stopping. Here is an illustration of what the Bus class looks like.

Here, we can see that the Bus class depends on the _StateEnum enumeration. This enumeration holds all possible states a Cherrypy server may be in at any given time.

The subscribe() method will subscribe the specified callback to the specified channel. There is also an optional priority parameter that may be set.

The unsubscribe() method will unsubscribe the specified callback from the specified channel.

The publish() method will publish the specified parameters to the specified channel.

The _clean_exit() method will check if the bus is in an EXITING state before exiting the bus.

The start() method will publish an event on the start channel and put the bus in a STARTING and START state.

The exit() method will publish an event on the exit channel and put the bus in a EXITING state.

The restart() method will set the execv attribute to True and invoke the exit() method.

The graceful() method will publish an event on the graceful channel.

The block() method will block execution in the current thread of control until the bus is in the EXITING state. Once reached, all threads are then joined so the current thread will wait until they have terminated. Finally, if the execv attribute is set to True, _do_execv() is invoked.

The wait() method will block execution until the bus is in the specified state.

The _do_execv() method will attempt to restart the process in which cherrypy is running.

The stop() method will publish an event on the stop channel and put the bus in a STOPPING and STOPPED state.

The start_with_callback() method will start a new thread for the specified callback and then invoke the start() method.

I think having a platform independent bus like this in place is a very smart design decision. It is very useful and helpful for web applications to be able to reason about the state of the process in which it is running. This is especially true in today's distributed environments.

Tuesday, November 4, 2008

Some TurboGears configuration thoughts

The TurboGears Python web framework uses Cherrypy as the web server. Cherrypy offers several other useful features other than strictly "web server" functionality. One of these features is project configuration. TurboGears basically builds on the Cherrypy configuration functionality. TurboGears also uses a Python package called ConfigObj to help distribute responsibilities. If there is one thing the TurboGears project does well, it would have to be reusing existing functionality rather than rebuild it.

When retrieving a configuration value in a TurboGears project using get(), TurboGears simply uses the Cherrypy configuration. However, the TurboGears framework is responsible for keeping the Cherrypy configuration component up to date. For example, lets say you need to update the project configuration dynamically as the result of some client request. This can be done is one of two ways. The first, you could pass a key/value dictionary to update(). The second, you could pass a configuration file to update_config(). In both cases, we are essentially updating the Cherrypy configuration. The ConfigObj package comes in handy when we need to read configuration files. Not when we already have a key/value configuration dictionary.

My main criticism of the TurboGears config module is that I wish there were a factory function that simply generated a configuration instance. This instance would be a representation of the configuration for the entire project. All the functions that are currently defined in the TurboGears config module could be instance methods of this new configuration class. I haven't yet looked in the trunk yet to see whats happening there with the config module.

Tuesday, October 14, 2008

Enomaly ECP update

Lots of new ideas for Enomaly ECP 2.2 have been brewing and work to implement these ideas has commenced. One new piece of functionality that has already made its way in are new HTTP request testing functions:

is_get_request()
is_post_request()
is_put_request()
is_delete_request()

These functions have made their way into the REST_library module in an effort to globally decouple cherrypy from the Enomaly ECP modules. This isn't to say that cherrypy isn't going to be used as the HTTP server in 2.2. It is just good programming practice.

Just today, I started adding extensible javascript and CSS capabilities for ECP extension modules. I'll have more to say about that once it is further developed.

Subscribe to: Posts ( Atom )