Showing posts with label cache. Show all posts

Friday, November 6, 2009

Django Cache Nodes

As part of the Django Python web application framework is a powerful template system. Many other Python web application frameworks rely on external template rendering packages whereas Django includes this functionality. Normally, it is a good idea to not re-invent the wheel an use existing functionality provided in other packages. Especially specialized packages that do only one thing like render templates. Django, however, is a batteries-included type of framework that doesn't really have external package dependencies.

The Django templating system is not all that different from other Python template rendering systems. It is quite straightforward for both developers and for UI designers to use.

Down at the code level, everything in a template is considered to be a node. In fact, there is a Node class that every template component inherits from. It could have been named TemplateComponent but that isn't the best name for a class. Node sounds better.

One type of node that may be found in Django templates is a cache node. These are template fragments that may be stored in the Django caching system once they have been rendered. Underlying these template cache nodes is the CacheNode class and is illustrated below.

As mentioned, every Django template node type extends from the Node class and CacheNode is no different. Also, as is shown in the illustration, the render() method is overridden to provide the specific caching functionality.

Illustrated below is an activity depicting how the CacheNode.render() method will attempt to retrieve a cached version of the node that has already been rendered and return that value instead if possible.

In this illustration, we start off with two objects, cache and args. The cache object represents the Django cache system as a whole. The args object is the context of the specific node about to be rendered. Next, the context is turned into an MD5 value. The reason for this is to produce a suitable value that can used to construct a cache key. Once this operation has completed, we now have a converted copy of the context. Next, we construct a cache key. This key serves as the query we submit to the Django cache system. Next, we perform the query, asking the cache system for a value based on the key we have just constructed. Finally, if a value was returned by the cache system, this is the rendered value that we return. If now value exists in the cache system, we now need to give it one. This means that we must render the node and pass the result to the cache system and also return this rendered value.

Friday, September 25, 2009

Granular Django Cache

Like many other web application frameworks, Django has a built-in caching system. Unlike other web application frameworks, the Django cache system is relatively straightforward to configure and use. Configuring the cache system can be as simple as specifying where the cached items are stored. With the Django cache system, developers have plenty of options. There is even a dummy cache storage that can be used for development purposes. Whichever back-end cache system you decide to use, it can be specified in the CACHE_BACKEND configuration value.

Once the cache storage location has been setup, caching can be implemented at any number of levels from per-site to low-level. The most effective way to implement Django cache, I find, is to implement it on a per-view basis. Using this method to implement cache means that cached items are created for each URL that is requested if the view mapped to the URL is cached. Using the lower level Django cache constructs are nearly impossible to manage for larger, more complex applications. They do exist, however, for niche situations.

The cache_page() function is responsible for creating a page cache. The function takes a view to be cached and a timeout as parameters. Once the timeout has expired, any cached items are no longer valid. Although the cache_page() function can be used as a decorator on the view declaration, it makes more sense to pass the view as a parameter to cache_page() within the URL configuration. This is the more portable way of doing things and is better aligned logically since the URL serves as the cache key, not the view name.

Thursday, January 8, 2009

Evolution of the updating the cache in vmfeed module.

Remote packages in ECP are managed through an extension module called vmfeed. This module is a core extension module and is distributed along with the application. Remote repositories are essentially RSS feeds that are read by vmfeed and each entry is then updated in the local database (the cached entries).

Within the vmfeed extension module there is a RepoFeed class that represents an installed repository. The RepoFeed.update_cache() method is responsible for reading the feed XML, and updating the database with each entry that is found. Here is what the ECP 2.1 version of the method looks like.

#ECP 2.1 version of RepoFeed.update_cache()

def update_cache(self):
 """This method will update the cache (the RepoEntry rows) with the new
    versions of all of the data in the database.
    @param self: The method class.
    @type self: L{vmfeed.model.RepoFeed}
    @return: None
    @rtype: None
    @raise None: No exceptions are raised by this method.
    @status: Stable
    @see: L{vmfeed.model.RepoFeed.validate_enclosure}"""
 if not self.cache:
     return False
 self.retrieved_on=datetime.datetime.now()
 feed=ET.fromstring(self.cache)
 e2_log('Got a good feed %s'%feed, location=__name__)
 feedname=None
 feedname=feed.find('channel/title')
 if feedname!=None and feedname.text.strip()!="":
     e2_log('Updating feed name to %s'%feedname, location=__name__)
     self.name=feedname.text.strip()
 description=None
 description=feed.find('channel/description')
 if description!=None and description.text.strip()!="":
     e2_log('Updating description to %s'%description,\
             location=__name__)
     self.description=description.text.strip()         
 items=feed.findall('channel/item')
 if not len(items):
     return 0
 for i in items:
     name=i.find('title').text.strip()
     description=i.find('title').text.strip()
     try:
         description=i.find('description').text.strip()
     except:
         pass
     U=None
     U=i.find('uuid')
     if U!=None:
         U=U.text.strip().lower()
     else:
         U=gen_uuid()
     enclosure=None
     enclosures=i.findall('enclosure')
      
     #Only the first enclosure matters, unless it is an egg, in which
     #case we need to look at ALL of them and get the matching python
     #release version.
     #This should all get refactored actually...
     for enclosure in enclosures:
         e2_log('Enclosure is %s'%enclosure.attrib,\
                 location=__name__)
         if enclosure!=None:
             mime=self.enclosure_2_mime(enclosure)
             if not mime:
                 enclosure=None
                 continue
             elif not self.validate_enclosure(enclosure,mime):
                 enclosure=None
                 continue #Only known mime types get stored.
             else:
                 break;
              
     if enclosure!=None:
         #mime=self.enclosure_2_mime(enclosure)
         if not mime:
             continue #Only known mime types get stored.
         e2_log('Found an enclosure %s'%enclosure.attrib['url'],\
                 location=__name__)
         url=enclosure.attrib['url']
         url=self.normalize_url(url)
         try:
             if U:
                 re=RepoEntry.by_uuid(U)
             else:
                 re=RepoEntry.by_url(url)
         except:
             re=RepoEntry(url=url,\
                          name=name,\
                          description=description,\
                          feed=self)
         re.set(description=description,\
                name=name,\
                url=url,\
                retrieved_on=datetime.datetime.now(),\
                mime=mime,\
                uuid=U,\
                )
         re.sync()
         #re.retrieved_on=datetime.datetime.now()
         #re.description=description
         #re.mime=mime
         #re.uuid=U
     else:
         e2_log('No enclosures found in entry %s'%name,\
                 location=__name__)
         if enclosure:
             e2_log('Enclosure had attribs %s'%enclosure.attrib,\
                     location=__name__)
 return 1

The success of the methods' execution is based on the return value. This means that when the method fails, the invoking process is given no useful information when the method fails.

The main problem the ECP development team found with this method is that it is not very cohesive. The responsibilities of this method are very broad:

Parse XML
Initialize repository entry parameters
Iterate through item elements (while performing XML operations)
Iterate through enclosure elements (while performing XML operations)
Check if the repository entry exists and create it if not.

Finally, there is excessive logging that doesn't help with the complexity.

Here is a taste what what the ECP 2.2 version of the same method will look like.

#ECP 2.2 version of RepoFeed.update_cache()

def update_cache(self, tx=None):
  """This method will update the cache (the RepoEntry rows) with the new
    versions of all of the data in the database.
    @param self: The method class.
    @type self: L{vmfeed.model.RepoFeed}
    @return: None
    @rtype: None
    @raise None: No exceptions are raised by this method.
    @status: Stable
    @see: L{vmfeed.model.RepoFeed.validate_enclosure}"""
  self.retrieved_on=datetime.datetime.now()
  feed_xml=get_element(self.cache)
  try:
      name=get_element_text(feed_xml, element='channel/title')
      self.name=name.strip()
  except AttributeError:
      pass
  try:
      desc=get_element_text(feed_xml, element='channel/description')
      self.description=desc.strip()
  except AttributeError:
      pass
  for item in VMFeedTools.get_items_xml(feed_xml):
      try:
          name=get_element_text(item, element='title')
          name=name.strip()
      except AttributeError:
          name=None
      try:
          desc=get_element_text(item, element='description')
          desc=desc.strip()
      except AttributeError:
          desc=None
      try:
          item_uuid=get_element_text(item, element='uuid')
          item_uuid=item_uuid.strip().lower()
      except AttributeError:
          item_uuid=gen_uuid()
      enclosure=VMFeedTools.get_valid_enclosure_xml(item)
      url=get_element_property(enclosure, None, 'url')
      mime=VMFeedTools.enclosure_2_mime(enclosure)
      try:
          entry_obj=VMFeedTools.get_repo_entry(item_uuid)
      except RepoEntryNotFound, e:
          e.store_traceback()
          entry_obj=RepoEntry(uuid=item_uuid,\
                              url=url,\
                              name=name,\
                              description=desc,\
                              retrieved_on=datetime.datetime.now(),\
                              mime=mime,\
                              feed=self)
      else:
          entry_obj.set(uuid=item_uuid,\
                        url=url,\
                        name=name,\
                        description=desc,\
                        retrieved_on=datetime.datetime.now(),\
                        mime=mime)
          entry_obj.sync()