Wednesday, October 12, 2011

Static Tests And Dynamic Tests

How do you automate the delivery of solid software? Software without bugs, software that fulfills what the customer has envisioned? It's difficult to determine how software is going to achieve it's role when dropped into problem domain battle field. Adroit human beings — the software development team — collaborate and work closely with the customer to work all this out.

Typically, this process never ends — even long after the software has been deployed and has become a possession of the customer. We'd like for our software to work well enough that it can handle subtle mistakes in configuration — in other words, the unexpected. Most experienced computer users have had experiences where they know ahead of time the software is going to fail as a result of their actions. The nice surprise these folks get is when the software unexpectedly adapts to the situation and might even offer the user some advice.

These advanced features — little nuggets of exceptionally usable software don't come naturally — not without a lot of testing. Back to automating the development process — the repetitive pieces that are better performed by unit test bots. We, as software developers, are constantly pushing the limits of what it means to be stable — how can we find our weak points and design a test that will cover them when we want to step through the testing procedure?

The question I have is this — can you devise creative automated unit tests without manually testing the software yourself?

Static software design
When I talk about static software design, what I'm actually referring to is not the design itself, but the methodology used to generate a design. In the end, what we're looking for is something that satisfies the requirements of our customers. The customers rarely glimpse into the process that produces their code. So if a development team follows the waterfall approach — if we're to assume that the requirements are Newtonian and unchanging, that'd be fine by them.

As we all know, the waterfall approach has a pretense of completeness about it. We're assuming, under this mode of operation, that nothing about the project will change. I'm sure there have been a few instances of this in the real world — hello world programs, for example.

Static software design means that we're not incorporating the emergent system properties into the final product. During development, we're bound to make discoveries about our approach, the problem domain, or even the technology itself. We don't have robots that'll write code for us yet — this means that humans still play a pivotal role in the design of software. And where there are humans, there are false assumptions.

Is this necessarily a bad thing? No, because if there is something unique within our species, it's our ability to problem solve and adapt quickly. There are simply too many variables in software design to fully automate the process — and even if we could take humans out of the driver seat when it comes to optimal programming language use, we've still got humans supplying the input.

The iterative approach to software development is really nothing more than a means to organize the natural learning that takes place during evolution of the system. Each iteration has new input. The first iteration has what the customer envisions — each subsequent iteration has our previous attempts at prototyping the system along with new problems we've introduced for ourselves. This is as good as it gets — we not only improve the software system we're building, but as developers, we're building our own repertoire of software development knowledge. You don't get this with a static software design methodology.

Dynamic software testing
Write the tests first, build software that pass those tests. An elegant solution — perhaps an obvious solution to building quality software in as little time as possible. It's like when you're figuring out a problem mentally — it helps if you have a set of smaller problems — questions you can explicitly ask yourself before answering. The same goes with software — it's much easier to write code that'll pass small tests than it is to write software that'll answer some grand philosophical question.

This is indeed a great starting point — an atomic means to get your feet wet with each iteration of the product life cycle. As a bonus, you've now got a suite of tests you can continue to apply throughout the development of the software.

But here's the thing — the tests you write for your software are like a little software project on their own. The tests are the nurturing parent that makes sure the child is able to make it in the real world without walking out onto the highway and getting run-over. Thus, if our software is continuously evolving, than so must our unit tests. As we write our tests, we're going to discover what works and what doesn't. But as we learn and improve the software, so too must the tests be improved.

As the system grows, we'll need to introduce new tests that aren't necessarily related to the overall requirements of the system. Perhaps there is a something a little less tangible — something related to a technological dependency that caused problems during development. These are useful tests to pass and are impossible to collect up-front.

So if tests are also evolutionary, they should become an integral part of the software development process. Incorporating automated unit tests into the project isn't anything new — what I'm suggesting is that they're treated as the parent of the software product and follow the same evolutionary course. Just as static software development isn't a tool humans can readily use, neither are static up-front unit tests. New unforeseen scenarios need tests. These don't reveal themselves till several iterations into the project. And, old tests should eventually be culled — much later on. Mature software products should have sophisticated unit tests that supply unique input as a result of evolution.

Thursday, January 8, 2009

Evolution of the updating the cache in vmfeed module.

Remote packages in ECP are managed through an extension module called vmfeed. This module is a core extension module and is distributed along with the application. Remote repositories are essentially RSS feeds that are read by vmfeed and each entry is then updated in the local database (the cached entries).

Within the vmfeed extension module there is a RepoFeed class that represents an installed repository. The RepoFeed.update_cache() method is responsible for reading the feed XML, and updating the database with each entry that is found. Here is what the ECP 2.1 version of the method looks like.

#ECP 2.1 version of RepoFeed.update_cache()

def update_cache(self):
 """This method will update the cache (the RepoEntry rows) with the new
    versions of all of the data in the database.
    @param self: The method class.
    @type self: L{vmfeed.model.RepoFeed}
    @return: None
    @rtype: None
    @raise None: No exceptions are raised by this method.
    @status: Stable
    @see: L{vmfeed.model.RepoFeed.validate_enclosure}"""
 if not self.cache:
     return False
 self.retrieved_on=datetime.datetime.now()
 feed=ET.fromstring(self.cache)
 e2_log('Got a good feed %s'%feed, location=__name__)
 feedname=None
 feedname=feed.find('channel/title')
 if feedname!=None and feedname.text.strip()!="":
     e2_log('Updating feed name to %s'%feedname, location=__name__)
     self.name=feedname.text.strip()
 description=None
 description=feed.find('channel/description')
 if description!=None and description.text.strip()!="":
     e2_log('Updating description to %s'%description,\
             location=__name__)
     self.description=description.text.strip()         
 items=feed.findall('channel/item')
 if not len(items):
     return 0
 for i in items:
     name=i.find('title').text.strip()
     description=i.find('title').text.strip()
     try:
         description=i.find('description').text.strip()
     except:
         pass
     U=None
     U=i.find('uuid')
     if U!=None:
         U=U.text.strip().lower()
     else:
         U=gen_uuid()
     enclosure=None
     enclosures=i.findall('enclosure')
      
     #Only the first enclosure matters, unless it is an egg, in which
     #case we need to look at ALL of them and get the matching python
     #release version.
     #This should all get refactored actually...
     for enclosure in enclosures:
         e2_log('Enclosure is %s'%enclosure.attrib,\
                 location=__name__)
         if enclosure!=None:
             mime=self.enclosure_2_mime(enclosure)
             if not mime:
                 enclosure=None
                 continue
             elif not self.validate_enclosure(enclosure,mime):
                 enclosure=None
                 continue #Only known mime types get stored.
             else:
                 break;
              
     if enclosure!=None:
         #mime=self.enclosure_2_mime(enclosure)
         if not mime:
             continue #Only known mime types get stored.
         e2_log('Found an enclosure %s'%enclosure.attrib['url'],\
                 location=__name__)
         url=enclosure.attrib['url']
         url=self.normalize_url(url)
         try:
             if U:
                 re=RepoEntry.by_uuid(U)
             else:
                 re=RepoEntry.by_url(url)
         except:
             re=RepoEntry(url=url,\
                          name=name,\
                          description=description,\
                          feed=self)
         re.set(description=description,\
                name=name,\
                url=url,\
                retrieved_on=datetime.datetime.now(),\
                mime=mime,\
                uuid=U,\
                )
         re.sync()
         #re.retrieved_on=datetime.datetime.now()
         #re.description=description
         #re.mime=mime
         #re.uuid=U
     else:
         e2_log('No enclosures found in entry %s'%name,\
                 location=__name__)
         if enclosure:
             e2_log('Enclosure had attribs %s'%enclosure.attrib,\
                     location=__name__)
 return 1

The success of the methods' execution is based on the return value. This means that when the method fails, the invoking process is given no useful information when the method fails.

The main problem the ECP development team found with this method is that it is not very cohesive. The responsibilities of this method are very broad:

Parse XML
Initialize repository entry parameters
Iterate through item elements (while performing XML operations)
Iterate through enclosure elements (while performing XML operations)
Check if the repository entry exists and create it if not.

Finally, there is excessive logging that doesn't help with the complexity.

Here is a taste what what the ECP 2.2 version of the same method will look like.

#ECP 2.2 version of RepoFeed.update_cache()

def update_cache(self, tx=None):
  """This method will update the cache (the RepoEntry rows) with the new
    versions of all of the data in the database.
    @param self: The method class.
    @type self: L{vmfeed.model.RepoFeed}
    @return: None
    @rtype: None
    @raise None: No exceptions are raised by this method.
    @status: Stable
    @see: L{vmfeed.model.RepoFeed.validate_enclosure}"""
  self.retrieved_on=datetime.datetime.now()
  feed_xml=get_element(self.cache)
  try:
      name=get_element_text(feed_xml, element='channel/title')
      self.name=name.strip()
  except AttributeError:
      pass
  try:
      desc=get_element_text(feed_xml, element='channel/description')
      self.description=desc.strip()
  except AttributeError:
      pass
  for item in VMFeedTools.get_items_xml(feed_xml):
      try:
          name=get_element_text(item, element='title')
          name=name.strip()
      except AttributeError:
          name=None
      try:
          desc=get_element_text(item, element='description')
          desc=desc.strip()
      except AttributeError:
          desc=None
      try:
          item_uuid=get_element_text(item, element='uuid')
          item_uuid=item_uuid.strip().lower()
      except AttributeError:
          item_uuid=gen_uuid()
      enclosure=VMFeedTools.get_valid_enclosure_xml(item)
      url=get_element_property(enclosure, None, 'url')
      mime=VMFeedTools.enclosure_2_mime(enclosure)
      try:
          entry_obj=VMFeedTools.get_repo_entry(item_uuid)
      except RepoEntryNotFound, e:
          e.store_traceback()
          entry_obj=RepoEntry(uuid=item_uuid,\
                              url=url,\
                              name=name,\
                              description=desc,\
                              retrieved_on=datetime.datetime.now(),\
                              mime=mime,\
                              feed=self)
      else:
          entry_obj.set(uuid=item_uuid,\
                        url=url,\
                        name=name,\
                        description=desc,\
                        retrieved_on=datetime.datetime.now(),\
                        mime=mime)
          entry_obj.sync()