Showing posts with label objectorientation. Show all posts

Friday, April 29, 2011

Statistical Objects

Software is really good at keeping statistical records. We can write code that stores raw data we want to keep track of. We can then display this data in a user friendly way, maybe as a chart of some kind. In addition, the stored raw data can be mined for trends, and other anomalies. Imagine trying to function in a modern scientific discipline without this capability - software that aids in statistical analysis. It would be nearly impossible. There is simply too much data in the world for us to process without tools that extract meaning for us. This same complexity phenomenon is prevalent in modern software systems. External tools will monitor the externally-visible performance characteristics of our running system for us. Inside the code, however, is a completely different ball-game. We really don't know what is happening at a level we can grasp. Logs help - they're good at reporting events that take place, things like simple tasks, errors, and values for debugging. But the traditional style of logging can only take us so far in terms of the true nature of how our software behaves in it's environment. Object systems should retain this information instead of broadcasting it to a log file and forgetting about it.

Statistics software is different from software statistics. Software statistics is data about software itself, things like how long it takes to process an event or respond to a request. Running software can track and store data like this in logs. Why do we need this data? The short answer, we need it to gauge characteristics, the otherwise intangible attributes of our software exhibits during it's lifespan. The system logs, in their chronological format can answer questions like “when was the last failure?” and “what was the value of the url parameter in the last parse() call?”. Further questions, questions about an aggregate measure of quality or otherwise, require a tool that will parse these logs. Something that will answer questions such as “what is the average response time for the profile view?” or “which class produces the most failures”?

These latter questions need a little more work to produce answers. But first, are runtime characteristics the same thing as application logs?

Well, yes and no. Logs are statements that we're making about an occurrence in time, an event. This is why logs are typically timestamped and what makes them such an effective debugging tool. Something went wrong at 10:34:11? The error logs just before then say something about permission problems. I now make a few adjustments, perhaps even modify the logging a little, and problem solved. Characteristics of a running system, on the other hand, equate to qualities that cannot be measured by a single event. The characteristics of running software changes over time. We can say things like “the system was performing well yesterday” or “two hours ago, the disk activity spiked for five minutes”.

Software logs are like a newspaper. Not an individual paper, but the entire publication process. Events take place in the world and each event gets logged. We, as external observers, read about them and draw our own conclusions. Software characteristics are more like the readout on car dashboards that tell you what the fuel economy is like over the past several kilometres. This can answer questions such as “how much money will I need this week for gas?”.

Its not as though we cannot log these characteristics to files for the consumption of external tools to analyze and provide us with meaningful insight. We can, but that doesn't really suit the intent of logging events. Events are one-time occurrences. Characteristics, or traits, is something that is established over time. Our system needs to run and interact with its environment before anything interesting and be accumulated and measured. The question is, how is this done? If we want to measure characteristics of our software, how it behaves in its environment over time, we'll need an external observer to do it for us, to take measurement. External tools can give us performance characteristics or point to trends that cause are software to fail. These things are limiting in some respects because they say nothing about how the objects inside the system interact with one another and the resulting idiosyncrasies.

In a system composed of objects, wouldn't it be nice to know how they interact with one another? That is, store a statistical record of the system's behaviour at both an individual object level and at a class level? This type of information about our software is meta-statistical – stats that only exist during the lifetime of our software. Once it's no longer running, the behavioural data stored about our objects is meaningless because this could change entirely once the software starts up again. If we can't use a report generated by a running system to improve something, say, our code, or development process, or whatever, what value does it have?

For example, suppose we want to know how often an instance of the Road class is created. We might also be interested in how the instance is created – which object was responsible for it's instantiation? If I want to find out, I generally have to write some code that will log what I'm looking for and remove it when I'm done. This is typical of the debugging process – make a change that will produce some information that we'd otherwise not be interested in. This is why we remove the debugging code when we're finished with it – its in the way. Running systems tested as being stable don't need the added overhead of keeping track of stuff like which object has been around the longest or which class has the most instances. These statistics don't seem valuable at first, but we add code to produce it when we need it. When something goes wrong, it certainly comes in handy. Maybe we don't need to get rid of it entirely.

Maybe we want to use statistical objects in our deployed, production systems after all. Maybe this will prevent us from having to piece together logic that helps us diagnose what went wrong. Logs are also handy in this sense, for figuring out what happened leading up to the failure. Recall that logs are system occurrences, a chronological ordering of stuff that happens. We can log as much or as little as we please about things that take place while our software is running. But, too much information in the log files is equally useless as not having enough information to begin with.

The trouble with storing statistical data about our running system – data about software objects in their environment – is the overhead. If overhead weren't an issue, we probably wouldn't bother removing our debug code in the first place. In fact, we might build it into the system. Code that stores interesting information. Insightful information. Developers dream of having the capability to query a running system for any characteristic they can think of. Obviously this isn't feasible, let alone possible. To store everything we want to know about our running system would be an exercise in futility. It simply cannot be done, even if we had unlimited memory. The complexities involved are too great.

So what can we do with statistical software objects that speak to meta properties of the system? Properties that will guide us in maintenance or further modifications to the software in the future. Its simply, really. We only store the interesting meta data on our objects. Just as we only log events we feel are relevant, we only keep a record of system properties that are statistically relevant to a better performing system. Perhaps a more stable system or any other unforeseen improvement made as a result of the meta data being made available to us. This way we can be selective in our resource usage. Storing things that don't help us, doesn't necessarily make sense, although, the usages sometimes wont reveal themselves until its too late and you don't have the data you need and have a heroic debugging effort to deal with.

For example, say you're building a painter application, one that allows you to draw simple shapes and move them around. You can keep track of things like how many Circle and Rectangle objects are created. This falls into the domain statistics, however, because this is what the application does. This information is useful to the user, potentially, but not so much the developer, or the application itself. It is more a characteristic of the user than the software. But what if we knew how each shape was instantiated? Did the user select the shape from a menu, or did they use the toolbar button? Perhaps these two user interface components have different factories that create the objects. Which one is more popular? How can we, and by we I mean the software, exploit this information to function better? Using this information is really implementation dependent, if used at all. For example, the painter application could implement a cache for each factory that creates shapes. The cache stores a shape prototype that gets copied onto the drawing canvas when created. Armed with meta-statistical information about our system, we can treat one factory preferentially over another, perhaps allocating it a larger cache size.

The preceding example is still reflective of the domain itself. Sure, the implementation of our software could certainly benefit from having it, but what about problematic scenarios that are independent of the domain? For example, disk latency may be high in one class of objects, while not as high in another class. Again, this is does depend on the user and what they're doing, but also on factors external to the software, such as hardware or other software processes competing for resources. Whatever the cause, we give our system a fighting chance to adapt, given sufficient data. Sometimes, however, there really isn't anything that can be done to improve the software during runtime. Sometimes, external factors are simply too limiting, or maybe there is a problem with the design. In either case, the developers can query the system and say “wow, should definitely be running with more memory available” or “the BoundedBox class works great, except when running in threads”.

Of course, we're assuming we have the physical resources to store all this meta data about our software, data that highlights the running characteristics of it. We might not have the luxury of free memory or maybe writing to the disk frequently is out of the question. In these situations, it might make sense to have the ability to turn off statistical objects. You could run your software with them turned on in an environment that can handle them. When it comes to deploying to its live, production environment, shut off the extraneous stuff that causes unacceptable overhead. More often than not, however, there are more than enough physical resources in today's hardware deployments to handle the extra data and processing power required by statistical objects. If you've got the space, utilize it for better software. The reality is, as our software grows more complex, we'll have no choice but to generate and use this type of information to cope with factors we cannot understand.

Tuesday, March 2, 2010

Exceptional Development

Exceptions are raised when an exceptional condition is true within a program. Most object-oriented languages define a base exception class which developers can use to derive custom exception hierarchies.

These exception hierarchies can grow to be quite large, even in production systems. This makes sense if we want to use the exception handling mechanism in a polymorphic way. We handle all exceptions at one level of the hierarchy, including all descendant exceptions, while ignoring all exceptions at a higher level up.

These hierarchies allow us to reason about what as gone wrong. So using an aggressive approach to exception handling during initial development might make a lot of sense. Construct a list of every conceivable exceptional path that isn't part of the successful path. Generalize some of the exceptions so you avoid unnecessary duplication.

With this first initial exception hierarchy, you can pound out a first iteration quickly. The fact that none of these exceptions in the hierarchy are raised is an indicator that the iteration is complete.

Friday, October 16, 2009

Python Super Classes

The Python programming language is considered to be an object-oriented language. This means that not only must it support classes, but it must also support inheritance in one form or another. Inheritance is the principle of object-oriented software development that allows developers to say class A "is a kind of" class B.

Not all object-oriented languages support it, but multiple inheritance is another form of inheritance what allows developers to say class A "is a kind of" class B "and is also a kind of" class C. The Python programming language does support multiple inheritance and can support designs that employ the principle when needed.

Opponents of multiple inheritance say that it is an unnecessary feature of object oriented languages and in most cases, they are correct. Not necessarily correct about the fact that multiple inheritance shouldn't be a language feature, but about the design itself. Like anything else, multiple inheritance can be abused and actually hurt the software. Most of the time, it isn't needed and something more simplistic in design terms is ideal.

Consider the following class hierarchy. Here we have a Person class that acts as the root of the hierarchy. Next, we have an Adult and a Remote class, both of which inherit directly from Person. Finally, the Student class inherits from both Adult and Remote. The Student class uses multiple inheritance.

This is an example of where multiple inheritance may come in handy. The Remote class represents something that isn't local. This could be a Student or something else in the system. Since it is required that Student inherit from Adult, it makes sense that it also be able to inherit from Remote. A student can be both things.

Below is an example of this class hierarchy defined in Python. The super() function really helps us here because the Student class would otherwise need to invoke the individual constructors of each of its' super classes. Not only is this less code, it is also more generic of a design. All Student instances base class constructors will continue to be invoked correctly, even as these base classes change.

#Example; Using the super() function.

#Root class.
class Person(object):
  def __init__(self):
      super(Person, self).__init__()
      print "Person"
    
#Adult class.  Inherits from Person.
class Adult(Person):
  def __init__(self):
      super(Adult, self).__init__()
      print "Adult"
    
#Remote class.  Inherits from Person.
class Remote(Person):
  def __init__(self):
      super(Remote, self).__init__()
      print "Remote"
    
#Student class.  Inherits from both Adult and Remote.
class Student(Adult, Remote):
  def __init__(self):
      super(Student, self).__init__()
      print "Student"

#Main.
if __name__=="__main__":
  #Create a student.
  student_obj=Student()

Tuesday, July 7, 2009

Leaky Abstractions

In an interesting entry, the issue of failed software abstractions is brought up. The title of the entry may be slightly misleading, stating that all abstractions are failed abstractions. In the majority of cases, I would agree that a given abstraction is destined to fail at one point or another. However, there are certain abstractions that can't fail. These are the types of abstractions that are miniscule by comparison with everyday abstraction a developer is likely to see. They are less likely to fail simply because of their small size. Attempting to stuff the entire world into a single class is infinitely stupid and should be avoided, obviously. That being said, there are small abstractions that fail as is illustrated in the entry.

The entry also discusses "leaky" abstractions. What exactly is a "leaky" abstraction? In object-oriented software development, a leaky abstraction can be viewed as an abstraction that doesn't hide underlying problems or inconsistencies. One of the key principles of object-oriented software development is encapsulation, or information hiding. So, it is not unusual for a given high-level software abstraction to use low-level functionality under the covers. This is exactly where the leaks can occur. Technology is almost never perfect, especially when attempting to design code that will run on multiple platforms. If this isn't taken into consideration, leaks will occur. A developer could design the worlds most pristine abstraction that could end up leaking because he didn't consider a subtlety in the underlying technology encapsulated within the class.

The entry uses an object-relational mapper to illustrate an abstraction leak. This has become a popular abstraction for developers in recent years and the problems caused by it are apparent across most if not all software that provides this technology. This particular problem would be an example of a solution domain abstraction leak since the abstraction applies to any business domains using it. One may argue that there exist countless stable software solutions that use object-relational mapper technology and I would agree. I would argue that these projects also had to implement there own niche object-relational mapper abstractions on top of the third party packages just to make them functional. And there is nothing wrong with this because these projects now contain abstractions that do not leak. If this additional abstraction layer weren't implemented, it is nearly impossible to discover where the problem originated, hence the term leak. These types of abstraction leak bug-hunting sessions aren't all that different from memory leak bug-hunting sessions.

In addition to the solution solution domain abstraction leak category, which the object-relation mapper problem falls under, there can also be leaks associated with problem domain abstractions. If the underlying business logic is not fully realized by the abstraction, it can leak in mysterious ways. Abstraction leaks can be as serious a bug as a memory leak and be just as difficult to locate and correct. The main difference being that once a memory leak has been fixed, it is fixed. This is a low-level solution domain issue. Fixing an abstraction leak can have adverse effects on the abstraction itself since the quality of the abstraction matters. It is less-than-ideal to patch an abstraction. Especially in the problem domain.

Wednesday, May 13, 2009

Two Ways to Visualize Method Invocations

Analogy making lies at the heart of object-oriented software development. Developers make analogies to objects in the domain of interest to yield software objects. It is this concept of classes and instances of these classes that make object orientation such a powerful paradigm. When designing object-oriented software, it is sometimes helpful to visualize more than one abstraction for any given solution. As any developer knows, there are several ways to implement a given solution. The act of visualizing different approaches to these abstractions will often yield something in between. The same idea can be applied to understanding and designing the behavioral aspects of software objects. Designing of the behavior of software objects equates to building methods for those objects and how these methods are invoked. There is more than one way to visualize these method invocations.

Method invocations can be visualized as the sending of a message. In this abstraction, the instance that makes the invocation can be thought of as the sender. The instance that implements the method can be though of as the receiver. The method name and method parameters can collectively be thought of as the message. This approach is illustrated below.

An alternative way to visualize method invocations is as the publishing of an event. In the event publishing abstraction, the instance that implements the method can be thought of as the subscriber. The instance that invokes the method can be thought of as the publisher. The method name and the method parameters can collectively be thought of as the event. This approach, not all that different from the first, is illustrated below.

Which approach is the right one? Either. Since this is simply an abstraction visualization strategy, the right approach is the one that yields the better code, and thus, the better software. However, some developers may find the message method invocation visualization approach to be more useful when designing single, one-time method invocations. The event method invocation visualization might prove more useful when designing a polymorphic method invocation over a set of instances.

Friday, February 6, 2009

Python CPU Usage

I recently needed to obtain the CPU usage in a Pythonic-way. After some searching, I found this example. I've adapted it to fit my needs and to fit a more general purpose usage. Here is my version of the example.

#Python CPU usage example.

import time

class CPUsage:
   def __init__(self, interval=0.1, percentage=True):
       self.interval=interval
       self.percentage=percentage
       self.result=self.compute()
      
   def get_time(self):
       stat_file=file("/proc/stat", "r")
       time_list=stat_file.readline().split(" ")[2:6]
       stat_file.close()
       for i in range(len(time_list))  :
           time_list[i]=int(time_list[i])
       return time_list
  
   def delta_time(self):
       x=self.get_time()
       time.sleep(self.interval)
       y=self.get_time()
       for i in range(len(x)):
           y[i]-=x[i]
       return y   

   def compute(self):
       t=self.delta_time()
       if self.percentage:
           result=100-(t[len(t)-1]*100.00/sum(t))
       else:
           result=sum(t)
       return result
  
   def __repr__(self):
       return str(self.result)
  
if __name__ == "__main__":
   print CPUsage()

Here, we have a class called CPUsage. This class is capable of getting the CPU usage as a percentage or in milliseconds. This is specified by the percentage constructor parameter. The interval constructor parameter specifies the length of time we are measuring. This defaults to 0.1 seconds. Finally, the constructor invokes the compute() method to store the result.

The get_time() method will return a list of CPU numbers needed to compute the number of milliseconds the CPU has been in use for the interval we are measuring.

The delta_time() method will accept a time list returned by the get_time() method. It will then return the delta time based on the same list format.

The compute() method will calculate the percentage of CPU usage or simply return the CPU time.

To me, this makes for a more object-oriented approach. This class can also be modified to suite a different usage scenario.

Friday, October 10, 2008

Object orientation

This is a continuation of my previous object orientation discussion. There, I gave my introductory thoughts on why object is beneficial in most circumstances. Not just as a fancy paradigm buzzword but as actual sustainable design. The class provides developers a taxonomic mechanism to represent abstractions in both the problem domain and the solution domain.

Encapsulation

Encapsulation in the context of object orientation means data concealment. But why do developers want to hide the internal data structures? Moreover, in cases where the developer is working by himself on some component, they are basically hiding the data from themselves.

So why is encapsulation an important concept in object orientation? The goal behind encapsulation is to hide irrelevant details from the outside world. A real life example of encapsulation is as follows. When you drive a car, several engineering details are hidden away from the driver. All the driver wants to do is move forward. When the puts the gearshift into "drive", several complex actions are executed by complex structures. Why doesn't the driver care about any of this? Because she can accomplish her goal without this knowledge overhead.

There are two sides of encapsulation. The first side being data concealment. The second being behavioural concealment. There are times when hiding functionality from the outside world is useful. Such as when there exists a method that is only used by other methods of the same class.

The following example is Python code that does not incorporate encapsulation into the design:

class BlogEntry:
 def __init__(self):
     self.title=''
     self.body=''
  
if __name__=='__main__':
 blog_entry_obj=BlogEntry()
 blog_entry_obj.title='Test'
 blog_entry_obj.body='This is a test.'

Although by default all attributes of Python instances are publicly accessible, this example directly uses attributes of the instance. This is not good programming practice and violates the idea of an encapsulated abstraction. Here is a new implementation:

class BlogEntry:
  def __init__(self):
      self.title=''
      self.body=''
    
  def set_title(self, title):
      self.title=title
    
  def set_body(self, body):
      self.body=body
    
  def get_title(self):
      return self.title

  def get_body(self):
      return self.body
    
if __name__=='__main__':
  blog_entry_obj=BlogEntry()
  blog_entry_obj.set_title('Test')
  blog_entry_obj.set_body('This is a test.')

All we have done here is add a few methods to the class definition and invoke them where instances of the class are used. The key idea here is that we hide the internal structure from the main program (in this example). In the first example, we alter the instance state directly. In the last example, we alter the instance state behaviourally. This is the proper way to interact with software objects. We are only concerned with how we can behaviourally affect instances.

Thursday, September 25, 2008

Object orientation

I figured I would give my thoughts on my view toward object orientation and it's benefits. Also, why it makes a difference using an object-oriented programming language as opposed to a functional or procedural language.

I think a definition would serve its purpose here. What is object orientation? Object orientation in the context of software development is a practice that developers undertake in order to better represent abstractions. Notice I'm referring to the idea of object orientation and not the features that define an object-oriented programming language. You could, in theory, practice object orientation in a functional language. You would just need to implement the object-oriented language features using the language itself. Which doesn't gain anyone anything.

Lots of developers use functional languages today and there is absolutely nothing wrong with that it it is done right. Good design is good design. The C programming language has been around for decades and is going nowhere fast. It is cross-platform, fast, and relatively easy to program in.

C++ is the object-oriented C (an increment of C). It defines the object-oriented constructs necessary for C to be considered an object-oriented programming language. Enough of the history lesson already.

Class

The class lies at the very heart of object-orientation. The class classifies things. This makes sense in software because even when developing in a functional language, you are still implicitly modeling abstractions. For example, consider the following functional code that defines a user and some behavior that the user is capable of.

def create_user(user_name=None, password=None, is_active=None):
  user={}
  user['user_name']=user_name
  user['password']=password
  user['is_active']=is_active
  return user

def is_user_active(user):
  if user['is_active']:
      return True

if __name__=='__main__':
  user=create_user(user_name='test',\
                   password='123',\
                   is_active=True)
  print 'Is the user active?',is_user_active(user)

Here we have a user abstraction that is very difficult to comprehend. Lets take a look at the equivalent object-oriented functionality.

class User:
   def __init__(self, user_name=None, password=None, is_active=None):
       self.user_name=user_name
       self.password=password
       self.is_active=is_active
      
   def is_active(self):
       return self.is_active
  
if __name__=='__main__':
   user=User(user_name='test',\
             password='123',\
             is_active=True)
   print 'Is the user active?',user.is_active()

Here, it is much more obvious that we are dealing with a user abstraction. In the main program, we actually create a user object. This object then encapsulates the user essence. That is, the behavior associated with the concept of a user is bound to that object. This is illustrated when we invoke user.is_active().

I'll continue this discussion as part two (or something like that) because I have much to say about other object-orientation topics.

Subscribe to: Posts ( Atom )