Showing posts with label virtualmachine. Show all posts

Monday, April 26, 2010

Value In The Cloud

Cloud computing is definitely one of the top buzzwords in technology today. Yet, there doesn't seem to be any consensus on what it all means. Yes, many people have there own opinions or definitions of the topic, each slightly different from one another. There aren't many agreed-upon components that make up cloud computing technology. At least one thing remains unclear; where is the value for the end user in all of this?

To better understand the some of the more obscure value propositions offered by cloud computing technology, we need to understand the shortcomings of the current technology available. That is, the current technology that allows people to make applications available over the web. There is no shortage of hosting providers out there that make it easy for both businesses and individuals to deploy an application over the web. In fact, these providers often have the application pre-built and ready to go. It really depends on what you are trying to do on the web and if it falls within the realm of the commonplace, you're in luck.

Once you want to start requiring customizations that these pre-built applications simply cannot provide, it is time to build something. If you're not a developer, you need a team of them to build you a web application that does what you need. This means that you need to figure out which framework to use, what platform it will run on, etc. These are all details that just don't interest non-technical folks. Nevertheless, it is a reality in order to survive these days.

Whether you use a pre-built, or a custom-built web application, having a presence on the web is the ultimate goal. The Internet has an ever growing user base and without if you're not there, you'll go unnoticed. The web is a big part of cloud computing but that isn't the whole story. What about desktop applications? Do they simply have no place in cloud computing? Like it or not, desktop applications are still in heavy use today. Some development efforts involving re-creating the same desktop applications that will run in a web browser. This is nice to have but it would also be nice if we could move desktop applications into the cloud without re-inventing the wheel.

Virtualization allows us to do just that. We can run our desktop applications inside a virtual machine without having to re-write the entire program. We can also access these virtual machines remotely so we don't necessarily need to see the user interface inside a web browser, even though it is possible to do so. Virtualization is another key component of cloud computing. In fact, it is probably the key to differentiating cloud computing from a more traditional web application deployment. Virtual machines can be created an destroyed upon request. They can also be moved around to different physical nodes without interruption to the running application contained within the virtual machine.

There are several well known virtualzation technologies available today that service providers can use to their advantage. If they have the hardware, they can make the these cloud computing resources available to their customers. One shortfall to using these virtualization platforms is that they are missing several key components necessary in order to provide a cloud offering. For instance, we're doing some interesting things here at Enomaly with our cloud computing platform, ECP. ECP offers features that are essential to service providers such as multi-tenancy and a highly-customizable user portal.

Having said all this, what is it about cloud computing that really sets it apart from a more traditional web application deployment? That is, where is the value? I believe the value of cloud computing is enabled by virtualization. This gives us a level of freedom to do what we want with our deployed applications previously unheard of. What can be now be done with cloud computing can also be done with a more traditional deployment, it will just cost a lot more. When you loose the ability to create you're own environment precisely as you need it, it suddenly becomes much more difficult to do things.

Virtual machines are their own self contained environment so they can also be copied. That is, once you have an environment setup the way it needs to be setup, that same work is never done twice. You simply clone the virtual machine if you need more capacity, or for some other reason. What this means is that you never have to do the same thing twice and that translates to much time saved.

This customization work can even be performed by the cloud providers themselves. This gives them an opportunity to not only provide the cloud computing services, but to also add value by providing appliances to their customers. These appliances are virtual machines that have been built for a common purpose. Typically, they will have some commonly used application stack installed and minimally configured.

In summation, there is value in cloud computing that is often overlooked because it is obscured by the more technical aspects of trying to define what exactly cloud computing is. The hidden added value cloud computing offers is sometimes hard to see unless you're using the technology that makes it possible. Mind you, this technology is still in its' infancy but the end results all look very promising.

Friday, October 2, 2009

Python Libvirt Example

Just like block device statistics can be retrieved from libvirt guest domains, network interface statistics can also be retrieved. The main modification is that the XML data comes from a different element and a different method on the domain is invoked. Here is an example of how network interface statistics are retrieved from libvirt.

#Example; Libvirt network stats.

#We need libvirt and ElementTree.
import libvirt
from xml.etree import ElementTree

#Function to return a list of network devices used.
def get_target_devices(dom):
  #Create a XML tree from the domain XML description.
  tree=ElementTree.fromstring(dom.XMLDesc(0))

  #The list of network device names.
  devices=[]

  #Iterate through all network interface target elements of the domain.
  for target in tree.findall("devices/interface/target"):
      #Get the device name.
      dev=target.get("dev")
  
      #Check if we have already found the device name for this domain.
      if not dev in devices:
          devices.append(dev)
      
  #Completed device name list.
  return devices

if __name__=="__main__":
  #Connect to some hypervisor.
  conn=libvirt.open("qemu:///system")

  #Iterate through all available domains.
  for id in conn.listDomainsID():
      #Initialize the domain object.
      dom=conn.lookupByID(id)
  
      #Initialize our interface stat counters.
      rx_bytes=0
      rx_packets=0
      rx_errs=0
      rx_drop=0
      tx_bytes=0
      tx_packets=0
      tx_errs=0
      tx_drop=0
  
      #Iterate through each device name used by this domain.
      for dev in get_target_devices(dom):
          #Retrieve the interface stats for this device used by this domain.
          stats=dom.interfaceStats(dev)
      
          #Update the interface stat counters
          rx_bytes+=stats[0]
          rx_packets+=stats[1]
          rx_errs+=stats[2]
          rx_drop+=stats[3]
          tx_bytes+=stats[4]
          tx_packets+=stats[5]
          tx_errs+=stats[6]
          tx_drop+=stats[7]
      
      #Display the results for this domain.
      print "\n%s Interface Stats"%(dom.UUIDString())
      print "Read Bytes:      %s"%(rx_bytes)
      print "Read Packets:    %s"%(rx_packets)
      print "Read Errors:     %s"%(rx_errs)
      print "Read Drops:      %s"%(rx_drop)
      print "Written Bytes:   %s"%(tx_bytes)
      print "Written Packets: %s"%(tx_packets)
      print "Write Errors:    %s"%(tx_errs)
      print "Write Drops:     %s"%(tx_drop)

Thursday, September 10, 2009

Python Libvirt Example

The libvirt virtualization library is a programming API used to manage virtual machines with a variety of hypervisors. There are several language bindings available for the libvirt library including Python. Within a given Python application that uses the libvirt library, the application can potentially control every virtual machine running on the host if used correctly. Libvirt also has the ability to assume control of remote hypervisors.

Virtual machines, or guest domains, have primary disks and potentially secondary disks attached to them. These block devices and even be added to a running virtual machine. But just like a physical host, it helps to know exactly how the virtual block devices for a given virtual machine are being utilized. This way, potential problems may be addressed before they occur. Libvirt provides the ability retrieve such statistics for these devices. Here is a Python example of how to do this.

#Example; Libvirt block stats.

#We need libvirt and ElementTree.
import libvirt
from xml.etree import ElementTree

#Function to return a list of block devices used.
def get_target_devices(dom):
   #Create a XML tree from the domain XML description.
   tree=ElementTree.fromstring(dom.XMLDesc(0))
  
   #The list of block device names.
   devices=[]
  
   #Iterate through all disk target elements of the domain.
   for target in tree.findall("devices/disk/target"):
       #Get the device name.
       dev=target.get("dev")
      
       #Check if we have already found the device name for this domain.
       if not dev in devices:
           devices.append(dev)
          
   #Completed device name list.
   return devices

if __name__=="__main__":
   #Connect to some hypervisor.
   conn=libvirt.open("qemu:///system")
  
   #Iterate through all available domains.
   for id in conn.listDomainsID():
       #Initialize the domain object.
       dom=conn.lookupByID(id)
      
       #Initialize our block stat counters.
       rreq=0
       rbytes=0
       wreq=0
       wbytes=0
      
       #Iterate through each device name used by this domain.
       for dev in get_target_devices(dom):
           #Retrieve the block stats for this device used by this domain.
           stats=dom.blockStats(dev)
          
           #Update the block stat counters
           rreq+=stats[0]
           rbytes+=stats[1]
           wreq+=stats[2]
           wbytes+=stats[3]
          
       #display the results for this domain.
       print "\n%s Block Stats"%(dom.UUIDString())
       print "Read Requests:  %s"%(rreq)
       print "Read Bytes:     %s"%(rbytes)
       print "Write Requests: %s"%(wreq)
       print "Written Bytes:  %s"%(wbytes)

Tuesday, March 17, 2009

ECP three-level machine abstraction

The Enomaly Elastic Computing Platform is a platform for managing distributed virtual machines. Therefore, we need some type of abstract representation of this concept. This requirement isn't really any different from any other software problem. There is a problem domain which contain concepts unique to that domain. Developers will then try to capture what that concept represents in that domain by creating an abstraction. By creating abstractions in this manor, we lower the representational gap between the domain and how concepts in that domain are realized in the solution. In the case of ECP, there is a real need to represent the idea of machines.

In any given software solution, the abstraction created by developers may be a very simple, single layer abstraction architecture or there could potentially be several layers within the architecture, yielding an extremely complex architecture. In the latter case, without a well thought out design, we start to lose the value that creating an abstraction brings in the first place. Sometimes, when dealing with a large abstraction, further dividing this abstraction into layers can help to better understand what you as a developer are actually implementing. Often, the abstraction design is further complicated by constraints imposed by the system or framework within which we are developing. Rationale, interfaces, and consistency in general, need to be taken into consideration when constructing a layered abstraction architecture.

To implement the machine abstraction, ECP uses a three-level approach to realizing this abstraction. In this architecture, each level is a class that realizes a different level of the "machine" concept, and for different purposes than other layers. In this implementation, the three levels hierarchical. At the top level, we have a class called ActualMachine which implements several methods for invoking machine behavior. The next level contains a class called DummyMachine that inherits from ActualMachine and doesn't do much. Finally, we have a Machine class that can store persistent data to the database. Hierarchically, the DummyMachine and Machine classes are at the same level since they both inherit from ActualMachine. In this discussion, however, the levels aren't necessarily based on the class hierarchy but rather based on the rationale behind each class.

The ActualMachine class is meant to most closely represent the concept of "machine" in the context of ECP. The same symbolizes that this is the underlying machine, not a Python object. Obviously, instances of ActualMachine are Python objects but when using these objects, we are more interested in what the underlying technology. This class is where all the behavior for the machine concept is defined. This class doesn't define any data attributes.

The DummyMachine class is exactly what the name implies; a dummy. The class simply defines a constructor that allows attributes to be set. Also, the class inherits all the behavior from ActualMachine. Instances of DummyMachine can set attributes in the constructor and invoke behavior provided by ActualMachine.

The Machine class provides persistence for the machine abstraction in ECP. The class also inherits behavior from the ActualMachine class. Machine functions similar to DummyMachine in that they both provide the same behavior. The difference between the two is that DummyMachine stores attributes in memory while Machine stores attributes in the database.

The rationale behind this architecture is that we want to be able to instantiate machine instances while not affecting the database. The opposite is also true; we need to be able to instantiate machines that will have an immediate effect on the database. Within the context of the ECP RESTful API, machines that are not stored on the local machine (they are retrieved from another ECP host), will need to be instantiated. That is, we want to have an abstraction available to use once the remote machine data has arrived. This can be done by using some primitive data construct such as a list or a dictionary, but by doing this we lose the machine concept. The behavioral aspect of the machine concept is gone because you can't tell a dictionary to shutdown.

There are still several limitations to this approach. For instance, not all ActualMachine behavior will be supported by the DummyMachine instances that are created. This is simply a limitation of the three classes and their inter-relationships. It is still an improvement over representing domain concepts using primitive types. We give ourselves more control in the three-level architecture over what happens when requested behavior cannot be fulfilled. The DummyMachine layer is an example of mixing the problem domain with the solution domain. The class came into existence because the solution demanded it. But this design allows for the behavior provided by the machine instances to still behave like "machines" without conforming too much to the solution constraints.

A similar approach is taken in ECP with other abstractions such as packages. The architecture hasn't been fully implemented for every abstraction within the platform. It will hopefully prove to add some balance between constraints and offered functionality.

I'm sure this approach prove useful in many other application areas. As objects become more and more distributed, we'll need a better way to represent their data when used locally while preserving the behavior of that object.

Monday, March 9, 2009

Trying to break Python memory consumption

I've been trying to find consistent method to raise a MemoryError exception in Python and have so far been unsuccessful. I'm mostly interested in real-world usage scenarios such has executing huge number arithmetic. Here is what my latest test looks like.

#Example; Massive memory consumption.

if __name__=="__main__":
   list_obj=[]
   cnt=0
   while cnt<10**24:
       list_obj.append(cnt**2)
       cnt+=1

Here, we are simply appending progressively larger integers to a list. Executing this on my Ubuntu laptop results a huge increase in memory consumption. Because we are constructing a massive list object, this would make sense. However, a memory error isn't raised for me. Figuring that my laptop has enough memory to not let any kind of memory mishap take place for awhile, I fired-up a virtual machine with much less memory capabilities and executed the same program. The Python process ends up being killed. No MemoryError exception. What gives?

If I want to handle memory errors in Python, how can I deal with it if the process is terminated before I get a chance to?

Thursday, March 5, 2009

Libvirt 0.6.1

Looks like the latest Libvirt is now available (0.6.1). Only a couple new API features and some maintainance tasks but nonetheless moving forward.

Tuesday, February 24, 2009

Optimistic provisioning in the cloud

One of the technological problems that cloud computing technologies are supposed to solve is the lack of computing power when it is needed. Computing on demand, so to speak. The elasticity of the cloud enables this.

The classic example of this is when a web site operating in the cloud gets "slashdotted" and does not have the necessary computing resources required to fulfil the requests, your site dies and readers (soon to be ex-readers) will be disappointed. Luckily, your site is running in a cloud environment and has the ability to "expand" its' computing when the demand requires it.

What happens when the actual expansion takes place? Generally, a new virtual machine is created and that machines' resources are now available to the process that requires it. The process in this context refers to the overall business requirement that caused the expansion event in the first place. The process that says "give me more computing power" may in fact result from a general discussion amongst several nodes in the cloud.

Here, we have a simple controlling process that handles requests. These may be client requests or requests from other nodes in the same cloud. The controlling process then forwards the request to a resource management process. It is the responsibility of the resource manager to ensure that computing resources are available to fulfil every request. This is where the bottleneck lies, in has_resources(). In the most common case, there are plenty of resources available and has_resources() has very little work to do. However, when resources start to dry up, it needs to make more resources. This is where the costly work of the resource manager lives. It would be great if there were some way to know ahead of time what the peak resource demand will be.

Unfortunately, there is really no reliable way to do this. The best we can do in this situation is guesswork. The resource manager could monitor the distance between the size of resource requirements in a given time interval. Certain thresholds could then be set and once reached, we could then provide resources based on what the probable resource demand will be in the near future.

For instance, lets say I have a simple running within a cloud environment. I post a new entry, "a ton of traffic". Now, before I post this entry, I have an average demand of 5 requests per hour. An hour after posting, the resource manager notices that my average has doubled to 10 requests per hour. This is something that could be handled very comfortably be my service. However, the suddenness of this relatively large change could put the resource manager on alert. Now, hour two after posting "a ton of traffic", the number of requests reaches 20 requests per hour. It seems that this raising demand trend is continuing. The resource manager would then proceed to making more resources available.

With this approach, there is always the risk of over-provisioning resources. This type of data can be misleading. However, it does lend a guiding light toward proactive provisioning. Besides, if the statistical data is misleading, it is better to cleanup over-provisioned resources than being trying to do a huge provision job during the high resource demand.

Wednesday, February 4, 2009

How to count Python virtual instructions

Recently, I've had a need to count how many virtual instructions will be executed by the Python virtual machine for a given function or method. It turns out, there is no standard way to do this. But, Python being Python, there is always an easy way out.

The built-in dis module allows us to disassemble the byte-code for any given Python object. However, the challenges are that the results are printed rather than returned. Also, even if the output were returned, we still need to perform some action that will count the number of instructions.

Here is a simple example of what I came up with.

#Virtual instruction count

import sys
import dis

def my_function(op1, op2):
   result=op1**op2
   return result

class VICount:
   def __init__(self, obj):
       self.instructions=[]
       sys.stdout=self
       dis.dis(obj)
       sys.stdout=sys.__stdout__
      
   def write(self, string):
       if string.strip() in dis.opname:
           self.instructions.append(string)
      
   def count(self):
       return len(self.instructions)

print VICount(my_function).count()

In this example, I have a function called my_function(). I would like to determine the number of virtual instructions the Python virtual machine will execute when this function is invoked. For this purpose, I've created a VICount class that is used to count the virtual instructions. The constructor of this class will accept an object to disassemble. We also initialize the list that will store the names of the virtual instructions that are found.

Next, in the constructor, we need to change where the print statements executed by the dis.dis() function go. We do this by changing the sys.stdout file object to self. This is legal since VICount provides a writable file object interface by implementing a write method.

Finally, in the constructor, we need to restore the sys.stdout file object to its' original state.

The write() method is invoked by the print statements executed by dis.dis(). If the string being written is a valid instruction, it is appended to the instruction list.

The count() method simply returns the length of the instruction list. In the case of my_function(), we have six virtual instructions.

Wednesday, January 28, 2009

ECP 2.2 released

ECP 2.2 has finally been released. Outlined below are the changes.

Core

The ECP installer will now automatically generate a uuid for the host. Also, the installer will now synchronize with the local package repository if one exists. Several Xen fixes are now carried out by the installer that allow ECP to better manage Xen machines. The exception handling has also been drastically improved in the installation process.
The core ECP data module has many new features as well as many bug fixes. Several subtle but detrimental object retrieval issues have been resolved. This alone fixed several issues that were thought to be GUI related in previous ECP versions. The new features include added flexibility to existing querying components and newer, higher level, components have been implemented. These newer components build on the existing components and will provide faster querying in ECP.
The configuration system has gone through a major improvement. It is now much easier and efficient to both retrieve and store configuration data. This affects nearly any ECP component that requires configuration values.
The extension module API now allows extension modules to register static directories as well as javascript. Some of the core extension modules are already taking advantage of this new offered capability. This helps balance the distribution of responsibilities and increases the separation of concerns among ECP components.

GUI

There have been many template improvements that promote cross-browser compatibility. Many superfluous HTML elements have been removed and others now better conform to the HTML standard.
A new jQuery dialog widget has been implemented. This widget is much more robust and visually appealing than the dialog used in previous ECP versions.
General javascript enhancements will give the client a nice performance boost and improve on the overall client experience.

Testing

With an emphasis on improving the ECP RESTful API design in this release, the requirement for automatically invoking various ECP resources came about. Included in this release is a new client testing facility that can run tests on any ECP installation. Although the tests are limited, they continue to be expanded with each new ECP release.

vmfeed (extension module)

A big effort has been undertaken in analyzing the deficiencies with the previous versions of the vmfeed extension module in order to drastically improve its' design for this release. One of the major problems was the lack of consistency in the RESTful API provided by vmfeed. Some of the resource names within the API were ambiguous at best while some important resources were missing entirely. There has been a big improvement in both areas for this release.
Another problem was the actual design of the code that actually drives the extension module. Much of the code in vmfeed has been re-factored in order to produce a coherent unit of functionality. As always, there is still room for improvement which will come much more easily in future iterations as a result of these changes.

machinecontrol (extension module)

In previous ECP versions, when operating in clustered mode, removal of remote hosts was not possible. This has been corrected in this release.
The machinecontrol extension module will now take advantage of the new ECP configuration functionality.
When deleting machines, they are now actually undefined by libvirt.

static_networks (extension_module)

The static_networks extension module will now use the newer ECP core functionality in determining the method of the HTTP request.
Refactoring has taken place to remove the static_networks javascript from the core and into the actual extension module package. This improves the design of both the static_networks extension module while reducing the complexity of the ECP core.
The static_networks extension module will now take advantage of the new ECP configuration functionality.

transactioncontrol (extension module)

The transactioncontrol extension module will now use the newer ECP core functionality in determining the method of the HTTP request.

clustercontrol (extension module)

Major improvement in the RESTful API design. Some invalid resources were removed while others were improved upon.
The clustercontrol extension module will now use the newer ECP core functionality in determining the method of the HTTP request.
The clustercontrol extension module will now use the newer ECP core functionality in determining the method of the HTTP request.

Wednesday, August 6, 2008

The problem of migrating virtual machines

The problem in cloud computing is the need to send virtual machines over a network connection. This basic requirement is illustrated in the following diagram.

Virtual machines need to have the ability to migrate from one physical machine to another. The problem is the huge latency involved, caused by the network bottleneck of transferring these machines over a network. The basic idea is illustrated in the following figure.

As you can see, the target physical machine has to wait until the entire machine has arrived before anything can be done with it. It would be nice if there were something the target machine could do with partial virtual machine data. In essence, a streaming virtual machine migration.

Subscribe to: Posts ( Atom )