Showing posts with label benchmark. Show all posts

Monday, October 26, 2009

Python Named Tuples

Tuples in Python are similar to lists in Python. The main difference of course being that tuples are immutable data structures. This means that once a tuple is instantiated, elements cannot be added or removed from the tuple as they can in list instances. The benefit to using tuples in Python applications is that they are used more efficiently by the interpreter simply because they are of fixed length.

The collections module provides efficient container structures that expand upon the primitive Python container types such as tuples, lists, and dictionaries. One of the container types offered by the collections module is the named tuple. This functionality is available in Python 2.6 or later. A named tuple is essentially a tuple which enables elements to be referenced by a field rather than an integer index, although the index may still be used as well. Below is an example of how to create an use a named tuple.

#Example; Named tuple benchmark.

#Do imports.
from collections import namedtuple
from timeit import Timer

#Create a named tuple type along with fields.
MyTuple=namedtuple("MyTuple", "one two three")

#Instantiate a test named tuple and dictionary.
my_tuple=MyTuple(one=1, two=2, three=3)
my_dict={"one":1, "two":2, "three":3}

#Test function.  Read tuple values.
def run_tuple():
  one=my_tuple.one
  two=my_tuple.two
  three=my_tuple.three

#Test function.  Read dictionary values.
def run_dict():
  one=my_dict["one"]
  two=my_dict["two"]
  three=my_dict["three"]

#Main.
if __name__=="__main__":

  #Setup timers.
  tuple_timer=Timer("run_tuple()",\
                    "from __main__ import run_tuple")
  dict_timer=Timer("run_dict()",\
                   "from __main__ import run_dict")

  #Display results.
  print "TUPLE:", tuple_timer.timeit(10000000)
  print "DICT: ", dict_timer.timeit(10000000)

Here, we create a new named tuple data type, MyTuple, by invoking namedtuple(). The namedtuple() is a factory function provided by the collections module. It is a factory function because it takes a set of fields as a parameter and assembles a new named tuple class. Next, we create a MyTuple instance by supplying it the tuple data.

Now we have two instances; my_tuple is a named tuple while my_dict is an ordinary dictionary instance. Next, we have two functions that will read values from our two data structure instances, run_tuple() and run_dict().

When I run this example, the run_tuple() takes significantly longer to execute than run_dict() does. So what does this mean? Well, what it means to me is that if you are already using dictionaries to read data in your program, keep using them. Especially if the elements are referenced by key.

They power of named tuples comes into play when developers have no choice but to deal with tuples. These tuples may be returned from some other developers code, or they just can't change it for whatever reason. Rather than having to stare at a meaningless integer index, named tuples can add meaning to code which can in turn have a huge impact.

Thursday, September 17, 2009

Python Benchmarks

With new-style Python classes, one can help cut down on the memory cost associated with creating new instances. This is down by using the __slots__ class attribute. When the __slots__ attribute is used in the declaration of new-style classes, the __dict__ attribute is not created with instances of the class. The creation of the __dict__ attribute can be expensive in terms of memory when creating large numbers of instances. So, the questions that begs itself is why not use the __slots__ attribute for all classes throughout an application? The simple answer is, because of the lack of flexibility offered by attributes that live inside memory that has been pre-allocated by a slot. The other problem with using the __slots__ attribute for every class is the burden involved with maintaining all the slots, in all the classes. These could be in the several hundred range or more. So, this simply isn't feasible.

What is feasible, however, is to define __slots__ attributes for smaller classes with few attributes. Another factor to consider is instantiation density of these classes. That is, the __slots__ attribute is more beneficial with large numbers of instances because of the net memory savings involved. Consider the following example.

#Example; Using __slots__

import timeit

#A simple person class that defines slots.
class SlottedPerson(object):
  __slots__=("first_name", "last_name")

  def __init__(self, first_name="", last_name=""):
      self.first_name=first_name
      self.last_name=last_name
    
#A simple person class without slots.
class Person(object):
  def __init__(self, first_name="", last_name=""):
      self.first_name=first_name
      self.last_name=last_name

#Simple test for the slotted instances.
def time_slotted():
  person_obj=SlottedPerson(first_name="First Name", last_name="Last Name")
  first_name=person_obj.first_name
  last_name=person_obj.last_name

#Simple test for the non-slotted instances.
def time_non_slotted():
  person_obj=Person(first_name="First Name", last_name="Last Name")
  first_name=person_obj.first_name
  last_name=person_obj.last_name

#Main
if __name__=="__main__":
  #Initialize the timers.
  slotted_timer=timeit.Timer("time_slotted()",\
                             "from __main__ import time_slotted")
  non_slotted_timer=timeit.Timer("time_non_slotted()",\
                                 "from __main__ import time_non_slotted")

  #Display the results.
  print "SLOTTED    ",slotted_timer.timeit()
  print "NON-SLOTTED",non_slotted_timer.timeit()

In this example, we have two very simple classes. The SlottedPerson and the Person classes are identical except for the fact that the SlottedPerson class will always outperform Person. This is because there are always going to be performance gains when the interpreter doesn't need to allocate memory.

Sunday, May 31, 2009

Think Again About Promoting Your Language

Are you one to promote your programming language of choice? I know I am. This entry may in fact force anyone who promotes a given language and is not thoroughly in touch with the technical realities to reconsider what they are promoting. Lots of interesting data found here and and a very worth-while read.

Tuesday, January 6, 2009

Python benchmarks

Just for fun, I decided to run some Python benchmarks that test the lookup time differences between list, tuple, and dictionary types. The list performance seems to come out on top every time. This is confusing to me because I hear that tuples are supposed to be faster because they are immutable. Here is the test I used:

from timeit import Timer

test_data1=[1,2,3,4,5]
test_data2=(1,2,3,4,5)
test_data3={0:1,1:2,2:3,3:4,4:5}

def test1():
  v=test_data1[2]

def test2():
  v=test_data2[2]

def test3():
  v=test_data3[2]

if __name__=='__main__':
  print 'Test 1:', Timer("test1()", "from __main__ import test1").timeit()
  print 'Test 2:', Timer("test2()", "from __main__ import test2").timeit()
  print 'Test 3:', Timer("test3()", "from __main__ import test3").timeit()

I'm running this on a Intel(R) Core(TM) 2 CPU T7200 @ 2.00GHz machine. I wonder if my test is flawed.