Showing posts with label dictionary. Show all posts

Friday, March 22, 2013

Python: Dict Comprehension Example

Just like list comprehensions, dictionary comprehensions help the Python programmer quickly construct an alternative structure from something more permanent. The application we're working with probably has a schema, and from that, we can only do so much as we implement new functionality over time. So rather than alter the data model that supports the application, why not create something transient? This is where the comprehension idea is powerful, at very efficiently constructing new data sources. You can use this approach to creating new lists and dictionaries that stick around for the duration of the program, but a more useful use case is in creating a temporary search space.

Flexible Python Conditionals

With dynamically-typed languages such as Python, we are able to add flexibility in places that reduces the complexity of the problem we are trying to solve. Take a conditional if statement for instance. We can pass any object to a function and use attributes of this object in the conditions of an if statement. Of course, you'll get AttributeError exceptions if the attribute isn't part of the object, but that is easy to fix because the exception is very self-explanatory.

Even dynamically-typed languages aren't immune to complex conditionals, such as large if statements with several segments. During development, these can grow quite large. This isn't necessarily a problem of careless development. It is really more of a time constraint. The if statement is a great construct for executing alternative behavior; it is fast and the syntax is straightforward.

In Python, an alternative conditional construct is a dictionary. A dictionary is a simple means of storing named data, key-value pairs. Since the key can be any type, including objects, and the data can store objects, which in turn store behavior, dictionaries can also execute behavior conditionally.

So what is the performance difference between using if statements or dictionaries to control behavior? Below is a small example of using the two approaches.

import timeit

class User(object):
 def __init__(self, name):
     self.name = name

def test_dict(user):
 user_dict = {"user1":User("user1"), "user2":User("user2")}
 return user_dict[user].name

def test_if(user):
 user1 = User("user1")
 user2 = User("user2")

 if user == "user1":
     return user1.name
 if user == "user2":
     return user2.name
  
if __name__ == "__main__":
 dict_timer = timeit.Timer('test_dict("user1")',\
                           'from __main__ import test_dict')

 print "Dict", dict_timer.timeit()

 if_timer = timeit.Timer('test_if("user1")',\
                         'from __main__ import test_if')

 print "If", if_timer.timeit()

In this example, we have two test functions, test_dict() and test_if(). The test_dict() function uses a dictionary as a conditional while the test_if() function uses an if statement. The test_if() function is faster than the test_dict() function. In the example above, test_if() evaluates to true in the first segment of the if statement. What happens if we change our test to look up "user2" instead of "user1"?

Well, the test_if() function is still faster than the test_dict() function but not as fast as when searching for "user1". This is because both segments of the if statement are are evaluated. What is worth noting here is that the test_dict() performance doesn't fluctuate based on what is being searched. That is, the dictionary performance is largely dependent on the number of elements it contains and the if statement performance is dependent on the value being tested in addition to the number of conditions to evaluate.

The benefit to using dictionaries in this context is that they are flexible. Dictionaries can easily be computed and inserted into a conditional context like the example above. There is, however, one drawback test_dict() has. What happens if we search for a non-existent user? The test_if() function already handles this scenario by returning None. The test_dict() function will raise a KeyError exception. Below is the modified example that will handle this scenario.

import timeit

class User(object):
  def __init__(self, name):
      self.name = name

def test_dict(user):
  user_dict = {"user1":User("user1"), "user2":User("user2")}
  try:
      return user_dict[user].name
  except KeyError:
      return None

def test_if(user):
  user1 = User("user1")
  user2 = User("user2")

  if user == "user1":
      return user1.name
  if user == "user2":
      return user2.name
    
if __name__ == "__main__":
  dict_timer = timeit.Timer('test_dict("user1")',\
                            'from __main__ import test_dict')

  print "Dict", dict_timer.timeit()

  if_timer = timeit.Timer('test_if("user1")',\
                          'from __main__ import test_if')

  print "If", if_timer.timeit()

Tuesday, March 30, 2010

Python Dictionary get()

Python dictionary instances have a built in get() method that will return the value of the specified key. The method also accepts a default value to return should the key not exist. Using this method to access dictionary values is handy when you want to set a default. The alternative methods are to either manually test if the key exists or to handle a KeyError exception. The three methods are outlined below.

def use_get():
   dict_obj = dict(a=1, b=2)
   return dict_obj.get("c", 3)

def use_has_key():
   dict_obj = dict(a=1, b=2)
   if dict_obj.has_key("c"):
       return dict_obj["c"]
   else:
       return 3
  
def use_key_error():
   dict_obj = dict(a=1, b=2)
   try:
       return dict_obj["c"]
   except KeyError:
       return 3
  
if __name__ == "__main__":
   print "USE GET:", use_get()
   print "HAS KEY:", use_has_key()
   print "KEY ERROR:", use_key_error()

Thursday, March 11, 2010

Python Key Errors

When accessing elements from a Python dictionary, we have to make sure that the key actually exists as part of the dictionary object. Otherwise, a KeyError will be raised. This isn't a big deal because since we know that is the consequence of attempting to access a non-existent key, we can handle the error.

This is a common way to handle accessing dictionary elements; using a try-except block. Another way to make sure you are not requesting an element that doesn't exist is to use the has_key() dictionary method. This method will return true if the element in question exists. At this point, you are generally safe to access the element.

Which dictionary access element is better? None really. It depends on your coding style. It is always better to be consistent.

From a performance perspective, we can see a minor difference. For instance, the following example will attempt to retrieve a non-existent dictionary element using both methods.

from timeit import Timer

def haskey():
   dictobj = dict(a=1)
  
   if dictobj.has_key("b"):
       result = dictobj["b"]
   else:
       result = None
      
def keyerror():
   dictobj = dict(a=1)
  
   try:
       result = dictobj["b"]
   except KeyError:
       result = None
      
if __name__ == "__main__":
   haskey_timer = Timer("haskey()", "from __main__ import haskey")
   keyerror_timer = Timer("keyerror()", "from __main__ import keyerror")
  
   print "HASKEY:", haskey_timer.timeit()
   print "KEYERROR:", keyerror_timer.timeit()

In this case, the has_key() method is noticeably faster than the KeyError method. Now, this example shows elements that don't exist. What if the element does exist? Well, the KeyError method is slightly faster.

Monday, October 26, 2009

Python Named Tuples

Tuples in Python are similar to lists in Python. The main difference of course being that tuples are immutable data structures. This means that once a tuple is instantiated, elements cannot be added or removed from the tuple as they can in list instances. The benefit to using tuples in Python applications is that they are used more efficiently by the interpreter simply because they are of fixed length.

The collections module provides efficient container structures that expand upon the primitive Python container types such as tuples, lists, and dictionaries. One of the container types offered by the collections module is the named tuple. This functionality is available in Python 2.6 or later. A named tuple is essentially a tuple which enables elements to be referenced by a field rather than an integer index, although the index may still be used as well. Below is an example of how to create an use a named tuple.

#Example; Named tuple benchmark.

#Do imports.
from collections import namedtuple
from timeit import Timer

#Create a named tuple type along with fields.
MyTuple=namedtuple("MyTuple", "one two three")

#Instantiate a test named tuple and dictionary.
my_tuple=MyTuple(one=1, two=2, three=3)
my_dict={"one":1, "two":2, "three":3}

#Test function.  Read tuple values.
def run_tuple():
  one=my_tuple.one
  two=my_tuple.two
  three=my_tuple.three

#Test function.  Read dictionary values.
def run_dict():
  one=my_dict["one"]
  two=my_dict["two"]
  three=my_dict["three"]

#Main.
if __name__=="__main__":

  #Setup timers.
  tuple_timer=Timer("run_tuple()",\
                    "from __main__ import run_tuple")
  dict_timer=Timer("run_dict()",\
                   "from __main__ import run_dict")

  #Display results.
  print "TUPLE:", tuple_timer.timeit(10000000)
  print "DICT: ", dict_timer.timeit(10000000)

Here, we create a new named tuple data type, MyTuple, by invoking namedtuple(). The namedtuple() is a factory function provided by the collections module. It is a factory function because it takes a set of fields as a parameter and assembles a new named tuple class. Next, we create a MyTuple instance by supplying it the tuple data.

Now we have two instances; my_tuple is a named tuple while my_dict is an ordinary dictionary instance. Next, we have two functions that will read values from our two data structure instances, run_tuple() and run_dict().

When I run this example, the run_tuple() takes significantly longer to execute than run_dict() does. So what does this mean? Well, what it means to me is that if you are already using dictionaries to read data in your program, keep using them. Especially if the elements are referenced by key.

They power of named tuples comes into play when developers have no choice but to deal with tuples. These tuples may be returned from some other developers code, or they just can't change it for whatever reason. Rather than having to stare at a meaningless integer index, named tuples can add meaning to code which can in turn have a huge impact.

Friday, October 23, 2009

Python Dictionary Generators

The dictionary type in Python is what is referred to as an associative array in other programming languages. What this means is that rather than looking up values by position, values may be retrieved by a key value. With either type of array, the values are indexed, meaning that it each value may be referenced individually within the collection of values. Otherwise, we would have nothing but a collection of values that cannot be referenced in any meaningful way.

The dictionary type in Python offers higher-level functionality than most other associative array types found in other languages. This is done by providing an API on top of the primitive operators that exist for traditional style associative arrays. For instance, developers can retrieve all the keys in a given dictionary instance by invoking the keys() method. The value returned by this method is an array, or list in Python terminology, that may be iterated. The Python dictionary API also offers other iterative functionality.

With the introduction of Python 3, the dictionary API has seen some changes. Namely, the methods that return lists in Python 2 now return generators. This is quite different from invoking these methods and expecting a list. For one thing, the return value is not directly indexable. This is because generators do not support indexing. The following example shows an example of how the dictionary API behaves in Python 3.

#Example; Python 3 dictionary keys.

#Initialize dictionary object.
my_dict={"one":1, "two":2, "three":3}

#Main.
if __name__=="__main__":
  
   #Invoke the dictionary API to instantiate generators.
   my_keys=my_dict.keys()
   my_items=my_dict.items()
   my_values=my_dict.values()
  
   #Display the generator objects.
   print("KEYS:   %s"%(my_keys))
   print("ITEMS:  %s"%(my_items))
   print("VALUES: %s"%(my_values))
  
   #This would work in Python 2.
   try:
       print(my_keys[0])
   except TypeError:
       print("my_keys does not support indexing...")
      
  
   #This would work in Python 2.
   try:
       print(my_items[0])
   except TypeError:
       print("my_items does not support indexing...")
      
   #This would work in Python 2.
   try:
       print(my_values[0])
   except TypeError:
       print("my_values does not support indexing...")           
  
   #Display the generator output.
   print("\nIterating keys...")
   for i in my_keys:
       print(i)
      
   print("\nIterating items...")
   for i in my_items:
       print(i)
      
   print("\nIterating values...")
   for i in my_values:
       print(i)

In this simple example, we start by creating a simple dictionary instance, my_dict. The idea is that this dictionary be a simple one as we aren't interested in the content. Next, we create three new variables, each of which, store some aspect of the my_dict dictionary. The my_keys variable stores all keys that reference the values in the dictionary. The my_items variable stores key-value pairs that make of the dictionary, each item being a tuple. The my_values variable stores the actual values stored in my_dictionary with no regard for which key references them. The important thing to keep in mind here is that these variables derived from the my_dict dictionary were created using the dictionary API.

Up to this point, we have my_dict, the main dictionary, and three variables, my_keys, my_items, and my_values, all created using the dictionary API. Next, we purposefully invoke behavior that isn't supported in Python 3. We do this by acting as if the values returned by the dictionary API are list values when they are in fact generators. This produces a TypeError each time we try to do it because the generators stored in my_keys, my_items, and my_values do not support indexing.

Finally, we simply iterate over each generator containing data derived from my_dict. This works just as expected and is in fact the main use of the data returned by the dictionary methods shown here. Sure, the direct indexing doesn't work on the returned generators, but is that really a common use of this data? I would certainly think not. The key aspect of this API change is that the API now returns a structure that is good at iterative functionality and that happens to be the intended use. And, if indexing these values that are returned from the dictionary API are absolutely necessary, these generators can easily be turned into lists. It is just an extra step involved for the rare use of the data.

Tuesday, January 6, 2009

Python benchmarks

Just for fun, I decided to run some Python benchmarks that test the lookup time differences between list, tuple, and dictionary types. The list performance seems to come out on top every time. This is confusing to me because I hear that tuples are supposed to be faster because they are immutable. Here is the test I used:

from timeit import Timer

test_data1=[1,2,3,4,5]
test_data2=(1,2,3,4,5)
test_data3={0:1,1:2,2:3,3:4,4:5}

def test1():
  v=test_data1[2]

def test2():
  v=test_data2[2]

def test3():
  v=test_data3[2]

if __name__=='__main__':
  print 'Test 1:', Timer("test1()", "from __main__ import test1").timeit()
  print 'Test 2:', Timer("test2()", "from __main__ import test2").timeit()
  print 'Test 3:', Timer("test3()", "from __main__ import test3").timeit()

I'm running this on a Intel(R) Core(TM) 2 CPU T7200 @ 2.00GHz machine. I wonder if my test is flawed.

Subscribe to: Posts ( Atom )

Boduch's Blog