Thursday, March 5, 2009

Null references

In an earlier post, I described my distaste for null references. I thought I should elaborate a little more with an example. So first, in Python, consider the following.

#Example; Null references.

if __name__=="__main__":
ref1=None
ref2=True
ref3=False

print "REF1",ref1==True
print "REF2",ref2==True
print "REF3",ref3==True

Here we have defined three variables; ref1, ref2, and ref3. In this example, ref1 is supposed to represent a null reference. It is defined as None. This means it is pseudo-defined. There is really no way to differentiate between ref1 and ref3 when running this example. They both evaluate to False. From a developer perspective, this doesn't make things very easy. The two variables (ref1 and ref3), are defined differently yet yield the same result. What gives?

Normally this wouldn't make a difference. Certain variables will have different content but will evaluate to True in truth tests. For instance, consider the following.

#Example; Null references.

if __name__=="__main__":
  ref1=""
  ref2="My String"
  ref3=None

  if ref1:
      print "REF1 TRUE"
  else:
      print "REF1 FALSE"      
  if ref2:
      print "REF2 TRUE"
  else:
      print "REF2 FALSE"
  if ref3:
      print "REF3 TRUE"
  else:
      print "REF3 FALSE"

Here, we have altered the definitions of ref1, ref2, and ref3. ref1 is now an empty string. ref2 is now a non-empty string. ref3 is a null reference. Running this example will result in the following:

ref1 is False because it is empty. But, it is still a string.
ref2 is True because it is a non-empty string.
ref3 is False because it is a null reference and hence has no way of knowing if it is True.

Finally, this last example shows how this null reference can really cause havoc.

#Example; Null references.

def print_length(str_obj):
   print len(str_obj)

if __name__=="__main__":
   ref1=""
   ref2="My String"
   ref3=None
  
   print_length(ref1)
   print_length(ref2)
   print_length(ref3)

Here we have the exact some reference variables as the last example. We also have a print_length() function that will print the length of the specified string. The last print_length() invocation will fail because it is a null reference. The first invocation on the empty string works because it is still a string object.

If you want something to evaluate to False until it is non-empty, you can still specify a type that will evaluate to false. If not, build your own Null type.

Tuesday, March 3, 2009

Null references

In an interesting abstract, Tony Hoare describes the invention of the null reference a "billion dollar mistake". He goes on to describe the huge number of unforeseeable side-effects that are a result of the null reference.

This brings up an interesting topic of why the null reference is needed. This is actually a question as well. Why are null references needed? I don't think they should exist. There is simply no need for them. A well structured program should never have a need to define a null reference. Null, basically means "I don't know". How is this useful any any application? I suppose it is useful in situations where the developer's best guess is "undefined".

I spend the majority of my time writing code in dynamically-typed languages. I'm not very concerned with null references. However, when I do write in compiled-languages, I find any alternative than using null references. Even if a given reference in the context of some function can be considered dynamic, or polymorphic, it still isn't null.

Having said that, what if we find ourselves in a situation where null references are absolutely unavoidable? Why do null references cause such a massive headache? My guess would be that the null reference is treated differently than the false value. The false value here is anything that would not evaluate to true in a compiled language. Does that not fall into the same category as null? Not necessarily. If we have a null reference and we want to invoke some behaviour on that reference, we simply attempt to invoke it, assuming we do not have a null.

What is needed in these situations is a wrapper that simple checks for "nullness" before invoking any type of behaviour. This way, nulls are treated as booleans. This new overhead checking during run time will no doubt slow things down. This is not a good thing since performance is probably a big part of why we are using a compiled language to begin with. However, I thing stability is slightly higher in the development food chain than raw performance is.