Showing posts with label opinion. Show all posts

Thursday, July 2, 2009

Hackers and Architects

In an interesting entry, the question of whether hacker-type developers have a place in the modern-day IT field. That is, do hacker-type developers add any real business value in a business world? If not, who does in regards to producing software? Is the more standardized architect-type developer better suited for working on anything that isn't a hobby? The argument for not being able to hire the hacker-type onto a formal IT development team is well-grounded in some respects. However, the hacker-type developer wasn't just given that title for no reason. Any given hacker-type developer most likely earned the title through many years of trying things that were unheard of at the time. Believe it or not, organizations that produce anything but the most trivial software need the hacker-type developer as well as the architect-type developer.

The hacker-developer type tends to be experimental by nature. They are willing to go beyond the norm in order to realize a solution. More often than not, the hacker-type developer can get around extremely complex problems in a relatively short amount of time. I like to think that this is because they are able to step outside the standards imposed on the project at hand to find the necessary solution. It is this ability of the hacker-type developer that enables development teams to determine the feasibility of a problem without spending a year developing just to realized it can't be solved within the constraints imposed on the project.

At the other end of the development team we have the architect-type developer who is focused on standards-based development. But they aren't only concerned with standards, they also want to ensure that an elegant loosely-coupled, component-based, interface-conforming system is produced. This is obviously necessary within any serious IT organization. The architect-type developer often has a strong view of the implications of the development team's actions and implementations of problems.

In the field of software development, it seems that these two are closely related but are also different enough stereotypes that should easily be able to collaborate with one another. Not only should it be easy but I think it should also be a requirement. The hacker-type and the architect-type are codependent on one another even if neither wants to admit the fact. Hackers need discipline when it comes to sticking to standards and interface conformance. Architects can easily oversee subtle design flaws that will a particular implementation strategy impossible. Hackers who have a keen eye for this type of thing can help here. Hackers and architects and how they are codependent on one another is just a small slice of the software development ecosystem.

Thursday, March 5, 2009

Null references

In an earlier post, I described my distaste for null references. I thought I should elaborate a little more with an example. So first, in Python, consider the following.

#Example; Null references.

if __name__=="__main__":
ref1=None
ref2=True
ref3=False

print "REF1",ref1==True
print "REF2",ref2==True
print "REF3",ref3==True

Here we have defined three variables; ref1, ref2, and ref3. In this example, ref1 is supposed to represent a null reference. It is defined as None. This means it is pseudo-defined. There is really no way to differentiate between ref1 and ref3 when running this example. They both evaluate to False. From a developer perspective, this doesn't make things very easy. The two variables (ref1 and ref3), are defined differently yet yield the same result. What gives?

Normally this wouldn't make a difference. Certain variables will have different content but will evaluate to True in truth tests. For instance, consider the following.

#Example; Null references.

if __name__=="__main__":
  ref1=""
  ref2="My String"
  ref3=None

  if ref1:
      print "REF1 TRUE"
  else:
      print "REF1 FALSE"      
  if ref2:
      print "REF2 TRUE"
  else:
      print "REF2 FALSE"
  if ref3:
      print "REF3 TRUE"
  else:
      print "REF3 FALSE"

Here, we have altered the definitions of ref1, ref2, and ref3. ref1 is now an empty string. ref2 is now a non-empty string. ref3 is a null reference. Running this example will result in the following:

ref1 is False because it is empty. But, it is still a string.
ref2 is True because it is a non-empty string.
ref3 is False because it is a null reference and hence has no way of knowing if it is True.

Finally, this last example shows how this null reference can really cause havoc.

#Example; Null references.

def print_length(str_obj):
   print len(str_obj)

if __name__=="__main__":
   ref1=""
   ref2="My String"
   ref3=None
  
   print_length(ref1)
   print_length(ref2)
   print_length(ref3)

Here we have the exact some reference variables as the last example. We also have a print_length() function that will print the length of the specified string. The last print_length() invocation will fail because it is a null reference. The first invocation on the empty string works because it is still a string object.

If you want something to evaluate to False until it is non-empty, you can still specify a type that will evaluate to false. If not, build your own Null type.

Tuesday, March 3, 2009

Null references

In an interesting abstract, Tony Hoare describes the invention of the null reference a "billion dollar mistake". He goes on to describe the huge number of unforeseeable side-effects that are a result of the null reference.

This brings up an interesting topic of why the null reference is needed. This is actually a question as well. Why are null references needed? I don't think they should exist. There is simply no need for them. A well structured program should never have a need to define a null reference. Null, basically means "I don't know". How is this useful any any application? I suppose it is useful in situations where the developer's best guess is "undefined".

I spend the majority of my time writing code in dynamically-typed languages. I'm not very concerned with null references. However, when I do write in compiled-languages, I find any alternative than using null references. Even if a given reference in the context of some function can be considered dynamic, or polymorphic, it still isn't null.

Having said that, what if we find ourselves in a situation where null references are absolutely unavoidable? Why do null references cause such a massive headache? My guess would be that the null reference is treated differently than the false value. The false value here is anything that would not evaluate to true in a compiled language. Does that not fall into the same category as null? Not necessarily. If we have a null reference and we want to invoke some behaviour on that reference, we simply attempt to invoke it, assuming we do not have a null.

What is needed in these situations is a wrapper that simple checks for "nullness" before invoking any type of behaviour. This way, nulls are treated as booleans. This new overhead checking during run time will no doubt slow things down. This is not a good thing since performance is probably a big part of why we are using a compiled language to begin with. However, I thing stability is slightly higher in the development food chain than raw performance is.

Monday, March 2, 2009

Writing bad pseudo code

Is it possible to entirely screw up your design by writing bad pseudo code? I like to think so. Pseudo code is a useful tool for quickly jotting down a proposed solution to a problem you just thought of. But don't base your actual implementation on this pseudo code. Especially if you are developing in an object oriented language.

No matter what paradigm you use to develop software, pseudo code is an extremely powerful tool for quickly measuring the feasibility of an implementation to a specific problem. These problems are generally in the context of functions or short algorithms. It would be very hard to comprehend even trivial object oriented relationships using only pseudo code. If you develop using a functional language such as C, a series of flowcharts and pseudo code illustrations may suffice.

However, using an approach like this when the implementation involves an object oriented language such as C++ or Python, you will most assuredly run into problems. The UML is by far the superior approach to modeling object oriented systems. It covers the entire range of elements needed and is also extensible. However, it need not be the UML you use to visualize your system under construction. You can invent your own (as long as multiple readers wouldn't be an issue and it will). The important idea is that the elements you visualize be graphic shapes.

Here is where we run into problems with pseudo code. Pseudo code is generally tightly coupled to the implementation language. And when I say generally, I mean close to one hundred percent of the time. How often do developers write pseudo code without an implementation language in mind. Additionally, pseudo code can be written in implementation languages!

Lets say we already know we are writing a C application. So, during our design/implementation/testing, we can write pseudo code that is biased toward the C language. No harm done right? I would say that even in this scenario, you are likely to create bad pseudo code. Just because the it is pseudo, doesn't make it any less of a programming language. It only makes it less executable. I think when using pseudo code exclusively as a design tool for "real code", you are bound to miss important design issues that are not visible at this level of thought.

The superior approach, I think, is to start out by visually modeling some elements. Not the entire system, just some important functions, classes, or methods. Then, where things aren't clear, get down to the pseudo level. But don't spend any real time on it. Go back and forth. Visual modeling doesn't give developers any kind of immunity to design errors. I does, however, give a third floor view of the problem.

Monday, February 23, 2009

An argument against XML

My argument against using XML as a data format in certain situations is that it is too verbose. In other situations, however, the verbosity provided by XML is needed. Such as for human consumption. This is why XML exists, it is easy to use and read by both humans and computers.

The verbosity problem with XML stems from the use of tags. Every entity represented in XML needs needs to be enclosed in a tag. The opening tag indicating that a new entity definition has started and the ending tag indicating the end of that definition. For example, consider the following XML.

<person>
<name>adam</name>
</person>

This is a trivial example of a person entity with a single name attribute. Notice the duplication of the text "person" and "name" in the metadata. With XML this is required. However, tags may also have attributes. Our person definition could be expressed as follows.

<person name="adam"/>

Here there is no metadata duplication. But I think the second example negates the readability philosophy behind XML. What exactly is the difference between attributes and child entities in XML? Semantically, there is none. A child entity is still an attribute of the parent entity.

With JSON, there is no duplication of metadata or any confusion of how an entity is defined. This is because the JSON format is focused on lightweight data, not readability. For instance, here is our person in JSON.

{person:{name:"adam"}}

Now, if a person were reading this, the chances of them getting the meaning right are greatly reduced when compared to the XML equivalent. However, it is much less verbose in most cases. And verbosity counts when data is being transferred over a network. Another plus, the XML is not lost. JSON can easily be converted to XML and back. So if JSON-formatted data must be edited by humans as XML, this is not difficult to achieve.

Here is a simple Python demonstration of reducing the size of XML data with JSON.

#Example; XML string and JSON string

xml_string="""
<entry><title>mytitle</title><body>mybody</body></entry>
"""

json_string="""
{entry:{title:"mytitle",body:"mybody"}}
"""
if __name__=="__main__":
print 'XML Length:',len(xml_string)
print 'JSON Length:',len(json_string)
pcent=float(len(json_string))/len(xml_string)*100
print 'XML size as JSON:',pcent,'%'

Finally, since XML is based on tags, there is no opportunity for sets of primitive types. For example, some client says to the server "give me a list of names and nothing else". The client will likely name something along the lines of the following.

<list>
<item name="name1"/>
<item name="name2"/>
<item name="name3"/>
</list>

Here is the JSON alternative.

["name1", "name2", "name3"]

Thursday, February 19, 2009

How to manage technical documentation for varying levels of competency?

An interesting question on slashdot asks exactly this. Two things spring immediately to mind for me:

Trac
Is this possible?

The Trac wiki system would be my first choice simply because I'm familiar with it. Anyone with moderate Trac experience can teach the concepts to other people fairly easily. Developers and anyone else using the system.

So, the basic problem re-stated; "how do I provide a simple and easy way for people of with different levels of knowledge toward a given subject access to that information?". Using Trac, you could start by getting all required content into a page. This includes every possible detail imaginable.

Next, suppose we have written a Trac plugin that defines processors you can use to wrap around sections of text based on the required expertise. For instance, you could have the following processors defined:

Expert-topic
Intermediate-topic
Moderate-topic
New-topic

The second part of this theoretical plugin would need to extend the user accounts to allow the ability to specify what the user knows and at what level. Of course, each page would also need to be categorized as well.

The question of is this possible comes not from the technical end but from the business side of things. My answer to this is another question. How accurately can users' knowledge for a given topic be rated? This problem is eliminated if we were allow users to rank themselves in regards to topic competence.

Friday, February 13, 2009

Relational databases are not going anywhere

I stumbled upon this entry which argues against the existence of the RDBMS in distributed applications. I must say I disagree. The "key-value" database movement does attempt to solve some valid concerns. For instance, the complexity involved with deploying a clustered RDBMS can be daunting at best. The ease of which application developers can use these key-value databases is very powerful. I also agree with the assertion that a single schema distributed across n nodes will not be able to scale very easily.

However, at a lower level, there is still no substitute for the RDBMS when it comes to reading and writing persistent data. I often sense a stereotype among developers toward RDBMSes as being a bloated overhead. Again, I disagree. There exist several open source, lightweight RDBMSes that add zero or very little additional effort.

On the schema end of things, they are only fixed at the data level. The application can easily abstract new and dynamic shemas during its' lifetime.

I think several of the new "key-value" database concepts (such has the document abstraction) belong in the application or server layer, not the data layer. As far as nodes in the cloud, each node will benefit from an RDBMS for the foreseeable future.

Thursday, February 12, 2009

Community involvement is not open source

The NY times has an entry on "open source game design". I'm not sure if the title of this entry is a simple misunderstanding of what open source means. However, the title is misleading. Open source, in the context of software development means code. Thousands and thousands of lines of code. I could not find a single mention of source code in the article.

Open source software is really gaining a lot of traction. Maybe this is why several people who wouldn't otherwise care are talking about it. But titles like the mentioned NY times entry worry me when it comes to people who are new to open source and genuinely interested in what it is and how it works.

The community surrounding projects plays a huge role. The community, in several cases, could even be more important than the code itself. However, the code is available in open. The case mentioned in the NY times entry is not.

Tuesday, February 3, 2009

Web-based applications a good idea?

An entry on infoworld doesn't think so. Apparently the web browser sucks for applications and anyone using them are reverting back to the client server model. Well, it is a client (the browser) and it is a server (whatever). The strong case for the web application in the browser is deployment. There is no easier way to distribute a GUI than the web browser. Unless that is, everyone in the world used Windows.

Of course, the case against the browser is strong too. The same interoperability problems that operating systems experience are shared in the browser domain. In short, your HTML/CSS/AJAX GUI may not work across all platforms (platforms being web browsers in this case).

In terms of GUI interoperability and deployment, the same problems persist. There is no easy way to make everyone happy. And that will always be the case as long as people use different operating systems. There exists no simple unifying solution. Again, with web browsers, you have the interoperability problem. With interpreted languages, it is quite simple to achieve interoperability amongst various operating systems. I think the problem there is deployment. It is a lot tougher to deploy desktop applications that use any kind of data center.

The main problem I see with the suggested solution in the infoworld entry is that there is no mention of distributed data. Lets say I take the advice from the entry and all our customers now use our new Java desktop application. Where did the data go? Any modern requirements will scream distributed data. And that is another complexity there is no alternative for.

All these challenges, distributed computing, distributed deployment, and interoperability, they have been around for a while now. I agree with the author in that for the most part, web browsers suck for any kind of reliable user interactivity. However, I still don't see any solutions over the immediate horizon that will make the browser go away.

Wednesday, January 28, 2009

C remains popular while Python is still low-key.

An interesting entry cites the C programming language as the most popular choice for new open source projects. There are also some other languages that followed C as popular choices. Python wasn't one of them.

So what does this mean? Absolutely nothing. It means that there are several existing and successful Python projects out there. Although I'm not a huge fan of some of the other languages mentioned, as a Python developer, I do like C. Python and C interoperability at the system level isn't too difficult to achieve.

Of course, this isn't a requirement. Ideally, if some new killer open source application written in C is made available, you can make bindings between it and your Python application.

Tuesday, January 27, 2009

Use firefox

This is simply a plea for people to start using Firefox or at least spread the word if you already use Firefox. Although attempts are being made to force Microsoft to bundle Firefox with Windows.

The IE 8 release is nearing a "stable" release. Don't upgrade. Switch to Firefox.

Tuesday, January 13, 2009

The spreadsheet is 30 years old

It was 30 years ago that the spreadsheet application was invented. PC magazine has an entry about how the spreadsheet has brought society nothing but trouble. For example, the article blames the spreadsheet for many miscalculations in the past rather than the people responsible for building the spreadsheet.

I think the spreadsheet was an ingenious idea that has much broader applications than accounting tasks. The spreadsheet is simply a tool to visualize and manipulate data. In the end, it is the human interpretation of that data that leads to the undesirable consequences. Blaming the spreadsheet application for a financial crises is like placing the responsibility of car accidents on the invention of the automobile, rather than the people who drive them.

Monday, January 5, 2009

Software predictions

Neil McAllister has an opinion on software development predictions for 2009. First is the usual Microsoft stuff which I for one don't care about (it is one thing to use Windows for daily tasks but good luck using it as a development environment).

What is interesting is the Java trend toward open source. Fans of the Java platform traditionally also like big proprietary systems written in Java. Or, are these types just fans of the application and not caring what it is written in?

My question; is it too late for Java to make this push toward open source? Or will simpler, more dynamic languages such as Python continue to gain the upper hand?

Subscribe to: Posts ( Atom )