Brendel Consulting

The Blog

Aug 7, 2009

Search engine comparison for software developers

For today's software developer it is difficult to imagine that there once was a time where we did not have instant access to information on the Internet to help us do our work: Answers to questions, howto-guides, reference material, documentation... whatever we need, it is usually just a quick search away.

I can't count how many times I simply searched for something like "Django ... such and such ..." to instantly find the relevant documentation or reports of other people's experience while working on a Django project, for example. The amount of time this saved me is immeasurable. I have a friend who - as a software developer - is not allowed to access the Internet while at work! The mind boggles. How much does this cost his employer in lost productivity?

Anyway, clearly the access to search engines while developing software should be considered essential. If during a job interview I'd ask a candidate something like: "How do you define a C function pointer?" and the answer would be: "I've done it before, but can't recall the exact syntax now... I'd just google it!" ... well, I would consider that to be a perfectly valid answer, to be honest. One more reason why these types of questions are quite meaningless in interviews. But that's a different topic.

So, the question then is: What is the best search engine for a software developer to use? Many people default to Google, obviously, but personally, I don't like to let one company have all the information about me, so I like to mix it up a little. The difference in search results can be quite startling, though. With Bing as the new arrival on the scene, let's perform a completely subjective and un-scientific experiment and ask four of the major search engines the same set of questions and see what they tell us.

For this test, I have chosen Bing, Google, Yahoo and Ask. Sadly Yahoo search will soon stop to exist when they start using Bing as a search provider, but for sentimental reasons and while it is still available, I thought I include them. Especially since Yahoo was my default search engine for quite some time now.

I have chosen four questions, which obviously cover only a tiny fraction of software development related areas:
  1. "Django IntegerField model": Here I would like to see links to the Django project documentation on how to define an IntegerField in a model. It should be noted that 'IntegerField' also exists as part of forms, so the search engine should not confuse these two.
  2. "Define C function pointer": Our 'interview question' from above. The top results should lead to pages that very clearly explain how to define this thing.
  3. "TCP checksum calculation": I would like to see links to pages that explain how the algorithm works that computes TCP checksums.
  4. ".Net class inheritance": I'm not an MS developer at all, so I'm not even quite sure whether this question makes much sense, but I thought I try this one, just on a hunch.
Ok, let's start...

The test

Question 1. "Django IntegerField model"

Here are the results from all four search engines, side by side. In all images here, you can see Bing in the first column, followed by Google, then Yahoo and finally Ask in the last column.

  • Bing: The first results are essentially useless, leading (ironically) to Google code pages, followed by some obscure changeset in the Django code itself. Sure, you can see an IntegerField used there. However, the link to the Django docs, which I was looking for, is only at the very bottom of the page.
  • Google: The results are highly relevant (only peeve might be that they are pointing to docs for the SVN version, not the latest stable release version, but that's really minor). A link to the older Django version's docs is in third place, which is useful. But as we can see, the ads are not quite as relevant.
  • Yahoo: The results are not bad. However, I didn't like the fact that it pointed me to docs for a much older version of Django in the top result. They could have done better here.
  • Ask: The top results are good. However, I am disturbed by those ads, which appear right under the top result. They could have shown them on the side. But on Ask the side space is taken up by "Related Searches", which in this particular case were completely irrelevant.
Ranking: Google wins hands down. Ask in second place, closely followed by Yahoo. Bing is dead last.

Question 2. "Define C function pointer"

Here are the results for this question.

  • Bing: Again disappointing. The top result is about a function pointer validation function... which would be relevant if obviously I didn't need to learn first how to define a function pointer. But since I am looking only for how to define a pointer these results don't help me much. The third one is even about how to call C functions from Forth!?
  • Google: This is exactly what I was looking for: A tutorial! Top link, good job Google!
  • Yahoo: Not bad, with the tutorial in second place. For some reason, the Wikipedia entry comes first. Both Yahoo and Google generally give quite some weight to Wikipedia pages, which I am normally fine with. In this particular case, the tutorials are really what I'm looking for, though.
  • Ask: Again a very good top result, just like Google, marred by irrelevant annoying ads, followed by other good results (for example how to define a function-pointer parameter for a function, which is very nice).
Ranking: Ask wins this one based on the result quality, even though their ads bother me. Google is a very close second, Yahoo in third, but still close. Bing is way off the mark again.

Question 3. "TCP checksum calculation"

The results for this one, side by side.

  • Bing: Very disappointing results at the top, leading to small discussion threads and postings here and there. Not at all what I was looking for.
  • Google: Excellent results, especially the third link ("TCP Checksum code"). That page contains a textual description of the algorithm as well as actual sample code. Perfect. The further links on the page remain relevant as well. The PDF in fourh place is actually quite good.
  • Yahoo: The top three results are the same as Google, so that's very good. The relevancy of the remaining links trails off quickly, though.
  • Ask: The top three results are the same as Google and Yahoo. The remaining links remain more relevant than Yahoo, but not quite as good as Google, though.
Ranking: Google in first, followed by Ask and Yahoo. Bing again in last place.

Question 4. ".Net class inheritance"

Here is a question for Microsoft's Bing to shine! It should know all about that, right? Well, here are the results...

  • Bing: Oh the humanity! What were they thinking? Again, the top link goes to some forum page, the second one is a blog. That's a recurring and unsuccessful theme with Bing. The third one contains some sample code, but is basically someone making the case for multiple-inheritance, which apparently is missing in .Net.
  • Google: Hits the mark again with the top link pointing straight to the relevant page of Microsoft's own online documentation. Why couldn't Bing come up with this? And the second link is great, too: A tutorial for class inheritance, which is always nice to see.
  • Yahoo: Good results in the top two spots, with a useful page about interfaces included, which Google didn't show. But the third result is disappointing: Apparently, Yahoo was confused by the domain name for that result, which ended in ".net".
  • Ask: The ads are disturbing as usual, but the top results are the same as Google's, which is good. Not visible in this screenshot, because too much valuable space was used by those ads, there is also a good tutorial link further down, which Google didn't show.
Ranking: Ask by a slim margin (if we manage to ignore those ads), Google in close second, followed by Yahoo a bit further back. Bing is again at the very end with a smattering of useless results.

I sure hope that Microsoft's own developers are free to use Google or Ask at work.

Conclusions

Let's start at the back.

Maybe they are still trying to sort out issues and will improve over time. However, at the moment Bing is just completely hopeless when it comes to these kinds of queries. I guess they are much more consumer oriented. As it stands, though, for questions software developers need to ask for their work, Bing quite plainly ... sucks!

Yahoo still tends to be good and delivers useful results. But while the top results are usually ok, the relevance of further results quickly drops. Of course, that will all end once they switch to Bing. At that moment they will get much, much worse. What a sad loss for the Internet to see Yahoo search go.

Ask and Google are actually ranking equally well here in this test. That may come as a surprise to many. The biggest annoyance about Ask is the placement of the ads right after the top result. And their "Related Searches" links usually don't provide any value at all for these types of searches. But the quality of the links is about as good as Google. The ads on both Ask and Google are close to irrelevant in both cases.

A message to Ask: Please change the placement of your ads or at least distinguish them visually a bit more!

So, take your pick: Google or Ask. I will start using Ask, because as I said earlier, Google is on enough pages already by means of their ad network and they don't need to know everything about me. And with Yahoo ranking behind Ask and soon completely fading away, Ask remains the only real alternative for search results that are relevant to software developers.

You should follow me on twitter here.

Labels: , , , , , , , ,

Jun 14, 2009

Read-optimize your source code

When you design a software system, database or data structures, you take into consideration its most common use case. For example, you organize your data differently when it is read frequently, but only rarely written to. You read optimize your data.

I am convinced that the same applies to source code. In almost all cases, source code will be read much more often than it is written. Who writes the source code? You. A function is developed over some limited time, and once it is done it changes rarely, unless being refactored or modified to accommodate some changed requirements.

But who has to read your source code?

  • Well, for starters, your colleagues who have to integrate with or use your code.
  • New hires who join your team and try to find their way around.
  • The maintenance (or continued engineering) programmers.
  • Those who come after you, or inherit your code as part of their responsibility
  • You.
Your code is read around the time it is written and integrated and possibly for many, many years after you have long forgotten about your code.

We can see that over the life-time of some piece of code, it is very likely to be read much more often then written or modified. Consequently, source code needs to be read optimized just like we might read optimize a data structure or database if called for.

What does it take to read optimize source code? Here are the key points:

  • Meaningful variable and function names.
    • The compiler doesn't care whether the function name is 3 characters or 30 characters long. However, a colleague reading the code will be very grateful if the function is called get_daily_rainfall_average(), rather than dra(). Who cares that it takes two seconds more to type it? Many modern IDEs are going to do that for you anyway.
  • Thoughtful source code documentation.
    • Explaining in one sentence what some code does and in possibly many more sentences why it does it and why it's needed. The why is often much more important then the how: Explain the rational behind your design or implementation decisions. Everyone can see that a variable is increased in a for-loop, that doesn't have to be documented. But why the for-loop is needed in the first place is much more interesting and illuminating to someone who is new to the code.
    • Comments like this should be there for each module, class, function and the more complex code blocks within a function. Keep the comments close to the code they refer to. If they are all in the function or module header, they are often forgotten when the internals of code are changed, 15 pages further down.
  • A legible and consistent coding style.
    • We can endlessly argue about which style of parenthesis is the right one, but what's more important is that you remain consistent throughout your project.
    • If you are new to a team, use the coding style they use, even if it's not your favourite one.
    • Unless absolutely needed for performance reasons, don't try to optimize your doubly-nested loop into a fancy single line statement, exploiting even the most esoteric features of the language. Instead, break it up into a more easily understandable set of spelled out loops.
    • In general, don't optimize your code until you know you have to! Not only does it waste time if it turns out that the code really isn't performance critical, it is often also much more difficult to read and maintain.
    • If you can do what you have to do with simple and often used language features then use those.
    • Use white-space to make code less 'dense' and increase legibility. Many developers have pretty large screens these days, white space doesn't cost money.
    • Align code. For example, if you have several variable assignments, align the '=' operators directly underneath each other. It's astonishing how much more legible that block of code becomes. If you have to line-break a long function call, indent the arguments of the second line to align with the start of the arguments in the first line. That's just a little example, but there are many cases like this in most programs, where a bit of thoughtful alignment can make a difference.
It is often astonishing what amount of resistance some developers put up against even those few, simple rules. I have heard arguments, such as:

  • "I don't want to write comments, because then I will have to scroll more to see the code."
  • "Writing comments in the code takes time."
  • Formatting the code nicely takes time, especially when I need to change a few things, in which case I then have to re-format the code.
  • "Code changes, and before you know it the comments are obsolete."
I have absolutely no patience for the first three arguments. There might be some extremely rare situations where an emergency fix needs to be rushed out and short-cuts need to be taken. But for the most part, those arguments are bogus. The only one even remotely credible is the last one about comments getting obsolete, but this can be addressed in a straight forward manner: Keep the comments close to the code, and do code reviews where readability and correctness of the comments are stressed as well.

In fact, I claim that if you don't take those rules to heart in your own source code then you are either unprofessional, lazy, not a team-player, or all of the above. If you as a software developer take pride in your professionalism and quality of your work then you have to consider that it is not only the achieved functionality for which you are being paid: The code you produce in almost all cases becomes property of your employer. Therefore, the code itself also becomes a product you deliver, and actually your most important product.

How useful and usable your code is for the team who has to work with it is what really determines its value in the long run.

You should follow me on twitter here.

Labels: , , , ,