Thursday, August 28, 2003

Today my number changes (dint get!!!) my b'day. Huh.!! I am getting Old.

Tuesday, August 26, 2003

The Law of Leaky Abstractions

There's a key piece of magic in the engineering of the Internet which you rely on every single day. It happens in the TCP protocol, one of the fundamental building blocks of the Internet.

TCP is a way to transmit data that is reliable. By this I mean: if you send a message over a network using TCP, it will arrive, and it won't be garbled or corrupted.

We use TCP for many things like fetching web pages and sending email. The reliability of TCP is why every exciting email from embezzling East Africans arrives in letter-perfect condition. O joy.

By comparison, there is another method of transmitting data called IP which is unreliable. Nobody promises that your data will arrive, and it might get messed up before it arrives. If you send a bunch of messages with IP, don't be surprised if only half of them arrive, and some of those are in a different order than the order in which they were sent, and some of them have been replaced by alternate messages, perhaps containing pictures of adorable baby orangutans, or more likely just a lot of unreadable garbage that looks like the subject line of Taiwanese spam.

Here's the magic part: TCP is built on top of IP. In other words, TCP is obliged to somehow send data reliably using only an unreliable tool.

To illustrate why this is magic, consider the following morally equivalent, though somewhat ludicrous, scenario from the real world.

Imagine that we had a way of sending actors from Broadway to Hollywood that involved putting them in cars and driving them across the country. Some of these cars crashed, killing the poor actors. Sometimes the actors got drunk on the way and shaved their heads or got nasal tattoos, thus becoming too ugly to work in Hollywood, and frequently the actors arrived in a different order than they had set out, because they all took different routes. Now imagine a new service called Hollywood Express, which delivered actors to Hollywood, guaranteeing that they would (a) arrive (b) in order (c) in perfect condition. The magic part is that Hollywood Express doesn't have any method of delivering the actors, other than the unreliable method of putting them in cars and driving them across the country. Hollywood Express works by checking that each actor arrives in perfect condition, and, if he doesn't, calling up the home office and requesting that the actor's identical twin be sent instead. If the actors arrive in the wrong order Hollywood Express rearranges them. If a large UFO on its way to Area 51 crashes on the highway in Nevada, rendering it impassable, all the actors that went that way are rerouted via Arizona and Hollywood Express doesn't even tell the movie directors in California what happened. To them, it just looks like the actors are arriving a little bit more slowly than usual, and they never even hear about the UFO crash.

That is, approximately, the magic of TCP. It is what computer scientists like to call an abstraction: a simplification of something much more complicated that is going on under the covers. As it turns out, a lot of computer programming consists of building abstractions. What is a string library? It's a way to pretend that computers can manipulate strings just as easily as they can manipulate numbers. What is a file system? It's a way to pretend that a hard drive isn't really a bunch of spinning magnetic platters that can store bits at certain locations, but rather a hierarchical system of folders-within-folders containing individual files that in turn consist of one or more strings of bytes.

Back to TCP. Earlier for the sake of simplicity I told a little fib, and some of you have steam coming out of your ears by now because this fib is driving you crazy. I said that TCP guarantees that your message will arrive. It doesn't, actually. If your pet snake has chewed through the network cable leading to your computer, and no IP packets can get through, then TCP can't do anything about it and your message doesn't arrive. If you were curt with the system administrators in your company and they punished you by plugging you into an overloaded hub, only some of your IP packets will get through, and TCP will work, but everything will be really slow.

This is what I call a leaky abstraction. TCP attempts to provide a complete abstraction of an underlying unreliable network, but sometimes, the network leaks through the abstraction and you feel the things that the abstraction can't quite protect you from. This is but one example of what I've dubbed the Law of Leaky Abstractions:

All non-trivial abstractions, to some degree, are leaky.



Abstractions fail. Sometimes a little, sometimes a lot. There's leakage. Things go wrong. It happens all over the place when you have abstractions. Here are some examples.

Something as simple as iterating over a large two-dimensional array can have radically different performance if you do it horizontally rather than vertically, depending on the "grain of the wood" -- one direction may result in vastly more page faults than the other direction, and page faults are slow. Even assembly programmers are supposed to be allowed to pretend that they have a big flat address space, but virtual memory means it's really just an abstraction, which leaks when there's a page fault and certain memory fetches take way more nanoseconds than other memory fetches.
The SQL language is meant to abstract away the procedural steps that are needed to query a database, instead allowing you to define merely what you want and let the database figure out the procedural steps to query it. But in some cases, certain SQL queries are thousands of times slower than other logically equivalent queries. A famous example of this is that some SQL servers are dramatically faster if you specify "where a=b and b=c and a=c" than if you only specify "where a=b and b=c" even though the result set is the same. You're not supposed to have to care about the procedure, only the specification. But sometimes the abstraction leaks and causes horrible performance and you have to break out the query plan analyzer and study what it did wrong, and figure out how to make your query run faster.
Even though network libraries like NFS and SMB let you treat files on remote machines "as if" they were local, sometimes the connection becomes very slow or goes down, and the file stops acting like it was local, and as a programmer you have to write code to deal with this. The abstraction of "remote file is the same as local file" leaks. Here's a concrete example for Unix sysadmins. If you put users' home directories on NFS-mounted drives (one abstraction), and your users create .forward files to forward all their email somewhere else (another abstraction), and the NFS server goes down while new email is arriving, the messages will not be forwarded because the .forward file will not be found. The leak in the abstraction actually caused a few messages to be dropped on the floor.
C++ string classes are supposed to let you pretend that strings are first-class data. They try to abstract away the fact that strings are hard and let you act as if they were as easy as integers. Almost all C++ string classes overload the + operator so you can write s + "bar" to concatenate. But you know what? No matter how hard they try, there is no C++ string class on Earth that will let you type "foo" + "bar", because string literals in C++ are always char*'s, never strings. The abstraction has sprung a leak that the language doesn't let you plug. (Amusingly, the history of the evolution of C++ over time can be described as a history of trying to plug the leaks in the string abstraction. Why they couldn't just add a native string class to the language itself eludes me at the moment.)
And you can't drive as fast when it's raining, even though your car has windshield wipers and headlights and a roof and a heater, all of which protect you from caring about the fact that it's raining (they abstract away the weather), but lo, you have to worry about hydroplaning (or aquaplaning in England) and sometimes the rain is so strong you can't see very far ahead so you go slower in the rain, because the weather can never be completely abstracted away, because of the law of leaky abstractions.
One reason the law of leaky abstractions is problematic is that it means that abstractions do not really simplify our lives as much as they were meant to. When I'm training someone to be a C++ programmer, it would be nice if I never had to teach them about char*'s and pointer arithmetic. It would be nice if I could go straight to STL strings. But one day they'll write the code "foo" + "bar", and truly bizarre things will happen, and then I'll have to stop and teach them all about char*'s anyway. Or one day they'll be trying to call a Windows API function that is documented as having an OUT LPTSTR argument and they won't be able to understand how to call it until they learn about char*'s, and pointers, and Unicode, and wchar_t's, and the TCHAR header files, and all that stuff that leaks up.

In teaching someone about COM programming, it would be nice if I could just teach them how to use the Visual Studio wizards and all the code generation features, but if anything goes wrong, they will not have the vaguest idea what happened or how to debug it and recover from it. I'm going to have to teach them all about IUnknown and CLSIDs and ProgIDS and ... oh, the humanity!

In teaching someone about ASP.NET programming, it would be nice if I could just teach them that they can double-click on things and then write code that runs on the server when the user clicks on those things. Indeed ASP.NET abstracts away the difference between writing the HTML code to handle clicking on a hyperlink () and the code to handle clicking on a button. Problem: the ASP.NET designers needed to hide the fact that in HTML, there's no way to submit a form from a hyperlink. They do this by generating a few lines of JavaScript and attaching an onclick handler to the hyperlink. The abstraction leaks, though. If the end-user has JavaScript disabled, the ASP.NET application doesn't work correctly, and if the programmer doesn't understand what ASP.NET was abstracting away, they simply won't have any clue what is wrong.

The law of leaky abstractions means that whenever somebody comes up with a wizzy new code-generation tool that is supposed to make us all ever-so-efficient, you hear a lot of people saying "learn how to do it manually first, then use the wizzy tool to save time." Code generation tools which pretend to abstract out something, like all abstractions, leak, and the only way to deal with the leaks competently is to learn about how the abstractions work and what they are abstracting. So the abstractions save us time working, but they don't save us time learning.

And all this means that paradoxically, even as we have higher and higher level programming tools with better and better abstractions, becoming a proficient programmer is getting harder and harder.

During my first Microsoft internship, I wrote string libraries to run on the Macintosh. A typical assignment: write a version of strcat that returns a pointer to the end of the new string. A few lines of C code. Everything I did was right from K&R -- one thin book about the C programming language.

Today, to work on CityDesk, I need to know Visual Basic, COM, ATL, C++, InnoSetup, Internet Explorer internals, regular expressions, DOM, HTML, CSS, and XML. All high level tools compared to the old K&R stuff, but I still have to know the K&R stuff or I'm toast.

Ten years ago, we might have imagined that new programming paradigms would have made programming easier by now. Indeed, the abstractions we've created over the years do allow us to deal with new orders of complexity in software development that we didn't have to deal with ten or fifteen years ago, like GUI programming and network programming. And while these great tools, like modern OO forms-based languages, let us get a lot of work done incredibly quickly, suddenly one day we need to figure out a problem where the abstraction leaked, and it takes 2 weeks. And when you need to hire a programmer to do mostly VB programming, it's not good enough to hire a VB programmer, because they will get completely stuck in tar every time the VB abstraction leaks.

The Law of Leaky Abstractions is dragging us down.


This post is from Alek Weblog. I am also trying to find such a guy or else I have to start similar to Chris Brumme. Before that I have to read Chris Blogs carefully to know how extent he is writing the Internals.

--------------------------------------------------------------------------------------------------------------------------------------------
Where Are Java Senior Engineers?

Reading a blog of such high quality as Chris Brumme where he is dissecting CLR internals in such depth (for example asynchronous operations) that it brings a joy to any engineers heart even if it is Java enthusiast.

I just can not stop to wonder where are Java blogs of such caliber that goes into such details and are written not by users but by creators. Java.net seems to be under control of "How-To" writers (exactly opposite to what Chris is doing), evangelists, SUN enthusiasts, and marketing specialists. Not exactly what I would call "The Source For Java(TM) Technology" and engineers seems to be lacking sorely from "a diverse group of engineers, researchers, technologists, and evangelists at Sun Microsystems" that was supposed to propel that site.

So where is SUN hiding all these Java engineers?

Please let me know if somebody knows where to find them ...
--------------------------------------------------------------------------------------------------------------------------------------------


Tuesday, August 19, 2003

Java Interfaces are a great tool for providing clean designs. I almost hesitate to point out that they can have a minor performance penalty. A method call on an object that is declared as an interface type is slower than a method call on a regular class type. I would only consider attempting optimizing this type of code in the most severely performance sensitive code.

For those interested, I have provided some details. Regular Java virtual method calls are done by looking up the method stored at a constant offset into the object's Class's method table. A method call on an interface will be called on object's with different method table layouts. The correct method in the method table cannot be found with a constant offset. A distinct mechanism for finding the right method must be used. There are technique to make this operation fairly efficient, but it is fundamentally slower than a normal method call.

Sunday, August 17, 2003

This website is personal experience while doing SCEA, SCJP... of Jane. http://www.janeg.ca/

Check this website for Java Tips http://www.javafaq.nu. I have to dowload this eBook yet.

Wednesday, August 13, 2003

One of the first lessons you should learn about tuning is not to guess. Don't guess that there may a performance problem. Implement the code simply, with good coding practices, then measure the performance and find the bottlenecks. If there is a bottleneck, use one of the hundreds of performance tuning techniques available (including reusing objects just like the example above) to improve the performance of the bottleneck. Then document clearly what you've done. (I like to keep the old code in comments.)

The Java News Brief
http://www.ociweb.com/jnb/index.html

This website has good book for Mastering The Fundamentals of the Javaï?½ Programming Language http://www.javarules.com

About equals()

equals() method can be misleading. Every class in Java inherits this
method from the Object class whether it's a Java class or a class
created by you.

If you think about it, equals can mean different things to different
classes. Imagine two people - you can think of them as equal if they
are
the same age, same height, same shoe size OR all of the above. So
clearly, Object has to define SOME criteria of comparing the two.

If you don't write your own equals method, you get the DEFAULT equals
from Object, which basically says that one object is ONLY equals to
another if they are located in the same place in memory. So, even if
they have the same behavior and same attributes to them, if they are
located in different places in memory - they are NOT equals.

To give you another example, take the String class. String class
decided not to keep the DEFAULT implementation of the Object class,
but
to override it and say that if two strings are the same, even if they
are located in different places in memory it will still consider them
equal.

Tuesday, August 12, 2003

Overloading considered harmful


What is overloading, once again? Same method name for different methods - sounds harmless enough!

Sure it's one of the first things Java programmers are confronted with when learning the language. You are told things like: Do not mix it up with overriding - remember, these things may look quite similar, but are altogether different concepts! Then your Java introduction goes on telling you about unique parameter lists, and after one and half pages you get the impression that this is something not so terribly hard to understand. [HK: I can vouch for this argument. In my Java courses, students commonly make this mistake.]

What is the value proposition of this seemingly simple feature?

Shorter interfaces, not bogged down by artificial, tiresome discriminators, and a bit of lexical structuring of your class text: Overloading allows you to indicate the conceptual identity of different methods, letting you stress common semantics across methods so that similarities become apparent at first sight. It's supposed to make your code more readable, and what regards server code - the code, where these method siblings are defined -, it really does.

There are many who like it. There is tons of code using what overloading has to offer. And of course, you cannot even escape it in Java, where you're simply forced to use it when you want to provide different initializers. It seems, overloading rules - a feature not only popular, but tightly integrated into some important programming languages, an open commitment of venereous language designers that surely does not fail to impress the masses. And, what is more: no performance penalty whatsoever...

Now, should we fully embrace overloading in our own code then? Should we use it wherever possible? This discussion shall present an attempt to put the technical facts investigated in-depth by a former edition of this newsletter into a usage perspective - a bit similar in spirit to the popular harping on pointers which you can find in every Java introduction. The seminal idea that overloading clashes with dynamic binding is taken from a discussion of overloading to be found in "Object-Oriented Software Construction" by Bertrand Meyer.

There is no reason to question that naming conventions to indicate conceptual interrelatedness of different methods will benefit the class text where these methods are defined. To adopt the convention of reusing the same method name, however, has unfortunate consequences on client code which can become quite unintuitive, to say the least.

Overloading with parameter lists of different length pose no problem for client code interpretation, as they openly disambiguate client calls at first sight. Things that could irritate you just will not compile. However, when overloaded methods with the same method name have parameter lists of the same length, and when the actual call arguments conform to more than one signature of these overloaded methods, it somehow gets a little hard to tell which methods are actually executed just looking on the client calls. In this situation, you experience the strange phenomenon that the methods being called are not independent of the reference types being used for the calls.

There are several problems related to this, but first let's take another look on the small code example presented in a former edition of this newsletter in order to really get a feel for what it's like when methods being called are not independent of the reference types being used for the calls.

A minimal modification allows us to focus on the ugly side of overloading: The program still tells us which method gets actually called, but on top of that also delivers rather strong comments when overloading is caught to harm our ability to reason about the client code without knowing the server classes.

Basically, we have two fixed instances, which will play always the same roles: one serving as call target, the other serving as argument. Now we mix and match several calls always to be executed on these same instances (always the same target object, always the same argument object) the only difference being the reference types through which these objects are accessed. And behold: Different methods are being called. If you are familiar with this simple setting, you may skip the program part to directly go on with the following discussion.

public class OverloadingTest {
public abstract static class Top {
public String f(Object o) {
String whoAmI = "Top.f(Object)";
System.out.println(whoAmI);
return whoAmI;
}
}

public static class Sub extends Top {
public String f(String s) {
String whoAmI = "Middle.f(String)";
System.out.println(whoAmI);
return whoAmI;
}
}

public static void main(String[] args) {
Sub sub = new Sub();
Top top = sub;

String stringAsString = "someString";
Object stringAsObject = string;

if (top.f(stringAsObject) == sub.f(stringAsString))
//if (top.f(stringAsObject) == sub.f(stringAsObject))
//if (top.f(stringAsString) == sub.f(stringAsString))
//if (top.f(stringAsString) == sub.f(stringAsObject))
//if (sub.f(stringAsString) == sub.f(stringAsObject))
//if (top.f(stringAsString) == top.f(stringAsObject))
{
System.out.println("Hey, life is great!");
} else {
System.out.println("Oh no!");
}
}
}

Can you tell what happens with activating each of the conditions?

Let us carefully go through the code.

There are two overloaded methods spread across a class hierarchy (one class inheriting from another class). This is the server code to be called by the client.
The superclass defines: String f(Object o).
The subclass defines: String f(String o).
The signatures are chosen to make both methods eligible candidates to be executed in the context of calls on the subclass instance with a String argument.
The client provides two objects, reused for all calls and chosen in a way that both overloaded methods are potentially eligible candidates for executing the client calls.
Through polymorphic assignment, the client obtains references of different types for these two instances.
The client makes method calls that differ only in the different references used for making the call. In the given setup, there are 4 different call forms possible: Overloading has the method name fixed, so only the target reference type and the parameter reference type are variable. Every reference type for the target can be combined with every reference type for the argument. (Mathematically spoken, there are 4 binary strings of length 2).
The comparisons then are really just for fun, eliminating detail. They shift the focus of attention from the question what particular method gets called to the general insight that different methods get called, additionally allowing the program to be explicit about its likes and dislikes: Every case of seeming reference-independence of the calls is instantly interpreted as an example of how things should be, and welcomed with a happy, optimistic "Hey, life seems great!" In those some dark moments, however, when overloading casts its dark shadow upon the else so object-oriented Java world, and just nothing seems right, our little program starts to complain... Combinatorics tells us six 2-combinations of a 4-set (consisting of 4 call forms) exist, and so you find six comparisons (five of them showing up as comments), but of course, already one single predicate returning false (different methods having been called) suffices to get the point across.
And that's it.

Discussion
The program shows, once again, that one thing to be aware of in connection with overloading is that it's all about reference types. This is as true for target reference types as it is for parameter reference types. For instance, the predicate "sub.f(stringAsObject) == sub.f(stringAsString)" will resolve to false in our setup because two different methods are executed. This dependence on reference types in connection with overloading may or may not be what you expected, but the question remains if this is a clean approach to object-oriented programming.

No doubt, this may puzzle many a brave programmer, as it is a result absolutely exclusive to overloaded methods. And, as the use of overloaded methods does not identify itself as such in the method call, the intuitive, but unfortunately wrong expectation might be that the predicate returns true, as it would be the case with any gentle non-overloaded method.

Honest, do we like this? No. Object-oriented programming, as we know it, is about objects, not about references. We expect objects to behave the way they are and not the way they are referenced. Objects do their thing regardless of the role the client assigns them. This is how it should be be, and we call this thing dynamic binding. It is not cosmetics, it is not just a feature, it is THE feature. It shapes the architecture of our systems, decoupling clients from servers.

Now, with overloading a second rule, reference type dependence, takes over, breaking the fundamental polymorphic equivalence property described above (that polymorphic assignments do not change the results of method calls as long as the code can be compiled). The choice of references in the client, which should be based on considerations like grouping and low coupling, suddenly has to take the demands of overloading into account. Overloaded server objects affect the design of client code. Cosmetics beat structure. Unlike overriding, overloading cannot just be applied in a server method definition act and end of the story. It is a feature you have to stay aware of in your clients whose specific referencing of server objects influences what functionality gets called in the end. While with dynamic binding alone the method to be executed is completely server-defined, overloading proves to be client-sensitive.

Now to the problems. An important issue closely connected to software quality is readability. Our ability to reason about the software text is essential for any kind of maintenance, and, as you might have guessed by the direction this discussion has taken by now, overloading affects readability of client code in a rather negative way. It is all very well to let the program run and after the surprise look at the server code and explain the strange things away (oh, of course, overloaded methods, you know...), but nevertheless it would be preferable by far to predict the behaviour, simple as it is, by simply (i.e. exclusively) examining the client. Show me the client class, tell me no overloading is involved, and I tell you: "Hey, life seems great!" I can reason about the result of the condition solely looking at the client class.

With overloading being introduced, or even with just the slightest chance of overloading being used (this includes all unknown Java code), this statement is impossible to make, because you cannot tell if the same server method gets called without examining the server sources. In our program, you would have to read three classes instead of one to know what's going on. So, use of overloading weakens the expressive power of client code as the polymorphic equivalence property cannot be relied upon.

Sometimes, of course, you are willing to dig into the server code because you want to find out the exact server method that gets called. But even then overloading significantly complicates things. Without overloading, you just work your way bottom up through the target's class hierarchy, and when you find a match, bingo, you're done! With unknown code or code known to use overloading, this can be only your second step. First you have to examine the class of the reference and find the matching method. Only then can you check the class hierarchy for overriding methods. The bad thing about this is probably not the additional step involved, but that you have to repeat this analysis for every different reference type, because results can vary. Thus, overloading complicates the analysis of client-server interaction.

There is also a psychological dimension to all this. The following will try to show that overloading is not a gentle, unobtrusive language feature, but, as it stands in conflict with other language features, late binding and inheritance, particularly prone to abuse. In other words, overloading is an open invitation for introducing conceptual errors. Think of novice programmers or programmers in a rush. Overloaded methods, coming with its own method selection rules, present an anomaly in the object-oriented landscape shaped by the presence of dynamic binding, and will surely go on to puzzle people, who will falsely think overloaded methods behave like "normal" methods, or mistake overloading for overriding just because the methods signatures involved in overloading look so similar.

In fact, such a mistake may be seen as expressing justified desires regarding object-oriented design. Hell, we'd sure like to see the overloaded methods in our example being handled as an instance of overriding! The parameters of our methods are related through inheritance, so inspired by other programming languages, it does not take great imagination to see the derived class define a method that overrides the inherited method. Of course, this is an additional twist adding a bit of vision to our discussion, and of course, we know that Java does not support such covariant method redefinition (restricting the parameter domain of the method): Most of us have learnt by now that Java allows only specification inheritance (overriding being only defined for methods with the same return and parameter types). But still. Do we not think, deep in our heart, that the subclass method with the more specific parameter should, in a better world maybe, be the one in charge, overriding the superclass method? Think about an Integer class inheriting from Number while redefining addition for integers only. Not allowed in Java, but still desirable (and a real feature in other languages such as Eiffel). Sure, overloading is not to be blamed for an incorrect understanding of inheritance in Java, but it clearly invites such fantasies (and the corresponding errors) when used in a context such as the presented one. And even if such interpretations are wrong - shouldn't they be right?

And then the poor integration of overloading and inheritance in Java, which is very misleading as well. Reference type dependence means that overloading is simply not developed to conceptual consistency in the context of inheritance. Guessing from experience with overloaded methods defined in one and the same class, we might expect the method with the best match in terms of formal parameter type and actual method argument to be called on the object. This does not happen, though. Java does not produce any kind of "flat form" for the object's class with all overloaded methods, inherited or not, appearing side by side in a list in order to allow the runtime to choose the most appropriate.

No, what technically happens, is, in my understanding, that the compiler takes the method symbol plus the parameter reference types of some method call and calculates a position in the method table of the target's reference type. So, choosing between overloaded methods is done compile-time, and it is restricted to the overloaded methods of one class: the class of the reference type. Overloaded methods defined in subclasses of the reference type are never called: Java ignores the exiled siblings although the whole thing looks so very similar to overriding.

With overloaded methods being defined in superclasses of the reference type, Java exhibits quite strange behaviour: While the server code can still be compiled, client code will break: Trying to make a method call where the compiler would have to choose between them, you get a compilation error, complaining that the call is ambiguous. Put the method into the reference type and all is well. Don't ask me why - just remember selection of overloaded methods is limited to the reference type class. I personally believe this further anomaly might is more a compiler issue than a language issue. If you find a logic explanation for this, other than that it helps to improve compilation performance, please let me know.

Once again (the last time): overloaded methods defined in subtypes of the target reference will not be taken into consideration as candidates for execution by the runtime. With the table position given in the bytecode, the runtime will only check if there are overriding methods (which will appear at the same position in the method tables of subclasses if they exist). So, the compiler cannot hunt them down, and the runtime does not want to.

A consequence of this is, disturbingly, that the place where non-overridden overloaded methods are defined in the class hierarchy is of essential importance what regards the selection of the method being actually called. To me, this sounds a little scary, or would you really want your class design to be influenced by the crippled demands of overloading? Summing up: Overloading is a static compile-time feature which does not integrate well with our expectations shaped by dynamic method lookup coming along with inheritance.

What else can we do to shoot the dead man? (Who is still alive enough to ruin our programs, of course.) Bertrand Meyer sees overloading as a violation of the "principle of non-deception: differences in semantics should be reflected by differences in the text of the software" (OOSC 94, Bertrand Meyer). But wait a second, isn't late binding another case where there is only one method symbol for different methods?

As I understand it, the difference between late binding and overloading can be pinned to the observation that late binding lets one method name to be the pointer to one operation contract (which then can be fulfilled by several different methods whose differences are nevertheless absolutely transparent to the client code), whereas overloading lets one method name to be the pointer for several method specifications whose differences can be experienced in the client code. In the scope of the client, there is no difference between polymorphic calls bound to different methods. The polymorphic call specification is all the client has to know about the call. Overloaded methods, on the other hand, need not share common semantics, to be more precise, a common contract, their pre- and postconditions potentially varying wildly. This is something the client always has to take into account: Overloaded methods can not be used interchangeably, as different methods just under the same hood they have to be treated according to their specific contracts. These contracts, however, are hidden behind the same name which makes them hard to identify.The same method name does not point to a common denominator, in this case, but only serves to disguise differences that have to be laboriously disambiguated lateron. The client has to stay aware of the method contract being pointed to by a complicated three component key for the method which, as we have seen, consists of target reference type, method name, and parameter reference types.

So what are my final words to the programmer who, after having read this article, wonders if he should try to use overloading now wherever possible or not? Keep going... And if you really, really want to use it, go on and do so, but only with different method names - this is a trick stolen from real experts that can improve your overloading a lot! :o)


James Gosling:

I've been inescapably tagged as "the Java guy", despite the fact that I havn't worked in the Java product organization for a couple of years (I do still kibbitz). These days I work at Sun's research lab about two thirds of the time, and do customer visits and talks the other third.

SCEA meterial :http://www.leocrawford.org.uk/work/jcea

Tuesday, August 05, 2003

"I've always had this question: do we invent patterns or do we discover them? By definition patterns are sets of (problem-context-solution) and have value only if the problem occurs repeatedly. So it's clear that we invent neither the problem nor the context..."

- Tunisian

Monday, August 04, 2003

Patterns 101: The Factory Method
The source code listings for this article are available in our download section.

It looks as though there are still a lot of software developers out there that just don't get it when it comes to design patterns. When interviewing architects and senior developers, I always ask the following question: "Are you familiar with design patterns, and if so, how would you describe the value in applying them to your work?"

For the most part, I get either no answer or a textbook answer without much substance. I guess that shouldn't be too surprising. I personally picked up the book Design Patterns: Elements of Reusable Object-Oriented Software by the so-called Gang of Four (or GoF for short) several times before I understood patterns, each time saying "this time I'm going to get it". I never did; or at least not by reading the book anyway.

I finally got it while designing an application a couple of years ago. This was an application that I knew would eventually have to be extended to handle new (and changing) requirements. I started sketching out my design, and as I did so, I remembered some things I had read about factories, commands, and how they related to pattern basics. Suddenly those ideas made sense and guided me through a design that stood up to extension. Now that I understand pattern basics and how they apply to my design, I get a lot more out of reading about new patterns.

This article is intended to be very fundamental as it applies to patterns. Anyone who has a basic or strong understanding of the importance and appropriate use of patterns will probably not learn anything new. This article is directed at the developer who has heard about patterns or read about them and finds himself or herself saying "I just don't get it." If that describes you and you are willing to give this article careful thought, I'm confident that this will be a turning point for you. I believe we can make the light bulb go on by taking a simple problem, first doing it the wrong way, and then doing it the right way.

So if you think you have something to learn about design patterns, then on to the example...

The Factory Method Pattern
There are a slew of design patterns out there. Let's look at one of the basics: the factory method. A factory method class commits to return an object of a base class (or interface). The object returned by the factory method will actually be a subclass of that type. The calling program will treat that object in the generic manner (as defined by the base class), and the object will demonstrate specific functionality (of the extended class). These statements will become clearer after we've stepped through the example. You may want to read this paragraph again, once you have a better understanding of the approach.

Example: My Tasks
Being a software architect, I can't remember what I'm supposed to do without writing it down, so I've composed a program to list my daily tasks based on what day of the week it is.

Here is my task list:
Monday: Move the trash to the curb
Tuesday: Put out the recyclables
Wednesday: Tape "South Park"
Thursday: Move the trash to the curb
Friday: Buy beer
See Listing 1: MainProcedural.java for the class I've written to display those tasks.

Note that this class doesn't give me any direction on the weekends. After some time, I decide to extend it to include my weekend tasks:

Saturday: Cut the lawn
Sunday: Call Mom.
The changes I have to make are pretty straight forward:

Change the switch statement to include cases for Saturday and Sunday.
Write new methods to handle the task displays: showSaturdayTasks() and showSundayTasks().
There's an object oriented guideline that I've found to be pretty consistently valid: if you're asking a lot of questions about the nature of your objects in the operational parts of your application, you probably aren't taking an object oriented approach to the problem. In other words, if you see a lot of if, switch, and instanceof statements when you're dealing with your objects, there's a good chance you have some room to improve your design.

So I decide I'm not going to modify my application as I've indicated above. First, I'm going to redesign it with what I consider a better approach:

I'm going to replace the switch statement with a static method in a class that will function as a factory.
I'm going to replace the show[Day]Tasks() methods with individual classes that extend a new base class called DayTasks. (A whole class for each day of the week? And it just prints out a task list? Yeah, probably overkill for this example, but bear with me; it makes sense.)
So I will replace MainProcedural.java with these classes:

The central class as shown in Listing 2: MainPatterns.java
The factory class, as shown in Listing 3: DayTaskFactory.java, that will return us a specific class depending on what day of the week it is.
An abstract base class as shown in Listing 4: DayTasks.java. The core class will deal with objects of this type. We could have just as easily made DayTasks.java an interface.
Five other classes, as shown in Listing 5, that each extend DayTasks.java and manage the specific message for the day of the week.
These classes will each extend DayTasks.java and manage the specific message for the day of the week.

When I run this application, my output is identical to my procedural design. That's not a surprise. We don't employ a pattern-based approach for the application end-user's benefit (not directly, anyway).

So what can I accomplish with design patterns that can't be done with a more traditional procedural approach? If you're talking about application functionality, the answer is: probably nothing. Patterns deal with how you design your system and organize your code. The example in this article will demonstrate how the procedural approach builds a routine and the pattern approach builds a framework. This pattern approach lends itself to simpler extensibility because decision-making is handled in the factory, which itself hides (or encapsulates) the operational specifics that are handled in the DayTasks subclasses. With the procedural approach, the separation of functionality isn't organized (or separated) as well. The act of encapsulating variation is an important principle that runs throughout the object oriented paradigm and many software patterns.

So my initial modification plans:

Change the switch statement to include cases for Saturday and Sunday.
Write new methods to handle the task displays: showSaturdayTasks() and showSundayTasks().
Are now replaced with these plans:

Create classes that print messages for Saturday and Sunday.
Modify the factory to also return these classes.
The modified factory is provided in Listing 6: DayTaskFactory.java, and new classes for Saturday and Sunday are provided in Listing 7.

The Important Part
The reader might think: "So basically you just moved the switch statement into the factory and handled the messages in separate classes instead of using specific methods. Is that such a big deal?"

Response: Yeah, it is. That's the whole idea. It may not seem like such a big deal with this trivial example (in fact it may seem like more work), but as the core application gets more complicated, and the scope of the application grows, the organization of application code is a very big deal. The second approach is more suitable for teams of developers: each developer could work on his or her DayTasks subclasses without worrying about dependencies, working on the same file concurrently, or duplicating effort. And imagine that you got your core classes to the point where they were stable, handling exceptions where necessary, logging, and performing other base functionality as well. Isn't it nice that you extended the functionality of the system at large without touching the core class? That's why this approach is considered a framework: the functionality was extended by adding a class that plugged in to the existing model - a class that wasn't even conceived by the time the core class stabilized.

What are the major points?

The approach we took here is founded on encapsulating variation and polymorphism. Each task (or set of tasks) for a day descends from a common interface (DayTasks). We only deal with this common interface (which the factory always returns), while the underlying implementation may provide varying behavior.
The purpose of a factory class is to isolate the decision-making code of your system. Of course you may need separate factories if you're dealing with a range of object types. When you are comfortable with the concepts presented in this article, you may want to read about a related pattern that builds on the factory method – the abstract factory.
We extended the functionality of our program without changing the core classes (class, in this case), and that's a good thing.
Conclusion
So that's it. I think by having a grasp of the why when it comes to patterns, you can understand what they are. As I mentioned earlier, you might have to implement an example similar to the one presented here to fully understand and appreciate the approach. Look for some application you've written that has a switch statement or a series of if/else conditions. Look at the statements that are called in each if/else block and think about the base classes and extended classes that can perform the necessary operations. You may never code an application the same old way again.

http://www.modelingstyle.info/ for UML Diagram Tips & Guidelines
http://www.itbookclub.net/ for books
http://www.bredemeyer.com/ for Architecture
http://www.softwarearchitect.biz/ for the same above

This is a good Resource for Java 1.4 Examples by Package wise http://javaalmanac.com/