8/24/2006

Off to blog.davber.com!

I have moved my thoughts and spirit to davber does IT. So, change your links.

Thanks for visiting me here. You will encounter more "soft" content over at blog.davber.com. Just click on the proper category or tag to get to the hard core stuff.

Bye!

7/05/2006

Swig Swings Java

First I was afraid, I was petrified. Kept thinking I could never live without a proper JNI reference manual by my side. But I spent so many nights thinking how to implement those native methods right. I grew strong, I learned how to carry on without a proper wrapper generator.

But now Swig is back, from a quite silent place. I just Googled in and found that it had reached Version 1.3. I should have closed that stupid JNI reference book. If I had known for just one second that Swig would be back to help me.

After having looked at that version I have changed my mind. Swig does provide some value - or at least does not make the native bridging less dynamic than raw JNI. A simple example of Swig with Java follows.

For Java, you can specify an interface file, which could be the regular C header for the native code, such as

     // File: mathus.h
     
     extern int fac(int arg);

This is done via the following command

     swig -java -module Mathus mathus.h

which will create a few new files:

     
     1. Mathus.java - a class 'Mathus' with wrapper methods for all the functions specified in 'fac.h' (i.e., only the 'fac' function...) These wrappers are very thin (and do not add much value, to be honest.)

     2. MathusJNI.java - the Java side of the native implementation, which is nothing but a signature of the function and the keyword 'native'. Note that Java client code should *not* use this class, but rather the aforementioned wrapper class.

     3. mathus_wrap.c - ah, the JNI implementation in all its g(l)ory. Read it and weep. The reason for all the obfuscation is to handle ALL imaginable compilers.

Let us create an implementation of that advanced numerical function:

     // mathus.c

     int fac(int arg)
     {
          return arg == 0 ? 1 : arg*fac(arg-1);     
     }
     
"But that prohibits those neat last call optimization tricks of the compiler, you should use an accumulating parameter!"

Sit down, you computer science freak! This is not class, this is real life, with real stacks! And do not mention negative numbers!

Ok, so now when we have these five files, whereof Swig created three, we just have to create a DLL. Since we do not want to spend money on Microsoft tools, but rather choose to give our money to the Gates Foundation in a more direct manner, we use GCC. The MinGW port, to be exact. Just install it - because all serious developers need a GCC implementation on their machine. Ok, a version of Emacs along with a decent LaTeX installation might qualify as well.

You have to type (or, better yet, put in a makefile...) the following, after setting the variable JAVA_INCLUDE to point to the include directory of your Java JDK installation (which is usually %JDK_HOME%\include):

     gcc -Wall -Wl,--kill-at -shared -o mathus.dll -I%JAVA_INCLUDE% -I%JAVA_INCLUDE%\win32 Mathus_wrap.c fac.c

The "-Wall" is for our S&M needs. Yes, Mr. Compiler, I have been a bad boy! Spank those unused variables out of me!

Now you have a nice JNI DLL. In order to use it, you just have to make sure to load it explicitly before the class with the native method (MathusJNI) is first used.

For instance:

     // File: MathusClient.java

     public MathusClient {
     public static void main(String[] args) {
          // Let us do something really intricate here:
          int arg = Integer.parseInt(args[0]);
          int facRes = Mathus.fac(arg);
          System.out.println("fac(" + arg + ") = " + facRes);
     }

     static {
          // Yep, let's load that DLL and get it over with...
          System.loadLibrary("mathus");
     }
     }

Just compile it and run. And, more importantly, enjoy! Your first Java app with some cojones.
     
DISCLAIMER: it is late and I am typing all this from memory, so there might be typos, although I doubt it. There probably was one in that doubtful sentence...

        

10/18/2005

Casting Types In C++

These types do not match? What to do? We force one of the expressions to be of the same type as the other! Right?

Wrong.

One should avoid converting values between types, even if one knows – or assumes – that the conversion does not change the value per se.

More importantly, one should avoid C-style conversions, such as

     Foo foo = (Foo)bar;

since they do whatever it takes, including totally unsafe, and platform-specific transformations.

Using C++ there are three different standard type casting operators to use:

  1. Foo foo = static_cast<Foo>(bar);

  2. Foo foo = dynamic_cast<Foo>(bar);

  3. Foo foo = reinterpret_cast<Foo>(bar);

The first operator will behave similarly on all platforms and with all compilers, and is the preferable one. That operator only works if the types are compatible, and will issue a warning if information might be lost, such as in

     unsigned int val = 20U;
     int n = static_cast<int>(val);
     int m = static_cast<int>(&val); // will not compile!

The second operator downcasts a generic type to a more specific type. This is often used to downcast a pointer:

     Employee *employee = GetSomeEmployee();
     Boss *bigBoss = dynamic_cast<Boss>(employee);
     int who = dynamic_cast<int>(bigBoss); // will not compile!

It will only compile if the destination type is a sub class of the source type (as deduced statically from the expression.) Needless to say, one should not subclass, but rather use polymorphic operations, such as the regular virtual functions of C++.

The third operator is the ugly one, just trying to reinterpret the bits of the source as a value of the target type. Note especially that this kind of casting from a base class object to a subclass instance is not guaranteed to work, since such a downcast often requires some manipulation of the pointer. But

     int who = reinterpret_cast<int>(bigBoss); // will compile, but watch out!

Q: Hey, why are these casting operators so ugly? And why not just use the C-style conversion?

     Boss *bigBoss = (Boss*)employee;

A: Because the cast is potentially unsafe and unportable. It will actually try the C++-style casting operators in order, and potentially ending up using reinterpret_cast which is highly unsafe. The ugliness is there to remind you that you should not cast at all…

Never ever use C-style conversions.

Try to avoid casting types – use polymorphism instead.

Only use reinterpret_cast when you writing compilers or abstract machines.

10/08/2005

The Virtues of Laziness

One of the greatest virtues of programmers is that we are lazy. Regularly this means that we do not want to spend a lot of time and/or energy in doing stuff that bore us.

We should be lazy in that common-sensical way, but we should also make our code lazy in two ways, whereof only the second one is analogous to the aforementioned human laziness, whereas the first one is usually considered its dual, “planning ahead”:

  1. Do not repeat labor.

  2. Do not do anything before you need to.

Even though these two statements seem simple and obvious, it is one thing to just read them and imagine oneself intrinsically following them by the nature of being a “programmer.” Actually, there are quite few “programmers” who follow these statements to their fullest, and reach that delicate balance of duals. A balance of expression.

Let us look further at these two seemingly simple statements.

Do not repeat labor

There are two aspects to this: logic and data.

As to not repeating logic, this means identifying a recurring pattern of labor, describing that pattern formally and, finally, using these magnificent machines we call computers to carry out the corresponding laborious tasks over and over again. This goes for all levels, from the simplistic level of actually automating manual work – this is what conventionally is called “programming” - to writing generic code, such as employing the notions and ideas of Generic Programming.

There is also a data aspect to not repeating labor and that is to not calculate the same data over and over again. This is called caching.

Do not do anything before you need to

This ranges from not trying to optimize code in advance to actually not calculating data we might not need.

Unfortunately, only non-common programming languages have intrinsic support for this kind of laziness. Those languages are almost always in the category of functional languages, such as Haskell. Those languages are called lazy – or non-strict if you are into language semantics.

Luckily, there are both libraries for more common languages that provide lazy evaluation, such as the eminent FC++ for C++, and frequently used notions that are intrinsically lazy. Consider TCP/IP streams for this latter category. The revival of asynchronous queuing is also a variant of that example. The problem is that these realizations of the “lazy pattern” are purely inter-modular and quite often even inter-process.


What I am saying?

Be lazy, but never ever sloppy ;-)

I will provide examples of how one can use laziness in C++ later.

10/07/2005

Always Deal With Compiler Warnings

There are many rules of proper C++ programming, ranging from style to use of constructs. Some of these rules are so “obvious” that masters in the field no longer bother to mention them.

One of these “obvious” and old rules is one of the most important ones, and, luckily, easiest to follow: to always deal with compiler warnings. Always. No exception.

What this means is to simply not have any warnings. There are two ways to get rid of warnings:

  1. Comply with the compiler, i.e., write better code.

  2. Hide the warnings – via compiler flags or pragmas.

I hope it is obvious that the first one is the best option, and one should strongly work towards that goal. There are very few exceptions, where one might consider the second option. One of them is where the compiler does not comply with the language used, such as Microsoft’s C++ compiler’s constant complaining about not caring about “throw declarations.” Another one is when you are doing intrinsically tricky stuff with bits and nibbles, such as writing a compiler.

In other words, you should almost always get rid of warnings by not provoking the compiler. And, you should always get rid of them.

Always.

If only life outside our software confines was this easy.
    

8/18/2005

Exceptions are part of the interaction

This blog entry discusses exceptions and their relationship to other interaction points. I here argue that the throwing and catching of exceptions should be regarded more or less as any other interaction between modules and thus conform to the chosen abstractions of those modules.

There is a constant debate about the virtues of checked exceptions. Checked exceptions mean that the potential thrower, or propagator, of the exception have to declare that potential act of throwing in advance, and that any users of the code have to either catch such exceptions or in turn declare itself as a propagator.

In the naïve use of this model, a ripple effect is seen, where either:

  1. A chain of client modules all have to declare the potential throwing of that same exception. Pass thru.

  2. One of the client modules have to handle that exception, potentially thrown from deep down the call chain.

The first alternative seems redundant and tedious. The second one is scary, since the upper-level client module might have little knowledge about the lower-level exception thrown. The typical example of the second scenario is a mapping abstraction throwing a NoSuchKeyException due to a non-existing key. That exception does not make much sense higher-up the call chain, where the module IncreaseEmployeeSalary is trying to deal with the fortunate event of an employee getting a raise.

It is quite understandable that developers, faced with these sad scenarios, tend to either silently ignore telling clients about that potential exception, where the language allows for such laziness, or use unchecked exceptions – a class of exceptions being so unpredictable that not even the implementer himself knows that it will be thrown. Unchecked exceptions often deal with resource availability issues, such as OutOfMemoryException.

In the face of having to deal with checked exceptions – after all, the serving code could have been implemented by someone else, not as lazy as the client developer – those developers would simply catch those, often lower-level, exceptions and do nothing, but perhaps log it. There is not much knowledge about the exception at that more abstract client level, so what else can one do?

Answer: One should treat exceptions with the same respect as any other interaction between modules, i.e., make it part of the API, at the same abstraction level as function calls.

This implies that the IncreaseEmployeeSalary module should not throw a NoSuchKeyException. Whether it informs its surrounding of that potential event – by declaring the throwing of that exception - or not is irrelevant to this discussion. That exception is simply at a completely wrong level of abstraction.

What should happen is that IncreaseEmployeeSalary is informed of an underlying problem, perhaps directly from that mapping abstraction, if the former happens to be a direct client of the latter. That event should in turn result in another event, possibly throwing an exception at the proper abstraction level of the former module, e.g., EmployeeNotFoundException. Note that there should not be an automatic “translation” of the lower-level to the upper-level exception, since the contract of the higher-level module might stipulate some other, non-exceptional, handling of that event.

Potential Problems
  1. Frameworks have no idea what the domain-specific exceptions would be.Yes, that is true, although one could in most cases wrap such framework components in more domain-specific code, which could turn the exceptions to more adequate ones. This is harder with template-based generic constructs, which are even mixed in via inheritance in some cases. This is why most hard-core C++ developers have given up on declaring exception throwing, but simply either wait for that unhandled exception handler, which often terminates the thread, or catch anything at a high level.

  2. How can you know what exceptions are potentially thrown from deep down?In environments forcing the declaration of such acts, there are ways. The important point here is that the implementer should know, either way, following the principle of treating exceptions as any other API element.Exceptions from deep down should be considered bugs.

  3. During debugging, at least, one needs to know what INITIALLY triggered this exception.Most language environments allow for nested exceptions, where a higher-level exception can carry the initiator as a sub object.

I am not stating that one should always convert each exception at each level. It might be the case that the low-level exception is fully adequate at the higher-level interface.

The main point is that the client should always understand the meaning of the exception thrown from the serving code, talking the “language” appropriate between the two modules. The same modularity aspects as with class methods, in other words.

8/17/2005

10 Steps to Software Solution Nirvana

These steps are applicable to any kind of programming task, and in most languages. Admittedly, Steps #6 and #7 are hard to achieve in most programming languages, but we can at least try…
I know that most of us have heard of these steps in various contexts, but the point is to actually follow them, strictly. It is possible and it is my firm conviction that they will make you a better problem solver. Steps #1 to #5 are pretty easily digested for most developers, whereas Steps #6 to #10 might no be.

Furthermore, the single most important step is Step #10. So, if you cannot follow the other steps, for any, political or not, reason, that is the step you should really contemplate and then take.

1. Never Jump
Do not use goto’s - not even disguised in the form of break’s of continue’s. The only exceptions to this rule are in switch statements and if you happen to be lucky enough to implement a threaded interpretation of an abstract machine.
Instead, if you want to affect the flow of control, please use those nifty built-in flow control constructs for conditional and iterative execution. If you are lost and really want to change the flow radically, raise some kind of error or exception signal. Refrain from extra-linguistic facilities such as longjmp.

2. Only One Exit Point
Do not use multiple (normal, i.e., non-erroneous) exit points from a modularized piece of code (function, methods whatever...) This obviously incorporates never having various return statements.

3. Avoid Massive If-Else
Do not use massive if/else’s to emulate a mapping. Represent that mapping instead, hashing or not. In some cases, the built-in core facilities such as switch are warranted, given the performance and small footprint, but those are exceptional cases.

4. Use Recursion
Yes, use recursion, rather than explicit iterations with managed state variables, i.e., watch out for excessive use of while and for. Recursion did not cease to be useful just because you left school. Do not be afraid of that stack biting you, unless you work in a very limited environment. After all, Conquer & Divide is not only a viable approach for heterogeneous (sub) problems…

5. Use Generic Operations
Abstract not only “things” but also “general operations.” These operations often represent crosscutting concerns in the domain and, in lack of proper AOP tools, should be implemented via higher-order programming. Those 50 years of Structural Programming should not be completely thrown away, in other words.

6. Abstract Functions
Use higher-order functions and functions as first-order “objects.” Could be viewed as a corollary to Step #5, but is really not. This is more focused on Functional Programming thinking. Functional Programming is definitely achievable in most popular languages, such as Python, Java and C++.

7. Avoid Variables
“What, is he crazy?” Perhaps, but it is my view that any explicit state-holding object is bad.
So, what should we use instead? Use higher-order composers. These composers are often manifested as function adapters.
Combine, or compose, constructs instead of passing information between constructs via variables.
For pure manifestations of no-variable languages, look at IFP (Illinois Functional Programming) or Super Combinators.

8. Avoid Index Variables
Do not increase index variables just for the fun of it. I.e., avoid simple for loops. Instead, use generic traversing constructs (foreach and such.) This is definitely a corollary of Step #7.

9. Avoid Inheritance
Let things cooperate instead of absorbing things by improper inheritance. The cooperation should preferably be setup in compile-time, for efficiency and proper type checking. This avoidance of inheritance is the latest fad in the object-oriented community. Nevertheless, it is a valid step to take…

10. Code As You Think
Use the same words when you code as when you describe, or contemplate, the problem and its solution. If those words do not exist in your programming environment, introduce them by extending the vocabulary (often definining…) This might sound simple, but most of us give in to an urge of steering the computer, instead of – declaratively – describing the solution. This requires the use of top-down definitions at times, where words (such as functions or types) that are not yet defined are used. Scary, initially, but powerful.

I will later supply examples of code complying and breaking these steps.