Java — A fractal of bad experiments

The title of this post is clearly a reference to the classic article PHP a fractal of bad design. I’m not saying Java is as bad as that, but that it has its own problems.

Do note that this post is mostly opinion.

And I’m not saying any language is perfect, so I’m not inviting “but what about C++’s so-and-so?”.

What I mean by “bad experiments” is that I don’t think the decisions the creators of Java were bad with the information they had at the time, but that with the benefit of hindsight they have proven to be ideas and experiments that turned out to be bad.

Ok, one more disclaimer: In some parts here I’m not being precise. I feel like I have to say that I know that, to try to reduce the anger from Java fans being upset about me critiqueing their language.

Don’t identify with a language. You are not your tool.

Too much OOP

A lot of Java’s problems come from the fact that it’s too object oriented. It behaves as if everything is axiomatically an object.

No free-standing functions allowed. So code is full of public static functions, in classes with no non-static methods at all.

Object.class is an object, so it can be passed in as an object, to create the ugliest source of runtime type error crashes I’ve ever seen.

Nothing like waiting three hours for a pipeline to finish running, only for it to fail at a final step because of a type error, in what was supposed to be a statically typed language.

Too much heap use

The language doesn’t allow for objects allocated outside the heap. Everything is just an object where the programmer is not supposed to care about where it lives.

Not only is this a problem for readers of the code, but it also makes writing garbage collectors much harder.

Java may have expected that a “sufficiently smart garbage collector” would solve this. It has turned out that the garbage collector needs help from the language to do a good job.

Go does this much better. It does escape analysis on local variables, thus reducing heap use. It also composes objects into its structs, so that one object with 10 (non-pointer) subobjects becomes just one object in memory, not 11.

Anyone who’s ever needed to run a production service written in Java can attest to how much care and feeding the GC needs. These problems are not inherent to a GC, but ultimately come from the design of the Java language.

So it’s not that Go doesn’t have as advanced GC as Java, it’s that it doesn’t even need it.

Another way this is a problem, as this talk gets into, Java optimized for the completely wrong thing. Back then compute was expensive and RAM access slow. So why not have indirections and pointers everywhere? Now computation is super cheap and memory access is slow, and Java is trapped with its bad choices.

The speaker is not even against Java, and still he calls these memory decision Java’s “original sin”.

When is the file opened?

A long time ago now I made a small tool in Java that would take a file, and upload it to a server.

The part that would be easiest, or so I thought, would be to simply read the file. Something like:

File file = new File(filePath);
FileInputStream fileToUpload = new FileInputStream(file);
byte[] buffer = new byte[size];
int read = fileToUpload.read(buffer);
byte[] bytesRead = Arrays.copyOf(buffer, read);

Now, clearly this will throw an exception if the file doesn’t exist (oh, I’ll get to that, believe me). But where?

Which line throws an exception?

Honestly I don’t remember anymore, but I do remember that it wasn’t the one I first thought.

And I remember at the time showing this code to more experienced Java programmers, and they all got it wrong too.

You could call me a terrible Java programmer. And everyone I asked was too. But you can’t deny that this is about as simple a question as you can get about error handling, and it says something about the language if this many people get it wrong.

Terrible error messages

Once upon a time this issue affected C++. GCC has gotten much better with this over the years. If Java was ever good at it, then it sure isn’t now.

Like with the other problems I see where the good intentions came from.

Someone looked at C++ error messages, specifically involving std::string and how there’s huge basic_string<…> everywhere, and decided that wouldn’t it be nice if that template expansion were just an appendix?

Does it really help, though? I’ve had single character errors produce 20-30 lines of this:

C#3 extends Foo<Pair<K#3,V#3>> declared in method <K#3,V#3,C#3>create(Multimap<K#3,V#3>,FooFactory<Pair<K#3,V#3>,C#3>) 
T#2 extends Object declared in method <T#2,C#4>create(TObject<? extends Fubar<T#2>>,FooFactory<T#2,C#4>)      
C#4 extends Foo<T#2> declared in method <T#2,C#4>create(TObject<? extends Fubar<T#2>>,FooFactory<T#2,C#4>)

How is that helpful? How did it manage to be less readable than C++ error messages from 20 years ago?

Virtual machine bytecode

In an interview Gosling has said that the great idea for a Java bytecode VM came from creating an interpreter for some Pascal pcode.

Basically they had some Pascal code that needed to run on another machine, so they decided to interpret the intermediate format, instead of recompiling.

I know that recompiling isn’t as easy as it should be. C & C++ code needs to not depend on size of pointers, endianness, and various other architecture-specific things, in order to be source code portable.

C++ needs to be source code portable to be actually portable.

Java assumed that being binary portable matters. It does not.

The property of “write once, run anywhere” (WORA) does not require a deliverable that is bytecode, and in any case “write once, run anywhere” does not mean “compile once, run anywhere”.

For a simple example of this see Go. It has a cross compiler built in, so writing once and running anywhere just means that you have to create a for-loop to build all the binaries.

WORA would have made sense if Java had won the browser extension war, but it hasn’t. Javascript clearly won. It’s over. And if it does get replaced then it won’t be by Java, but maybe something like webassembly.

WORA isn’t even true. I have jar files from 20 years ago that just don’t run anymore. Others do. But seems about as hit and miss as my 20 year old C++ code. At least the C++ code that needed a fix to work was always broken, it was just that the compiler became pickier (e.g. code was missing an include).

Java saw the pipeline of source→IR→machine code and decided to make the public interface not the IR, but the machine code.

This doesn’t make sense to me. Under what circumstances is it inconvenient to port a compiler backend to a platform, but not to port the JRE?

Why waste transistors in a SIM card to run Java bytecode, when it could run anything you compile for it?

Java bytecode is pretty much IR. Fine, but why are you executing your IR? Or if you’re not, why is your runtime environment including a compiler backend?

This decision doesn’t make any sense on the machines we actually ended up executing code.

So you could do all what Gosling mentions in the interview, without the drawback of not having an actual executable binary.

UTF-16

Born too late to not handle unicode at all. Born too early to know that UTF-8 is the obviously right choice.

UTF-16 is terrible. It takes up twice as much space as UTF-8, yet is not fixed width so it also doesn’t get the benefits of constant time counting of code points like UTF-32 does..

RAM has gotten bigger, but taking up twice as much CPU cache will never not have downsides.

And of course UTF-16 means having to deal with the hell of byte order marks (BOMs).

Fixed memory pool size

Java isn’t the only language that does this. I remember some Lisp implementation that just allocated a few gigs of RAM to work in, and relied on page faulting to death if there was no physical memory to back it.

Incidentally this doesn’t work on OpenBSD, or on Linux if overcommit is turned off. You just can’t run any program written in this language on Linux if vm.overcommit_memory=2.

Because Java is a virtual machine it needs a certain amount of memory. It simply grabs this as a huge chunk, and tells the OS to stay out of it.

Sure, on that level it’s similar to sbrk or mmaping anonymous pages, and having libc allocate objects in there. But libc does that on demand. You don’t have to tell libc how big you want your heap to be. Why would you? That would be madness. That’s the computers job.

But if you’ve not had to deal with -Xms, -Xmx and other options in Java, then you’ve not run a real Java service.

So you have to tweak the GC and the memory allocator. It plays very poorly with the OSs memory management. Great.

Even though (per previous reference) the compaction possibilities enabled by taking over this much is basically just a patch for a fundamental flaw in the language; the fact that it creates fragmented memory in the first place.

Exceptions for non-exceptional cases

Java throws more exceptions than I can count. Real production environments actually graph exceptions per second.

Exceptions are expensive, and should not be thrown for flow control. Yet they are.

This is not a problem with the language, per se, but with its standard library. And the standard library sets the style for the whole language.

C++ has exceptions. But its standard library doesn’t use them for simple errors. This has led to code generally not using exceptions for normal errors.

I say generally, because it’s not rare for C++ code to overuse exceptions. This is a case of C++ making it too easy to shoot yourself in the foot.

Go has exceptions too. But not only does the standard library not really use them, they’re also very crippled, so nobody else wants to use them. Go discourages this feature by making it bad.

This has led to even less use of exceptions in Go. Although the Go standard library sometimes swallows exceptions, something that the C++ standard library would never do.

C++ in a way also discourages overuse of exceptions, by not having a finally keyword. That’s because finally is a code smell of a language.

C++ has RAII, so there is a natural and MUCH safer method of cleaning up, using shared code for the normal and exception case.

Go has defer, which is a poor man’s RAII. (very poor man’s, as it doesn’t even run at end of scope, but end of function, which makes no sense and causes endless ugly code and bugs).

In all these three languages you need to write exception-safe code (yes, even in Go, and code WILL be buggy if you don’t), but finally is just the worst way possible to handle both code paths.

Conclusion

These are the concrete reasons I can think of right now. But I’m sure I’ll think of more eventually.

I have learned many languages over 30 years. Basic, C, C++, Erlang, Fortran, Go, Javascript, Pascal, Perl, PHP, Prolog, Python, and a couple of assembly variants, and more. Naturally there are languages that I like less or like more. But Java is the only language that I hate.