Interesting Links – 2017/May/1

If you have a teammate who suffers from mental illness, I’d encourage you to champion (heart and balance) within your team. Be an ally. Help create a safe and inclusive space. Not only because it’s the right thing to do, but because being around people different from you broadens your horizons and builds empathy. And empathy for others makes you better at just about every job.

LinkedList vs. ArrayList

Recently, The Java Programmer published “Difference between ArrayList and LinkedList in Java“, asserting different cases of when to prefer one List implementation over another. It’s a link full of conventional wisdom, and while it has some good information, it’s also wrong.
Here’s the channel’s factoid on LinkedList, as of 2017/Apr/25 (prior to it having been changed to point to the page you’re reading):

LinkedList is almost always slower than ArrayList/ArrayDeque. It might be best to use ArrayList/ArrayDeque to start. But if you have an actual performance problem, you can compare the two for your specific case. Otherwise, just use ArrayDeque. (It too has O(1) performance for insert-at-beginning, and unlike LinkedList is actually fast).

Implementation

The “Difference” page says that ArrayList uses an internal array to represent stored objects, while a LinkedList uses a doubly-linked list internally. In this, it’s correct.

Searching

The next point of comparison is entitled “Searching.” Here’s what it says about ArrayList:

An elements can be retrieved or accessed in O(1) time. ArrayList is faster because it uses array data structure and hence index based system is used to access elements.

And about LinkedList:

An element can be accessed in O(N) time. LinkedList is slower because it uses doubly linked list and for accessing an element it iterates from start or end (whichever is closer).

First off, in the interest of honesty, the last bit of information – that it iterates from either forward or backward, depending on which is closer – was a surprise to me (although it’s perfectly logical.) It’s also accurate.
However, this isn’t searching – it’s element access. Access and searching are different things; it’s get(42) vs. list.contains(matchingObject). This is important; contains() will have O(N) time for either List implementation, whereas ArrayList will have O(1) for get(), and LinkedList will have O(N) for get().
The thing is: LinkedList is not only O(N) for get() but the nature of the internal implementation makes it slower than one might think, because of cache locality. In an ArrayList, the element references are stored sequentially in memory; a LinkedList potentially splatters the references all over the heap. (This is not entirely likely, but it’s possible.) Thus, a LinkedList not only has O(N) performance for indexed access (where the ArrayList has O(1)), but the O(N) is worse than one might expect.
For accessing elements, there’s no situation apart from accessing the first or last element where a LinkedList competes with an ArrayList‘s speed.

Note that programming hates absolutes. It’s fully possible to create situations in which that last claim is not entirely true… but they’re rare and unlikely.

Insertion

About ArrayList:

Normally the insertion time is O(1). But in case array is full then array is resized by copying elements to new array, which makes insertion time O(N).

And about LinkedList:

Insertion of an element in LinkedList is faster, it takes O(1) time.

Incomplete data is provided here. For one thing, the author conflates mutation with “insertion.” There are two different kinds of additive list mutations (where data is added to a List): insertion (meaning that elements are prepended to other elements) and addition (where an element “follows” the last element in the list prior to mutation.)
Where the elements go as part of the additive mutation is really important, and it’s also important to note that the claim of LinkedList‘s O(1) time is very much a single type of insertion.
For an ArrayList, a list append (i.e., calling add(), which adds the provided element at the end of the list) has O(N) performance, but typically has O(1) performance. The O(N) complexity is because the internal array size might be exceeded by the addition of the element, in which case a new internal array is allocated, and the array references are copied over to the new internal array, and then the new element is appended. Thus, in the worst case, it is O(N), but this is misleading because:

  1. It’s fairly rare (the list resizes with a formula of currentSize*1.5 (the actual line is int newCapacity = oldCapacity + (oldCapacity >> 1);, if you’re interested.)
  2. Java’s blits are very, very, very fast; Java uses the blit mechanism constantly, and let’s be real, modern CPUs are really good at this anyway.

So is it actually O(N), then? … given what conventional Big O notation means, yes, it is; it’s just that O(N) is a lot less expensive than one might think in this case. And typically, the add() will be O(1) in actuality.
But what about LinkedList?
Insertion at the beginning or the end of the list is, in fact, O(1). This is even the common case (add(Object) calls linkLast(Object) by default.) However, if positional notation is used at all (i.e., add(int, Object) you degrade to O(N), because the LinkedList has to find the position at which the new element has to be inserted – and finding that position is O(N).
So what’s the conclusion? ArrayList does in fact have O(N) performance for adds, but it’s close enough to O(1) in real world conditions that we can colloquially speak of it being likely as fast as LinkedList‘s similar case… and if we add any kind of positional insertion, ArrayList‘s degradation under the worst case is far better than LinkedList‘s degradation.
As usual, we can create situations under which this is not the case (for example, if we always insert after the first position in a LinkedList, which will “search” one element and only that element) but these are fairly rare.

Deletion

Let’s see about ArrayList:

Deletion of an element is slower as all the elements have to be shifted to fill the space created by deleted element.

And LinkedList:

Deletion of an element is faster in LinkedList because no elements shifting are required.

Well, um, no.
Here, the author is confusing complexity – big O notation – with speed, for ArrayList. (He or she is wrong about LinkedList in any event.)
For an ArrayList, deleting an element by index is indeed O(N) because the JVM does have to blit the elements after the removed element (it has to shift them all in the internal array.)
However, for a LinkedList, it also has to find the elements around the element to remove – and if you recall, accessing a specific element for a LinkedList is an O(N) complexity itself. Even though the removal is simple (it’s simply relinking the nodes before and after the removed element), actually finding those nodes is O(N), thus deletion is O(N) for LinkedList – and slower than ArrayList, to boot.

Memory and Initial Capacity

The author says that an ArrayList has a default internal capacity of 10. This is incorrect for Java 8. See this code.
For the rest of it, the author is correct; a LinkedList is “empty” (there are no linked nodes internally); an ArrayList has capacity references in the internal list, while a LinkedList has a chain of objects, which will have higher memory consumption, but that’s not likely to matter unless your lists are huge.

The Comparison’s Conclusion

  • As search or get operation is faster in ArrayList so it is used in the scenario where more search or get operations are required.
  • As insertion or deletion of an element is faster in LinkedList so it is used in the scenario where more insertion and deletion operations are required.

The thing that the conclusion is missing is that “insertion and deletion” operations in LinkedList typically involve searches and gets – which are terrible for LinkedList – and therefore for even the operations that LinkedList is potentially better, ArrayList will typically perform better.
The factoid stands. If you need a List, use ArrayList first, then ArrayDeque if your access patterns demand… use LinkedList only after exhausting other possibilities.

Interesting Links, 31 Oct 2016 (and 25 Dec 1038)

  • kumuluzEE is “A lightweight framework for developing microservices using standard Java EE technologies and migrating existing Java EE applications to microservices.” It’s not immediately clear from the web page what makes it ideal for migrating to microservices, but there it is.
  • Don’t Even use COUNT(*) For Primary Key Existence Checks” has some valuable performance advice, and considers multiple DB implementations while offering it.

Now for some Scala stuff:

  • Quo Vadis, Scala?” has some considerations of the value of Scala on the larger JVM ecosystem. Turns out that Scala has more impact than some think it does, and less impact than others think. Go figure. 🙂
  • Sbt makes me want to give up Scala” has some well-considered criticisms of the Simple Build Tool, SBT, Scala’s primary delivery mechanism. Well-written and justified; if a build tool is the primary contact point someone has with a language, and that build tool is slow and uncomfortable to use, well… that impression is going to stick with people. SBT is really powerful… and it sucks. This post walks through a lot of why.
  • From user tetero, apparently inspired by the SBT link, “Scala Collections: Why Not?” is a video presentation by Paul Phillips that feels like it’s a rant about why he’s leaving Scala: it doesn’t do things the way he wants. Sure, what he wants would be good (simplicity, accuracy, honesty, clarity) and it highlights what people see as the worst in Scala: the community’s hardheadedness about how good Scala really is. It turns out to be an anti-Scala rant, in the end, but the truth is that Scala is better than he makes it sound and just as bad as he makes it sound. Use Scala if it works for you, and don’t, if it doesn’t. (The user in question said that this video was part of what led him to Clojure. There’s nothing wrong with Clojure, but like most things, it’s not perfect and Scala’s type system oddities seem like a poor sole motivator.)