Jackson-databind and Default Typing Vulnerabilities

Today, GitHub sent out security notices to owners of projects using old jackson-databind versions (older than 2.8.11.1 and 2.9.5). These notices pertain to this issue. I have talked about its relevance before on IRC, but since it is getting more attention now, I will describe it here again.

The Problem

The “bug” comes from using the so-called default typing. This feature allows a user to deserialize subclasses (or even Object) without specifying the full possible type hierarchy. Consider this model:

interface I {
}
@Data
class A implements I {
    private int i;
}
@Data
class B implements I {
    private boolean b;
}

Now, if we wish to serialize the interface I, you usually need to specify some sort of type info. This is typically done through an annotation on the interface:

@JsonTypeInfo(use = JsonTypeInfo.Id.NAME)
@JsonSubTypes({@JsonSubTypes.Type(value = A.class, name = "A"), @JsonSubTypes.Type(value = B.class, name = "B")})
interface I {
}

If we now serialize an object of type I, we get a result like this:

new ObjectMapper().writerFor(I.class).writeValueAsString(new A())
{"@type":"A","i":0}

Deserializing this JSON works as expected.

Default typing

This annotation-based registration is fine for small use cases, but can get cumbersome if the types are in different modules, or there are just a lot of them, or it would be bad style to reference them from the parent class. Normal Java serialization does not have this problem (it just carries the actual dynamic class name with it), so this could be a barrier for adoption of Jackson for previous Java serialization users.

The problem here is that we want people to migrate from Java serialization. There are lots of reasons, most of them compelling.

Enter default typing. With default typing, we don’t need the type info annotations at all:

new ObjectMapper().enableDefaultTyping().writerFor(I.class).writeValueAsString(new A())
["net.hawo.tv.tvui17.A",{"i":0}]

(Ignore the fact that this is now suddenly an array – this is one of the ways jackson may include type info)
The idea is simple: Include the full class name in the json, and you can simply get the proper class at runtime! Sounds good, right?

The Vulnerability

Well, turns out this is not that good of an idea. Java serialization does a very similar thing (though it still requires the named class to be serializable, which jackson doesn’t) and this has lead to what feels like a third of all serious security vulnerabilities in Java applications, ever. The problem lies with the fact that Java classpaths are often massive, and allowing any class on that classpath to be deserialized at will can be disastrous since it exposes a huge attack surface. If you can get any class on the classpath to execute code when deserialized with jackson, you have successfully achieved remote code execution.
This is exactly what happened with Jackson. Some classes that were common on user classpaths could be deserialized to execute arbitrary code. The fix for this issue is basically a blacklist of a few of these classes that could be exploited. Blacklists are not a solution, though, and since this first list, the list has been amended several times. The maintainers are playing whack-a-mole here, and in my opinion it is a waste of everyone’s time to be adding all exploitable classes to this list.

The Solution

Our experience with this same issue in the Java serialization world tells us not to deserialize untrusted data. Luckily, Jackson is much more secure than Java serialization – if you don’t use default typing. The only acceptable solution to this issue in the long run is: do not use default typing to deserialize untrusted data. Default typing is rarely necessary or even a good idea.
Unfortunately, online resources saying this are sparse. Default typing is an “easy” solution, and many people simply do not have the security awareness to see the issue with it – they will stumble over stackoverflow answers such as this one and simply enable default typing to easily serialize Object fields. The documentation of Jackson also doesn’t highlight this as much as it should.
Two alternatives to default typing exist in Jackson:

  • Normal, annotation-based typing as shown above. This allows you either to use the full name as with default typing, or even specify your own name for greater compatibility (you can later change the class name without affecting the serialized representation). This is the “standard” solution, and the appropriate one for most use cases.
  • Should you not know the possible subtypes of a class you wish to deserialize in advance, you can use the rich registerSubtypes API to dynamically add the types you desire. These types could be detected through an existing module system you are already using, or using something like SPI.

All in all, I am a bit dissatisfied with the attention this issue has gotten. The issue is a security vulnerability by design, and anyone using default typing should have been aware of it. Luckily, default typing is not on by default. I do not have statistics but I would be surprised if many people used it or knew of its existence in the first place, and so I find the attention GitHub has given this a bit over the top – the biggest thing these notifications will spread is uncertainty about Jackson, so this article was an attempt at clearing up what it’s actually all about.

OpenJDK 11 on Fedora

Interesting in running the current version of Java (and its compiler) on your Fedora instance? It’s easy:

sudo dnf install java-11-openjdk-devel
sudo alternatives --config java # select Java 11
sudo alternatives --config javac # if necessary, select java 11

Note that this is still labeled as the “early access” version as of this writing – but the early access version is virtually identical to the released version.

The JavaChannel Podcast, Episode XVII (17)

The XVIIth podcast is live! … one wonders why the Roman Empire didn’t last much longer, doesn’t one. One also wonders how long one can refer to oneself in the whatever-person this is. (It’s… second-person? No, folks, it’s “third person.” English grammer fumbling and lesson over.)
In this podcast, we chase a few things: the discussion topic centers back around on Python vs. Java, and why one might encounter one over the other in certain situations.
We also jump down the gullets of a lot of varied topics, only some of which are Java, but most are at least sort-of relevant:

  • Expanded switch statements in java! … Stuff we can look forward to in Java 12. Java’s looking more and more like Kotlin all the time.
  • Hibernate with Kotlin – powered by Spring Boot describes, of all things, using Kotlin, Hibernate, and Spring Boot together. (The summary has a good laundry list of things to pay attention to.)
  • Another Java Enhancement Proposal: the Java Thread Sanitizer. It would provide a dynamic data race detector implementation for Java and native JNI code utilizing ThreadSanitizer in OpenJDK with interpreter and C1 support… and while it’s great that tools like this might exist, it’s fantastic that they’re apparently rarely needed if they show up twenty years after Java’s release.
  • The Left Hand of Equals is not strictly java related, but still interesting. It mostly centers around what it means for one object to be equal to another. I wonder if the channel blog should have recommended LINKS as well as books….
  • TomEE: Running with Systemd is a pretty fair walkthrough, and should be applicable to other app servers as well… that is, if appservers are still relevant in the world where microservice containers do be existin, mon.
  • https://brilliant.org/ is a kinda neat site for math and logic skills. If you use it, don’t forget to take the question surveys, because that’s how it’ll improve.
  • How to Be a Person Who Can Make a Damn Decision is the first of at least two annoying links for Andrew – this one says it’s “how to be a person who can” but actually mostly documents people being those kinds of people. We also have the resurgence of the podcast drinking game; take a shot whenever game theory is mentioned. However, the article doesn’t really have a lot of takeaways, apart from pointing out that the ability to make a decision quickly is probably a worthwhile skill to have.
  • OpenPDF 1.2.1 has been released. Joe didn’t even know about this library. No idea how useful it is; this release doesn’t look like a big one from the surface, but still: the more libraries out there, the merrier, right? (Unless they’re logging libraries.)
  • 7 Scientific Benefits of Reading Printed Books is the second annoying link for Andrew. It goes over some, uh, tenuous reasons print books are worth reading, some of which were taken exception to. Joe thought it was worth thinking about when e-books are ALL the rage for programming topics…
  • Other tangential topics:
    • https://hmijailblog.blogspot.com/2018/08/you-could-have-invented-lmax-disruptor.html I hated reading this, so I stopped.
    • https://perens.com/2018/08/22/new-intel-microcode-license-restriction-is-not-acceptable/ Apparently Intel was saying not to publish benchmarks, which is kinda gross. However, worth noting is that after the initial scraping of this article, Intel backed down and changed the police. Way to go, Bruce Perens!

The JavaChannel Podcast, vol 16

It’s only been six months, so it’s finally time for a new podcast. This one doesn’t even pretend to go over the mountains of killer content from ##java since the last podcast – it focuses on some of the more recent links, and that’s it. Well, apart from talking about the Java ecosystem a bit, especially in contrast with Python, an upstart language that’s making a lot of headway lately thanks to a giant upsurge in data science applications.


(A bit of irony: the very first paragraph in the podcast says it’s only been “four months” when it’s actually been six. Yikes.)


But there are some interesting links, and here are the ones the podcast focused on!



This was written with the new editor plugin for WordPress, called “Gutenberg.” It’s a lot like Medium.com’s editor. It’s effective for writing… unless you have any actual features you want in the text.


Gradle Properties

Gradle supports properties in builds just like Maven does, but the actual documentation and examples are harder to come by. This article will show you something that actually works.
In Maven, you typically have a dependencyManagement section that declares the dependencies with versions; I typically put the versions in a properties block so that they’re all located in a nice, handy, easy-to-manage place. See https://github.com/jottinger/ml/blob/master/pom.xml for a simple example; it’s not consistent with the property usage, but the idea should be clear. I have my versions all coalesced into an easy place to update them, and the changes propagate through the entire project.
Gradle examples rarely do anything like this. Therefore, for your edification, here’s a simple example:
gradle.properties:

aws_sdk_version=1.11.377
kotlin_version=1.2.50

build.gradle:

buildscript {
  repositories {
    mavenCentral()
  }
  dependencies {
    classpath "org.jetbrains.kotlin:kotlin-gradle-plugin:$kotlin_version"
  }
}
apply plugin: 'java'
apply plugin: 'kotlin'
repositories {
    mavenCentral()
    jcenter()
}
dependencies {
    compile "org.jetbrains.kotlin:kotlin-stdlib-jdk8"
    compile "com.amazonaws:aws-java-sdk:$aws_sdk_version"
}
tasks.withType(org.jetbrains.kotlin.gradle.tasks.KotlinCompile).all {
    kotlinOptions {
        jvmTarget = "1.8"
        javaParameters = true
    }
}

You can use a constraints block in the dependencies to do the same thing as dependencyManagement in Maven; that would look like this:

dependencies {
    constraints {
        implementation "com.amazonaws:aws-java-sdk:$aws_sdk_version"
    }
}

This would allow submodules (or this module) to use a simpler dependency declaration, leaving off the version, just like Maven can do:

dependencies {
    compile "org.jetbrains.kotlin:kotlin-stdlib-jdk8"
    compile "com.amazonaws:aws-java-sdk"
}

If you don’t want to use gradle.properties, that’s doable too (as pointed out by channel member matsurago) – you can put it in the buildscript block, as so:

buildscript {
  ext.aws_sdk_version="1.11.377"
}

Using multiple cores, the basics

The problem with multicore code

Having an application use more than one CPU during its execution introduces an explosion of execution paths. Instead of a simple “first Java executes the first line of this method, then, it executes the second line of this method, all the way to the end,” it becomes: “An arbitrarily chosen thread chooses an arbitrary number of instructions to execute, then the VM will make an arbitrary choice as to whether or not to make any changes to fields this thread performed visible to other threads unless you’ve done a good job on reading the Java memory model to ensure propagation of these changes.”
The former is easily understood. The latter is (largely) untestable and unfollowable, and therefore, it’s easy to write code that looks good, passes all tests, runs great on your machine… and bombs in production five days later, in a way that is utterly mystifying; no exception pointing at the problem. The problem won’t be reproducible (do the same thing, click the same buttons, feed in the same data… and now it works fine. And then tomorrow when that important customer logs in, it’ll fail again on you). Finding a bug like this is literally on the order of 100 to 1000 times harder to find, so avoiding even one such bug is ‘worth’ having 100 others.
Nevertheless, using all the cores in your CPU is:

  • Practical, in that just running it all on one can be unacceptably wasteful for performance, but more importantly
  • Inevitable, because you don’t write every line of code in your app yourself (you use libraries; the core java.* libraries at the very least), and THOSE will sometimes use threads.

So, what is a programmer to do?
You may feel like the only right answer is for the programmer to completely understand threading and learn how to write bug free code (given that tests can’t really find these bugs anyway, and even if you so happen to run into one, they are hard to reproduce and its hard to figure out which line(s) of code are faulty when you do find them). This means reading up on the entire Thread API and reading the Java Memory Model front-to-back (If thread 1 writes some data to some field, and sometime later thread 2 reads this field, it may or may not see the change. The JMM defines when Java is, and more importantly is not,  required to have one thread ‘see’ updates of another.
But that’s a fool’s errand. You cannot guarantee writing bug free code.
Instead, anytime you have a need to have your code run on multiple cores, try to push towards going big or going small; in both cases, you NEVER call ANY method on Thread (possibly you do some Thread.sleep in the ‘going big’ style), you have no need whatsoever of synchronized, no state is shared between threads, and you don’t create new threads (some framework / library does this for you).

Going Big

Write or use a framework that runs your code for you and which takes care of multithreading. For example, a web framework is such a thing: You start it, and then it runs your code (and it takes care of creating whatever threads are necessary). Generally in these cases you do not write public static void main, or you do, but all that your main does is fire up the framework while you configure it or pass to it a bunch of ‘handlers’, and then the framework does the hard work for you.
Such frameworks are really good at setting up threading. This way each handler can be in its own little world; if a thread shares no data with any other, reasoning about threads is much, much simpler. Take webservers: Any given web handler simply does not get to interact with other handlers. It’s not like you can ask the web framework for a list of WebHandler objects that you can then call methods on.
There is often still a need to have 2 handlers interact. However, you should do this interaction by using tightly controlled communication channels where the issue of how concurrent operations interact is either irrelevant, or very well specified. There are 2 usual ways to go:

  • Databases. Databases define how concurrently running operations go via transactions. So, use transactions, make your database calls, and the database takes care of it. Communications-via-database is very common in web frameworks. You can use JDBI if you want to write SQL directly, or use Hibernate if you just want to store objects persistently.
  • Message queues, such as RabbitMQ. The idea behind these is that one handler will tell the message queue framework that it is interesting in all ‘foo’ events, and the other will tell the message queue framework: Here’s a ‘foo’ event, please deliver it to all handlers that said they’d like to know about it. All communication between handlers goes via the message queue.

Editor’s note: the projects mentioned here are very far from being your only choices. JDBI and Hibernate are good, but note that there are projects like myBatis, jOOQ, and others for SQL-ish libraries, and Hibernate is only one JPA implementation among many. Likewise for messaging: RabbitMQ is an AMQP 0.9 server, but you don’t use the JMS standard to talk to RabbitMQ; you do use JMS when using ActiveMQ, HornetQ, or any other of … quite a few messaging platforms. This is not saying that RabbitMQ or Hibernate, et al, are bad – your Editor uses them daily – but remember that your choices are not limited.

Going Small

Reduce the code that needs to run multithreaded to the smallest possible thing it can be, and then write a single stream/collection based operation that uses threading to apply this operation to a great many inputs concurrently. The fork/join framework is the usual go-to here.
Imagine you are writing a bitcoin miner. This operation can be described as follows:
GIVEN: A list of a few million randomly generated codes to try, and exactly 1 input block.
TASK: Write the hash into that 1 input block, hash it, and see if the hash ends up having the appropriate amount of 0s at the very end.
You can do this by going small: Write a trivial method (it won’t be larger than half a screen’s worth, common in this ‘go small’ model) which injects the code into the block, hashes it, and returns an empty string if the hash is not suitable, and if you hit the jackpot, returns the block. Then tell the framework you only want the non-empty returned values and voila, you’ve written a parallel bitcoin miner without ever having to touch java.lang.Thread.

Libraries

In practice, especially for the ‘going big’ route, you may need to have a cache or some such that should be shared between whatever threads your web framework is making for you, but try to find libraries for this, too.1 There are some tricks to interacting with such libraries. Generally, you have to go ‘atomic’. For example, you can use the various collections in the java.util.concurrent package, but, the only guarantees it can make is that a single method call does the right thing. It cannot guarantee that a series of calls does the right thing. So, don’t do this:

if (concurrentMap.containsKey(myKey)) {
    String v = concurrentMap.get(myKey);
    operateOn(v);
}

Instead, do this:

String v = concurrentMap.get(myKey);
if (v != null) {
    operateOn(v);
}

The bad example can cause a problem: What if, in between the call concurrentMap.containsKey and the call to concurrentMap.get, some other thread removes the entry from the map? You’ll now call operateOn on a null value, not what was intended. Problematically, if this can happen, it will happen, but only very very rarely. Tests are unlikely to catch this problem.
Don’t do this:

String v = myCache.get(myKey);
if (v == null) {
    v = doExpensiveCalculationOfValue(myKey);
    myCache.put(myKey, v);
}

Instead, do this:

myCache.computeIfAbsent(myKey, k -> doExpensiveCalculationOfValue(k));

The reason to use computeIfAbsent here is because the first snippet can lead to running doExpensiveCalculationOfValue more than once. In the first snippet, imagine two (or more) threads get to the code with the same key about the same time. They’ll both find that there is no associated value (yet), so they both call doExpensibeCalculationOfValue and then they both set it. With computeIfAbsent, provided you use a properly concurrent map, such as java.util.concurrent.ConcurrentHashMap for example, doExpensiveCalculationOfValue is only ever called once, guaranteed.

Lessons

  • Use frameworks, either large: Web frameworks, or small: fork/join.
  • Thread-to-thread communication uses abstractions like message queues or databases. Don’t share state, don’t touch any fields from more than one thread.
  • If you must have direct thread-to-thread interop, use libraries with collections types that are designed for this, such as java.util.concurrent and guava‘s CacheBuilder stuff. When interacting with these collections, know that consecutive method calls to them have no guarantees of internal consistency, so reduce it to one call. Be aware of methods like java.util.Map‘s computeIfAbsent to enable this.

Footnotes

1 Editor’s note: caches are nontrivial in and of themselves. Go for “transactional caches” if you can – or just trade the speed that a cache would give you for the reliability that using a system of record gives you. Figure out what you need and do what fulfills that, before thinking that a cache is magic performance sauce – and note that the whole point of this article is that “magic performance sauce” usually isn’t what it says it is.

Article on java.time

Channel denizen (yes, that’s what we call people in the ##java channel, “denizens”) yawkat has published “An introduction to java.time,” an article to try to clear up some of the confusion around the time API in Java: things like LocalTime, OffsetTime, and the like. If you’re still using Date, this article is for you. Includes handy ways to convert between the time types, and is likely to grow further at need.

JavaChannel’s Interesting Links podcast, episode 15

Welcome to the fifteenth ##java podcast. Your hosts are Joseph Ottinger, dreamreal on the IRC channel, and Andrew Lombardi (kinabalu on IRC) from Mystic Coders.
As always, this podcast is basically interesting content pulled from various sources, and funneled through the ##java IRC channel on freenode. You can find the show notes at the channel’s website, at javachannel.org; you can find all of the podcasts using the tag (or even “category”) “podcast”, and each podcast is tagged with its own identifier, too, so you can find this one by searching for the tag “podcast-15”.
Discussion today centered around specifications: any good ones out there? What would good specifications look like?

  1. A few podcasts ago (podcast 11) we mentioned “resilience4j” – well, there’s also retry4j on github. It’s another library that offers retry semantics. retry4j’s semantics are really clean; maybe it’s me (dreamreal), but it seems more in line with what I’d expect a retry library to look like.

  2. From Youtube, we have a video showing how to use Graal in the JVM. Graal exposes JVMTI internals – which means that theoretically you could write a replacement for the just-in-time compiler, and change some fundamental ways of how Java works. Twitter is using Graal today, and the presenter says that it works fantastically; it’s also slotted to become standard in the next few revisions of Java, so we’ll have it probably next month or so.

  3. From the channel blog itself, user yawkat posted a way to cache HTTP results regardless of response with OkHttp. Why is this important? Well, if you’re testing a crawler, it’s handy to not slam the target server – even on results that should give things like “not found.” It’s written in Kotlin; the Java code would be trivial (and maybe we should post it as well just for the sake of example) but it’s still pretty handy.

  4. There’s also a Java Enhancement Proposal to allow launching single-source-code files like “java File.java”, and it automatically invokes javac; also with support for shebang, so we could be looking at Java scripting before too long. Some people already do this with scripts; they have something that pipes Java source into a temporary file, compiles it, and runs it, but this would be more standardized.

  5. We’ve seen a few posts about Hibernate on the channel lately, including some about projections (where an object isn’t represented as such in the data model but is constructed on-the-fly). Vlad Mihalcea actually had an appropriate post about caching such projections.

  6. We also have seen references lately to Project Sumatra – GPU enablement for the JVM, which seems abandoned – and then there’s this from the valhalla developer’s list: there IS a GPU exploitation thing being worked on. Who knows, we might see highly-GPU-enabled JVMs or libraries innate for Java before long.

  7. DZone has an article called “Java Phasers Made Simple” – basically a set of limited-scope locks. It’s really well done although quite long; hard for it to go into detail without being fairly long, though, so that’s okay. Worthwhile read.

Caching All OkHttp Responses

When (integration) testing scrapers you need to strike a balance between “get as much input data as possible to cover parsing edge-cases” and “don’t DoS the backend, please”. Caching can help with that, and gives you a nice performance boost during testing on top.
The OkHttp http client library contains a cache built-in but by default it follows HTTP server caching headers. It also allows you to set a particular request to always use the cache, and error if the request isn’t present yet, but that is too harsh (we do want to fetch the first time after all). Turns out you can do both:

val cachedHttpClient = OkHttpClient.Builder()
        .cache(Cache(File(".download-cache/okhttp"), 512 * 1024 * 1024))
        .addInterceptor { chain ->
            chain.proceed(
                    chain.request().newBuilder()
                            .cacheControl(CacheControl.Builder()
                                    .maxStale(Integer.MAX_VALUE, TimeUnit.SECONDS)
                                    .build())
                            .build()
            )
        }
        .addNetworkInterceptor { chain ->
            log.info("Fetching {}", chain.request().toString())
            chain.proceed(chain.request())
        }
        .build()!!

This is kotlin code, but you get the point.

JavaChannel’s Interesting Links podcast, episode 14

Welcome to the fourteenth ##java podcast. Your hosts, as usual, are Joseph Ottinger, dreamreal on the IRC channel, and Andrew Lombardi from Mystic Coders (kinabalu on the channel), and it’s Wednesday, February 7, 2018.
As always, this podcast is basically interesting content pulled from various sources, and funneled through the ##java IRC channel on freenode. You can find the show notes at the channel’s website, at javachannel.org; you can find all of the podcasts using the tag (or even “category”) “podcast”, and each podcast is tagged with its own identifier, too, so you can find this one by searching for the tag “podcast-14”.

  1. javan-warty-pig is a fuzzer for Java. Basically, a fuzzer generates lots of potential inputs for a test; for example, if you were going to write a test to parse a number, well, you’d generate inputs like empty text, or “this is a test” or various numbers, and you’d expect that your tests would validate errors or demonstrate compliance to number conversion (this is a way of saying “it would parse the numbers.”) A Fuzzer generates this sort of thing largely randomly, and is a good way of really stressing the inputs for the methods; the fuzzer has no regard for boundary conditions, so it’s usually a good way of making sure you’ve covered cases. The question, therefore, becomes: would you use a fuzzer, or HAVE you used a fuzzer? Do you even see the applicability of such a tool? There’s no doubt that it can be useful, but potential doesn’t mean that the potential will be leveraged.

  2. Simon Levermann, mentioned last week for having released pwhash, wrote up an article for the channel blog detailing its use and reason for existing. Thank you, Simon!

  3. Scala is in a complex fight to overthrow Java, from DZone. Is the author willing to share the drugs they’re on? Scala’s been getting a ton of public notice lately – it’s like the Scala advocates finally figured out that everything Scala brought to the table, Kotlin does better, and with far less toxicity. If kotlin wanted to take aim at Scala, there’d be no contest – Kotlin would win immediately, unless “used in Spark and Kafka” were among the criteria for deciding a winner. It’s a fair criterion, though, honestly; Spark and Kafka are in fairly wide deployment. But Scala is incidental for them, and chances are that their developers would really rather have used something a lot more kind to them, like Kotlin, rather than Scala.

  4. More of the joys of having a super-rapid release cycle in Java: according to a post on the openjdk mailing list, bugs marked as critical are basically being ignored because the java 9 project is being shuttered. It’s apparently on to Java 10. This is going to take some getting used to. It’s good to have the new features, I guess, after wondering for years if Java would get things like lambdas, multiline strings, and so forth, but the rapid abandonment of releases before we even have a chance to see widespread adoption of the runtimes is… strange.

  5. James Ward posted Open Sourcing “Get You a License”, about a tool that allows you to pull up licenses for an entire github organization – and issue pull requests that automatically add a license for the various projects that need one. Brilliant idea. Laziness is the brother of invention, that’s what Uncle Grandpa always said.