Jackson-databind and Default Typing Vulnerabilities

Today, GitHub sent out security notices to owners of projects using old jackson-databind versions (older than 2.8.11.1 and 2.9.5). These notices pertain to this issue. I have talked about its relevance before on IRC, but since it is getting more attention now, I will describe it here again.

The Problem

The “bug” comes from using the so-called default typing. This feature allows a user to deserialize subclasses (or even Object) without specifying the full possible type hierarchy. Consider this model:

interface I {
}
@Data
class A implements I {
    private int i;
}
@Data
class B implements I {
    private boolean b;
}

Now, if we wish to serialize the interface I, you usually need to specify some sort of type info. This is typically done through an annotation on the interface:

@JsonTypeInfo(use = JsonTypeInfo.Id.NAME)
@JsonSubTypes({@JsonSubTypes.Type(value = A.class, name = "A"), @JsonSubTypes.Type(value = B.class, name = "B")})
interface I {
}

If we now serialize an object of type I, we get a result like this:

new ObjectMapper().writerFor(I.class).writeValueAsString(new A())
{"@type":"A","i":0}

Deserializing this JSON works as expected.

Default typing

This annotation-based registration is fine for small use cases, but can get cumbersome if the types are in different modules, or there are just a lot of them, or it would be bad style to reference them from the parent class. Normal Java serialization does not have this problem (it just carries the actual dynamic class name with it), so this could be a barrier for adoption of Jackson for previous Java serialization users.

The problem here is that we want people to migrate from Java serialization. There are lots of reasons, most of them compelling.

Enter default typing. With default typing, we don’t need the type info annotations at all:

new ObjectMapper().enableDefaultTyping().writerFor(I.class).writeValueAsString(new A())
["net.hawo.tv.tvui17.A",{"i":0}]

(Ignore the fact that this is now suddenly an array – this is one of the ways jackson may include type info)
The idea is simple: Include the full class name in the json, and you can simply get the proper class at runtime! Sounds good, right?

The Vulnerability

Well, turns out this is not that good of an idea. Java serialization does a very similar thing (though it still requires the named class to be serializable, which jackson doesn’t) and this has lead to what feels like a third of all serious security vulnerabilities in Java applications, ever. The problem lies with the fact that Java classpaths are often massive, and allowing any class on that classpath to be deserialized at will can be disastrous since it exposes a huge attack surface. If you can get any class on the classpath to execute code when deserialized with jackson, you have successfully achieved remote code execution.
This is exactly what happened with Jackson. Some classes that were common on user classpaths could be deserialized to execute arbitrary code. The fix for this issue is basically a blacklist of a few of these classes that could be exploited. Blacklists are not a solution, though, and since this first list, the list has been amended several times. The maintainers are playing whack-a-mole here, and in my opinion it is a waste of everyone’s time to be adding all exploitable classes to this list.

The Solution

Our experience with this same issue in the Java serialization world tells us not to deserialize untrusted data. Luckily, Jackson is much more secure than Java serialization – if you don’t use default typing. The only acceptable solution to this issue in the long run is: do not use default typing to deserialize untrusted data. Default typing is rarely necessary or even a good idea.
Unfortunately, online resources saying this are sparse. Default typing is an “easy” solution, and many people simply do not have the security awareness to see the issue with it – they will stumble over stackoverflow answers such as this one and simply enable default typing to easily serialize Object fields. The documentation of Jackson also doesn’t highlight this as much as it should.
Two alternatives to default typing exist in Jackson:

  • Normal, annotation-based typing as shown above. This allows you either to use the full name as with default typing, or even specify your own name for greater compatibility (you can later change the class name without affecting the serialized representation). This is the “standard” solution, and the appropriate one for most use cases.
  • Should you not know the possible subtypes of a class you wish to deserialize in advance, you can use the rich registerSubtypes API to dynamically add the types you desire. These types could be detected through an existing module system you are already using, or using something like SPI.

All in all, I am a bit dissatisfied with the attention this issue has gotten. The issue is a security vulnerability by design, and anyone using default typing should have been aware of it. Luckily, default typing is not on by default. I do not have statistics but I would be surprised if many people used it or knew of its existence in the first place, and so I find the attention GitHub has given this a bit over the top – the biggest thing these notifications will spread is uncertainty about Jackson, so this article was an attempt at clearing up what it’s actually all about.

The JavaChannel Podcast, vol 16

It’s only been six months, so it’s finally time for a new podcast. This one doesn’t even pretend to go over the mountains of killer content from ##java since the last podcast – it focuses on some of the more recent links, and that’s it. Well, apart from talking about the Java ecosystem a bit, especially in contrast with Python, an upstart language that’s making a lot of headway lately thanks to a giant upsurge in data science applications.

(A bit of irony: the very first paragraph in the podcast says it’s only been “four months” when it’s actually been six. Yikes.)

But there are some interesting links, and here are the ones the podcast focused on!

This was written with the new editor plugin for WordPress, called “Gutenberg.” It’s a lot like Medium.com’s editor. It’s effective for writing… unless you have any actual features you want in the text.

Javachannel’s interesting links podcast, episode 7

Welcome to the seventh ##java podcast. I’m Joseph Ottinger, dreamreal on the IRC channel, and it’s Monday, 2017 November 6. Today feels slightly less anonymous than yesterday.
This week we have a co-host, Andrew Lombardi – kinabalu on ##java – and we also offer our humblest apologies to Ms. Debbie Gibson.
This podcast covers news and interesting things from the ##java IRC channel on Freenode; if you see something interesting that’s related to Java, feel free to submit it to the channel bot, with ~submit and a URL to the interesting thing, or you can also write an article for the channel blog as well; I’m pretty sure that if it’s interesting enough to write about and post on the channel blog, it’s interesting enough to include in the podcast.

  1. Increment Development posted “Center stage: Best practices for staging environments,” an article by Alice Goldfuss that defends and describes the use of the staging environment. “Staging is where you gain confidence in your systems by consensus,” she writes – pointing out that development and testing are for testing known things (“when I do this, does that happen?”), and staging is for testing those things that you think might happen in production but can’t necessarily anticipate as part of development or explicit testing. The author points out that there’s an ongoing debate about this, with some well-known people saying “just test better!” but I’m on Alice’ side personally – staging is where you validate that all that testing didn’t let something get through before deployment to production.

  2. Chase Roberts has written “How to unit test machine learning code.” It’s an interesting article – in that it focuses on expected results for a long pipeline of operations for stuff that’s really hard to test well. Machine learning libraries tend to be black-box tested – throw an input at it, pray a bit, hope you get the expected output, suffer for a while if you don’t – and he’s trying to show a way to avoid this cycle. Short summary: testing is hard. Long summary: know what your algorithms are doing, and test every step along the way.

  3. One of the changes for Java 9’s release was unlimited strength cryptography. Well, all of you laggards on older JVMs might be getting it as well, assuming you update and/or patch – which might be questionable, depending on how far back in the revision cycle you are. If you’re still running Java 6, chances are you don’t update, ever, and this might be a scary process for you because it’s so rare. Do I sound like I’m filled with scorn? I don’t mean to be – pity, maybe, and confusion, but not scorn. (Seriously, folks: update to 8. Or 9. Something moderately current. The pain is coming; putting it off will only make it hurt worse when you run out of time.)

  4. The first of multiple DZone articles for this edition of the podcast: “Switching Java Versions on MacOS” shows you how to use the java_home command on OSX to switch between your multiple JVM installations on OSX easily. This is apparently not a perfect process according to some on ##java, but it’s always worked for me when I’ve tried it – but that’s a very small sample set, so try it yourself and see. (The context of the failure was apparently Apache Ant, and my “success” was really just kicking the tires of Java 9. I’m not saying that the failure is incorrect or user error, by any means.)

  5. Another DZone article: “The JSON-P API: A JSON Processing Primer” shows you a high level overview of JSON-P, with both an object model and a streaming model. The streaming model is more interesting; the object model is a lot like org.json, which… no. Just no. In the end, though, Jackson is probably still your best bet for JSON processing in Java.

  6. Kafka has gone to version 1.0, according to Apache. Kafka is a distributed streaming platform – one way of thinking of it is that it’s a distributed event log, where you can write events that are processed at very high volume by various clients. Every client can have its own offset into the event log, so there’s a lot of flexibility in how you use it. Like most such types of data stores, it’s not a magic bullet for … anything, really, but leveraged properly it really can provide amazing throughput. Administration is great fun; I think I’d rather chew off my own neck than manage a Kafka cluster, but … again, if you need the features, it’s a great product.

  7. Yet another DZone article: “https://dzone.com/articles/an-introduction-to-http2-support-in-java-9” shows us the new HTTP/2 client that’s being incubated in Java 9 – which means it probably won’t be fully realized until the next release of Java (which is itself the subject of another news item.) There are already HTTP/2 libraries for Java: Jetty, Netty, vert.x, OkHttp, and Firefly (among others, probably) – but this one will be part of the Java runtime itself. It looks pretty similar to some of the others already mentioned, but that’s not a bad thing; idiom is good and there probably are only so many ways you can think of building a request and issuing it.

  8. Our third entry from DZone this week: “Machine Learning Algorithms: Which One to Choose for Your Problem” Tries to provide an overview of some of the core factors involved in choosing a machine learning algorithm: supervised vs. unsupervised (or semi-supervised, or reinforced) models, along with some of the models themselves and their applications. There are some examples of problems and math, but it’s got no code whatsoever (and if it did, would probably use Python) – still, it does a good job of going over some of the models and capabilities.

  9. Oh, this seems relevant: Java 10 is coming! … maybe. In an email to the OpenJDK mailing list, Mark Reinhold has revised the version string for Java again – so we might actually get Java 10 instead of Java 18.3 for the next release. Versions are hard to get right – but I think going to a major release version scheme like this (or, rather, staying with a major version scheme) is a good idea, even if the release frequency is boosted. Of course, I also think a major version every year or two is a good thing, so now I’ll probably be complaining about the frequency, but … first world problems, I guess.

  10. Speaking of Java 10, early access builds are available. I don’t know if they have variable inference – “var“, in other words – because I haven’t even truly migrated to Java 9 yet, so I’m far from being ready to test an early access build of 10! But if that’s your stimulant of choice, the builds are there for OSX, Linux, Windows, and even Solaris on SPARC, since even that guy Mike needs to play with Java 10 every now and then.

  11. The next two entries are from DZone, too; they’re on fire. The first one is “Null Safety: Calling Java From Kotlin,” which shows the use of annotations in Java code such that Kotlin doesn’t have to pretend the Java method call can return null. It still can, of course, but in Kotlin, nullable types look and act differently than non-nullable types, so this annotation (@Nonnull(when = When.ALWAYS), if you’re interested) is really a way to suggest to Kotlin that the result is never expected to be null. Really pretty neat stuff.

  12. Lastly, DZone came through with an article about the human mind, applied to programming: “Transcending the Limitations of the Human Mind.” It’s about cognitive capacity, a subject I’ve written about myself in the past; when I wrote about it, I referred to it as “chunking” (we manage only so many chunks of information at a time) and here, it’s the same concept with different terminology. The author – Robert Brautigam – walks through some of the tricks we developers use to manage incredibly detailed deployments with limited cognitive capacity through decomposition, generalization, abstract concept management; he also discusses ways in which our development processes work against our cognitive capacity (where our processes make a given mechanism more expensive cognitively than it otherwise should be, perhaps.) Good article.

Interesting Links – 14-Feb-2017

  • From DZone: Distributed Systems Done Right: Embracing the Actor Model is a reference to a webinar (ugh, “webinar”) from Lightbend on, well, the Actor Model, a way of representing distributed services. Powerful model, even if you don’t use Actors as described.
  • TwelveMonkeys is a set of additional plug-ins and extensions for Java’s ImageIO. It includes BMP, TIFF, JPEG, PNM, and a few others.
  • From user u1dzer0: Awesome Asciidoctor: Include Partial Parts from Code Samples describes how AsciiDoctor can extract, well, partial code samples from a block of code. Given Java’s verbosity (not a bad thing, but a still thing that many people don’t like), having a way to ignore code can help focus on relevance. (Also see: Dexy.)
  • JSON is the new Data Transfer Object is a short reference to JSON – Javascript Object Notation – in Java EE. JSON is becoming one of the more popular serialization formats (and I use “becoming” sarcastically – it’s very popular already). For better or for worse, it’s time. This isn’t a long article, nor is it very deep – but it touches on something that’s become more and more important over time.
  • The Deadly Diamond Of Death In Java 9’s Module System discusses a problem Java 9 has with automatic modules, when a dependency has two names but one resolution path. (Confused? Read the article – it describes it better than I do, but at much more length.)

An Aside About Scala

A mailing list that was pretty popular back in the day recently had some activity asking about Java 8. The discussion itself was a little bit interesting, including a reference here and there to other languages… like Scala. Kirk Pepperdine wrote a post that absolutely riveted Your Humble Author, and may have actually convinced him to stop bothering with Scala except where absolutely necessary. I’d like to quote it here, since Yahoo!’s web page formats things poorly sometimes:

I personally don’t feel that Scala offers any advantages over Java. If you take away the opinions on style, I’d argue that moving to Scala actually leaves you at a slight disadvantage. First, point, the JVM support for languages other than Java was quite poor until Nashorn. Nashorn was actually 2 separate projects. One to get JS running in the JVM and the second was to refactor the JVM to isolate the APIs needed to support alternate languages. Scala hasn’t been able to take advantage of those changes as of yet and I’m not sure they are interested in doing so. Next is tool support. The Java tooling chain is and remains quite broken in 8. I fear that this situation will get slightly better in 9 but then changes there will batter the tooling china quite badly. Unless the Scala people step up to the plate (and there is no evidence of them doing so to date), the dismal state of the tooling chain in Scala will continue to get worse. I currently see very little motivation for companies to invest in the work needed to improve the tooling in Scala. Though I see that this might change, every time I’ve previously thought it might get better, something in the market changes and Scala takes a hit and things don’t get better so my track record on predicting this is admittedly very poor. But then I’m predicting improvements in this area and they’ve simply not happened.
What makes me more hopeful is that I know of a couple of very big projects running in stealth that are based on Scala. These projects are big enough that they’ve really been hurt by the weak tooling story in Scala and I imagine or can only hope that they will start to look at fixing the problems they are facing. Many of these problems are baked into the JVM so to fix them, we need to further fix the JVM’s support for alternate languages. The feeling I get coming from some JVM developers is that it’s the JVM that is past EOL. Maybe Graal will be the thing that replaced HotSpot… not sure… all I know is that the JVM is past due for a huge refactoring and simply don’t see anyone.. certainly not Oracle funding that effort.

I’ve “played”with the new features and to be honest find the Scala implementations cleaner and easier to write and understand.

To say that Scala is less verbose than Java maybe true in the small but I’ve found it not to be so true in the large. Reachability in a language is very important and deserved or not, Scala as the reputation of not being so reachable and hence not so readable. IME, almost everyone one I know struggles with the readability of Scala even those that have been using it for quite some time. This is not to say that Scala code can’t be read, it’s simply that Scala code not naturally easily readable and hence people struggle with it.
I think we had the same types of problems with Smalltalk. People resisted Smalltalk for some of the same reasons and in cases where C/C++ wasn’t so desirable we saw teams jumping to Java. In this era I think the current alternative to Java is Go. The buzz around Go is more marketing and hubris than reality but it has a buzz and level of activity that feels familiar to the buzz that we say with Java in it’s early days. Just like people kicked the tires and sniff around Smalltalk and then moved on, I get the same sense here with Scala.

Java is starting to give under its own weight of patched-on features, IMHO.

On the contrary, I think that Brian has done a wonderful job integrating Streams and Lambda’s into Java. The only thing I can complain about that makes things a wee bit difficult is this aversion to mutable state. Fine if you don’t like mutable state but that streams make the mutable/immutable decision is an overreach on Brian’s part. It disallows a valid model that we need to work in. It’s different than how everything else works in the language. Streams are inherently single threaded and thus concurrent modification of state is a concern that lies outside the internals of a Lambda expression. IOW, Java != Scala in that Scala had immutability baked into (just about) everything right from the beginning. Brian did consider these points when he designed Lambdas and he opted for his bias towards immutability.. so be it.. All I know if that Lambda’s do not feel like they are patched on to me. They feel like a natural part of the language… much better than the earlier proposals that were tabled. The work on ValueTypes and type inference is also focusing on making sure that the added features do not feel like bolted on bits and that they fit naturally into the current language. Brian’s idea was to create small changes in the language that have a huge impact on your code. While I don’t agree with all of his choices I do think he’s done a brilliant job.
Regards, Kirk