External Program Invocation in Java

Users who wish to shell out a Java program may be tempted to use Runtime.exec(), which yields a Process. They probably should use zt-exec instead. However, for those who think that using a separate library for something so “simple” is overkill, please read on.

General Notes

Java does not invoke a shell – Java uses execve(). This means that variables like ~, %HOME%, and “$JAVA_HOME” will not be expanded, nor will you be able to use shell built-ins such as cd or while. However, Java can invoke a bash or cmd shell, which can then be fed input and output. For more information on shelling out to an actual shell, please see “Invoking A Shell.”

Invocation with Runtime

Runtime is the most accessible way to execute an external process. Processes are started with Runtime.exec(String), Runtime.exec(String[]), and variants based off of Runtime.exec(String) that accept an environment variable array and an optional directory. The only method that should be used from Runtime.exec() is the String[] variant. Each part of the String[] reflects a separate argument, with the 0th being the command to be run. For the functionality of an environment variable array and a directory, please see “Invocation with ProcessBuilder“.
No variant of Runtime.exec(String) should be used because Runtime.exec(String) splits the incoming string on spaces. This may not sound bad, as the shell also splits on spaces. Take the following:

sed 's/a b/c d/gi' "My Documents/foo.txt"

A shell would interpret this as ["sed", "s/a b/c d/gi", "My Documents/foo.txt"]
Runtime.exec(String) would interpret this as: ["sed", "'s/a", "b/c", "d/gi'", "\"My", "Documents/foo.txt\""]
Always use the Runtime.exec(String[]) variant.
After performing the invocation with Runtime, please see “Using Process“.

Invocation with Process Builder

ProcessBuilder takes a String... or a List<string> as its command argument. Each part of the String[] reflects a separate argument, with the 0th being the command to be run. In order to manipulate the environment, ProcessBuilder.environment() returns a mutable Map<String,String> of environment variables. This should be modified (it is not a read-only view, as pointed out by the summary javadoc). Setting the directory can be done with a .directory() call.
Other common operations are setting standard output to a file, with redirectOutput(File), and merging standard input and standard output, with redirectErrorStream(true). Starting the process can be done by calling .start().
After performing this invocation, please see “Using Process“.

Invoking a Shell

This must be done by actually starting the shell process, and then the shell will interpret variables as normal. Done with ProcessBuilder, with the following algorithm:

List<String> parameters = new ArrayList<>();
if(System.getProperty("os.name").toLowerCase().contains("windows")) {
    parameters.add("cmd");
    parameters.add("/C");
} else {
    parameters.add("/bin/bash");
    parameters.add("-c");
}

This will start the OS-specific shell: cmd.exe on Windows, bash on other systems. Other shells may be substituted on linux by changing “bash” to the appropriate shell in the above. Those familiar with cmd may want to switch /C to /X – this is improper unless streams are redirected, with ProcessBuilder.inheritIO().

Note that Windows’ PowerShell apparently introduces some difficulties. Apparently how it is invoked is different enough that these instructions aren’t enough.

Using Process

After a Process is obtained with either of the above methods, the process has been started and will run until its natural death. However, there are several common pitfalls.
You must cull the process with Process.waitFor(), even if you do not want all input and output from the process. Otherwise, the process will exist as a zombie until the parent process (the JVM) exits.
You do not have to read the output from the process, but if you read either stdout or stderr, you must read both. On some systems, if there is output in stderr, stdout is blocked until this output is consumed. This must be done in a multithreaded fashion if these streams are not joined.
If you want to terminate the process before finishing, you can call Process.destroy() – this sends the process-equivalent of the KILL signal to the process, not TERM, so use cautiously.
Putting it all together, the following is a sample of shelling out to an external process:

import java.io.IOException;
import java.io.InputStream;
import java.util.concurrent.Executors;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Future;
import java.util.concurrent.Callable;
public class Example {
    public static void main(String[] args) throws Exception {
        ExecutorService service = Executors.newSingleThreadExecutor();
        Process p = Runtime.getRuntime().exec(new String[]{"echo", "Hello world"});
        new Thread(new ErrorConsumer(p.getErrorStream())).start();
        //java is the center of the universe
        Future output = service.submit(new OutputConsumer(p.getInputStream()));
        p.waitFor();
        System.out.println(output.get());
    }
    private static class ErrorConsumer implements Runnable {
        private final InputStream toDiscard;
        public ErrorConsumer(final InputStream toDiscard) {
            this.toDiscard = toDiscard;
        }
        @Override
        public void run() {
            byte[] buf = new byte[1024];
            try {
                while (toDiscard.read(buf) != -1) ;
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }
    private static class OutputConsumer implements Callable</string><string> {
        private final InputStream toRead;
        public OutputConsumer(final InputStream toRead) {
            this.toRead = toRead;
        }
        @Override
        public String call() throws Exception {
            StringBuilder sb = new StringBuilder();
            byte[] buf = new byte[1024];
            int read;
            while ((read = toRead.read(buf)) != -1) {
                sb.append(new String(buf, 0, read));
            }
            return sb.toString();
        }
    }
}

Editor’s note: this code may or may not work as expected. It runs, but may block on some operations – consider yourself warned, prepare to hit ^C, and use it as a point of emphasis on why you should be using zt-exec, shown immediately below…

Or, with zt-exec:

import org.zeroturnaround.exec.ProcessExecutor;
import java.io.IOException;
import java.util.concurrent.TimeoutException;
public class Example2 {
    public static void main(String[] args)
        throws InterruptedException, TimeoutException, IOException {
        String output = new ProcessExecutor()
                .command("echo", "Hello World")
                .readOutput(true)
                .execute()
                .outputUTF8();
        System.out.println(output);
    }
}

Interesting Links, 24 Feb 2016

It’s been a while, and I’m pretty sure I missed some fun stuff, but here goes with a few things:

  • Blogger Sam Atkinson has a few here, some good, some bad. I admire his proclivity.
    • Don’t Rewrite Your Old Application; Refactor!” has some advice for people migrating to new products. It’s got some good thinking in it (rewriting is going to miss stuff, it’s going to take longer than you think) but not a lot of deep reasoning (and misses some possible points, like the resentment from the original architects which has happened to me when I tried to rewrite rather than refactor). Good post.
    • Then there’s “Kill Your Dependencies: Java/Maven Edition“, which says not to introduce dependencies until you have no other choice. That’s not terrible advice on the surface, but … it’s terrible advice. Use what you need. If wasting 3MB of disk space gets you one method from a library that saves you time to write or think or test, well, that 3MB of space is cheaper than you are. YMMV, but it’s not good advice – worth reading, though.
  • jOOQ‘s “The Mute Design Pattern” shows how you can use Java 8’s lambdas to hide checked exceptions for situations where you Just Don’t Care, leading to code like mute( () -> { doStuff(); } ) — which is actually pretty neat. Very handy to have in your coding toolbox, much like Binkley‘s “Java 8 AutoCloseable trick“.

By the way, feel free to send in stuff you think belongs here!

Programmatic Reload of Logback Configurations

Logback has the capability to programmatically and explicitly load various configurations. This can be useful when you need to adjust logging levels at runtime, and it’s actually pretty easy to do, as well.
You’d want to use something like this for a long-running application, or one that has an extensive load process: imagine a production environment, where you want to see details that would be hidden by convention.
For example, imagine you track a given method invocation, but your production logs don’t include the tracking, because it’s too verbose. But if a problem occurs, you want to be able to see the invocation. Changing the logging configuration and redeploying (or restarting) is an option, but it’s expensive and embarrassing, when all you really need to do is see more information.
The core operative code looks like this:

LoggerContext context = (LoggerContext) LoggerFactory.getILoggerFactory();
context.reset();
JoranConfigurator configurator = new JoranConfigurator();
configurator.setContext(context);
configurator.doConfigure(this.getClass().getResourceAsStream("/logback.xml"));

Note that doConfigure can throw a JoranException if the configuration is invalid somehow.
I built a project (called logback-reloader) to demonstrate this. The project has a LogThing interface, which provides a simple doSomething() method along with an accessor for a Logger; the doSomething() method simply issues a series of calls to generate log entries at different levels.

public interface LogThing {
    Logger getLog();
    default void doSomething() {
        getLog().trace("trace message");
        getLog().debug("debug message");
        getLog().warn("warn message");
        getLog().info("info message");
        getLog().error("error message");
    }
}

I then created two different implementations – ‘FineLogThing’ and ‘CoarseLogThing’ which are identical except that they’re named differently (so that I can easily tune the logging levels).
It would have been easy to use a single implementation and declare two components with Spring, but then I’d be deriving the logger from the Spring name and not the package of the classes. This was just a short path to completion, not necessarily a great design.

Why Spring? Because I’m using Spring at work, and I wanted my test code to be reusable.

Then I created a custom Appender (InMemoryAppender) to provide easy access to logged information. I wanted to do this because I wanted to programmatically check that the logging levels were being changed; the idea is that my custom appender actually maintains a list of logged entries internally so I can query it later. The reason the logged entries is a static List is because Spring doesn’t maintain the Appenders – logback does – so again, this was a short path to completion, not necessarily a “great design.”
So to put it together, I created a TestNG test that had two tests in it. The only difference in the tests is that one uses “logback.xml” – the default configuration, loaded by default but explicitly included here to remove dependency on order of execution in the tests – and the other uses “logback-other.xml“. (I could have parameterized the tests as well – again, shortest path, not “great design.”)
Our default logback configuration is pretty simple, albeit slightly longer than I’d like:

<configuration>
    <appender name="MEMORY" class="com.autumncode.logger.InMemoryAppender"></appender>
    <logger name="com.autumncode.components.fine" level="TRACE"></logger>
    <logger name="com.autumncode.components.coarse" level="INFO"></logger>
    <root level="WARN">
        <appender -ref ref="MEMORY"></appender>
    </root>
</configuration>

Note that it’s appender-ref, no spaces. The Markdown implementation from this site’s software is inserting the space before the dash.

The “other” logback configuration is almost identical. The only difference is in the level for the coarse package:

<configuration>
    <appender name="MEMORY" class="com.autumncode.logger.InMemoryAppender"></appender>
    <logger name="com.autumncode.components.fine" level="TRACE"></logger>
    <logger name="com.autumncode.components.coarse" level="TRACE"></logger>
    <root level="WARN">
        <appender -ref ref="MEMORY"></appender>
    </root>
</configuration>

Here’s the first test:

@Test
public void testBaseConfiguration() throws JoranException {
    LoggerContext context = (LoggerContext) LoggerFactory.getILoggerFactory();
    context.reset();
    JoranConfigurator configurator = new JoranConfigurator();
    configurator.setContext(context);
    configurator.doConfigure(this.getClass().getResourceAsStream("/logback.xml"));
    appender.reset();
    fineLogThing.doSomething();
    assertEquals(appender.getLogMessages().size(), 5);
    appender.reset();
    coarseLogThing.doSomething();
    assertEquals(appender.getLogMessages().size(), 3);
}

This verifies that the coarse logger doesn’t include as many elements as the fine logger, because the default logback configuration has a more coarse logging granularity set for its package.
The other test is almost identical, as stated: the only differences are in the logback configuration file and the number of messages the coarse logger is expected to have created.
So there you have it: a simple example of reloading logback configuration at runtime.

It’s worth noting that this isn’t “new information.” It’s actually shown pretty well at sites like “Obscured by Clarity,” for example. The only contribution here is the building of a project with running code, as well as loading the configuration from the classpath as opposed to from a filesystem.

Interesting Links, 15 Feb 2016

  • A great quote from ##java: < surial> maven/gradle are to ant, as svn is to cvs.
  • JavaCPP is a new project that attempts to bridge a gap between C++ and Java, entering the muddy waters along with JNI and JNA (as well as a few other such projects). It actually looks pretty well done – and targets Android as well as the JVM, which seems like a neat trick.
  • First in a couple from DZone: “Reactive Programming by Example: Hands-On with RxJava and Reactor” is a presentation (thus a video) of a use of RxJava. Reactive programming is one way to introduce a scalable processing model to your code, although it’s hardly the only one (and it’s not flawless, either, so if you’re one of the anti-reactive people, cool your jets, it’s okay). If you’ve been wondering what this whole reactive thing is, here’s another chance to learn.
  • Speaking of learning: “Monads: It’s OK to Feel Stupid” punts on the idea of describing what a monad is, saying that it’s okay if you don’t understand them – you can use them anyway. (Java’s streams provide a lot of access to functionality through monads, which present “computations represented as sequences of steps.”)
  • The 5 Golden Rules of Giving Awesome Customer Support” goes through some basic things to think about for, of all things, customer support. (Surprise!) The things are topics, not good headings, but one thing they didn’t point out was that people who use your open source software library are customers, too. You’ll want to read the article to get more relevance out of the headings. The points are:
    • All users are customers
    • Your customer took the time
    • Your customer is abrasive and demanding
    • Your customer doesn’t understand your product
    • Your customer doesn’t care about your workflows

Interesting Links, 9 Feb 2016

  • From Parks Computing, a short word of advice in “On Recruiting” for the movers and shakers (and those who want to be movers and shakers): “The quality of your company’s software will never exceed the quality of your company’s software developers.”
  • DZone is back with a few interesting posts: “OpenJDK – Is Now the Time?” starts by wondering is OpenJDK is reaching critical mass to the point where it should be considered instead of the standard Oracle JDK. It’s an odd post.
    • It points out that if Google had used OpenJDK instead of Oracle’s libraries, the lawsuit might not have happened (Editor’s note: it might have!). This is a good point.
    • It says that the deployment options might open up, with standard package management instead of a custom update process specific to Java. This is also a good point.
    • It points out that OpenJDK’s performance and scalability is the same as the Oracle JDK. This is… not a good point. The codebases are the same (they’re routinely synchronized: code in one will be in the other eventually.) Oracle’s JDK is effectively OpenJDK with some closed-source libraries, so Oracle’s JVM can write JPEGs natively (and some other features like that.)
    • It also points out community improvements to OpenJDK – “As open source developer’s continue to provide insight into the source code, it is likely that OpenJDK could begin to outperform the version released by Oracle.” Um… since the codebases are the same, that’s not likely to happen much at all.
  • From ##java, cheeser had a beautiful expression of reference equivalence. Someone was asking about how two references (A and B) pointing to the same object work – cheeser said, “If B is your *name*, A would be a nickname. Both of them mean you so anything said to either name or nickname both go to you.
  • Fix PATH environment variable for IntelliJ IDEA on Mac OS X” describes a way for OSX users to provide the OS’s PATH to the popular IDE. It turns out that programs installed via brew aren’t necessarily available to IDEA unless you start IDEA from the shell – which few do. It’s easy to fix; this post shows you how.
  • Another from DZone – they’re on fire! – Per-Ã…ke Minborg posted “Overview of Java’s Escape Analysis“, which discusses what escape analysis is (it’s a way of determining the visibility of an object) and what it means for performance. (If an object isn’t used outside of a method or a block, it can be allocated on the stack rather than on the JVM heap – and as fast as the heap can be in Java, the stack is much faster.)
  • Pippo is a new, very small microframework based on services. The example looks … easy enough; take a look, see what you think.
  • Yet one more from DZone: Exceptions in Java: You’re (Probably) Doing It Wrong advocates the use of RuntimeException to get rid of those pesky throws clauses and forced try/catch blocks in your Java code. It’s an argument Spring advocates, and checked exceptions aren’t part of languages like Scala… but I personally find the over-reliance on unchecked exceptions to be terrible. The core argument against check exceptions from the article: “The old argument is that (the use of checked exceptions) “forces” developers to handle exceptions properly. Anyone on a real code base knows that this does not happen, and Exceptions are routinely ignored, or printed out and then ignored. This isn’t some sign of a terrible developer; it is such a common occurrence that it is a sign that checked Exceptions are broken.” Except no, it’s such a common occurrence that it’s a sign that developers are terrible. This article was so terrible that I’ll probably write up a better response as soon as I get some time.

Interesting Links, 5 Feb 2016

  • O Java EE 7 Application Servers, Where Art Thou?” is a humorously-titled summary of the state of Java EE 7 deployment options, covering the full and web profiles for Java EE 7. It’s the sort of thing one wants to know, honestly: great job, Antonio.
  • From Stack Overflow, “How to get started with Akka streams?” is a Scala question, not a Java one, but Akka has a Java implementation as well. The first answer (accepted, upvoted) is a fantastic explanation. I may port it to pure Java just for example’s sake…
  • From our friends at DZone, Orson Charts 1.5 is Open Source announces that Orson Charts 1.5 has been released, and it’s available under the GPLv3 (a commercial license is available for people who don’t want the restrictions of the GPL). It’s a 3D charting library, not a 2D charting library, and they say if you need 2D charts, you should use JFreeChart – Orson Charts looks great on first impressions, though. (It’s worth noting that apparently both Orson Charts and JFreeChart were from the same author.)
  • More from DZone: Application Security for Java Developers is a summary of security concerns. It’s really more of a short “have you thought of this?” post – useful, but not very deep.

The case of EnumSet

A few days ago ##java happened to discuss sets and bit patterns and things like that, I happened to mention EnumSet and that I find it useful. The rest of the gang wanted to know how it actually measures up, so this is a short evaluation of how EnumSet stacks up for some operations. We are going to look at a few different things.

EnumSet classes

There are two different versions of EnumSet:
* RegularEnumSet when the enum has less than 64 values
* JumboEnumSet used when the enum has more than 64 values
Looking at the code, it is easy to see that RegularEnumSet stores the bit pattern in one long and that JumboEnumSet uses a long[]. This of course means that JumboEnumSets are quite a lot more expensive, both in memory usage and cpu usage (at least one extra level of memory access).

Memory usage

I created a little program to just hold one million Sets with a few values in each of them.

Note: the enumproject.zip was built by your editor, not your author – any problems with it are the fault of dreamreal and not ernimril. Note that the project is mostly for source reference and not actually running the benchmark.

    List<Set<Token>> tokens = new ArrayList<> ();
    for (int i = 0; i < 1_000_000; i++) {
        Set<Token> s = new HashSet<> ();
        s.add (Token.LF);
        s.add (Token.CR);
        s.add (Token.CRLF);
        tokens.add (s);
    }

Heap memory usage for this program was about 250 MB according to JVisualVM.
Changing the new HashSet<> (); into EnumSet.noneOf (Token.class); we instead get 70 MB of heap memory usage.
Using the SmallEnum instead causes the HashSet to still use about 250MB, but drops the EnumSet usage down to 39 MB. I find it quite nice to save that much memory.

CPU performance

I constructed two simple tests, shown below, that calls a few methods on a Set that is either EnumSet or HashSet, depending on run. The enums have a few Sets that contain different allocations of the enum and the isX-methods only do return xSet.contains(this);

    @Benchmark
    public void testRegular() throws InterruptedException {
        SmallEnum s = SmallEnum.A;
        boolean isA = s.isA ();
        boolean isB = s.isB ();
        boolean isC = s.isC ();
        boolean res = isA | isB | isC;
    }
    @Benchmark
    public void testJumbo() throws InterruptedException {
        Token t = Token.WHITESPACE;
        boolean isWhitespace = t.isWhitespace ();
        boolean isIdentifier = t.isIdentifier ();
        boolean isKeyword = t.isKeyword ();
        boolean isLiteral = t.isLiteral ();
        boolean isSeparator = t.isSeparator ();
        boolean isOperator = t.isOperator ();
        boolean isBitOrShiftOperator = t.isBitOrShiftOperator ();
        boolean res =
            isWhitespace | isIdentifier | isKeyword | isLiteral |
            isSeparator | isOperator | isBitOrShiftOperator;
    }

I did the benchmarking using jmh in order to find out how fast this is.

Using HashSet:

Benchmark                      Mode  Cnt          Score         Error  Units
EnumSetBenchmark.testJumbo    thrpt   20   46787074.985 ± 2373288.078  ops/s
EnumSetBenchmark.testRegular  thrpt   20  124474882.016 ± 2165015.166  ops/s

Using EnumSet:

Benchmark                      Mode  Cnt          Score        Error  Units
EnumSetBenchmark.testJumbo    thrpt   20  112456096.790 ± 320582.588  ops/s
EnumSetBenchmark.testRegular  thrpt   20  563668720.636 ± 594323.541  ops/s

This is of course quite a silly test and one can argue that it does not do very much useful, but it still gives us quite a good indication that performance gains are there. Using EnumSet is 2.4 times faster for jumbo enums, but 4.5 times faster for small (regular) enums for this kind of operation.
I do not claim that your usage will notice the same speedup, but it might be worth checking out.

Final thoughts

Does it really matter if you use EnumSet or Set? In most cases: no, the enum will only be one field and not part of memory usage or cpu consumption, but depending on your use case it can be a nice memory saver while also being faster. I recommend that you use it.

ZeroTurnaround's Developer Productivity Report

ZeroTurnaround has issued their latest developer productivity survey – if enough people take it, they’re giving money to Devoxx4Kids.
It’s an interesting survey, done very well – probably the best execution of such a survey as I’ve ever seen (and I’ve seen a lot). Check it out, let’s get Devoxx4Kids that donation (and/or join ourselves).

Interesting Links, 1 Feb 2016

Editor’s note before we get to the links: I’ve been trying to keep the links’ length down to a manageable four or five, so the frequency’s been higher than I might otherwise have desired (roughly every two or three days). I’m going to try a longer list of interesting links, and move the frequency down – we’ll see if that’s useful and palatable. The problem is that ##java (and the Java community overall) just has a lot of interesting, relevant content that’s worth keeping! (Keep it up, guys. “Too much interesting stuff” is a good problem to have.)

  • WildFly 10 has been released. This is the latest version of the open source (community-supported) JBoss Application Server; it’s fully Java EE 7, and requires Java 8. Very cool stuff, congratulations to the WildFly team.
  • Neo4J has released milestone 1 of their object/graph mapping library’s second version. (Read it with me: “Neo4J has released OGM 2.0 m1.” Much simpler that way.) It sounds promising, especially since they seem to have straightened out the connection process to Neo4J instances such that both embedded and remote instances have similar capabilities.
  • Implied Readability” uses readability as a term to address transitive dependencies in Java 9, more or less, and shows how a module can export visibility of a dependency to other modules. As stated: Java 9, not Java 8, so it’s a new feature – but it looks a little like how OSGi exports visibility rules. It might be really relevant as time goes on and Java 9 gets closer. (It’s based off of information in “Programming with Modularity and Project Jigsaw. A Tutorial Using the Latest Early Access Build” published on InfoQ, so there may be more interesting stuff in that article.)
  • How we accidentally doubled our JDBC traffic with Hibernate” discusses an obvious issue (doubling JDBC traffic!), found when Hibernate logging was set to WARN – because Hibernate then re-executes every query in order to show the warnings associated with the query in question. The warnings can be useful, to be sure, but be wary!
  • PNG encoding from Java’s ImageIO can be slow, according to one ##java op. He said that he used ObjectPlanet‘s PngEncoder and got much better performance.
  • As an update to the process by which one can examine request headers (mentioned in “Interesting Links, 22 Jan 2016“) a ##java user mentioned RequestBin, which allows you to build a URL and issue requests against it, to examine the actual traffic data.
  • Moving to a Plugin-Free Web – by Oracle – says it point blank: “Oracle plans to deprecate the Java browser plugin in JDK 9. This technology will be removed from the Oracle JDK and JRE in a future Java SE release.” This makes sense – browsers are ignoring the java plugin (and should). If you’re still doing applets, stop. Oracle has spoken, and the technology is going away. (Now if we could get Oracle to get rid of Vector somehow…)
  • ZeroTurnaround – who gave the world JRebel – posted Java 8 Streams cheat sheet, which offers a one-page example of a lot of useful, relevant information around streams for handy, quick reference. I looked for a DZone Refcard on streams for a point of comparison, but they didn’t have one that I saw on first scan – which is surprising, since the Refcardz are actually done really well, in general.