Jan 02, 2018

Clojure's slow start — what's inside?

It is not surprising that the question of Clojure's long startup time is raised time and time again on forums and social media. This topic concerns users who want to use Clojure as a scripting language, or who have a particular workflow which requires restarting the Clojure program often. Compared to languages like Python, Ruby, and JavaScript all of which have less than 100 millisecond startup time, Clojure's seconds or even tenth of seconds seem incredibly slow and wasteful.

The nature of Clojure loading is not exactly a secret. A few years ago Nicholas Kariniemi profiled Clojure startup process and described his findings in the post "Why is Clojure bootstrapping so slow?". One of the points Nick brings forward is that JVM is not to blame for Clojure's slowness — after all, a HelloWorld in Java executes in 50 ms. This is only partly true, JVM's startup time is not a problem, but the JVM as a platform makes Clojure starting time worse. JVM is optimized towards running long-lived programs and invests a lot of time upfront when initializing the classes. That's why Clojure on JVM boots much slower than ClojureScript on NodeJS.

The silver lining in this situation, and the main thing that keeps Clojure alive even with its ludicrously long bootstrap, is the REPL, of course. So what that you have to wait seconds until the project starts — you'll get your revenge later, when the results in the REPL come back instantly while others are busy recompiling and reloading. With some dexterity, you might not restart your REPL for days. E.g., Leiningen and Boot both allow to dynamically inject new dependencies into a running REPL (in Leiningen you'll need clj-refactor or Alembic, Boot can do that by itself). If you are worried about polluting your REPL state, you can adopt a workflow like Reloaded. Even when writing new scripts, I prefer starting a REPL, connecting to it from Emacs, and then interactively produce pieces of the script as I go.

So, there aren't many reasons to restart Clojure programs as you would do with Python, for example. Yet the questions stand — what exactly is behind Clojure's slow start and can we do anything about it? Today we'll try to answer the first one in an exhaustive and reproducible way.

Experiment description

We are going to profile Clojure's "time to REPL" for three common use cases: bare Clojure using Clojure CLI, Leiningen, and Boot. In all three cases, we'll be using the minimum possible number of dependencies (nothing besides what comes with or is required by the build tool). Clojure version is 1.9.0 in all cases.

For profiling, we'll take async-profiler which has the ability to begin profiling from the very start of JVM process. In the previous post, I announced a Clojure wrapper around async-profiler — we won't need it here because it is not suited for startup profiling. The default async-profiler is precisely what we need here.

We will not add any "accelerating" JVM flags like -XX:+TieredCompilation -XX:TieredStopAtLevel=1 -Xverify:none. While I use those day-to-day, the choice to enable them is opinionated and has certain tradeoffs.

This post is about profiling, not benchmarking, so the parameters of my testing rig are less critical. For what it's worth, I ran these experiments on MacBook Pro 2015 with 3.1 GHz Core i7.

Setup

To reproduce the experiments in this post do the following steps.

Get async-profiler. First, clone it from Github. Then you'll have to compile the native agent with make. Note that JAVA_HOME environment variable must be set when running make. On MacOS you can do:

$ JAVA_HOME=$(/usr/libexec/java_home) make

If all goes well, you'll find a directory build, with two files: jattach and libasyncProfiler.so. We'll only be needing the last one. You may move this file into a more convenient place or keep it here; just keep in mind that whenever in the post we refer to libasyncProfiler.so, it means the valid path to the file.

Clone FlameGraph repository. We'll need it for generating flamegraphs from the profiling data. We'll invoke specifically flamegraph.pl, again, substitute it with a valid path when running this command.

This guide describes how to install Clojure CLI. Boot and Leiningen can be downloaded from their respective websites.

Clojure

Check if you have the correct Clojure version:

$ clj                                                                                                                                                   ~
Clojure 1.9.0
user=>

Now let's measure the time from start to instant exit:

$ time clj -e '(System/exit 0)'
clj -e '(System/exit 1)'  1.80s user 0.12s system 181% cpu 1.057 total

The time is around 1 second. It is much less than dozen-of-seconds load times you might have experienced with real-world projects. So, Clojure is not the primary time hog after all. But it is still too long for something so seemingly simple. Let's find out what is happening exactly.

$ cd /tmp
$ clj -J-agentpath:/path/to/agent/libasyncProfiler.so=start,event=cpu,file=raw-clojure.txt,collapsed -e '(System/exit 0)'
$ /path/to/FlameGraph/flamegraph.pl --colors=java --minwidth 2 raw-clojure.txt > raw-clojure.svg

Now, you can view the resulting flamegraph in the browser by visiting file:///tmp/raw-clojure.svg. Or just see my image below.

Raw Clojure startup flamegraph (View in full)

Note: I trimmed my flamegraphs a little so that they fit easier on the screen. Frames like (require ...) are manually collapsed.

If you haven't learned yet how to read flamegraphs, now is the excellent opportunity to do so. The vertical axis shows stacktraces going from bottom to top. On the horizontal axis, the width corresponds to how often the frame was seen on the stack during the execution (≈ time spent in the method). Relative position of frames on X-axis is arbitrary and doesn't mean anything. You can click the frames to zoom in/out on particular stacktraces.

You can explore the flamegraph on your own; meanwhile, I will list the main points of interest, together with some explanations. If you can't find a stackframe I'm referring to, use Search button in the top right corner and enter the sought frame.

We see that the most significant contributor to Clojure load time is the size of its base library. There are several potential ways to help that:

Leiningen

We are using Leiningen 2.8.1.

$ lein version
Leiningen 2.8.1 on Java 1.8.0_102 Java HotSpot(TM) 64-Bit Server VM

To profile Leiningen startup, we need a more complicated setup. First, we'll have to create a new Lein project; otherwise, there is no way to specify the exact Clojure version for lein repl.

$ lein new profile-lein
$ cd profile-lein
$ sed -i 's/1\.8\.0/1.9.0/g' project.clj
$ lein repl
...
Clojure 1.9.0

Now we can measure the startup time with Lein:

$ echo '(System/exit 0)' | time lein repl
...
lein repl  6.88s user 0.61s system 124% cpu 6.003 total

The elapsed time of 6 seconds is much more than Clojure's original second. Let's find out what exactly happens in there. There are some changes to be made to project.clj, here's what it should look like:

(defproject profile-lein "0.1.0-SNAPSHOT"
  :jvm-opts ["-agentpath:/path/to/agent/libasyncProfiler.so=start,event=cpu,file=lein-child.txt,collapsed"]
  :dependencies [[org.clojure/clojure "1.9.0"]])

And then launch the REPL like this:

$ echo '(System/exit 0)' | LEIN_JVM_OPTS='-agentpath:/path/to/agent/libasyncProfiler.so=start,event=cpu,file=lein-host.txt,collapsed' lein repl

You've probably noticed that we specified the async-profiler agent string twice. It is not accidental. Leiningen launches two Clojure processes: one for itself, and one for the actual project. We need to profile both of them, so we do, and we save the output to different files, lein-host.txt and lein-child.txt. Then, thanks to the beautiful textual nature of collapsed stacktraces, we can straightforwardly merge them[1]:

$ cat lein-host.txt lein-child.txt > lein.txt
$ /path/to/FlameGraph/flamegraph.pl --colors=java --minwidth 2 lein.txt > lein.svg
Leiningen startup flamegraph (View in full)

I have heavily redacted this flamegraph to make it easier to navigate. Your flames would be higher.

With this knowledge we can approximate that Leiningen's bootstrap process consists of:

I think there is room for improvement here, especially on the REPL side. More thorough investigation of tools.nrepl and Reply might reveal potential candidates for optimization.

Boot

Check Boot version. If Clojure is not 1.9.0, change it in ~/.boot/boot.properties.

$ boot -V
...
BOOT_CLOJURE_VERSION=1.9.0
BOOT_VERSION=2.7.2

Now measure the start time:

$ time boot repl -s
boot repl -s  14.71s user 0.60s system 288% cpu 5.303 total

5.3 seconds, almost the same as Leiningen.

$ BOOT_JVM_OPTIONS='-agentpath:/path/to/agent/libasyncProfiler.so=start,event=cpu,file=boot.txt,collapsed' boot repl -s
$ /path/to/FlameGraph/flamegraph.pl --colors=java --minwidth 2 boot.txt > boot.svg
Boot startup flamegraph (View in full)

To summarize, Boot spends those 5.3 seconds on:

I think the easiest way to reduce Boot loading time would be to trim the number of functions that need to be loaded from the start. From there, there are no immediately apparent optimizations.

Conclusions

As it turns out, Clojure start time is a complicated and multi-dimensional topic. Clojure projects are slow to start not only because of JVM — JVM itself starts in ~50 ms — but because of JVM specifics the classes are loaded slowly. Clojure projects are slow to start not only because of Clojure — Clojure itself starts in ~1 second — but because of Clojure specifics, the namespaces, especially not AOT-compiled one, are loaded slowly. And so on.

In this post, we have observed three common ways to use Clojure and discovered where the initial lag comes from. We haven't even started on long loading times of big projects with huge dependency trees. But now you know how to profile the startup of your application regardless of the build tool you are using. And meanwhile your REPL is loading, go make yourself a cup of tea. You earned it.

References

Footnotes
  1. Notice a Useful use of cat(1)!