Dec 11, 2017

Profiling tool: async-profiler

In the previous installment, we learned how to use VisualVM to profile your Clojure applications. VisualVM is an excellent profiler: beginner-friendly, has enough features but not bloated, all-in-all a good one to dip your toes into the waters of profiling. However, VisualVM (and many other JVM profilers) suffer from safepoint bias. Long story short, most sampling profilers can get a snapshot of the stack only at the so-called safepoints — points of execution where the runtime can stop and look around. The placement of those safepoints is quite arbitrary, and it hampers the accuracy of profilers that have this problem.


Fortunately, other profilers exist that don't rely on safepoints to do the sampling. One such profiler is an open-source async-profiler. It is a low overhead sampling profiler that operates by attaching a native Java agent to a running JVM process and collecting the stack traces samples by using HotSpot-specific APIs. async-profiler works only on GNU/Linux and MacOS, and on the former, it additionally utilizes perf_events to dig into native code as well (more on that in some future blog posts).

You can build async-profiler yourself by cloning the repository and following the README instructions. It ships with a convenient binary that can attach to any JVM process and start profiling.

Unlike VisualVM, async-profiler doesn't have a GUI. One way to see its results is printing them to a console, which is while being simple is not particularly informative. But the modus operandi of async-profiler is to collect collapsed stacktraces that can be later transformed into flame graphs.

Flame graphs

Flame graphs are a relatively recent fashion in the world of profiling. They are a visualization of profiled software, allowing the most frequent code-paths to be identified quickly and accurately.

The most popular way of creating flame graphs is Brendan Gregg's FlameGraph library. The set of scripts can take the profiling data and generate an SVG image from it. The resulting image shows which methods appear on stack statistically more often than others. Compared to conventional ways of representing profiled data (hot method lists, trees), flame graphs are:

A flame graph also ships some basic interactivity, such as zooming and searching, which adds clarity and convenience to deciphering the profiling data. You can learn more about flame graphs by following the links at the end of this post.


The two tools described above are perfectly fine to be used together, and each on its own. They don't strive to be a complete profiling solution but rather building blocks for a flexible and customizable profiling workflow that you might need. But it doesn't mean there shouldn't be an easy way to consume these two projects. With that in mind, I created clj-async-profiler, a tool for on-demand creation of flame graphs from the statistical profiling data sampled in the current or external JVM process.

Let's learn it by example. First, add the following dependency to your project.clj/build.boot/deps.edn:

[com.clojure-goes-fast/clj-async-profiler "0.1.0"]

Now we'll conceive a toy profiling example.

(defn test-sum []
  (reduce + (map inc (range 1000))))

(defn test-div []
  (reduce / (map inc (range 1000))))

(defn burn-cpu [secs]
  (let [start (System/nanoTime)]
    (while (< (/ (- (System/nanoTime) start) 1e9) secs)

Our preparatory steps are done, time to run the profiler!

user=> (require '[clj-async-profiler.core :as prof])

user=> (prof/start {})
"Starting [cpu] profiling\n"

user=> (burn-cpu 10)

user=> (prof/stop {})
#object[ 0x5c9cd2f7 "/tmp/clj-async-profiler/results/flamegraph-2017-12-11-01-37-05.svg"]

The result of the stop function points to a generated SVG flame graph. You can open it directly in the browser, like file:///tmp/clj-async-profiler/results/flamegraph-....svg.

Click here to open in full.

The above picture shows us that our function burn-cpu appeared in almost 100% of samples (no surprise), and that test-div accounts for 98.8% of the execution time. That's how much slower the ratio division is when compared to addition (but again, this is a toy example, don't take these numbers as a hard evidence).

The image of a flamegraph is interactive. Try playing with it! You can highlight stack frames to get the profiling numbers, you can click on them to zoom in and out. You can also click Search to highlight all of the frames that match the provided regexp and get their total time percentage. For example, you can enter ^java.*, and it will tell you that Java-side code takes 95.8% of the total time in our test.

Bonus features

The function (profile-for <duration> <options>) starts the profiling, waits for the specified duration in seconds, and then generates a flamegraph. It returns a future that will deliver the path to the generated flamegraph; thus it doesn't block the execution thread.

Functions start, stop, and profile-for optionally accept a process ID as their first argument. If PID is provided, clj-async-profiler will profile the external JVM process specified by the PID. By default, the JVM where profiler is started from is profiled.

clj-async-profiler includes a basic webserver that allows viewing the flamegraphs remotely. (serve-files <port>) will start the webserver, similar to Python's SimpleHTTPServer on the specified port.

All features and options of clj-async-profiler are listed in the README.

Conclusion: VisualVM or async-profiler?

First, let's outline the advantages of each. Async-profiler:

Now, VisualVM:

To summarize, VisualVM is a high-level holistic platform for monitoring different performance-related aspects of your application. It has an actual user interface, somewhat obscure, but still gives more clues to a beginner.

UPDATE: Since the time of this post, Java Mission Control had become free. JMC has most of the benefits of VisualVM and also isn't influenced by safepoints.

Async-profiler, on the other hand, is more of a low-level "hacker" tool. It doesn't put the profiling data into a walled garden of incompatible binary formats. It can be started rapidly from within the application. And once you familiarize yourself with flame graphs, reading the profiles for you will be much faster than VisualVM's UI.

For me, async-profiler has become a go-to ad hoc profiler. But I continue to use VisualVM for reasons mentioned above, and as a second pair of "eyes" to look at the performance picture. I hope this post has expanded your knowledge about profiling and introduced you to another useful tool in your arsenal.

Let it burn!