Jun 09, 2023

Profiling tool: async-profiler

Updated on 2023-06-09: bumped version numbers and usage examples.
A more comprehensive clj-async-profiler documentation is available on Knowledge Base.
Runnable code for this post can be found here.

In the previous installment, we learned how to use VisualVM to profile your Clojure applications. VisualVM is an excellent profiler: beginner-friendly, has enough features but not bloated, all-in-all a good one to dip your toes into the waters of profiling. However, VisualVM (and many other JVM profilers) suffer from safepoint bias. Long story short, most sampling profilers can get a snapshot of the stack only at the so-called safepoints — points of execution where the runtime can stop and look around. The placement of those safepoints is quite arbitrary, and it hampers the accuracy of profilers that have this problem.

async-profiler

Fortunately, other profilers exist that don't rely on safepoints to do the sampling. One such profiler is an open-source async-profiler. It is a low overhead sampling profiler that operates by attaching a native Java agent to a running JVM process and collecting the stack traces samples by using HotSpot-specific APIs. async-profiler works only on GNU/Linux and MacOS, and on the former, it additionally utilizes perf_events to dig into native code as well.

The output of async-profiler is commonly rendered with flame graphs. Flame graphs are a relatively recent fashion in the world of profiling. They are a visualization of profiled software, allowing the most frequent code-paths to be identified quickly and accurately.

The most popular way of creating flame graphs on your own is Brendan Gregg's FlameGraph library. The set of scripts can take the profiling data and generate an SVG image from it. The resulting image shows which methods appear on stack statistically more often than others. Compared to conventional ways of representing profiled data (hot method lists, trees), flame graphs are:

A flame graph also ships some basic interactivity, such as zooming and searching, which adds clarity and convenience to deciphering the profiling data. You can learn more about flame graphs by following the links at the end of this post.

clj-async-profiler

Clojure users have convenient access to async-profiler in the form of clj-async-profiler. Unlike VisualVM, clj-async-profiler is not a standalone application, but a library that you include into your own application as a dependency.

Let's learn it by example. First, add the following dependency to your project.clj or deps.edn:

[com.clojure-goes-fast/clj-async-profiler "1.0.4"]

Now we'll conceive a toy profiling example.

(defn test-sum []
  (reduce + (map inc (range 1000))))

(defn test-div []
  (reduce / (map inc (range 1000))))

(defn burn-cpu [secs]
  (let [start (System/nanoTime)]
    (while (< (/ (- (System/nanoTime) start) 1e9) secs)
      (test-sum)
      (test-div))))

Our preparatory steps are done, time to run the profiler!

user=> (require '[clj-async-profiler.core :as prof])
user=> (prof/profile (burn-cpu 10))

After the execuion finishes, the flamegraph will be generated in /tmp/clj-async-profiler/results/ directory. You can open it directly in the browser, like file:///tmp/clj-async-profiler/results/01-cpu-flamegraph-.....html, or run (prof/serve-ui 8080) to start a profiler UI local webserver.

Click here to open in full.

The above picture shows us that our function burn-cpu appeared in 92% of samples, and that test-div accounts for 91.32% of the execution time. That's how much slower the ratio division is when compared to addition (but again, this is a toy example, don't take these numbers as a hard evidence).

The image of a flamegraph is interactive. Try playing with it! You can highlight stack frames to get the profiling numbers, you can click on them to zoom in and out. You can also use Highlight section on the right to highlight all of the frames that match the provided regexp and get their total time percentage. For example, you can enter /java.*/, and it will tell you that Java-side code takes 92.76% of the total time in our test.

Bonus features

The function (profile-for <duration> <options>) starts the profiling, waits for the specified duration in seconds, and then generates a flamegraph. It returns a future that will deliver the path to the generated flamegraph; thus it doesn't block the execution thread.

Functions start, stop, profile, profile-for optionally accept a process ID as :pid argument in the options map. If PID is provided, clj-async-profiler will profile the external JVM process specified by the PID. By default, the JVM where profiler is started from is profiled.

All features and options of clj-async-profiler are listed in the README and Knowledge Base.

Conclusion: VisualVM or clj-async-profiler?

First, let's outline the advantages of each. clj-async-profiler:

Now, VisualVM:

To summarize, VisualVM is a high-level holistic platform for monitoring different performance-related aspects of your application. It has an actual user interface, somewhat obscure, but still gives more clues to a beginner.

UPDATE: Since the time of this post, Java Mission Control had become free. JMC has most of the benefits of VisualVM and also isn't influenced by safepoints.

clj-async-profiler, on the other hand, is more of a low-level "hacker" tool. It doesn't put the profiling data into a walled garden of incompatible binary formats. It can be started rapidly from within the application. And once you familiarize yourself with flame graphs, reading the profiles for you will be much faster than VisualVM's UI.

For me, clj-async-profiler has become a go-to ad hoc profiler. But I continue to use VisualVM for reasons mentioned above, and as a second pair of "eyes" to look at the performance picture. I hope this post has expanded your knowledge about profiling and introduced you to another useful tool in your arsenal.

Let it burn!

References