In the previous installment, we learned how to use VisualVM to profile your Clojure applications. VisualVM is an excellent profiler: beginner-friendly, has enough features but not bloated, all-in-all a good one to dip your toes into the waters of profiling. However, VisualVM (and many other JVM profilers) suffer from safepoint bias. Long story short, most sampling profilers can get a snapshot of the stack only at the so-called safepoints — points of execution where the runtime can stop and look around. The placement of those safepoints is quite arbitrary, and it hampers the accuracy of profilers that have this problem.
Fortunately, other profilers exist that don't rely on safepoints to do the
sampling. One such profiler is an
open-source
async-profiler. It is a
low overhead sampling profiler that operates by attaching a native Java agent to
a running JVM process and collecting the stack traces samples by using
HotSpot-specific APIs. async-profiler works only on GNU/Linux and MacOS, and on
the former, it additionally utilizes perf_events
to dig into native code as
well (more on that in some future blog posts).
You can build async-profiler yourself by cloning the repository and following the README instructions. It ships with a convenient binary that can attach to any JVM process and start profiling.
Unlike VisualVM, async-profiler doesn't have a GUI. One way to see its results is printing them to a console, which is while being simple is not particularly informative. But the modus operandi of async-profiler is to collect collapsed stacktraces that can be later transformed into flame graphs.
Flame graphs are a relatively recent fashion in the world of profiling. They are a visualization of profiled software, allowing the most frequent code-paths to be identified quickly and accurately.
The most popular way of creating flame graphs is Brendan Gregg's FlameGraph library. The set of scripts can take the profiling data and generate an SVG image from it. The resulting image shows which methods appear on stack statistically more often than others. Compared to conventional ways of representing profiled data (hot method lists, trees), flame graphs are:
A flame graph also ships some basic interactivity, such as zooming and searching, which adds clarity and convenience to deciphering the profiling data. You can learn more about flame graphs by following the links at the end of this post.
The two tools described above are perfectly fine to be used together, and each on its own. They don't strive to be a complete profiling solution but rather building blocks for a flexible and customizable profiling workflow that you might need. But it doesn't mean there shouldn't be an easy way to consume these two projects. With that in mind, I created clj-async-profiler, a tool for on-demand creation of flame graphs from the statistical profiling data sampled in the current or external JVM process.
Let's learn it by example. First, add the following dependency to your project.clj/build.boot/deps.edn:
[com.clojure-goes-fast/clj-async-profiler "0.1.0"]
Now we'll conceive a toy profiling example.
(defn test-sum []
(reduce + (map inc (range 1000))))
(defn test-div []
(reduce / (map inc (range 1000))))
(defn burn-cpu [secs]
(let [start (System/nanoTime)]
(while (< (/ (- (System/nanoTime) start) 1e9) secs)
(test-sum)
(test-div))))
Our preparatory steps are done, time to run the profiler!
user=> (require '[clj-async-profiler.core :as prof])
nil
user=> (prof/start {})
"Starting [cpu] profiling\n"
user=> (burn-cpu 10)
nil
user=> (prof/stop {})
#object[java.io.File 0x5c9cd2f7 "/tmp/clj-async-profiler/results/flamegraph-2017-12-11-01-37-05.svg"]
The result of the stop
function points to a generated SVG flame graph. You can
open it directly in the browser, like
file:///tmp/clj-async-profiler/results/flamegraph-....svg
.
The above picture shows us that our function burn-cpu
appeared in almost 100%
of samples (no surprise), and that test-div
accounts for 98.8% of the
execution time. That's how much slower the ratio division is when compared to
addition (but again, this is a toy example, don't take these numbers as a hard
evidence).
The image of a flamegraph is interactive. Try playing with it! You can highlight
stack frames to get the profiling numbers, you can click on them to zoom in and
out. You can also click Search to highlight all of the frames that match the
provided regexp and get their total time percentage. For example, you can enter
^java.*
, and it will tell you that Java-side code takes 95.8% of the total
time in our test.
The function (profile-for <duration> <options>)
starts the profiling, waits
for the specified duration in seconds, and then generates a flamegraph. It
returns a future that will deliver the path to the generated flamegraph; thus it
doesn't block the execution thread.
Functions start
, stop
, and profile-for
optionally accept a process ID as
their first argument. If PID is provided, clj-async-profiler will profile the
external JVM process specified by the PID. By default, the JVM where profiler is
started from is profiled.
clj-async-profiler includes a basic webserver that allows viewing the
flamegraphs remotely. (serve-files <port>)
will start the webserver, similar
to Python's SimpleHTTPServer
on the specified port.
All features and options of clj-async-profiler are listed in the README.
First, let's outline the advantages of each. Async-profiler:
Now, VisualVM:
To summarize, VisualVM is a high-level holistic platform for monitoring different performance-related aspects of your application. It has an actual user interface, somewhat obscure, but still gives more clues to a beginner.
UPDATE: Since the time of this post, Java Mission Control had become free. JMC has most of the benefits of VisualVM and also isn't influenced by safepoints.
Async-profiler, on the other hand, is more of a low-level "hacker" tool. It doesn't put the profiling data into a walled garden of incompatible binary formats. It can be started rapidly from within the application. And once you familiarize yourself with flame graphs, reading the profiles for you will be much faster than VisualVM's UI.
For me, async-profiler has become a go-to ad hoc profiler. But I continue to use VisualVM for reasons mentioned above, and as a second pair of "eyes" to look at the performance picture. I hope this post has expanded your knowledge about profiling and introduced you to another useful tool in your arsenal.
Let it burn!