The first stable release of clj-async-profiler marks a principal milestone for this project. Almost five years have passed since its creation, and during these years, I used clj-async-profiler almost daily to make the programs I write faster and better. New challenges presented new ideas for improving the profiler, which in turn materialized in such features as diffgraphs, stack transforms, demunging, etc. Today, I released clj-async-profiler 1.0.0, which comes with more goodies that will hopefully improve your experience of using the profiler. In this post, I want to highlight these changes and teach you how to make use of them in your workflow.
Here's a brief list of changes before we dive deeper into each of them:
Let's work through a non-trivial example to see the changes in action and learn how to use the new features. I've generated a new HTML flamegraph in a sample client-server application — click here to open it. Here is what you will see:
The first thing you'll notice is how fast it opens. That comes from the reduced file size and much snappier rendering. Click around the graph, see how responsive it has become (if you want to feel the improvement, check the SVG version of the same flamegraph — and that one doesn't even contain all frames!).
The second big difference is the addition of that barebones panel on the right.
It is actually collapsible, so you don't have to stare at it all the time. This
panel (the looks of which might get better later) is the vessel for the new
exciting ways to interact with the flamegraph. First from the top is the
Highlight section which is where the existing Search functionality has been
moved. You can write the query as a string or a
/regex/. Next goes the field
for setting the minimal frame width. It is used as an optimization for very
large flamegraphs since hairbreadth frames (meaning they have few samples) are
not very helpful, but skipping them can improve rendering speed even further.
Further down is the Reversed flag. Gone are the days when you had to regenerate the whole flamegraph to see the so-called "icicles" graph! Right below is a toggle between sorting the frames by their name or width. You can experiment with these options by changing them and clicking Apply.
The final section is the meat of the new flamegraph functionality — it gives the ability to transform the stacks in the flamegraph. You can add any number of transforms that could be one of the three possible types: Filter, Remove, or Replace, with the latter being the most useful. To understand transforms, it is important to remember how stacks in clj-async-profiler are represented:
A regex-based Replace transform is allowed not just to rename individual frames
but also elide frames or even introduce new frames (if that's ever needed). But
regardless of what replacement you are doing, the total number of samples
remains the same. You don't lose any data; you just redistribute it to other
stacks. Let's start with an easier usecase. Say, in our flamegraph, we don't
really care about what Cheshire is doing under the hood since we won't be able
to optimize that anytime soon. So, we can add a Replace transform that replaces
/cheshire.core.+/ (slashes signify that it is a regular expression and
not just a string) to
cheshire/.... It means: in any stack where there is a
cheshire.core replace that and everything afterward with
If you try doing this exercise on your own, you will notice that the total number of frames for that Cheshire subtree has remained the same after the transform.
Here is another usecase, a more complicated one this time. On the right side,
you should see a tall subgraph that contains invocations of
two functions are recursive, even mutually recursive, and this makes
interpreting the profiling output for them very cumbersome. With a transform, we
can collapse all recursive calls into just one frame so that the real work
performed within those functions sticks together and becomes more prominent. It
is achievable by the following Replace:
In the regex to replace, we say that we are looking for a stackframe that looks
example.client/rand-json- and then some word. Any number of other frames
can follow this frame, but at some point, there should appear another frame with
example.client/rand-json-. We capture that second frame into a match group
(with parentheses) and replace the whole substack that matched with just that
last frame. As a result, that complicated recursive tree flattens down and it is
now much easier to analyze what is the most expensive within those functions.
The other two transforms, Filter and Remove, are mirror versions of each other.
Filter only retains those stacks that match the given substring of a regex, and
Remove does the opposite — removes the stacks that match. For example, you might
want to remove that leftmost part of the graph that starts with the frame
[unknown_Java]. They appear when the profiler cannot accurately map the perf
events to what's going on within the JVM. For that, you can add a Remove
transform with the string
By adding some more transforms (collapsing of consecutive
manifold frames), our flamegraph could be looking like this:
It might not be immediately obvious how these transforms make it easier to read the profiler results. Writing such transforms is a learned skill, and it may take a while to start appreciating what they bring you. Good command of regular expressions also helps.
Diffgraphs are also rendered in HTML now, so everything above applies to diffgraphs too. In diffgraphs, Filter and Remove transforms can be more beneficial than in regular flamegraphs. You can also dynamically switch between normalized and non-normalized diffing within the flamegraph1).
The main breaking change of the new release is that SVG generation is no longer supported. I assume it won't matter much to most users unless you specifically relied on the profiler's output being an SVG file2). In that case, you are stuck with the previous version of clj-async-profiler; maybe, in the future, SVG generation will make it back as an option if there is enough user interest.
Since transforms are now dynamic, the old
:transform option that could be
passed to the profiler's façade functions is no longer essential. It is still
available and supported, though, since it accepts arbitrary code and can
potentially do much more than simple filter/remove/replace, and somebody might
still need that. Another option was added called
accepts a sequence of maps that look like this:
This option allows to bake common transforms directly into the HTML. Once baked in, you are still able to modify and disable those transforms. This is convenient in applications that you profile often and repeatedly apply the same transforms. Predefining such transforms can save you a bit of time. As an example, here is a flamegraph with all the transforms mentioned above already defined.
The HTTP server that powers
serve-files UI has been rewritten. It is still an
embedded thing; clj-async-profiler continues to be a proud member of the Zero
Dependencies movement. The fresh rewrite brings some basic features like cache
headers for the served files. Otherwise, the UI has remained the same for now,
but I have some more ideas for it.
As always, you can check for the exhaustive list of changes in the CHANGELOG.
And that's pretty much it. Personally, I'm very excited about being able to finally deliver this release to you. Hope you find it as useful as I do, and feel free to leave any comments, requests, and bug reports on Github or here.