clj-async-profiler is capable of profiling not only time spent on the CPU but other events as well. Here we'll learn how to use it for allocation profiling. This means finding out which parts of your program allocate the most objects (measured in total size).
The way allocation profiling is implemented in the underlying library — async-profiler — is very efficient and non-intrusive. It might not be as accurate as heavyweight memory profiling mechanisms available in other profilers, but in practice, it is good enough at pointing at the biggest allocation hogs in your code with almost zero costs for the runtime. Similar to regular CPU profiling, allocation profiling with clj-async-profiler is safe even on highly loaded production servers.
Let's see allocation profiling in action. First, download this file twitter.json, and we'll try to parse it with Cheshire and understand where the allocations happen.
user=> (require '[cheshire.core :as json]
                '[clj-async-profiler.core :as prof])
user=> (prof/profile
        {:event :alloc}
        (dotimes [_ 500]
          (json/decode (slurp "twitter.json"))))
Visually, an allocation flamegraph is almost identical to a regular CPU
flamegraph. What is different is that at the top of every stack, there is a
special frame (or multiple frames) that denotes class of objects that were
allocated there[1]. You can see in the flamegraph
above that Cheshire is responsible for 54% of the allocation volume; meanwhile,
slurp makes 46% of the allocations. Let's hoist slurp outside the loop and
profile just the parsing.
user=> (let [s (slurp "twitter.json")]
         (prof/profile
          {:event :alloc}
          (dotimes [_ 1000]
            (json/decode s))))
You'll also notice that the function parse* is recursive, which makes the
result less clear. Fortunately, we've already learned how to simplify
flamegraphs that contain recursive functions with live
transforms.
Let's add one of those to this flamegraph:
Now Cheshire is a sole allocator, as expected, and its allocations consist of creating internal Jackson objects and modifying transient maps.
Note that just like with CPU profiling, allocation profiling cannot measure the absolute numbers of your program's allocation pressure and whether it's too much. It only tells you which parts of the code allocate more than the others in comparison. To get the exact allocation rate, you should use separate tools, e.g., jvm-alloc-rate-meter.
[i] for inlined) or native code ([k]
for kernel). This may not be important to you initially, so feel free to
ignore it.↑