Updated on 2023-06-09: revised numbers, added section about
time+.
Runnable code for this post can be found
here.
The most basic question a young performance acolyte may ask is "How long does it
take to run my code"? Clojure is capable of answering such questions out of the
box with the time macro. time
takes a single expression and executes it. When the expression finishes running,
it will print the time it took in milliseconds.
Let's see how quickly Clojure can calculate the mean value of a sequence (using a terribly inefficient algorithm). A wrong way to do this would be:
user=> (time (reduce + (map #(/ % 100.0) (range 100))))
"Elapsed time: 0.252375 msecs"
49.5
Looks like a hundred sums and divisions take roughly 250 microseconds. But this
is very far from the truth. Unless the complete execution takes a few seconds,
the results of time
will include many things besides your actual expression.
First of all, it will count in the overall execution overhead. If the running
time of the expression you measure is small, that overhead time will contribute
to the majority of the time spent. For example, on my machine just (time nil)
finishes in approximately 15 microseconds.
Second inaccuracy comes from JVM itself, exactly from its JIT step. Since you've just entered the code in the REPL, it is the first time the JVM has seen it. JVM doesn't know yet whether this code will be executed again, so the code gets interpreted. Interpretation is slower than running the compiled code, but JIT-compiling an expression means paying some cost upfront. JVM wants to be sure that the investment gonna pay out. After running the same code certain number of times, the JVM will decide it had enough and proceed to JIT-compiling it and running the compiled version from there on.
The factors mentioned above lead us to a conclusion that we should run the
expression inside time
multiple times to get a more meaningful result:
user=> (time (dotimes [_ 1e6] (reduce + (map #(/ % 100.0) (range 100)))))
"Elapsed time: 1133.6695 msecs"
nil
If we divide the resulting time by 106, we'll get the rough time of each iteration to be 1.1 microseconds. Two hundred times less than in the previous measurement!
Using this simple combo of time
and dotimes, you can quite
accurately estimate how long your functions work. However, you have to divide
the final time by the number of iterations, and remember to do a warm-up run for
JIT to trigger, and do a few more runs to consider the deviation.
As a remedy to time/dotimes
two inconveniences (guessing the adequate number
of iterations and having to divide the final time by it), one can use the macro
I called time+. It is a simple paste-and-go macro
that you can include into your system-wide Clojure
profile
and then use in any project that you start locally.
Using time+
is identical to how time
is used, except that the repetition
is performed automatically:
user=> (time+ (reduce + (map #(/ % 100.0) (range 100))))
Time per call: 1.78 us Alloc per call: 6,320b Iterations: 1122541
nil
After running the code for 2 seconds, time+
prints the time and allocated
bytes per iteration, and the number of iterations. You can specify the total
time in milliseconds as the first argument to the macro:
user=> (time+ 5000 (Thread/sleep 10))
Time per call: 12.26 ms Alloc per call: 1b Iterations: 430
nil
Criterium is an advanced but easy to
use benchmarking tool for Clojure. It is designed as a robust replacement for
time
that takes into account some common benchmarking pitfalls. To use
Criterium, you should add this to your dependencies:
[criterium "0.4.6"] ; Per the time of writing
And require it in the namespace where you want to measure:
user=> (require '[criterium.core :as criterium])
If you find yourself using Criterium often, you ought to add it to your system-wide development profile in Leiningen or tools.deps.
quick-bench
is the fastest way to benchmark an expression:
user=> (criterium/quick-bench (reduce + (map #(/ % 100.0) (range 100))))
Evaluation count : 450582 in 6 samples of 75097 calls.
Execution time mean : 1.469145 µs
Execution time std-deviation : 332.754413 ns
Execution time lower quantile : 1.317431 µs ( 2.5%)
Execution time upper quantile : 2.044992 µs (97.5%)
Overhead used : 1.891462 ns
That "fastest way" surely took a while. Criterium spent some time warming up the JIT, and then ran our expression 450582 times, and the average execution time was 1.32 microseconds. That's approximately what we figured with our time+dotimes approach.
Let's check how precise the JVM is when going to sleep for short timespans:
user=> (criterium/quick-bench (Thread/sleep 10))
Evaluation count : 48 in 6 samples of 8 calls.
Execution time mean : 12.353675 ms
Execution time std-deviation : 258.636937 µs
Execution time lower quantile : 11.857332 ms ( 2.5%)
Execution time upper quantile : 12.521358 ms (97.5%)
Overhead used : 1.891462 ns
Found 1 outliers in 6 samples (16.6667 %)
low-severe 1 (16.6667 %)
Variance from outliers : 13.8889 % Variance is moderately inflated by outliers
By now you may have noticed yet another nice feature of Criterium — it automatically adjusts the benchmark run time according to the execution time of the measured expression. In other words, if the expression is fast, Criterium will run it plenty of times, but if a single iteration is quite slow, it will be executed fewer times so that you don't get too bored waiting.
The bench
macro is claimed to be more accurate than quick-bench
, but in
practice, it runs for much longer and doesn't yield significantly different
results most of the time. You should consider using it only when you need to
evaluate something tiny and volatile. Criterium also has many other facilities
for controlling the output and customizing the benchmarking process; you should
check the API documentation
to learn what else is in there.
You will get wrong benchmarking results if the evaluated expression contains unrealized laziness. Criterium is also susceptible to this:
user=> (criterium/quick-bench (map #(Math/tan %) (range 1000)))
Evaluation count : 48000000 in 6 samples of 8000000 calls.
Execution time mean : 9.986185 ns
Execution time std-deviation : 1.408634 ns
Execution time lower quantile : 8.914700 ns ( 2.5%)
Execution time upper quantile : 11.779724 ns (97.5%)
Overhead used : 1.891462 ns
user=> (criterium/quick-bench (doall (map #(Math/tan %) (range 1000))))
Evaluation count : 5844 in 6 samples of 974 calls.
Execution time mean : 102.158922 µs
Execution time std-deviation : 1.315074 µs
Execution time lower quantile : 100.751104 µs ( 2.5%)
Execution time upper quantile : 104.011188 µs (97.5%)
Overhead used : 1.891462 ns
10ns versus 102us (the latter is correct) — quite a difference! Also, keep in
mind that doall/dorun
do not traverse complex data structures. Let's look at
this slightly convoluted example (remember
that repeatedly returns a lazy
sequence):
(defn full-lazy []
(repeatedly 10 (fn [] (map #(Math/tan %) (range 1000)))))
(defn outer-doall []
(doall
(repeatedly 10 (fn [] (map #(Math/tan %) (range 1000))))))
(defn full-eager []
(doall
(repeatedly 10 (fn [] (doall
(map #(Math/tan %) (range 1000)))))))
(doseq [f [#'full-lazy #'outer-doall #'full-eager]]
(let [result (criterium/quick-benchmark* f {})]
(criterium/report-point-estimate (str f) (:mean result))))
; #'demo/full-lazy : 9.938479 ns
; #'demo/outer-doall : 590.588240 ns
; #'demo/full-eager : 1.083767 ms
As you can see, it is not enough to force only the top-level evaluation, you must make sure that no lazy sequences hide internally.
In the last example, we demonstrated one of Clojure's main powers — it's
hackability. We are not only able to benchmark Clojure code, but also use other
Clojure code to facilitate benchmarking. Instead of running all three benchmarks
manually and then visually trying to track the relevant output, we used
Criterium's quick-benchmark*
function (which returns the result as data) and
molded the output into what we needed.
Criterium is a convenient tool to approximate the performance of your code. It
might seem that it doesn't offer much over time
, but Criterium actually
produces more reliable results. Still, it shouldn't be the only tool in your
performance arsenal — there are many others that we will discuss very soon.