07 August 2014
I was watching a video by Aaron Patterson, where he highlighted a few Ruby performance tuning tools. I thought it would be useful to write a quick post to gather them all together in the same place.
The classic tool for testing if one block of code is faster than the other in Ruby is the Benchmark module. It comes built in, and is pretty simple to use:
require 'benchmark'
h = { foo: 1, bar: 1, baz: 1, foobar: 1 }
a = [:foo, :bar, :baz, :foobar]
n = 50000
Benchmark.bm(6) do |x|
x.report('hash:') { n.times do; h.include?(:foobar); end }
x.report('array:') { n.times do; a.include?(:foobar); end }
end
$ ruby benchmark.rb
user system total real
hash: 0.460000 0.000000 0.460000 ( 0.459387)
array: 1.040000 0.010000 1.050000 ( 1.049521)
Benchmark does an OK job, but the problem with it is that you need to guess how many iterations to run the code block so that it accrues meaningful time, and doesn't report the result as zero.
Often, a more interesting metric is Iterations Per Second, rather than the wall clock time of one method verses another. For this the benchmark-ips gem is useful. It figures out how many iterations to run the code so that the results are meaningful, and also calculates the standard deviation of the results.
require 'benchmark/ips'
h = { foo: 1, bar: 1, baz: 1, foobar: 1 }
a = [:foo, :bar, :baz, :foobar]
Benchmark.ips do |x|
x.report("hash") { h.include?(:foobar) }
x.report("array") { a.include?(:foobar) }
x.compare
end
$ ruby benchmark_ips.rb
Calculating -------------------------------------
hash 24983 i/100ms
array 24084 i/100ms
-------------------------------------------------
hash 6430567.4 (±8.8%) i/s - 31378648 in 4.937493s
array 3636150.6 (±4.0%) i/s - 18014832 in 4.963605s
Comparison:
hash: 6430567.4 i/s
array: 3636150.6 i/s - 1.77x slower
Stackprof is a sampling call-stack profiler that works with Ruby 2.1 and above. The way it works is to sample the call stack at regular tunable intervals. On each sample, it captures the method which is executing at that moment.
By doing this, a slow method will be encountered by the sampler more often than a fast one, and hence will be reported as consuming a bigger percentage of the time.
This technique is not perfect, as it could miss some very fast method calls completely, but generally when profiling you are only concerned with slow methods anyway.
require 'stackprof'
def slow_method
sleep(0.1)
end
def fast_method
end
StackProf.run(mode: :cpu, out: 'stackprof-output.dump') do
100.times do
slow_method
fast_method
end
end
The captured call data is written to the stackprof-output.dump file which can analyzed by the stackprof command line tool that is installed with the gem:
$ stackprof stackprof-output.dump
==================================
Mode: cpu(1000)
Samples: 14 (0.00% miss rate)
GC: 0 (0.00%)
==================================
TOTAL (pct) SAMPLES (pct) FRAME
14 (100.0%) 14 (100.0%) Object#slow_method
14 (100.0%) 0 (0.0%) block in <main>
14 (100.0%) 0 (0.0%) block (2 levels) in <main>
14 (100.0%) 0 (0.0%) <main>
14 (100.0%) 0 (0.0%) <main>
This example is a bit contrived - something more useful would be to profile some requests in a Rails application to see how things look.
To use Stackprof in a Rails application, you simply insert it as an extra piece of middleware, by adding the following to application.rb:
config.middleware.insert_before(Rack::Sendfile, StackProf::Middleware, enabled: true, mode: :cpu, interval: 1000, save_every: 5)
The saveevery parameter tells stackprof to save to disk after processing saveevery requests.
When you run Rails server, it will write stackprof trace files into your applications tmp directory that can be profiled as normal:
$ stackprof stackprof-cpu-27132-1407514856.dump
==================================
Mode: cpu(1000)
Samples: 26 (0.00% miss rate)
GC: 1 (3.85%)
==================================
TOTAL (pct) SAMPLES (pct) FRAME
3 (11.5%) 3 (11.5%) ActiveSupport::Subscriber#start
3 (11.5%) 3 (11.5%) block in ActionView::PathResolver#find_template_paths
2 (7.7%) 2 (7.7%) block in Rack::Utils::KeySpaceConstrainedParams#to_params_hash
1 (3.8%) 1 (3.8%) block (4 levels) in Class#class_attribute
1 (3.8%) 1 (3.8%) block in ActiveRecord::ConnectionAdapters::AbstractAdapter#lease
1 (3.8%) 1 (3.8%) ThreadSafe::NonConcurrentCacheBackend#[]
1 (3.8%) 1 (3.8%) ActiveSupport::TaggedLogging::Formatter#pop_tags
1 (3.8%) 1 (3.8%) Rack::BodyProxy#initialize
1 (3.8%) 1 (3.8%) ActionView::PathResolver#extract_handler_and_format_and_variant
1 (3.8%) 1 (3.8%) ActionView::Resolver::Path#to_str
1 (3.8%) 1 (3.8%) ActionDispatch::Journey::Route#matches?
1 (3.8%) 1 (3.8%) ActiveSupport::Duration.===
1 (3.8%) 1 (3.8%) Logger#format_severity
1 (3.8%) 1 (3.8%) ActiveSupport::Configurable::ClassMethods#config
1 (3.8%) 1 (3.8%) ActiveSupport::Inflector#underscore
1 (3.8%) 1 (3.8%) ActiveSupport::FileUpdateChecker#max_mtime
1 (3.8%) 1 (3.8%) Hash#extractable_options?
9 (34.6%) 1 (3.8%) Benchmark#realtime
1 (3.8%) 1 (3.8%) ActiveSupport::TaggedLogging::Formatter#current_tags
25 (96.2%) 1 (3.8%) Rack::Runtime#call
2 (7.7%) 0 (0.0%) Logger#add
18 (69.2%) 0 (0.0%) ActionDispatch::Journey::Router#call
18 (69.2%) 0 (0.0%) ActionDispatch::Routing::RouteSet#call
18 (69.2%) 0 (0.0%) Rack::ETag#call
18 (69.2%) 0 (0.0%) Rack::ConditionalGet#call
18 (69.2%) 0 (0.0%) Rack::Head#call
18 (69.2%) 0 (0.0%) ActionDispatch::ParamsParser#call
18 (69.2%) 0 (0.0%) ActionDispatch::Flash#call
18 (69.2%) 0 (0.0%) Rack::Session::Abstract::ID#context
18 (69.2%) 0 (0.0%) Rack::Session::Abstract::ID#call
Often, a piece of code that allocates less objects is more efficient. This is because less memory needs to be allocated, and the garbage collector needs to run less frequently. The Allocation_Tracer gem allows you to see how many of each type of object your application creates, and also which file and line number they were created on.
require 'allocation_tracer'
require 'pp'
ObjectSpace::AllocationTracer.setup(%i{path line type})
result = ObjectSpace::AllocationTracer.trace do
str = 'hello '
10_000.times{|i|
str << 'hello '
}
str = 'hello '
10_000.times{|i|
str = str + 'hello '
}
end
pp result
The results show that using << to concatenate a string produces many less objects than the + method - this is because << does an in-place concatenation of the string, while + creates a new object:
{["allocation.rb", 13, :T_STRING]=>[20000, 30, 19463, 0, 3, 282839776],
["allocation.rb", 9, :T_STRING]=>[10000, 0, 10000, 1, 1, 0],
["allocation.rb", 7, :T_STRING]=>[1, 0, 1, 1, 1, 102399],
["allocation.rb", 11, :T_STRING]=>[1, 1, 10, 10, 10, 0]}
If you have a large application, you can aggregate the counts at a higher level, by grouping by object type and ignoring the file and line number. Out of interest, I created a simple piece of Rails middleware to see how many objects Rails allocates on each request:
require 'pp'
class Allocation
def initialize(app)
ObjectSpace::AllocationTracer.setup(%i{type})
@app = app
end
def call(env)
res = nil
result = ObjectSpace::AllocationTracer.trace do
res = @app.call(env)
end
pp result
res
end
end
Notice that I asked allocation tracer to capture only the type of the object - adding the file or line number for a Rails application produced way to much output. This produced the following output for a single Rails request:
Started GET "/about" for 127.0.0.1 at 2014-08-10 13:23:33 +0100
Processing by AboutController#index as HTML
Rendered about/index.html.erb within layouts/application (0.1ms)
Completed 200 OK in 16ms (Views: 15.5ms)
{[:T_STRING]=>[2728, 0, 0, 0, 0, 0],
[:T_NODE]=>[425, 0, 0, 0, 0, 0],
[:T_ARRAY]=>[989, 0, 0, 0, 0, 0],
[:T_OBJECT]=>[53, 0, 0, 0, 0, 0],
[:T_HASH]=>[367, 0, 0, 0, 0, 0],
[:T_DATA]=>[416, 0, 0, 0, 0, 0],
[:T_MATCH]=>[67, 0, 0, 0, 0, 0],
[:T_REGEXP]=>[6, 0, 0, 0, 0, 0],
[:T_STRUCT]=>[2, 0, 0, 0, 0, 0],
[:T_FILE]=>[2, 0, 0, 0, 0, 0]}
Tracepoint is an interesting built-in library. It allows you to trace many events that occur in your Ruby application. This has many potential uses. For instance, in a large application it can be difficult to find all the places a certain class is instantiated. With Tracepoint, you can capture the file and line number where any given class is referenced by watching for events against the class you are interested in, and then dumping the call stack. For instance this code (taken from Aaron's talk):
require 'active_support/all'
trace = TracePoint.new(:c_call, :call) { |tp|
if tp.defined_class == ActiveSupport::SafeBuffer &&
tp.method_id == :initialize
puts "#" * 90
puts tp.binding.eval "caller"
end
}
trace.enable
"balblablalbal".html_safe
ActiveSupport::SafeBuffer.new "omgee"
This produces the following output:
##########################################################################################
tracer.rb:7:in `eval'
tracer.rb:7:in `block in <main>'
/Users/sodonnel/.rvm/gems/ruby-2.1.1/gems/activesupport-4.1.4/lib/active_support/core_ext/string/output_safety.rb:158:in `initialize'
/Users/sodonnel/.rvm/gems/ruby-2.1.1/gems/activesupport-4.1.4/lib/active_support/core_ext/string/output_safety.rb:237:in `new'
/Users/sodonnel/.rvm/gems/ruby-2.1.1/gems/activesupport-4.1.4/lib/active_support/core_ext/string/output_safety.rb:237:in `html_safe'
tracer.rb:11:in `<main>'
##########################################################################################
tracer.rb:7:in `eval'
tracer.rb:7:in `block in <main>'
/Users/sodonnel/.rvm/gems/ruby-2.1.1/gems/activesupport-4.1.4/lib/active_support/core_ext/string/output_safety.rb:158:in `initialize'
tracer.rb:12:in `new'
tracer.rb:12:in `<main>'