07 August 2014

Ruby Performance Analysis Tools

I was watching a video by Aaron Patterson, where he highlighted a few Ruby performance tuning tools. I thought it would be useful to write a quick post to gather them all together in the same place.

Benchmark

The classic tool for testing if one block of code is faster than the other in Ruby is the Benchmark module. It comes built in, and is pretty simple to use:

require 'benchmark'

h = { foo: 1, bar: 1, baz: 1, foobar: 1 }
a = [:foo, :bar, :baz, :foobar]


n = 50000
Benchmark.bm(6) do |x|
  x.report('hash:')  { n.times do; h.include?(:foobar); end }
  x.report('array:') { n.times do; a.include?(:foobar); end }
end
$ ruby benchmark.rb
             user     system      total        real
hash:    0.460000   0.000000   0.460000 (  0.459387)
array:   1.040000   0.010000   1.050000 (  1.049521)

Benchmark does an OK job, but the problem with it is that you need to guess how many iterations to run the code block so that it accrues meaningful time, and doesn't report the result as zero.

Benchmark/IPS

Often, a more interesting metric is Iterations Per Second, rather than the wall clock time of one method verses another. For this the benchmark-ips gem is useful. It figures out how many iterations to run the code so that the results are meaningful, and also calculates the standard deviation of the results.

require 'benchmark/ips'

h = { foo: 1, bar: 1, baz: 1, foobar: 1 }
a = [:foo, :bar, :baz, :foobar]

Benchmark.ips do |x|
  x.report("hash")  { h.include?(:foobar) }
  x.report("array") { a.include?(:foobar) }

  x.compare
end       
$ ruby benchmark_ips.rb
Calculating -------------------------------------
                hash     24983 i/100ms
               array     24084 i/100ms
-------------------------------------------------
                hash  6430567.4 (±8.8%) i/s -   31378648 in   4.937493s
               array  3636150.6 (±4.0%) i/s -   18014832 in   4.963605s

Comparison:
                hash:  6430567.4 i/s
               array:  3636150.6 i/s - 1.77x slower

Stackprof

Stackprof is a sampling call-stack profiler that works with Ruby 2.1 and above. The way it works is to sample the call stack at regular tunable intervals. On each sample, it captures the method which is executing at that moment.

By doing this, a slow method will be encountered by the sampler more often than a fast one, and hence will be reported as consuming a bigger percentage of the time.

This technique is not perfect, as it could miss some very fast method calls completely, but generally when profiling you are only concerned with slow methods anyway.

require 'stackprof'

def slow_method
  sleep(0.1)
end

def fast_method
end

StackProf.run(mode: :cpu, out: 'stackprof-output.dump') do
  100.times do
    slow_method
    fast_method
  end
end

The captured call data is written to the stackprof-output.dump file which can analyzed by the stackprof command line tool that is installed with the gem:

$ stackprof stackprof-output.dump
==================================
  Mode: cpu(1000)
  Samples: 14 (0.00% miss rate)
  GC: 0 (0.00%)
==================================
     TOTAL    (pct)     SAMPLES    (pct)     FRAME
        14 (100.0%)          14 (100.0%)     Object#slow_method
        14 (100.0%)           0   (0.0%)     block in <main>
        14 (100.0%)           0   (0.0%)     block (2 levels) in <main>
        14 (100.0%)           0   (0.0%)     <main>
        14 (100.0%)           0   (0.0%)     <main>

This example is a bit contrived - something more useful would be to profile some requests in a Rails application to see how things look.

To use Stackprof in a Rails application, you simply insert it as an extra piece of middleware, by adding the following to application.rb:

config.middleware.insert_before(Rack::Sendfile, StackProf::Middleware, enabled: true, mode: :cpu, interval: 1000, save_every: 5)

The saveevery parameter tells stackprof to save to disk after processing saveevery requests.

When you run Rails server, it will write stackprof trace files into your applications tmp directory that can be profiled as normal:

$ stackprof stackprof-cpu-27132-1407514856.dump
==================================
  Mode: cpu(1000)
  Samples: 26 (0.00% miss rate)
  GC: 1 (3.85%)
==================================
     TOTAL    (pct)     SAMPLES    (pct)     FRAME
         3  (11.5%)           3  (11.5%)     ActiveSupport::Subscriber#start
         3  (11.5%)           3  (11.5%)     block in ActionView::PathResolver#find_template_paths
         2   (7.7%)           2   (7.7%)     block in Rack::Utils::KeySpaceConstrainedParams#to_params_hash
         1   (3.8%)           1   (3.8%)     block (4 levels) in Class#class_attribute
         1   (3.8%)           1   (3.8%)     block in ActiveRecord::ConnectionAdapters::AbstractAdapter#lease
         1   (3.8%)           1   (3.8%)     ThreadSafe::NonConcurrentCacheBackend#[]
         1   (3.8%)           1   (3.8%)     ActiveSupport::TaggedLogging::Formatter#pop_tags
         1   (3.8%)           1   (3.8%)     Rack::BodyProxy#initialize
         1   (3.8%)           1   (3.8%)     ActionView::PathResolver#extract_handler_and_format_and_variant
         1   (3.8%)           1   (3.8%)     ActionView::Resolver::Path#to_str
         1   (3.8%)           1   (3.8%)     ActionDispatch::Journey::Route#matches?
         1   (3.8%)           1   (3.8%)     ActiveSupport::Duration.===
         1   (3.8%)           1   (3.8%)     Logger#format_severity
         1   (3.8%)           1   (3.8%)     ActiveSupport::Configurable::ClassMethods#config
         1   (3.8%)           1   (3.8%)     ActiveSupport::Inflector#underscore
         1   (3.8%)           1   (3.8%)     ActiveSupport::FileUpdateChecker#max_mtime
         1   (3.8%)           1   (3.8%)     Hash#extractable_options?
         9  (34.6%)           1   (3.8%)     Benchmark#realtime
         1   (3.8%)           1   (3.8%)     ActiveSupport::TaggedLogging::Formatter#current_tags
        25  (96.2%)           1   (3.8%)     Rack::Runtime#call
         2   (7.7%)           0   (0.0%)     Logger#add
        18  (69.2%)           0   (0.0%)     ActionDispatch::Journey::Router#call
        18  (69.2%)           0   (0.0%)     ActionDispatch::Routing::RouteSet#call
        18  (69.2%)           0   (0.0%)     Rack::ETag#call
        18  (69.2%)           0   (0.0%)     Rack::ConditionalGet#call
        18  (69.2%)           0   (0.0%)     Rack::Head#call
        18  (69.2%)           0   (0.0%)     ActionDispatch::ParamsParser#call
        18  (69.2%)           0   (0.0%)     ActionDispatch::Flash#call
        18  (69.2%)           0   (0.0%)     Rack::Session::Abstract::ID#context
        18  (69.2%)           0   (0.0%)     Rack::Session::Abstract::ID#call

Allocation Tracer

Often, a piece of code that allocates less objects is more efficient. This is because less memory needs to be allocated, and the garbage collector needs to run less frequently. The Allocation_Tracer gem allows you to see how many of each type of object your application creates, and also which file and line number they were created on.

require  'allocation_tracer'
require 'pp'

ObjectSpace::AllocationTracer.setup(%i{path line type})

result = ObjectSpace::AllocationTracer.trace do
  str = 'hello '
  10_000.times{|i|
    str << 'hello '
  }
  str = 'hello '
  10_000.times{|i|
    str = str + 'hello '
  }
end

pp result

The results show that using << to concatenate a string produces many less objects than the + method - this is because << does an in-place concatenation of the string, while + creates a new object:

{["allocation.rb", 13, :T_STRING]=>[20000, 30, 19463, 0, 3, 282839776],
 ["allocation.rb", 9, :T_STRING]=>[10000, 0, 10000, 1, 1, 0],
 ["allocation.rb", 7, :T_STRING]=>[1, 0, 1, 1, 1, 102399],
 ["allocation.rb", 11, :T_STRING]=>[1, 1, 10, 10, 10, 0]}

If you have a large application, you can aggregate the counts at a higher level, by grouping by object type and ignoring the file and line number. Out of interest, I created a simple piece of Rails middleware to see how many objects Rails allocates on each request:

require 'pp'

class Allocation

  def initialize(app)
    ObjectSpace::AllocationTracer.setup(%i{type})
    @app = app
  end

  def call(env)
    res = nil
    result = ObjectSpace::AllocationTracer.trace do      
      res = @app.call(env)
    end
    pp result
    res
  end

end

Notice that I asked allocation tracer to capture only the type of the object - adding the file or line number for a Rails application produced way to much output. This produced the following output for a single Rails request:

Started GET "/about" for 127.0.0.1 at 2014-08-10 13:23:33 +0100
Processing by AboutController#index as HTML
  Rendered about/index.html.erb within layouts/application (0.1ms)
Completed 200 OK in 16ms (Views: 15.5ms)
{[:T_STRING]=>[2728, 0, 0, 0, 0, 0],
 [:T_NODE]=>[425, 0, 0, 0, 0, 0],
 [:T_ARRAY]=>[989, 0, 0, 0, 0, 0],
 [:T_OBJECT]=>[53, 0, 0, 0, 0, 0],
 [:T_HASH]=>[367, 0, 0, 0, 0, 0],
 [:T_DATA]=>[416, 0, 0, 0, 0, 0],
 [:T_MATCH]=>[67, 0, 0, 0, 0, 0],
 [:T_REGEXP]=>[6, 0, 0, 0, 0, 0],
 [:T_STRUCT]=>[2, 0, 0, 0, 0, 0],
 [:T_FILE]=>[2, 0, 0, 0, 0, 0]}

Tracepoint

Tracepoint is an interesting built-in library. It allows you to trace many events that occur in your Ruby application. This has many potential uses. For instance, in a large application it can be difficult to find all the places a certain class is instantiated. With Tracepoint, you can capture the file and line number where any given class is referenced by watching for events against the class you are interested in, and then dumping the call stack. For instance this code (taken from Aaron's talk):

require 'active_support/all'

trace = TracePoint.new(:c_call, :call) { |tp|
  if tp.defined_class == ActiveSupport::SafeBuffer &&
    tp.method_id == :initialize
    puts "#" * 90
    puts tp.binding.eval "caller"
  end
}
trace.enable
"balblablalbal".html_safe
ActiveSupport::SafeBuffer.new "omgee"

This produces the following output:

##########################################################################################
tracer.rb:7:in `eval'
tracer.rb:7:in `block in <main>'
/Users/sodonnel/.rvm/gems/ruby-2.1.1/gems/activesupport-4.1.4/lib/active_support/core_ext/string/output_safety.rb:158:in `initialize'
/Users/sodonnel/.rvm/gems/ruby-2.1.1/gems/activesupport-4.1.4/lib/active_support/core_ext/string/output_safety.rb:237:in `new'
/Users/sodonnel/.rvm/gems/ruby-2.1.1/gems/activesupport-4.1.4/lib/active_support/core_ext/string/output_safety.rb:237:in `html_safe'
tracer.rb:11:in `<main>'
##########################################################################################
tracer.rb:7:in `eval'
tracer.rb:7:in `block in <main>'
/Users/sodonnel/.rvm/gems/ruby-2.1.1/gems/activesupport-4.1.4/lib/active_support/core_ext/string/output_safety.rb:158:in `initialize'
tracer.rb:12:in `new'
tracer.rb:12:in `<main>'
blog comments powered by Disqus