15 October 2013
In my day job, I do a lot of work with Oracle databases. One of the things I really like about Oracle is that the database is exceptionally well instrumented. If you are not sure whether designing a table or query is better using one method or another, you can turn on this instrumentation, run each method and gather all sorts of statistics about the query.
When tuning a query, I tend to focus on the number of logical reads performed to generate the results. Less is pretty much always better, as it gives a good approximation as to how much work the database performed running the query.
On a recent project that required Mysql, I wanted to do something similar, assuming this would be easy in Mysql.
It turns out it is pretty easy, but it is not immediately obvious how, at least in Mysql 5.5.
After some googling, I thought I had found the Mysql equivalent of Autotrace in Oracle:
mysql> set profiling = 1;
Query OK, 0 rows affected (0.00 sec)
mysql> select count(*) from id_tab where user_id = 100;
+----------+
| count(*) |
+----------+
| 998 |
+----------+
1 row in set (0.02 sec)
mysql> show profile;
+----------------------+----------+
| Status | Duration |
+----------------------+----------+
| starting | 0.014555 |
| checking permissions | 0.000038 |
| Opening tables | 0.000066 |
| System lock | 0.000195 |
| init | 0.000070 |
| optimizing | 0.000059 |
| statistics | 0.000146 |
| preparing | 0.000053 |
| executing | 0.000014 |
| Sending data | 0.000455 |
| end | 0.000022 |
| query end | 0.000012 |
| closing tables | 0.000016 |
| freeing items | 0.000065 |
| logging slow query | 0.000009 |
| cleaning up | 0.000010 |
+----------------------+----------+
16 rows in set (0.00 sec)
This timing information is great, but it is not really what I am after, which is how many database blocks / page reads were required to answer the query?
Turns out you can get more information from the profile, such as block IO and page faults, but neither of these tell me what I want to know.
The block IO option is potentially interesting, but it doesn't give any information about cached reads, which are important too.
Assuming you are using Innodb tables, there are another set of counters to consider.
mysql> show status like 'Inno%';
+---------------------------------------+----------+
| Variable_name | Value |
+---------------------------------------+----------+
| Innodb_buffer_pool_pages_data | 692 |
| Innodb_buffer_pool_bytes_data | 11337728 |
| Innodb_buffer_pool_pages_dirty | 0 |
| Innodb_buffer_pool_bytes_dirty | 0 |
| Innodb_buffer_pool_pages_flushed | 1 |
| Innodb_buffer_pool_pages_free | 7500 |
| Innodb_buffer_pool_pages_misc | 0 |
| Innodb_buffer_pool_pages_total | 8192 |
| Innodb_buffer_pool_read_ahead_rnd | 0 |
| Innodb_buffer_pool_read_ahead | 63 |
| Innodb_buffer_pool_read_ahead_evicted | 0 |
| Innodb_buffer_pool_read_requests | 42158 |
| Innodb_buffer_pool_reads | 630 |
| Innodb_buffer_pool_wait_free | 0 |
| Innodb_buffer_pool_write_requests | 1 |
| Innodb_data_fsyncs | 7 |
| Innodb_data_pending_fsyncs | 0 |
<snip>
Some of these look a bit more useful, especially:
There is one very important thing to note about this counters - they are global across all sessions on the database. To reliably compare two queries, you need to ensure nothing else is running on the database.
Another frustration is that Mysql doesn't give an out of the box way to get the difference between the counters before and after running a query. This means that you need to write some code to grab the value of the counters, run your test, grab the value of the counters again and finally calculate the differences.
I am sure it would be possible to write a stored procedure to do this, but I put together a small Ruby class to do what I needed:
require 'mysql2'
module MySQL
class Profile
def initialize(connection_hash)
@client = Mysql2::Client.new(connection_hash)
@profile = Hash.new
end
def run_test(query)
snapshot_stats
results = @client.query(query)
read_results(results)
snapshot_stats
end
def print_profile
@profile.keys.each do |k|
diff = @profile[k][-1] - (@profile[k][-2] || 0)
if diff > 0
puts "#{k.ljust(50, ' ')} #{diff}"
end
end
end
def read_results(results)
results.each do |row|
# do nothing, just want to read them from the db
end
end
def snapshot_stats
results = @client.query("show status like 'Inno%'")
results.each do |row|
(@profile[row['Variable_name']] ||= []) << row['Value'].to_i
end
end
end
end
Putting this code into action:
test = MySQL::Profile.new(:host => "localhost", :username => "root", :database => 'test')
test.run_test("select * from user_tab where user_id = 50")
puts "User tab ID"
test.print_profile
puts ""
puts "ID tab ID"
test.run_test("select * from id_tab where user_id = 50")
test.print_profile
User tab ID
Innodb_buffer_pool_read_requests 141
Innodb_rows_read 998
ID tab ID
Innodb_buffer_pool_read_requests 3007
Innodb_rows_read 998
In this case we can see that the first query performed about 20 times less reads from the cache than the second query, so we can conclude the first query is much more efficient.
In this post, I didn't explain what data is in my tables, or what I am testing here - that will be the topic of another post. I just wanted to illustrate how to compare one approach over another.