05 May 2015

Map Reduce Counters

When you run a map reduce job, at the end it prints out a bunch of statistics about the job, for example:

File System Counters
        FILE: Number of bytes read=215
        FILE: Number of bytes written=243859
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
Map-Reduce Framework
    Map input records=2
    Map output records=0
    Input split bytes=119
    Spilled Records=0
    Failed Shuffles=0
    Merged Map outputs=0
    GC time elapsed (ms)=0
    Total committed heap usage (bytes)=192937984
File Input Format Counters 
    Bytes Read=42
File Output Format Counters 
        Bytes Written=8

These statistics are tracked by Counters. Notice that we have 4 counter groups (File System Counters, Map-Reduce Framework, ...) and each group has one or more counters within it. These counters are all created and maintained automatically by the map reduce framework, but you can easily add your own custom counters, allowing you to track all sorts of details about your job.

Creating a Custom Counter

All you need to do is define an enum type to hold your counters. When you define an enum in Java is behaves somewhat like a class, so you can create it in its own file:

package com.sodonnel.Hadoop.Fan;

public enum OtherCounters {
    GOOD_RECORDS,
    BAD_RECORDS
}

In this case, this will create a new counter group called OtherCounters, which contains two counters - GOOD_RECORDS and BAD_RECORDS.

To use the counters all you have to do is increment them in your mapper or reducer:

context.getCounter(OtherCounters.GOOD_RECORDS).increment(1);

That's really all there is to it. Now if you run your job, the custom counter will be displayed in the output.

Getting The Counts

Aside from the framework displaying the value of the counter, you can get the values in the driver after the job has completed. For instance, you can access a counter by its name:

System.out.println("Good Records: "+counters.findCounter("com.sodonnel.Hadoop.Fan.OtherCounters", "GOOD_RECORDS").getValue());

Or, you can even loop over all the counter groups and their counters:

for (CounterGroup group : counters) {
    System.out.println("Counter Group: " + group.getDisplayName() + " (" + group.getName() + ")");
    System.out.println("  number of counters in this group: " + group.size());
    for (Counter counter : group) {
        System.out.println("   -> " + counter.getDisplayName() + ": " + counter.getName() + ": "+counter.getValue());
    }
}
blog comments powered by Disqus