05 May 2015
When you run a map reduce job, at the end it prints out a bunch of statistics about the job, for example:
File System Counters
FILE: Number of bytes read=215
FILE: Number of bytes written=243859
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
Map-Reduce Framework
Map input records=2
Map output records=0
Input split bytes=119
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=0
Total committed heap usage (bytes)=192937984
File Input Format Counters
Bytes Read=42
File Output Format Counters
Bytes Written=8
These statistics are tracked by Counters. Notice that we have 4 counter groups (File System Counters, Map-Reduce Framework, ...) and each group has one or more counters within it. These counters are all created and maintained automatically by the map reduce framework, but you can easily add your own custom counters, allowing you to track all sorts of details about your job.
All you need to do is define an enum type to hold your counters. When you define an enum in Java is behaves somewhat like a class, so you can create it in its own file:
package com.sodonnel.Hadoop.Fan;
public enum OtherCounters {
GOOD_RECORDS,
BAD_RECORDS
}
In this case, this will create a new counter group called OtherCounters, which contains two counters - GOOD_RECORDS and BAD_RECORDS.
To use the counters all you have to do is increment them in your mapper or reducer:
context.getCounter(OtherCounters.GOOD_RECORDS).increment(1);
That's really all there is to it. Now if you run your job, the custom counter will be displayed in the output.
Aside from the framework displaying the value of the counter, you can get the values in the driver after the job has completed. For instance, you can access a counter by its name:
System.out.println("Good Records: "+counters.findCounter("com.sodonnel.Hadoop.Fan.OtherCounters", "GOOD_RECORDS").getValue());
Or, you can even loop over all the counter groups and their counters:
for (CounterGroup group : counters) {
System.out.println("Counter Group: " + group.getDisplayName() + " (" + group.getName() + ")");
System.out.println(" number of counters in this group: " + group.size());
for (Counter counter : group) {
System.out.println(" -> " + counter.getDisplayName() + ": " + counter.getName() + ": "+counter.getValue());
}
}