19 January 2021
I have an application that makes use of ByteBuffers to buffer data read and process data from various sources. I wanted to answer the question, when you need a clean ByteBuffer, which is faster:
It's obviously better to avoid both of the above. Just clear the buffer (reset position to zero and the limit to the capacity), fill with available bytes and then only use from zero to the filled position. That allows a buffer to be reused without any expensive operations. There may be times when a new buffer is easier, or an existing buffer needs zero padded to some limit, so it's useful to know the fastest way to do this.
In these tests, I am using the java.nio.ByteBuffer class. Allocating 6x1MB buffers in an array, as follows:
public static ByteBuffer[] allocateBuffers(int count, int ofSize) {
ByteBuffer[] buf = new ByteBuffer[count];
for (int i=0; i<count; i++) {
buf[i] = ByteBuffer.allocate(ofSize);
}
return buf;
}
Then I have a few different ways of zeroing the buffer:
// Note if zeroing on a single buffer, then you may as well
// allocate a new one, as this method needs to allocate 1 new buffer
// to use to zero all the others.
public static void zeroBuffers(ByteBuffer[] buf) {
ByteBuffer newBuf = ByteBuffer.allocate(buf[0].capacity());
for (ByteBuffer b : buf) {
b.position(0);
newBuf.position(0);
b.put(newBuf);
b.position(0);
}
}
public static void zeroBuffersByte(ByteBuffer[] buf) {
for (ByteBuffer b : buf) {
b.position(0);
while (b.hasRemaining()) {
b.put((byte)0);
}
b.position(0);
}
}
// Note will not work correctly if the buffer is not an exact multiple of 1024,
// but its good enough for a benchmark test
public static void zeroBuffersByteArray(ByteBuffer[] buf) {
byte[] bytes = new byte[1024];
for (ByteBuffer b : buf) {
b.position(0);
while (b.hasRemaining()) {
b.put(bytes);
}
b.position(0);
}
}
public static void zeroBuffersArray(ByteBuffer[] buf) {
for (ByteBuffer b : buf) {
Arrays.fill(b.array(), (byte)0);
b.position(0);
}
}
Finally I ran some benchmark code, which performs the following tests:
The results look like:
Benchmark Mode Cnt Score Error Units
BenchmarkBufferAllocate.allocateNewBuffers thrpt 5 2306.443 ± 465.750 ops/s
BenchmarkBufferAllocate.zeroBufferWithBuffer thrpt 5 2156.215 ± 436.713 ops/s
BenchmarkBufferAllocate.zeroBufferWithByte thrpt 5 459.383 ± 77.800 ops/s
BenchmarkBufferAllocate.zeroBufferWithByteArray thrpt 5 4170.109 ± 401.827 ops/s
BenchmarkBufferAllocate.zeroBuffersArray thrpt 5 4985.363 ± 597.730 ops/s
BenchmarkBufferAllocate.resetPosition thrpt 5 137490972.804 ± 2829717.621 ops/s
Using the final approach, we can see that the Arrays.fill() method (zeroBuffersArray) is faster than any of the others, and so is the preferred approach. An additional advantage is that is would be equally efficient for a single ByteBuffer, as it does not allocate any new objects.
A simple ResetPosition with no zeroing, is much faster than any other approach, and hence should be preferred if possible.
It is also interesting to add the flags "-prof gc" when running the benchmarks to see the memory allocation rate. Unsurprisingly, the options which allocate more objects perform many more memory allocations per second:
BenchmarkBufferAllocate.allocateNewBuffers:·gc.alloc.rate thrpt 5 9335.400 ± 2415.879 MB/sec
BenchmarkBufferAllocate.zeroBufferWithBuffer:·gc.alloc.rate thrpt 5 1399.931 ± 319.943 MB/sec
BenchmarkBufferAllocate.zeroBufferWithByte:·gc.alloc.rate thrpt 5 ≈ 10⁻⁴ MB/sec
BenchmarkBufferAllocate.zeroBufferWithByteArray:·gc.alloc.rate thrpt 5 2.373 ± 1.312 MB/sec
BenchmarkBufferAllocate.zeroBuffersArray:·gc.alloc.rate thrpt 5 ≈ 10⁻⁴ MB/sec
BenchmarkBufferAllocate.resetPosition:·gc.alloc.rate thrpt 5 ≈ 10⁻⁴ MB/sec
For completeness, here is the benchmark code:
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Level;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.Setup;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.Threads;
import org.openjdk.jmh.annotations.Warmup;
import org.openjdk.jmh.infra.Blackhole;
import java.nio.ByteBuffer;
import static java.util.concurrent.TimeUnit.MILLISECONDS;
public class BenchmarkBufferAllocate {
@State(Scope.Benchmark)
public static class BenchmarkState {
public ByteBuffer[] buffers = ECValidateUtil.allocateBuffers(6, 1024*1024);
@Setup(Level.Trial)
public void setUp() {
}
}
public static void main(String[] args) throws Exception {
String[] opts = new String[2];
opts[0] = "-prof";
opts[1] = "gc";
org.openjdk.jmh.Main.main(opts);
}
@Benchmark
@Threads(1)
@Warmup(iterations = 5, time = 1000, timeUnit = MILLISECONDS)
@Fork(value = 1, warmups = 0)
@Measurement(iterations = 5, time = 1000, timeUnit = MILLISECONDS)
@BenchmarkMode(Mode.Throughput)
public void allocateNewBuffers(Blackhole blackhole) throws Exception {
ByteBuffer[] buffers = ECValidateUtil.allocateBuffers(6, 1024*1024);
blackhole.consume(buffers);
}
@Benchmark
@Threads(1)
@Warmup(iterations = 5, time = 1000, timeUnit = MILLISECONDS)
@Fork(value = 1, warmups = 0)
@Measurement(iterations = 5, time = 1000, timeUnit = MILLISECONDS)
@BenchmarkMode(Mode.Throughput)
public void zeroBufferWithBuffer(Blackhole blackhole, BenchmarkState state) throws Exception {
ECValidateUtil.zeroBuffers(state.buffers);
blackhole.consume(state.buffers);
}
@Benchmark
@Threads(1)
@Warmup(iterations = 5, time = 1000, timeUnit = MILLISECONDS)
@Fork(value = 1, warmups = 0)
@Measurement(iterations = 5, time = 1000, timeUnit = MILLISECONDS)
@BenchmarkMode(Mode.Throughput)
public void zeroBufferWithByte(Blackhole blackhole, BenchmarkState state) throws Exception {
ECValidateUtil.zeroBuffersByte(state.buffers);
blackhole.consume(state.buffers);
}
@Benchmark
@Threads(1)
@Warmup(iterations = 5, time = 1000, timeUnit = MILLISECONDS)
@Fork(value = 1, warmups = 0)
@Measurement(iterations = 5, time = 1000, timeUnit = MILLISECONDS)
@BenchmarkMode(Mode.Throughput)
public void zeroBufferWithByteArray(Blackhole blackhole, BenchmarkState state) throws Exception {
ECValidateUtil.zeroBuffersByteArray(state.buffers);
blackhole.consume(state.buffers);
}
@Benchmark
@Threads(1)
@Warmup(iterations = 5, time = 1000, timeUnit = MILLISECONDS)
@Fork(value = 1, warmups = 0)
@Measurement(iterations = 5, time = 1000, timeUnit = MILLISECONDS)
@BenchmarkMode(Mode.Throughput)
public void zeroBuffersArray(Blackhole blackhole, BenchmarkState state) throws Exception {
ECValidateUtil.zeroBuffersArray(state.buffers);
blackhole.consume(state.buffers);
}
@Benchmark
@Threads(1)
@Warmup(iterations = 5, time = 1000, timeUnit = MILLISECONDS)
@Fork(value = 1, warmups = 0)
@Measurement(iterations = 5, time = 1000, timeUnit = MILLISECONDS)
@BenchmarkMode(Mode.Throughput)
public void resetPosition(Blackhole blackhole, BenchmarkState state) throws Exception {
ECValidateUtil.resetBufferPosition(state.buffers, 0);
blackhole.consume(state.buffers);
}
}