19 January 2021

Resetting ByteBuffers to zero in Java

I have an application that makes use of ByteBuffers to buffer data read and process data from various sources. I wanted to answer the question, when you need a clean ByteBuffer, which is faster:

  • Allocated a new buffer, and allow the existing one to be garbage collected
  • Reset the position on the existing buffer and zero out the contents

It's obviously better to avoid both of the above. Just clear the buffer (reset position to zero and the limit to the capacity), fill with available bytes and then only use from zero to the filled position. That allows a buffer to be reused without any expensive operations. There may be times when a new buffer is easier, or an existing buffer needs zero padded to some limit, so it's useful to know the fastest way to do this.

Zeroing Methods

In these tests, I am using the java.nio.ByteBuffer class. Allocating 6x1MB buffers in an array, as follows:

  public static ByteBuffer[] allocateBuffers(int count, int ofSize) {
    ByteBuffer[] buf = new ByteBuffer[count];
    for (int i=0; i<count; i++) {
      buf[i] = ByteBuffer.allocate(ofSize);
    }
    return buf;
  }

Then I have a few different ways of zeroing the buffer:

  // Note if zeroing on a single buffer, then you may as well
  // allocate a new one, as this method needs to allocate 1 new buffer
  // to use to zero all the others.
  public static void zeroBuffers(ByteBuffer[] buf) {
    ByteBuffer newBuf = ByteBuffer.allocate(buf[0].capacity());
    for (ByteBuffer b : buf) {
      b.position(0);
      newBuf.position(0);
      b.put(newBuf);
      b.position(0);
    }
  }

  public static void zeroBuffersByte(ByteBuffer[] buf) {
    for (ByteBuffer b : buf) {
      b.position(0);
      while (b.hasRemaining()) {
        b.put((byte)0);
      }
      b.position(0);
    }
  }

  // Note will not work correctly if the buffer is not an exact multiple of 1024,
  // but its good enough for a benchmark test
  public static void zeroBuffersByteArray(ByteBuffer[] buf) {
    byte[] bytes = new byte[1024];
    for (ByteBuffer b : buf) {
      b.position(0);
      while (b.hasRemaining()) {
        b.put(bytes);
      }
      b.position(0);
    }
  }

  public static void zeroBuffersArray(ByteBuffer[] buf) {
    for (ByteBuffer b : buf) {
      Arrays.fill(b.array(), (byte)0);
      b.position(0);
    }
  }

Finally I ran some benchmark code, which performs the following tests:

  • Allocate a new array of 6x1MB buffers, rather than zeroing the existing ones
  • Reset an existing set of buffers by allocating one new ByteBuffer and using it to zero all others
  • Simply writing one zero byte at a time to the buffer from 0 to its capacity
  • Allocate a single byte[] of 1024 and write it until the buffer is filled.
  • Obtain the internal array from the byte buffer and use Arrays.fill() to fill it with zeros
  • Just reset the buffer position to zero, to compare how much faster that is.

The results look like:

Benchmark                                         Mode  Cnt          Score         Error  Units
BenchmarkBufferAllocate.allocateNewBuffers       thrpt    5       2306.443 ±     465.750  ops/s
BenchmarkBufferAllocate.zeroBufferWithBuffer     thrpt    5       2156.215 ±     436.713  ops/s
BenchmarkBufferAllocate.zeroBufferWithByte       thrpt    5        459.383 ±      77.800  ops/s
BenchmarkBufferAllocate.zeroBufferWithByteArray  thrpt    5       4170.109 ±     401.827  ops/s
BenchmarkBufferAllocate.zeroBuffersArray         thrpt    5       4985.363 ±     597.730  ops/s
BenchmarkBufferAllocate.resetPosition            thrpt    5  137490972.804 ± 2829717.621  ops/s

Using the final approach, we can see that the Arrays.fill() method (zeroBuffersArray) is faster than any of the others, and so is the preferred approach. An additional advantage is that is would be equally efficient for a single ByteBuffer, as it does not allocate any new objects.

A simple ResetPosition with no zeroing, is much faster than any other approach, and hence should be preferred if possible.

It is also interesting to add the flags "-prof gc" when running the benchmarks to see the memory allocation rate. Unsurprisingly, the options which allocate more objects perform many more memory allocations per second:

BenchmarkBufferAllocate.allocateNewBuffers:·gc.alloc.rate                      thrpt    5       9335.400 ±    2415.879  MB/sec
BenchmarkBufferAllocate.zeroBufferWithBuffer:·gc.alloc.rate                    thrpt    5       1399.931 ±     319.943  MB/sec
BenchmarkBufferAllocate.zeroBufferWithByte:·gc.alloc.rate                      thrpt    5         ≈ 10⁻⁴                MB/sec
BenchmarkBufferAllocate.zeroBufferWithByteArray:·gc.alloc.rate                 thrpt    5          2.373 ±       1.312  MB/sec
BenchmarkBufferAllocate.zeroBuffersArray:·gc.alloc.rate                        thrpt    5         ≈ 10⁻⁴                MB/sec
BenchmarkBufferAllocate.resetPosition:·gc.alloc.rate                           thrpt    5         ≈ 10⁻⁴                MB/sec

For completeness, here is the benchmark code:

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Level;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.Setup;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.Threads;
import org.openjdk.jmh.annotations.Warmup;
import org.openjdk.jmh.infra.Blackhole;

import java.nio.ByteBuffer;

import static java.util.concurrent.TimeUnit.MILLISECONDS;

public class BenchmarkBufferAllocate {

  @State(Scope.Benchmark)
  public static class BenchmarkState {
    public ByteBuffer[] buffers = ECValidateUtil.allocateBuffers(6, 1024*1024);

    @Setup(Level.Trial)
    public void setUp() {
    }

  }

  public static void main(String[] args) throws Exception {
    String[] opts = new String[2];
    opts[0] = "-prof";
    opts[1] = "gc";
    org.openjdk.jmh.Main.main(opts);
  }

  @Benchmark
  @Threads(1)
  @Warmup(iterations = 5, time = 1000, timeUnit = MILLISECONDS)
  @Fork(value = 1, warmups = 0)
  @Measurement(iterations = 5, time = 1000, timeUnit = MILLISECONDS)
  @BenchmarkMode(Mode.Throughput)
  public void allocateNewBuffers(Blackhole blackhole) throws Exception {
    ByteBuffer[] buffers = ECValidateUtil.allocateBuffers(6, 1024*1024);
    blackhole.consume(buffers);
  }

  @Benchmark
  @Threads(1)
  @Warmup(iterations = 5, time = 1000, timeUnit = MILLISECONDS)
  @Fork(value = 1, warmups = 0)
  @Measurement(iterations = 5, time = 1000, timeUnit = MILLISECONDS)
  @BenchmarkMode(Mode.Throughput)
  public void zeroBufferWithBuffer(Blackhole blackhole, BenchmarkState state) throws Exception {
    ECValidateUtil.zeroBuffers(state.buffers);
    blackhole.consume(state.buffers);
  }

  @Benchmark
  @Threads(1)
  @Warmup(iterations = 5, time = 1000, timeUnit = MILLISECONDS)
  @Fork(value = 1, warmups = 0)
  @Measurement(iterations = 5, time = 1000, timeUnit = MILLISECONDS)
  @BenchmarkMode(Mode.Throughput)
  public void zeroBufferWithByte(Blackhole blackhole, BenchmarkState state) throws Exception {
    ECValidateUtil.zeroBuffersByte(state.buffers);
    blackhole.consume(state.buffers);
  }

  @Benchmark
  @Threads(1)
  @Warmup(iterations = 5, time = 1000, timeUnit = MILLISECONDS)
  @Fork(value = 1, warmups = 0)
  @Measurement(iterations = 5, time = 1000, timeUnit = MILLISECONDS)
  @BenchmarkMode(Mode.Throughput)
  public void zeroBufferWithByteArray(Blackhole blackhole, BenchmarkState state) throws Exception {
    ECValidateUtil.zeroBuffersByteArray(state.buffers);
    blackhole.consume(state.buffers);
  }

  @Benchmark
  @Threads(1)
  @Warmup(iterations = 5, time = 1000, timeUnit = MILLISECONDS)
  @Fork(value = 1, warmups = 0)
  @Measurement(iterations = 5, time = 1000, timeUnit = MILLISECONDS)
  @BenchmarkMode(Mode.Throughput)
  public void zeroBuffersArray(Blackhole blackhole, BenchmarkState state) throws Exception {
    ECValidateUtil.zeroBuffersArray(state.buffers);
    blackhole.consume(state.buffers);
  }

  @Benchmark
  @Threads(1)
  @Warmup(iterations = 5, time = 1000, timeUnit = MILLISECONDS)
  @Fork(value = 1, warmups = 0)
  @Measurement(iterations = 5, time = 1000, timeUnit = MILLISECONDS)
  @BenchmarkMode(Mode.Throughput)
  public void resetPosition(Blackhole blackhole, BenchmarkState state) throws Exception {
    ECValidateUtil.resetBufferPosition(state.buffers, 0);
    blackhole.consume(state.buffers);
  }

}
blog comments powered by Disqus