I created a Java program to emit events in a specific frequency. I am using System.nanoTime()
instead of Thread.sleep()
because the first gives a higher precision on the interval according to many references here and here. However, I guess that when I try to set it to emit a data rate of 1M records/second it is not achieving the goal. This is my code:
long delayInNanoSeconds = 1000000;
private void generateTaxiRideEvent(SourceContext<TaxiRide> sourceContext) throws Exception {
gzipStream = new GZIPInputStream(new FileInputStream(dataFilePath));
reader = new BufferedReader(new InputStreamReader(gzipStream, StandardCharsets.UTF_8));
String line;
TaxiRide taxiRide;
while (reader.ready() && (line = reader.readLine()) != null) {
taxiRide = TaxiRide.fromString(line);
sourceContext.collectWithTimestamp(taxiRide, getEventTime(taxiRide));
// sleep in nanoseconds to have a reproducible data rate for the data source
this.dataRateListener.busySleep();
}
}
public void busySleep() {
final long startTime = System.nanoTime();
while ((System.nanoTime() - startTime) < this.delayInNanoSeconds) ;
}
So, when I wait for 10000 nanoseconds in delayInNanoSeconds
variable I will get a workload of 100K
rec/sec (1_000_000_000 / 10_000 = 100K r/s). When I wait for 2000 nanoseconds in delayInNanoSeconds
variable I will get a workload of
500K rec/sec (1_000_000_000 / 2_000 = 500K r/s). For 1000 nanoseconds I will get a workload of 1M
rec/sec (1_000_000_000 / 1000 = 1M r/s). And for 500 nanoseconds a workload of 2M rec/sec (1_000_000_000 / 500 = 2M r/s).
I saw here that it could be better to use double
instead of long
to increase the precision. Is it somehow related? Or the problem is just an OS limitation (I am using Linux Ubuntu 18)? Or maybe it I because I am using the readLine()
method and there is a faster way to emit these events? I think that when I am using the GZIPInputStream
class I am loading the whole file in memory and the readLine()
does not access the disk anymore. How can I increase the data rate of my application?
@TobiasGeiselmann makes a good point: your delay calculation doesn't take into account the time spent between calls to busySleep
You should be calculating a deadline relative to the last deadline, not the current time after logging. Don't use the result from the previous System.nanoTime()
either; that will be some time >= the actual deadline (because nanoTime itself takes time, at least a few nanoseconds, so it unavoidably over-sleeps). You'd accumulate error that way.
Before the first iteration, find the current time and set long deadline = System.nanoTime();
. At the end of every iteration, do deadline += 1000;
and use your busy-wait loop to spin until now >= deadline.
If deadline - now
is large enough, use something that yields the CPU to other threads until close to the wakeup deadline. According to comments, LockSupport.parkNanos(…)
is a good choice for modern Java, and may actually busy-wait for short enough sleeps(?). I don't really know Java. If so, you should just check the current time, calculate time till deadline, and call it once.
(For future CPUs like Intel Tremont (next-gen Goldmont), LockSupport.parkNanos
could portably expose functionality like tpause
to idle the CPU core until a given TSC deadline. Not via the OS, just a hyperthreading-friendly deadline pause, good for short sleeps on SMT CPUs.)
Busy-waiting is generally bad but is appropriate for high-precision very short delays. 1 microsecond is not long enough to usefully let the OS context switch to something else and back, on current hardware with current OSes. But longer sleep intervals (when you've chosen a lower frequency) should sleep to let the OS do something useful on this core, instead of just busy waiting for so long.
Ideally when you are spinning on a time-check, you'd be executing an instruction like x86's pause
in the delay loop, to be more friendly to other logical core sharing the same physical core (hyperthreading / SMT). Java 9 Thread.onSpinWait();
should be called in spin-wait loops (especially when waiting on memory), which lets the JVM expose this concept in a portable way. (I assume that's what it's for.)
This will work if your system is fast enough to keep up while running that time-getting function once per iteration. If not, then you could maybe check a deadline every 4 iterations (loop unrolling), to amortize the cost of nanoTime()
so you log in bursts of 4 or something.
Of course if your system isn't fast enough even with no delay call at all, you'll need to optimize something to fix that. You can't delay for a negative amount of time, and checking the clock itself takes time.