Java keyword
volatile
is one that often confuses students and junior developers,
making them think it allows to deal with concurrency problems, namely a race condition.
Partially this is due to some sources that thoughtlessly explain volatile
as a shortcut for
wrapping all usages with synchronized (this)
block, which is misleading to say the least.
In reality the volatile
modifier suits only for specific range of tasks and it's important to
understand what it does and what it doesn't.
Let's have a look at a simple class.
The instance
test
is accessed and modified by two threads and final result is printed to output stream.
Thread.sleep(200)
of course doesn't ensure both threads are done, but I don't want to focus on
barrier issues in this post. Just believe me, 200ms is more than enough for 100000 iterations on any modern hardware.
The point is there is a clear race condition on value
field: several threads are accessing and
modifying it without due synchronization. As a result any number between 100000 and 200000 can be printed out.
The underlying cause for the race is that
++
operator is not atomic
(by the way C/C++ increment is not atomic either by default; there is x86 instruction lock xadd
,
but none of compilers I tried produced it, unless it is explicitly specified with __asm__
block).
But let's not digress, here's the bytecode of the run()
method:
The interesting part is in the
L3
section (line 16 is value++;
), which effectively
pushes this
instance onto the stack twice (see ALOAD 0
and DUP
),
loads the value
field of this
and pushes it too, then pushes
the constant 1, adds these two and puts the sum back to this.value
. So there are two bytecode instructions
between GETFIELD
and PUTFIELD
and if the thread is interrupted anywhere between them,
put operation might overwrite the correct value in memory. I won't go into details, but JIT compiled code isn't
safer.
Well, now the issue is clear, how can we fix it? What naive programmers are tempted to do is make the
value
volatile and expect the code to be thread-safe.
Run this code yourself and see that it's not. More to that: the
run()
bytecode didn't change,
not a single instruction! Whatever the JVM is doing to a volatile field, it's still incremented in 4 instructions
and there aren't any locks around it. There's no way it can be atomic!
It's time to read some spec (Java Language Specification, aka JLS) and
find out what
volatile
actually does:
A field may be declared volatile, in which case the Java memory model (§17) ensures that all threads see a consistent value for the variable.
Threads see a consistent value? What the heck this means?
Well, it turns out that our first example is more buggy than it appeared:
GETFIELD
instruction didn't guarantee to return the very last value of the field,
because it could take it from L1 or L2 cache which are local to a CPU core.
Imagine another thread running on a separate core. What it is likely to see is the value from its own caches.
The issue is in fact deeper, because within each thread instructions can be reordered, either by a compiler or at runtime
by the CPU, and volatile
in addition provides sequential consistency, but it's a different topic.
What matters here is that the initial code had two concurrency problems, not one.
Yes, volatile
modifier solved one of them: it forced all reads and writes to go straight to main memory,
so both threads began to see the true most recent value (you may think about it this way, though strictly speaking
JVM doesn't guarantee writes to main memory). Another name for this guarantee is visibility of changes,
or happens-before in JVM.
But we hardly noticed the improvement, because the issue we tried to achieve is atomicity.
And this is what
volatile
doesn't guarantee, as we saw from reading the bytecode.
Take a moment to appreciate the difference between visibility of changes and atomicity.
Synchronized block
Let's try something else: get rid of
volatile
and wrap the increment with synchronized
.
What do you think will happen?
The program constantly prints
200000
, i.e. race condition is resolved. Below you can examine the
bytecode of the new run()
method and notice MONITORENTER
and MONITOREXIT
instructions around same GETFIELD
-PUTFIELD
stuff. Here's where atomicity comes
from, what about visibility of changes?
JLS again provides a rather sophisticated description in
section 17.4.5:
If an action x synchronizes-with a following action y, then we also have hb(x, y).
But in simple terms it says that visibility of changes is guaranteed between two
synchronized
sections,
which is exactly what we wanted.
It should be noted that
synchronized
is not the only solution.
Another way would be to use AtomicInteger
as follows.
And the bytecode. Note
INVOKEVIRTUAL
call.
Benchmark
For the benchmark I used the following template class, whose
run
method performs both
lots of reads and writes. The last increment by (temp > 0 ? 0 : 0)
is there only to make sure
temp
is not optimized away by JIT.
I made all four modifications discussed above, two thread-safe and two unsafe and measured the time on my machine
(java 1.6.0_45, Intel core i7-2600 CPU with 8 cores).
Absolute numbers are not very important, but the relative difference is more or less clear.
int
: 7-10msvolatile int
: 480-680mssynchronized (this)
: 650-700msAtomicInteger
: 1200-1400ms
Update for Java 1.8.0_45 on the same machine. Quite remarkable change:
int
: 20msvolatile int
: 500-650mssynchronized (this)
: 2700-2800msAtomicInteger
: 660-690ms