[concurrency-interest] j.u.c/backport performance on Windows

Peter Kovacs peter.kovacs.1.0rc at gmail.com
Tue Apr 3 02:44:46 EDT 2007

Thank you, Szabolcs, for your comments!

I basically agree with your observations -- except, perhaps, for the
last paragraph. The fact that multiple (two) producers cannot keep up
with a single producer doesn't mean that there is no room for
concurrent processing. Creating the result takes at least 30% more
time than consuming it, so there is room for concurrency not only
between the consumer and the producer, but also between consumers.

I think you made a good point by observing that high CPU utilization
may stem from threads spin-locking on busy resources. But the fact is
that I observe a very low CPU utilization (50-55% on a two-way Windows
system) and that is what prompts me to talk about "serial execution"
(apart from the comparatively very long execution time). Note that I
observe the same problem on Windows (with Java 5) in the other branch,
where "offer" is used on LBQ and the workers failing the "offer" go
into wait. Whereas an implementation using similar logic, but using
directly synchronization primitives and wait/notify-s, completes twice
as fast. That is where my question comes from about j.u.c/backport
performance on Windows.

Looking at the test results in another thread on this list with CLBQ,
I start to suspect that my application may produce significantly
different performance with Java 6 than with Java 5.


On 4/2/07, Szabolcs Ferenczi <szabolcs.ferenczi at gmail.com> wrote:
> On 02/04/07, Peter Kovacs <peter.kovacs.1.0rc at gmail.com> wrote:
> > getNextInputImmediate uses LinkedBlockingQueue.put while
> We are talking about this fragment:
>         synchronized (inputProducerLock) {
>             if (inputProducer.hasNext()) {
>                 input = inputProducer.getNext();
>             }
>             scheduledWorkUnitData = new ScheduledWorkUnitData(input);
>             outputQueue.put(scheduledWorkUnitData);
>         }
> Basically it is ok that you use the put method on the LBQ which
> provides you with the long term scheduling of the threads. However,
> you wrap it into a higher level critical section using the extra lock
> inputProducerLock. Due to the inner put method, a worker thread might
> stay for indefinitely long time in the inner critical section. Consequently,
> the other threads might be hanged on the lock inputProducerLock
> waiting for entrance into the upper critical region. Critical sections
> are intended for short term scheduling and in this case threads are
> waiting for entrance for a long time. (You mention this situation in
> your 19 March message.)
> Threads waiting to enter the critical section for a long time might
> unnecessarily consume processing power depending how the waiting is
> implemented. Usually it is implemented by some spin lock. That means
> threads are scheduled with the assumption that they will gain access
> to the resource shortly. Waiting to enter into a critical section for
> a long time might be the cause for the performance loss. There might
> be significant differences between the different platforms.
> On top of all that, you mention (in your 19 March message) that the
> consumer cannot keep up with the producers. That means that as soon as
> the buffer gets full, the work is necessarily serialized and the
> consumer determines the speed of the processing. The producer-consumer
> pattern is a solution for the case when the speed of producing the
> pieces of data and the speed of processing them are varying.
> Best Regards,
> Szabolcs

More information about the Concurrency-interest mailing list