adaptive spinning: fall back to sleep / block?

Discussion:

Andriy Gapon

2021-04-09 09:56:26 UTC

Until I recently looked at the actual code I was under an impression that
the adaptive spinning is bounded and that after some time / number of spins a
thread would go to a sleep queue or a turnstile. But it looks that the spinning
is actually unbounded as long as its conditions hold (some other thread owns the
lock and that thread is running, the owner could be changing too).

In my opinion, it does not make sense to spin for "too long".
If there was not an opportunity to take a lock quickly, then it's better to
block waiting for it rather than keep occupying a processor. For instance, the
spinning can prevent another runnable thread from running.

I think that if a lock is heavily contended or its hold times are on the longer
side (or both), then the adaptive spinning can make the system behavior
(performance, responsiveness) worse.

Finally, I was under an impression that 'adaptive' meant some heuristic on
whether and when to do the spinning. _A lock owner is running_ seems to be too
simple to qualify as 'adaptive'.

As an example, this looks like a relatively sophisticated implementation of the
"adaptiveness":
http://hg.openjdk.java.net/jdk8/jdk8/hotspot/file/87ee5ee27509/src/share/vm/runtime/objectMonitor.cpp#l1919
But, JIMHO, simply having a hard limit on the spin count would be better than
what we have now.

P.S.
This is not an area that I know well, so my confusion and ideas may be absurd.
On the other hand, I could not find any justification for the current unbounded
"adaptive" spinning.

--
Andriy Gapon

Andriy Gapon

2021-05-17 10:01:52 UTC

Permalink

Any opinions about this?
Thank you.

Post by Andriy Gapon
I think that if a lock is heavily contended or its hold times are on the longer
side (or both), then the adaptive spinning can make the system behavior
(performance, responsiveness) worse.
Finally, I was under an impression that 'adaptive' meant some heuristic on
whether and when to do the spinning. _A lock owner is running_ seems to be too
simple to qualify as 'adaptive'.
As an example, this looks like a relatively sophisticated implementation of the
http://hg.openjdk.java.net/jdk8/jdk8/hotspot/file/87ee5ee27509/src/share/vm/runtime/objectMonitor.cpp#l1919
But, JIMHO, simply having a hard limit on the spin count would be better than
what we have now.
P.S.
This is not an area that I know well, so my confusion and ideas may be absurd.
On the other hand, I could not find any justification for the current unbounded
"adaptive" spinning.

--
Andriy Gapon

Mateusz Guzik

2021-05-17 10:23:20 UTC

Permalink

Post by Andriy Gapon
Until I recently looked at the actual code I was under an impression that
the adaptive spinning is bounded and that after some time / number of spins a
thread would go to a sleep queue or a turnstile. But it looks that the spinning
is actually unbounded as long as its conditions hold (some other thread owns the
lock and that thread is running, the owner could be changing too).
In my opinion, it does not make sense to spin for "too long".
If there was not an opportunity to take a lock quickly, then it's better to
block waiting for it rather than keep occupying a processor. For instance, the
spinning can prevent another runnable thread from running.
I think that if a lock is heavily contended or its hold times are on the longer
side (or both), then the adaptive spinning can make the system behavior
(performance, responsiveness) worse.
Finally, I was under an impression that 'adaptive' meant some heuristic on
whether and when to do the spinning. _A lock owner is running_ seems to be too
simple to qualify as 'adaptive'.
As an example, this looks like a relatively sophisticated implementation of the
http://hg.openjdk.java.net/jdk8/jdk8/hotspot/file/87ee5ee27509/src/share/vm/runtime/objectMonitor.cpp#l1919
But, JIMHO, simply having a hard limit on the spin count would be better than
what we have now.

There is no clear cut answer to this that I'm aware of. Ultimately all
behavior in face of contention is about damage control.

It's not hard to make a counter point to going off cpu after a timeout:
1. going off cpu is also a serializing operation, that is threads can
content on doing so, albeit less than on the original lock
2. existence of blocked waiters makes it more expensive to release the
lock, making contention worse and even then it is unclear what wake up
policy you would like to enact (e.g., do you hand off the lock to the
oldest sleeper? do you wake everyone and let them fight it out? all
choices suffer problems)
3. consider a thread which holds foo_lock and now contends on bar_lock
and decides to go off cpu. if there is a thread which waits on
foo_lock, it will also go off cpu. This kind of behavior easily leads
to dramatic collapse in throughput, even on top of whatever problems
which are present due to contention to begin with.

Now, locking primitives in the FreeBSD kernel are problematic in at
least 3 ways:
1. there are no fairness guarantees, for example a constant stream of
threads can end up excluding some threads for a long time
2. there is no support for locking under kvm (as in the kernel does
not take advantage of it)
3. rw locking is using cas loops instead of add

imo, if doing anything serious with locks, the 3 above problems needs
to be solved, in that order.

I can't stress enough the lack of fairness, which arbitrary going to
sleep will only exacerbate. As noted earlier, I don't know if timeout
sleep is an inherently good or bad idea, but I'm confident playing
with it in the situation is on the bad side.

--
Mateusz Guzik <mjguzik gmail.com>

Andriy Gapon

2021-05-17 11:44:34 UTC

Permalink

Post by Mateusz Guzik

There is no clear cut answer to this that I'm aware of. Ultimately all
behavior in face of contention is about damage control.
1. going off cpu is also a serializing operation, that is threads can
content on doing so, albeit less than on the original lock
2. existence of blocked waiters makes it more expensive to release the
lock, making contention worse and even then it is unclear what wake up
policy you would like to enact (e.g., do you hand off the lock to the
oldest sleeper? do you wake everyone and let them fight it out? all
choices suffer problems)
3. consider a thread which holds foo_lock and now contends on bar_lock
and decides to go off cpu. if there is a thread which waits on
foo_lock, it will also go off cpu. This kind of behavior easily leads
to dramatic collapse in throughput, even on top of whatever problems
which are present due to contention to begin with.
Now, locking primitives in the FreeBSD kernel are problematic in at
1. there are no fairness guarantees, for example a constant stream of
threads can end up excluding some threads for a long time
2. there is no support for locking under kvm (as in the kernel does
not take advantage of it)
3. rw locking is using cas loops instead of add
imo, if doing anything serious with locks, the 3 above problems needs
to be solved, in that order.
I can't stress enough the lack of fairness, which arbitrary going to
sleep will only exacerbate. As noted earlier, I don't know if timeout
sleep is an inherently good or bad idea, but I'm confident playing
with it in the situation is on the bad side.

I agree with you. And it looks like we do not disagree in general
.
I just want to make a clarification, maybe redundant, that you seem to be mostly
concerned about effects on threads that contend on same locks or related lock
chains. I am more concerned about effects on completely unrelated threads.

Yes, sleeping and waking introduces overhead (and not only) on threads doing
them and threads depending on those threads. But spinning, when it happens on a
large share of CPUs (e.g., ~ 100% of them), introduces a penalty on the whole
system.

At work, where a substantial part of our application lives in kernel, I
regularly debug issues that seem to happen because of CPUs starvation. And a
common theme between them is that many CPUs are tied in the lock spinning.
FreeBSD as a general purpose OS does not seem to suffer from the same issues (at
least, that I know of), but we have quite a lot of additional kernel threads and
kernel locks.

So, maybe instead of trying to experiment with FreeBSD I should try to
experiment with our derivative product first.

Thank you for the feedback and additional information.

--
Andriy Gapon

Mateusz Guzik

2021-05-17 12:49:08 UTC

Permalink

Post by Andriy Gapon

Post by Mateusz Guzik

Post by Andriy Gapon
Until I recently looked at the actual code I was under an impression that
the adaptive spinning is bounded and that after some time / number of
spins
a
thread would go to a sleep queue or a turnstile. But it looks that the spinning
is actually unbounded as long as its conditions hold (some other thread
owns
the
lock and that thread is running, the owner could be changing too).
In my opinion, it does not make sense to spin for "too long".
If there was not an opportunity to take a lock quickly, then it's better to
block waiting for it rather than keep occupying a processor. For
instance,
the
spinning can prevent another runnable thread from running.
I think that if a lock is heavily contended or its hold times are on the longer
side (or both), then the adaptive spinning can make the system behavior
(performance, responsiveness) worse.
Finally, I was under an impression that 'adaptive' meant some heuristic on
whether and when to do the spinning. _A lock owner is running_ seems to
be
too
simple to qualify as 'adaptive'.
As an example, this looks like a relatively sophisticated implementation
of
the
http://hg.openjdk.java.net/jdk8/jdk8/hotspot/file/87ee5ee27509/src/share/vm/runtime/objectMonitor.cpp#l1919
But, JIMHO, simply having a hard limit on the spin count would be better than
what we have now.

You can't ignore the other kind. Regardless, even the "unrelated"
threads get affected by what's happening in the kernel.

Post by Andriy Gapon
Yes, sleeping and waking introduces overhead (and not only) on threads doing
them and threads depending on those threads. But spinning, when it happens on a
large share of CPUs (e.g., ~ 100% of them), introduces a penalty on the whole
system.

And going off cpu can easily introduce even more penalty after you
wake threads up.

Post by Andriy Gapon
At work, where a substantial part of our application lives in kernel, I
regularly debug issues that seem to happen because of CPUs starvation. And a
common theme between them is that many CPUs are tied in the lock spinning.

If you have so much contention that you run into this, the actual bug
is that the workload itself does not scale. As noted earlier, playing
with locking primitives is only a damage control measure.

That said, even if you can't fix scalability issues per se, you may
get some wins as follows:
- play with sysctl debug.lock.delay_max, in particular try to lower it
significantly
- adding should_yield checks around lock_delay calls to go off cpu if
you wait long enough may be a quasi-timeout
- at least make sure the contended locks are not false shared (e.g.,
with __exclusive_cache_line, but warning: it only helps for in-kernel
code, not modules; you will have to pad by hand otherwise)

--
Mateusz Guzik <mjguzik gmail.com>

Konstantin Belousov

2021-05-18 16:48:15 UTC

Permalink

Post by Andriy Gapon

Post by Mateusz Guzik

I agree with you. And it looks like we do not disagree in general
.
I just want to make a clarification, maybe redundant, that you seem to be
mostly concerned about effects on threads that contend on same locks or
related lock chains. I am more concerned about effects on completely
unrelated threads.
Yes, sleeping and waking introduces overhead (and not only) on threads doing
them and threads depending on those threads. But spinning, when it happens
on a large share of CPUs (e.g., ~ 100% of them), introduces a penalty on the
whole system.
At work, where a substantial part of our application lives in kernel, I
regularly debug issues that seem to happen because of CPUs starvation. And
a common theme between them is that many CPUs are tied in the lock spinning.
FreeBSD as a general purpose OS does not seem to suffer from the same issues
(at least, that I know of), but we have quite a lot of additional kernel
threads and kernel locks.
So, maybe instead of trying to experiment with FreeBSD I should try to
experiment with our derivative product first.
Thank you for the feedback and additional information.

BTW, the unfairness and scheduling randomness are quite fundamental
for locking to work. Minor scheduling inconsistencies and timing drifts
due to external events and internal machine operations allow to avoid
lock convoys, which would otherwise plague us.

The priorities mechanism is the partial compensation for lack of fairness.
Critical sections give you the highest priority execution, factually.
They will probably become more common as more lock-less algorithms start
appear in the kernel.

I saw some references that Windows tried to use the completely fair queued
spinlocks, and either limited their use to very specific places, or get
rid of them. They were never advertised as generic facility.

Also, I know that Solaris experimented with stuff like direct transfer of
the lock from unlocking thread to (some) lock contender. They were very
much dissatisfied with the results.