Discussion:
psm(4) & atkbdc(4) locking
Vladimir Kondratyev
2017-05-08 18:33:34 UTC
Permalink
Hi All

In order to implement evdev support in psm(4) driver I`m going to add
mutexes to psm and atkbdc drivers and mark psm interrupt and cdev
handlers MPSAFE.
Atkbd(4) is depending on Giant like before.

Locking of these drivers seems to be low-hanging fruit as spl() calls
are still in place but it has not occurred.

Does someone know why it is not done yet? For reasons other than "Nobody
took care of it"?
Any hidden traps?

Patches can be found here:
https://reviews.freebsd.org/D10263 (atkbdc, serialize hw registers
access)
https://reviews.freebsd.org/D10264 (psm, serialize softc access, mark
interrupt and cdev handlers MPSAFE)
--
WBR
Vladimir Kondratyev
Warner Losh
2017-05-08 18:39:02 UTC
Permalink
On Mon, May 8, 2017 at 12:33 PM, Vladimir Kondratyev
Post by Vladimir Kondratyev
Hi All
In order to implement evdev support in psm(4) driver I`m going to add
mutexes to psm and atkbdc drivers and mark psm interrupt and cdev handlers
MPSAFE.
Atkbd(4) is depending on Giant like before.
Locking of these drivers seems to be low-hanging fruit as spl() calls are
still in place but it has not occurred.
Does someone know why it is not done yet? For reasons other than "Nobody
took care of it"?
Any hidden traps?
Working rock-solid reliably in ddb(4) is what tripped me up when I
started down this path 10 years ago. Understanding all the rules for
that was a bridge too far for me given the time I had for the project
then. If you have that covered (I haven't looked at the diffs yet),
you're golden.

Warner
Post by Vladimir Kondratyev
https://reviews.freebsd.org/D10263 (atkbdc, serialize hw registers access)
https://reviews.freebsd.org/D10264 (psm, serialize softc access, mark
interrupt and cdev handlers MPSAFE)
--
WBR
Vladimir Kondratyev
_______________________________________________
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
Vladimir Kondratyev
2017-05-08 18:58:27 UTC
Permalink
Post by Warner Losh
On Mon, May 8, 2017 at 12:33 PM, Vladimir Kondratyev
Post by Vladimir Kondratyev
Hi All
In order to implement evdev support in psm(4) driver I`m going to add
mutexes to psm and atkbdc drivers and mark psm interrupt and cdev handlers
MPSAFE.
Atkbd(4) is depending on Giant like before.
Locking of these drivers seems to be low-hanging fruit as spl() calls are
still in place but it has not occurred.
Does someone know why it is not done yet? For reasons other than "Nobody
took care of it"?
Any hidden traps?
Working rock-solid reliably in ddb(4) is what tripped me up when I
started down this path 10 years ago.
I tried to avoid atkbd changes as much as possible. Patch adds atkbdc
mutex acquisition
before each hardware access and nothing more. I think mutexes should be
ignored in ddb mode.
Post by Warner Losh
Understanding all the rules for
that was a bridge too far for me given the time I had for the project
then. If you have that covered (I haven't looked at the diffs yet),
you're golden.
I`m not familiar with ddb internals, that is why i wrote first mail.
Post by Warner Losh
Warner
Post by Vladimir Kondratyev
https://reviews.freebsd.org/D10263 (atkbdc, serialize hw registers access)
https://reviews.freebsd.org/D10264 (psm, serialize softc access, mark
interrupt and cdev handlers MPSAFE)
--
WBR
Vladimir Kondratyev
Konstantin Belousov
2017-05-08 19:39:35 UTC
Permalink
Post by Vladimir Kondratyev
Post by Warner Losh
On Mon, May 8, 2017 at 12:33 PM, Vladimir Kondratyev
Post by Vladimir Kondratyev
Hi All
In order to implement evdev support in psm(4) driver I`m going to add
mutexes to psm and atkbdc drivers and mark psm interrupt and cdev handlers
MPSAFE.
Atkbd(4) is depending on Giant like before.
Locking of these drivers seems to be low-hanging fruit as spl() calls are
still in place but it has not occurred.
Does someone know why it is not done yet? For reasons other than "Nobody
took care of it"?
Any hidden traps?
Working rock-solid reliably in ddb(4) is what tripped me up when I
started down this path 10 years ago.
I tried to avoid atkbd changes as much as possible. Patch adds atkbdc
mutex acquisition
before each hardware access and nothing more. I think mutexes should be
ignored in ddb mode.
Locks are not ignored in kdb_active mode, and this is reasonable.
Instead, kdb should not acquire locks at all.

Consider a situation where you need to send a composite command to
the hardware, which consists of several registers accesses, and the
whole sequence of accesses is covered by a lock to ensure atomicity.
Kdb may be entered at arbitrary moment, including the middle of the
lock-protected section. This means that kdb might be entered with the
lock owned by some thread, not neccessary the thread which was executing
when entrance occured. More, the hardware state is some intermediate
state of being commanded the composite sequence, not yet finished.

I do not think that atkbd code correctly handles this situation now,
and simply adding a lock there probably make things worse.
Post by Vladimir Kondratyev
Post by Warner Losh
Understanding all the rules for
that was a bridge too far for me given the time I had for the project
then. If you have that covered (I haven't looked at the diffs yet),
you're golden.
I`m not familiar with ddb internals, that is why i wrote first mail.
Post by Warner Losh
Warner
Post by Vladimir Kondratyev
https://reviews.freebsd.org/D10263 (atkbdc, serialize hw registers access)
https://reviews.freebsd.org/D10264 (psm, serialize softc access, mark
interrupt and cdev handlers MPSAFE)
--
WBR
Vladimir Kondratyev
_______________________________________________
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
Warner Losh
2017-05-08 19:48:42 UTC
Permalink
Post by Konstantin Belousov
Post by Vladimir Kondratyev
Post by Warner Losh
On Mon, May 8, 2017 at 12:33 PM, Vladimir Kondratyev
Post by Vladimir Kondratyev
Hi All
In order to implement evdev support in psm(4) driver I`m going to add
mutexes to psm and atkbdc drivers and mark psm interrupt and cdev handlers
MPSAFE.
Atkbd(4) is depending on Giant like before.
Locking of these drivers seems to be low-hanging fruit as spl() calls are
still in place but it has not occurred.
Does someone know why it is not done yet? For reasons other than "Nobody
took care of it"?
Any hidden traps?
Working rock-solid reliably in ddb(4) is what tripped me up when I
started down this path 10 years ago.
I tried to avoid atkbd changes as much as possible. Patch adds atkbdc
mutex acquisition
before each hardware access and nothing more. I think mutexes should be
ignored in ddb mode.
Locks are not ignored in kdb_active mode, and this is reasonable.
Instead, kdb should not acquire locks at all.
Consider a situation where you need to send a composite command to
the hardware, which consists of several registers accesses, and the
whole sequence of accesses is covered by a lock to ensure atomicity.
Kdb may be entered at arbitrary moment, including the middle of the
lock-protected section. This means that kdb might be entered with the
lock owned by some thread, not neccessary the thread which was executing
when entrance occured. More, the hardware state is some intermediate
state of being commanded the composite sequence, not yet finished.
Yes. This is the snag I ran into...
Post by Konstantin Belousov
I do not think that atkbd code correctly handles this situation now,
and simply adding a lock there probably make things worse.
It did for me, since breaking to keyboard was totally broken by the
changes I made to try to lock things since it was quite likely to
interrupt things when locks were held... I had a very naive
implementation, which wasn't up to the task, so some care is needed.
But this was in the 5.x or 6.x time frame, and the locking situation
wrt GIANT is much better today than then...

I don't think it's a huge problem, but just one that the implementor
needs to be aware of... Since I was unaware of all the details up
front, I came up with a solution that couldn't possibly work so I
abandoned it.

Warner
Post by Konstantin Belousov
Post by Vladimir Kondratyev
Post by Warner Losh
Understanding all the rules for
that was a bridge too far for me given the time I had for the project
then. If you have that covered (I haven't looked at the diffs yet),
you're golden.
I`m not familiar with ddb internals, that is why i wrote first mail.
Post by Warner Losh
Warner
Post by Vladimir Kondratyev
https://reviews.freebsd.org/D10263 (atkbdc, serialize hw registers access)
https://reviews.freebsd.org/D10264 (psm, serialize softc access, mark
interrupt and cdev handlers MPSAFE)
--
WBR
Vladimir Kondratyev
_______________________________________________
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
Bruce Evans
2017-05-08 21:02:35 UTC
Permalink
Post by Warner Losh
Post by Konstantin Belousov
Post by Vladimir Kondratyev
Post by Warner Losh
On Mon, May 8, 2017 at 12:33 PM, Vladimir Kondratyev
Post by Vladimir Kondratyev
Hi All
In order to implement evdev support in psm(4) driver I`m going to add
mutexes to psm and atkbdc drivers and mark psm interrupt and cdev handlers
MPSAFE.
Atkbd(4) is depending on Giant like before.
Locking of these drivers seems to be low-hanging fruit as spl() calls are
still in place but it has not occurred.
High-hanging fruit. it is a lot of work to replace the buggy locking by
something that works, without causing deadlocks instead of relatively
harmless races.
Post by Warner Losh
Post by Konstantin Belousov
Post by Vladimir Kondratyev
Post by Warner Losh
Post by Vladimir Kondratyev
Does someone know why it is not done yet? For reasons other than "Nobody
took care of it"?
Any hidden traps?
There is ddb (discussed below), and also syscons calling the keyboard
driver. syscons is still Giant-locked, partly because keyboard drivers
are. The common Giant lock works more simply than separate non Giant
ones. The same non-Giant lock for both would would have some of the
complexities of Giant without the support already being there.
Post by Warner Losh
Post by Konstantin Belousov
Post by Vladimir Kondratyev
Post by Warner Losh
Working rock-solid reliably in ddb(4) is what tripped me up when I
started down this path 10 years ago.
I tried to avoid atkbd changes as much as possible. Patch adds atkbdc
mutex acquisition
before each hardware access and nothing more. I think mutexes should be
ignored in ddb mode.
They are only ignored if you do it explicitly, but this is fragile. If
the hardware works after ignoring locks while in kdb mode, then it must
work after ignoring locks in all modes.
Post by Warner Losh
Post by Konstantin Belousov
Locks are not ignored in kdb_active mode, and this is reasonable.
Instead, kdb should not acquire locks at all.
(This should be ddb_active mode. Only ddb uses the keyboard. But there
is only a flag for kdb.)

See my recent fixes for syscons. These basically lock console output
correctly, but for the keyboard they depend on keyboard drivers having
no locking and not needing any (really, both syscons and keyboard
drivers are supposed to be locked by a common lock which is Giant, but
this doesn't work for i/o in kdb mode and is fragile for low-level
console output not in kdb mode. My tests arrange for console output
from almost to worst places where it shouldn't be done (from IPI STOP
handlers, where the CPU doing the output must succeed despite other
CPUs that have already been stopped holding console driver locks).
Keyboard drivers are harder to test in this context, and are sure to
fail. so I don't try hard to test them. However, kdb and panics in
this context sometime reach the keyboard driver. Also, on syscons
entry in such context, it tries to not use keyboard drivers if not
necessary.
Post by Warner Losh
Post by Konstantin Belousov
Consider a situation where you need to send a composite command to
the hardware, which consists of several registers accesses, and the
whole sequence of accesses is covered by a lock to ensure atomicity.
Kdb may be entered at arbitrary moment, including the middle of the
lock-protected section. This means that kdb might be entered with the
lock owned by some thread, not neccessary the thread which was executing
when entrance occured. More, the hardware state is some intermediate
state of being commanded the composite sequence, not yet finished.
AT keyboards don't seem to mind some corruption of the hardware state.
In general, provided a bad state doesn't destroy the device, then at
worst the driver can probably recover by resetting everything after a
timeout.

My fixes are simpler and more complete for sio. They are simpler because
the driver only has 1 lock for input and output (but this lock is also
shared between sio devices, which gives problems related to those for
Giant -- console output is supposed to be immediate and must be synchronous,
but you are forced to wait for other devices). sio always did an almost
complete state switch for console entry and exit. This is possible since
the device registers are read-write and restoring them works as well
as a reset for getting back to a usable state. My recent fixes mainly
add some control of the nesting for the stacked states, and extra care
for some !kdb cases.
Post by Warner Losh
Yes. This is the snag I ran into...
Post by Konstantin Belousov
I do not think that atkbd code correctly handles this situation now,
and simply adding a lock there probably make things worse.
Yes, it would just give deadlock. See my recent fixes. atkbd currently
requires Giant locking, but this is obviously impossible for low-level
console i/o (even without ddb). Syscons used to do null locking in this
case, without really trying. Initially it used non-null spls, but spls
don't work at all for this use (since spl only locks out interrupts).
Then when syscons was converted to Giant, Giant works too well for hiding
itself, so syscons has almost no explicit references to Giant, and
certainly none for low-level-console i/o where they wouldn't work. Giant
also works well for locking out the interrupt handlers, but converting
the spls to null ones made them do even less than before to accidentally
limit console i/o.

If you sprinkle locks that actually, work, then deadlock is certain in
some cases. If you use recursive locks like syscons still does, then
they will often avoid deadlock but won't actually work. They reduce
to null spls if they are already held by the same thread.

Typical examples of deadlocks are:
- thread holding lock is stopped in IPI STOP handler
- another thread calls the console driver, perhaps to report a problem
in its IPI STOP handler

Typical examples of recursive (or otherwise ignored) locks not actually
working when the are needed are:
- thread is in super-critical region of console driver and aquires lock.
This should be a spinlock, and interrupts should be disabled (this
is a side effect of spinlocks)
- NMIs and traps cannot be disabled. They are often caused by NMI IPI
STOPs and debugger traps. Since the region is super-critical, i/o
should not be done, at least blindly. But recursive locks are useless
for stopping the i/o, since the same thread owns the lock. Ignoring
the lock in kdb mode is equally dangerous.
Post by Warner Losh
It did for me, since breaking to keyboard was totally broken by the
changes I made to try to lock things since it was quite likely to
interrupt things when locks were held... I had a very naive
implementation, which wasn't up to the task, so some care is needed.
But this was in the 5.x or 6.x time frame, and the locking situation
wrt GIANT is much better today than then...
Not much better. There are just more deadlocks and less races now.
Lots of deadlocks were added even at the printf() and msgbuf level.
One deadlock in cnputs() was there for many years. Output is now
droppped there instead.
Post by Warner Losh
I don't think it's a huge problem, but just one that the implementor
needs to be aware of... Since I was unaware of all the details up
front, I came up with a solution that couldn't possibly work so I
abandoned it.
The difficulties are mostly in complex hardware and drivers. The
only method that is sure to work is complete reinitialization/reset
whenever needed (including before the normal driver is probed), without
losing too much i/o. To take shortcuts in this, so that it doesn't
take as much code as the normal driver, you have to understand the
hardware better than is needed for writing the normal driver.

Bruce
Vladimir Kondratyev
2017-05-08 21:03:10 UTC
Permalink
Post by Konstantin Belousov
Post by Vladimir Kondratyev
Post by Warner Losh
On Mon, May 8, 2017 at 12:33 PM, Vladimir Kondratyev
Post by Vladimir Kondratyev
Hi All
In order to implement evdev support in psm(4) driver I`m going to add
mutexes to psm and atkbdc drivers and mark psm interrupt and cdev handlers
MPSAFE.
Atkbd(4) is depending on Giant like before.
Locking of these drivers seems to be low-hanging fruit as spl() calls are
still in place but it has not occurred.
Does someone know why it is not done yet? For reasons other than "Nobody
took care of it"?
Any hidden traps?
Working rock-solid reliably in ddb(4) is what tripped me up when I
started down this path 10 years ago.
I tried to avoid atkbd changes as much as possible. Patch adds atkbdc
mutex acquisition
before each hardware access and nothing more. I think mutexes should be
ignored in ddb mode.
Locks are not ignored in kdb_active mode, and this is reasonable.
Instead, kdb should not acquire locks at all.
Consider a situation where you need to send a composite command to
the hardware, which consists of several registers accesses, and the
whole sequence of accesses is covered by a lock to ensure atomicity.
Kdb may be entered at arbitrary moment, including the middle of the
lock-protected section. This means that kdb might be entered with the
lock owned by some thread, not neccessary the thread which was
executing
when entrance occured. More, the hardware state is some intermediate
state of being commanded the composite sequence, not yet finished.
Sending a composite command does not happen on regular operation. It
can happen on initialization, errors, led state changing and some
ioctls only, so i can change patch to protect them with giant as before.

But receive path is looking more complex as there are 2 data queues
that can be filled in both interrupt and polling ways

Nevertheless, there is simple fallback way: just add giant lock support
to evdev
Post by Konstantin Belousov
I do not think that atkbd code correctly handles this situation now,
and simply adding a lock there probably make things worse.
--
WBR
Vladimir Kondratyev
Loading...