Discussion:
making SW_WATCHDOG dynamic
Mike Karels
2017-12-26 14:25:22 UTC
Permalink
There is a kernel option, SW_WATCHDOG, which adds a low-level software
watchdog in hardclock. By default, the kernel and watchdogd support
only hardware-based watchdogs. There is also a callout-based software
watchdog that can be enabled by watchdogd with an ioctl if --softwatchdog
is specified, but watchdogd doesn't switch on its own. The SW_WATCHDOG
option adds a lower-level software watchdog to the hardware-based mechanism,
but it adds it unconditionally. I propose to include the SW_WATCHDOG
facility by default, but enable it only if there is no hardware watchdog.
I'm interested in any comments, suggestions, or background; feel free to
mail me off the list. If there are multiple people interested, I'll
forward messages to that group.

I want to make the change because I have found SW_WATCHDOG quite useful
at $JOB, and it's annoying to have to build a custom kernel just for this
(not just once, but every time there is a kernel patch).

Also, I'm curious why we have two software watchdog facilities. The
--softwatchdog facility has various options on expiration, such as
printf/log/panic; I don't know why anything other than panic/reboot
would be desirable, though. I already contacted some of the people who
have left fingerprints on watchdog. Also, if anyone wants to review
the code, let me know.

Thanks,
Mike
Andriy Gapon
2017-12-27 13:18:53 UTC
Permalink
Post by Mike Karels
There is a kernel option, SW_WATCHDOG, which adds a low-level software
watchdog in hardclock. By default, the kernel and watchdogd support
only hardware-based watchdogs. There is also a callout-based software
watchdog that can be enabled by watchdogd with an ioctl if --softwatchdog
is specified, but watchdogd doesn't switch on its own. The SW_WATCHDOG
option adds a lower-level software watchdog to the hardware-based mechanism,
but it adds it unconditionally. I propose to include the SW_WATCHDOG
facility by default, but enable it only if there is no hardware watchdog.
I think that this is a good idea. Although, I would not necessarily tie the
software watchdog to not having any hardware watchdog. This is probably a good
default policy, but I would allow to enable / disable the software watchdog
explicitly (e.g. via a sysctl).

I also think that we should support enabling several watchdog timers with
different timeouts. Each of them can serve a different purpose. E.g., a
software or hardware NMI-sending watchdog can be used to get diagnostic data out
of a hung system while a resetting watchdog can be used to ensure fail-safe
operation.
Post by Mike Karels
I'm interested in any comments, suggestions, or background; feel free to
mail me off the list. If there are multiple people interested, I'll
forward messages to that group.
I want to make the change because I have found SW_WATCHDOG quite useful
at $JOB, and it's annoying to have to build a custom kernel just for this
(not just once, but every time there is a kernel patch).
Makes sense.
Post by Mike Karels
Also, I'm curious why we have two software watchdog facilities. The
--softwatchdog facility has various options on expiration, such as
printf/log/panic; I don't know why anything other than panic/reboot
would be desirable, though. I already contacted some of the people who
have left fingerprints on watchdog. Also, if anyone wants to review
the code, let me know.
I guess that the second software watchdog was added to achieve what I suggested
above. Of course, it would have been nicer to re-use SW_WATCHDOG for that
purpose and to add a more generic support for configuring multiple watchdog
timers with different timeouts. But I guess that adding a new single-purpose
software watchdog was much easier to do.

P.S.
And maybe just using the second software watchdog would be good enough for what
you are doing?
--
Andriy Gapon
Rodney W. Grimes
2017-12-27 18:11:52 UTC
Permalink
Post by Mike Karels
There is a kernel option, SW_WATCHDOG, which adds a low-level software
watchdog in hardclock. By default, the kernel and watchdogd support
only hardware-based watchdogs. There is also a callout-based software
watchdog that can be enabled by watchdogd with an ioctl if --softwatchdog
is specified, but watchdogd doesn't switch on its own. The SW_WATCHDOG
option adds a lower-level software watchdog to the hardware-based mechanism,
but it adds it unconditionally. I propose to include the SW_WATCHDOG
facility by default, but enable it only if there is no hardware watchdog.
I'm interested in any comments, suggestions, or background; feel free to
mail me off the list. If there are multiple people interested, I'll
forward messages to that group.
I want to make the change because I have found SW_WATCHDOG quite useful
at $JOB, and it's annoying to have to build a custom kernel just for this
(not just once, but every time there is a kernel patch).
This is not a good reason to include this code in GENERIC. Should I
add all the things I find handy at $WORK too?

Further I think this is going in the opposite direction of
what we should be doing, less and less included in GENERIC, more and
more done as loadable options. I think Warner (imp@) is going
down this path with his devmatch code.

Now if you can recode this functionality so that it is a
boot time loadable module I am sure many would be very happy
to have that! It would satisfy your need of not having to
recompile a kernel, and others need of not needing this code
at all.

I think we have lost some light as to what the GENERIC kernel
is really for, to get you up and running good enough that you
can infact build a custom kernel. I do not think it was ever
intended that people run this long term, though over the years
that has become the defacto standard. IMHO, a bad one at that.
Post by Mike Karels
Also, I'm curious why we have two software watchdog facilities. The
--softwatchdog facility has various options on expiration, such as
printf/log/panic; I don't know why anything other than panic/reboot
would be desirable, though. I already contacted some of the people who
have left fingerprints on watchdog. Also, if anyone wants to review
the code, let me know.
I have no idea why we have 2, but can hypothosize that 2 different
people did the work, and both got included. As far as only
action on a timeuot being panic/reboot, not sure I agree with
that as we should provide a method (timeout has occured, what
would you like to do) not a policy (reboot/panic).
Post by Mike Karels
Thanks,
Mike
_______________________________________________
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
--
Rod Grimes ***@freebsd.org
Mark Linimon
2017-12-28 02:23:01 UTC
Permalink
Post by Rodney W. Grimes
I think we have lost some light as to what the GENERIC kernel
is really for, to get you up and running good enough that you
can infact build a custom kernel. I do not think it was ever
intended that people run this long term, though over the years
that has become the defacto standard. IMHO, a bad one at that.
By including more things in GENERIC, we expanded our audience past
"kernel developers + system administrators".

In particular, including "people using desktops and laptops who just
want to get work done".

Expecting everyone to recompile their kernels would be a shocking
step backwards for the project IMVHO.

mcl
Willem Jan Withagen
2017-12-29 00:53:59 UTC
Permalink
Post by Mark Linimon
Post by Rodney W. Grimes
I think we have lost some light as to what the GENERIC kernel
is really for, to get you up and running good enough that you
can infact build a custom kernel. I do not think it was ever
intended that people run this long term, though over the years
that has become the defacto standard. IMHO, a bad one at that.
By including more things in GENERIC, we expanded our audience past
"kernel developers + system administrators".
In particular, including "people using desktops and laptops who just
want to get work done".
Expecting everyone to recompile their kernels would be a shocking
step backwards for the project IMVHO.
I've been building custom kernels for as long as I can remember. And it
used to be like you describe, get the box running and give it its own
build-config.
But as the number of systems under my fingers grew, maintenance became
more and more a pain/time-consuming. And since about a year I've started
using freebsd-update on those systems that need proper maintenance.

And I cannot avoid the feeling that this is sort of perpendicular to
your suggestion of building custom kernels, since freebsd-update frowns
on that... And I've got the feeling that's the tool quite a lot of
people use to deal with basic Freebsd maintenance stuff.

--WjW
Mike Karels
2017-12-28 01:17:47 UTC
Permalink
Post by Rodney W. Grimes
Post by Mike Karels
There is a kernel option, SW_WATCHDOG, which adds a low-level software
watchdog in hardclock. By default, the kernel and watchdogd support
only hardware-based watchdogs. There is also a callout-based software
watchdog that can be enabled by watchdogd with an ioctl if --softwatchdog
is specified, but watchdogd doesn't switch on its own. The SW_WATCHDOG
option adds a lower-level software watchdog to the hardware-based mechanism,
but it adds it unconditionally. I propose to include the SW_WATCHDOG
facility by default, but enable it only if there is no hardware watchdog.
I'm interested in any comments, suggestions, or background; feel free to
mail me off the list. If there are multiple people interested, I'll
forward messages to that group.
I want to make the change because I have found SW_WATCHDOG quite useful
at $JOB, and it's annoying to have to build a custom kernel just for this
(not just once, but every time there is a kernel patch).
This is not a good reason to include this code in GENERIC. Should I
add all the things I find handy at $WORK too?
I understand what you are saying, but let me add some context. The
watchdog driver and framework are part of the base system, and
SW_WATCHDOG is about a dozen lines of code embedded in hardclock.
So about 90% of the facility is already in the base system, and I
propose to make it 100%.
Post by Rodney W. Grimes
Further I think this is going in the opposite direction of
what we should be doing, less and less included in GENERIC, more and
down this path with his devmatch code.
I agree with using loadable modules more if they can be selected
automatically, e.g. Warner's work. Keeping dozens of drivers out of
the base, and not having to edit loader.conf to include all the
desired facilities, is a big improvement.
Post by Rodney W. Grimes
Now if you can recode this functionality so that it is a
boot time loadable module I am sure many would be very happy
to have that! It would satisfy your need of not having to
recompile a kernel, and others need of not needing this code
at all.
As noted, this is tiny, and the hooks to hardclock to connect to
the module would be about the same size as the current code. The
advantage of the SW_WATCHDOG facility is that it is deeply embedded,
requiring only that hardclock be running.
Post by Rodney W. Grimes
I think we have lost some light as to what the GENERIC kernel
is really for, to get you up and running good enough that you
can infact build a custom kernel. I do not think it was ever
intended that people run this long term, though over the years
that has become the defacto standard. IMHO, a bad one at that.
I'm not sure I agree with this goal, although I used to run custom
kernels quite often to save space. That is less of a goal on x86
than in the past, and device drivers are the real bloat. I don't
think that a large fraction of FreeBSD users build custom kernels,
although I don't have actual data. But even in 4.xBSD days, many
systems ran the generic kernel.

Mike
Warner Losh
2017-12-28 01:48:18 UTC
Permalink
Post by Mike Karels
Post by Rodney W. Grimes
Post by Mike Karels
There is a kernel option, SW_WATCHDOG, which adds a low-level software
watchdog in hardclock. By default, the kernel and watchdogd support
only hardware-based watchdogs. There is also a callout-based software
watchdog that can be enabled by watchdogd with an ioctl if
--softwatchdog
Post by Rodney W. Grimes
Post by Mike Karels
is specified, but watchdogd doesn't switch on its own. The SW_WATCHDOG
option adds a lower-level software watchdog to the hardware-based
mechanism,
Post by Rodney W. Grimes
Post by Mike Karels
but it adds it unconditionally. I propose to include the SW_WATCHDOG
facility by default, but enable it only if there is no hardware
watchdog.
Post by Rodney W. Grimes
Post by Mike Karels
I'm interested in any comments, suggestions, or background; feel free
to
Post by Rodney W. Grimes
Post by Mike Karels
mail me off the list. If there are multiple people interested, I'll
forward messages to that group.
I want to make the change because I have found SW_WATCHDOG quite useful
at $JOB, and it's annoying to have to build a custom kernel just for
this
Post by Rodney W. Grimes
Post by Mike Karels
(not just once, but every time there is a kernel patch).
This is not a good reason to include this code in GENERIC. Should I
add all the things I find handy at $WORK too?
I understand what you are saying, but let me add some context. The
watchdog driver and framework are part of the base system, and
SW_WATCHDOG is about a dozen lines of code embedded in hardclock.
So about 90% of the facility is already in the base system, and I
propose to make it 100%.
Post by Rodney W. Grimes
Further I think this is going in the opposite direction of
what we should be doing, less and less included in GENERIC, more and
down this path with his devmatch code.
I agree with using loadable modules more if they can be selected
automatically, e.g. Warner's work. Keeping dozens of drivers out of
the base, and not having to edit loader.conf to include all the
desired facilities, is a big improvement.
Thanks for the vote of confidence :)

I don't view this as going the opposite way because of the very special
nature of SW_WATCHDOG. It doesn't fit into a nice, generic framework of
anything else.
Post by Mike Karels
Now if you can recode this functionality so that it is a
Post by Rodney W. Grimes
boot time loadable module I am sure many would be very happy
to have that! It would satisfy your need of not having to
recompile a kernel, and others need of not needing this code
at all.
As noted, this is tiny, and the hooks to hardclock to connect to
the module would be about the same size as the current code. The
advantage of the SW_WATCHDOG facility is that it is deeply embedded,
requiring only that hardclock be running.
I agree here: It would be hard to recode as a loadable module w/o there
being an unacceptable hit to performance. Turning on the option wouldn't
have those issues.
Post by Mike Karels
I think we have lost some light as to what the GENERIC kernel
Post by Rodney W. Grimes
is really for, to get you up and running good enough that you
can infact build a custom kernel. I do not think it was ever
intended that people run this long term, though over the years
that has become the defacto standard. IMHO, a bad one at that.
I'm not sure I agree with this goal, although I used to run custom
kernels quite often to save space. That is less of a goal on x86
than in the past, and device drivers are the real bloat. I don't
think that a large fraction of FreeBSD users build custom kernels,
although I don't have actual data. But even in 4.xBSD days, many
systems ran the generic kernel.
The goal of my GENERIC weight reduction program is to reduce it by as much
as makes sense, but not beyond. There's several issues that will keep it
from being completely minimal. Consoles cannot be loadable modules, even
loaded from loader.conf. Storage devices that we're booting off of need to
be loaded before my code will run. And there's no good way to know we need
to load a certain driver in /boot/loader because the name space there
differs from FreeBSD's name space. We have to rely on heuristics that I've
not convinced myself will be sufficient, so there will be several storage
drivers in GENERIC until that problem gets solved. Plus root filesystems
need to be loaded, but at least that's knowable. There's also some geom
issues to sort out to make them loadable, especially in complicated
situations.

So all this is by way of saying that we have quite a bit of ground to cover
before even my limited vision is in place.

Warner
Mike Karels
2017-12-31 15:03:13 UTC
Permalink
My proposed change is in Phabricator, https://reviews.freebsd.org/D13713.

A couple of notes:

- I retained the SW_WATCHDOG option; it now just enables the software
watchdog even if a hardware watchdog is attached, as it would have done
in the past. I updated NOTES and the watchdog(4) information accordingly.
I did not provide for any other combination of watchdogs or actions; that
would require a complete rework of the kernel API. Note that the hardware
watchdogs provide no choice of action; they simply reset the box.

- I have tested this with and without the SW_WATCHDOG option, but do not
have a hardware watchdog to test with. If someone could test this, it would
be much apprectiated. I tested by enabling watchdogd with default parameters,
doing b"killall -STOP watchdogd", and then observing that the system dropped
into the debugger after about 128 sec. It drops into the debugger if KDB
is defined and KDB_UNATTENDED is not; otherise it panics (as before).
Post by Andriy Gapon
P.S.
And maybe just using the second software watchdog would be good enough for what
you are doing?
I think the hardclock-based SW_WATCHDOG is better than the --softtimeout
wactchdog because it runs at a lower level (hardclock vs callout/softclock).
In particular, I have found it to work in situations where a locking
deadly embrace started in the network stack, and then a callout routine
got stuck as well when a function it called blocked on the offending lock.
That caused watchdogd to fail to be rescheduled, and the watchdog fired as
a result. The callout-based facility could have been blocked out as well.

Mike
Rodney W. Grimes
2017-12-31 18:01:40 UTC
Permalink
Post by Mike Karels
My proposed change is in Phabricator, https://reviews.freebsd.org/D13713.
- I retained the SW_WATCHDOG option; it now just enables the software
watchdog even if a hardware watchdog is attached, as it would have done
in the past.
The SW_WATCHDOG option has changed in function, it use to control
if this code was compiled in at all, it now does something different.
This code is compile in now no matter what with no option to remove
it, which means there is one more rarely used option in the GENERIC
kernel that can not be removed. 1 byte or 100 does not matter, it
is that policy has been shoved down the users throat. "You shall have
a software watchdog timer in your kernel weither you want it there
or not, and it is turned off by default."

We have way to many of those types of things already,
adding to this list is going in the wrong direction.
Post by Mike Karels
I updated NOTES and the watchdog(4) information accordingly.
This change as it stands requires a RELNOTES flag.
Post by Mike Karels
I did not provide for any other combination of watchdogs or actions; that
would require a complete rework of the kernel API. Note that the hardware
watchdogs provide no choice of action; they simply reset the box.
- I have tested this with and without the SW_WATCHDOG option, but do not
have a hardware watchdog to test with. If someone could test this, it would
be much apprectiated. I tested by enabling watchdogd with default parameters,
doing b"killall -STOP watchdogd", and then observing that the system dropped
into the debugger after about 128 sec. It drops into the debugger if KDB
is defined and KDB_UNATTENDED is not; otherise it panics (as before).
Post by Andriy Gapon
P.S.
And maybe just using the second software watchdog would be good enough for what
you are doing?
I think the hardclock-based SW_WATCHDOG is better than the --softtimeout
wactchdog because it runs at a lower level (hardclock vs callout/softclock).
In particular, I have found it to work in situations where a locking
deadly embrace started in the network stack, and then a callout routine
got stuck as well when a function it called blocked on the offending lock.
That caused watchdogd to fail to be rescheduled, and the watchdog fired as
a result. The callout-based facility could have been blocked out as well.
Mike
_______________________________________________
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
--
Rod Grimes ***@freebsd.org
Warner Losh
2017-12-31 22:52:20 UTC
Permalink
On Sun, Dec 31, 2017 at 11:01 AM, Rodney W. Grimes <
Post by Rodney W. Grimes
Post by Mike Karels
My proposed change is in Phabricator, https://reviews.freebsd.org/D13713
.
Post by Mike Karels
- I retained the SW_WATCHDOG option; it now just enables the software
watchdog even if a hardware watchdog is attached, as it would have done
in the past.
The SW_WATCHDOG option has changed in function, it use to control
if this code was compiled in at all, it now does something different.
This code is compile in now no matter what with no option to remove
it, which means there is one more rarely used option in the GENERIC
kernel that can not be removed. 1 byte or 100 does not matter, it
is that policy has been shoved down the users throat. "You shall have
a software watchdog timer in your kernel weither you want it there
or not, and it is turned off by default."
We have way to many of those types of things already,
adding to this list is going in the wrong direction.
Except the code does nothing of the sort. All it does is allow one to
enable software watchdogs if one wants as an always-available (but off by
default) feature. I don't see what the big deal is. If you don't enable
watchdogd, things are exactly as before. If you do enable it, you don't
have to rebuild a kernel to get it going.

This isn't worth a fight.

Warner
Post by Rodney W. Grimes
Post by Mike Karels
I updated NOTES and the watchdog(4) information accordingly.
This change as it stands requires a RELNOTES flag.
Post by Mike Karels
I did not provide for any other combination of watchdogs or actions; that
would require a complete rework of the kernel API. Note that the
hardware
Post by Mike Karels
watchdogs provide no choice of action; they simply reset the box.
- I have tested this with and without the SW_WATCHDOG option, but do not
have a hardware watchdog to test with. If someone could test this, it
would
Post by Mike Karels
be much apprectiated. I tested by enabling watchdogd with default
parameters,
Post by Mike Karels
doing b"killall -STOP watchdogd", and then observing that the system
dropped
Post by Mike Karels
into the debugger after about 128 sec. It drops into the debugger if KDB
is defined and KDB_UNATTENDED is not; otherise it panics (as before).
Post by Andriy Gapon
P.S.
And maybe just using the second software watchdog would be good enough
for what
Post by Mike Karels
Post by Andriy Gapon
you are doing?
I think the hardclock-based SW_WATCHDOG is better than the --softtimeout
wactchdog because it runs at a lower level (hardclock vs
callout/softclock).
Post by Mike Karels
In particular, I have found it to work in situations where a locking
deadly embrace started in the network stack, and then a callout routine
got stuck as well when a function it called blocked on the offending
lock.
Post by Mike Karels
That caused watchdogd to fail to be rescheduled, and the watchdog fired
as
Post by Mike Karels
a result. The callout-based facility could have been blocked out as
well.
Post by Mike Karels
Mike
_______________________________________________
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
--
Rod Grimes
_______________________________________________
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
Loading...