Discussion:
Order of device suspend/resume
Roger Pau Monné
2016-12-15 11:40:33 UTC
Permalink
Hello,

I'm currently dealing with a bug in the Xen suspend/resume sequence, and I've
found that lacking a way to order device priority during suspend/resume is
proving quite harmful for Xen (and maybe other systems too). The current
suspend/resume code simply scans the root bus, and suspends/resumes every device
based on the order they are attached to their parents. The problem here is that
there's no way to tell that some devices should be resumed before others, for
example the event timers/time counters/uarts should definitely be resume before
other devices, but that's seems to happens mostly out of chance.

Currently most time related devices are attached directly to the nexus, which
means they will get resumed first, but for example the uart is currently
attached to the pci bus IIRC, which means it gets resumed quite late. On Xen
systems, this is even worse. The Xen PV bus (that contains all Xen-related
devices) is attached the last one (because it tends to pick up unused memory
regions for it's own usage) and this bus also contains the PV timecounter which
should be resumed _before_ other devices, or else timecounting will be
completely screwed and things can get stuck in indefinitely long loops (due to
the fact that the timecounter is implemented based on the uptime of the host,
and that changes from host-to-host).

In order to solve this I could add a hack to the Xen resume process (which is
already different from the ACPI one), but this looks gross. I could also attach
the Xen PV timer to the nexus directly (as it was done before), but I also
prefer to keep all Xen-related devices in the same bus for coherency. Last
option would be to add some kind of suspend/resume priorities to the devices,
and do more than one suspend/resume pass. This is more complex and requires more
changes, so I would like to know if it would be helpful for other systems, or if
someone has already attempted to do it.

Thanks, Roger.
John Baldwin
2016-12-15 21:38:11 UTC
Permalink
Post by Roger Pau Monné
Hello,
I'm currently dealing with a bug in the Xen suspend/resume sequence, and I've
found that lacking a way to order device priority during suspend/resume is
proving quite harmful for Xen (and maybe other systems too). The current
suspend/resume code simply scans the root bus, and suspends/resumes every device
based on the order they are attached to their parents. The problem here is that
there's no way to tell that some devices should be resumed before others, for
example the event timers/time counters/uarts should definitely be resume before
other devices, but that's seems to happens mostly out of chance.
Currently most time related devices are attached directly to the nexus, which
means they will get resumed first, but for example the uart is currently
attached to the pci bus IIRC, which means it gets resumed quite late. On Xen
systems, this is even worse. The Xen PV bus (that contains all Xen-related
devices) is attached the last one (because it tends to pick up unused memory
regions for it's own usage) and this bus also contains the PV timecounter which
should be resumed _before_ other devices, or else timecounting will be
completely screwed and things can get stuck in indefinitely long loops (due to
the fact that the timecounter is implemented based on the uptime of the host,
and that changes from host-to-host).
In order to solve this I could add a hack to the Xen resume process (which is
already different from the ACPI one), but this looks gross. I could also attach
the Xen PV timer to the nexus directly (as it was done before), but I also
prefer to keep all Xen-related devices in the same bus for coherency. Last
option would be to add some kind of suspend/resume priorities to the devices,
and do more than one suspend/resume pass. This is more complex and requires more
changes, so I would like to know if it would be helpful for other systems, or if
someone has already attempted to do it.
I think Justin Hibbits had some patches to make use of the boot-time new-bus
passes for suspend and resume which I think would help with this. You suspend
things in the reverse order of boot and resume operates in the same order as
boot.
--
John Baldwin
Justin Hibbits
2016-12-16 04:34:51 UTC
Permalink
Post by John Baldwin
Post by Roger Pau Monné
Hello,
I'm currently dealing with a bug in the Xen suspend/resume
sequence, and I've
found that lacking a way to order device priority during suspend/
resume is
proving quite harmful for Xen (and maybe other systems too). The current
suspend/resume code simply scans the root bus, and suspends/resumes every device
based on the order they are attached to their parents. The problem here is that
there's no way to tell that some devices should be resumed before others, for
example the event timers/time counters/uarts should definitely be resume before
other devices, but that's seems to happens mostly out of chance.
Currently most time related devices are attached directly to the nexus, which
means they will get resumed first, but for example the uart is currently
attached to the pci bus IIRC, which means it gets resumed quite late. On Xen
systems, this is even worse. The Xen PV bus (that contains all Xen-
related
devices) is attached the last one (because it tends to pick up unused memory
regions for it's own usage) and this bus also contains the PV
timecounter which
should be resumed _before_ other devices, or else timecounting will be
completely screwed and things can get stuck in indefinitely long loops (due to
the fact that the timecounter is implemented based on the uptime of the host,
and that changes from host-to-host).
In order to solve this I could add a hack to the Xen resume process (which is
already different from the ACPI one), but this looks gross. I could also attach
the Xen PV timer to the nexus directly (as it was done before), but I also
prefer to keep all Xen-related devices in the same bus for
coherency. Last
option would be to add some kind of suspend/resume priorities to the devices,
and do more than one suspend/resume pass. This is more complex and requires more
changes, so I would like to know if it would be helpful for other systems, or if
someone has already attempted to do it.
I think Justin Hibbits had some patches to make use of the boot-time new-bus
passes for suspend and resume which I think would help with this.
You suspend
things in the reverse order of boot and resume operates in the same order as
boot.
--
John Baldwin
John is right. I have a (somewhat abandoned due to time and focus)
branch, https://svnweb.freebsd.org/base/projects/pmac_pmu/ which has
the necessary code working mostly on PowerPC. The diff can be found
at https://reviews.freebsd.org/D203 too.

- Justin
Warner Losh
2016-12-16 05:25:05 UTC
Permalink
Post by John Baldwin
Post by Roger Pau Monné
Hello,
I'm currently dealing with a bug in the Xen suspend/resume sequence, and I've
found that lacking a way to order device priority during suspend/resume is
proving quite harmful for Xen (and maybe other systems too). The current
suspend/resume code simply scans the root bus, and suspends/resumes every device
based on the order they are attached to their parents. The problem here is that
there's no way to tell that some devices should be resumed before others, for
example the event timers/time counters/uarts should definitely be resume before
other devices, but that's seems to happens mostly out of chance.
Currently most time related devices are attached directly to the nexus, which
means they will get resumed first, but for example the uart is currently
attached to the pci bus IIRC, which means it gets resumed quite late. On Xen
systems, this is even worse. The Xen PV bus (that contains all Xen-related
devices) is attached the last one (because it tends to pick up unused memory
regions for it's own usage) and this bus also contains the PV timecounter which
should be resumed _before_ other devices, or else timecounting will be
completely screwed and things can get stuck in indefinitely long loops (due to
the fact that the timecounter is implemented based on the uptime of the host,
and that changes from host-to-host).
In order to solve this I could add a hack to the Xen resume process (which is
already different from the ACPI one), but this looks gross. I could also attach
the Xen PV timer to the nexus directly (as it was done before), but I also
prefer to keep all Xen-related devices in the same bus for coherency. Last
option would be to add some kind of suspend/resume priorities to the devices,
and do more than one suspend/resume pass. This is more complex and requires more
changes, so I would like to know if it would be helpful for other systems, or if
someone has already attempted to do it.
I think Justin Hibbits had some patches to make use of the boot-time new-bus
passes for suspend and resume which I think would help with this. You suspend
things in the reverse order of boot and resume operates in the same order as
boot.
--
John Baldwin
John is right. I have a (somewhat abandoned due to time and focus) branch,
https://svnweb.freebsd.org/base/projects/pmac_pmu/ which has the necessary
code working mostly on PowerPC. The diff can be found at
https://reviews.freebsd.org/D203 too.
Cool. Does it have a mechanism similar to the attach code that lets
you run again at each pass?

Warner
Justin Hibbits
2016-12-22 19:37:04 UTC
Permalink
On Thu, Dec 15, 2016 at 8:34 PM, Justin Hibbits
Post by John Baldwin
Post by Roger Pau Monné
Hello,
I'm currently dealing with a bug in the Xen suspend/resume
sequence, and
I've
found that lacking a way to order device priority during suspend/
resume
is
proving quite harmful for Xen (and maybe other systems too). The current
suspend/resume code simply scans the root bus, and suspends/
resumes every
device
based on the order they are attached to their parents. The
problem here
is that
there's no way to tell that some devices should be resumed before
others,
for
example the event timers/time counters/uarts should definitely be
resume
before
other devices, but that's seems to happens mostly out of chance.
Currently most time related devices are attached directly to the
nexus,
which
means they will get resumed first, but for example the uart is currently
attached to the pci bus IIRC, which means it gets resumed quite
late. On
Xen
systems, this is even worse. The Xen PV bus (that contains all Xen-related
devices) is attached the last one (because it tends to pick up
unused
memory
regions for it's own usage) and this bus also contains the PV
timecounter
which
should be resumed _before_ other devices, or else timecounting will be
completely screwed and things can get stuck in indefinitely long
loops
(due to
the fact that the timecounter is implemented based on the uptime
of the
host,
and that changes from host-to-host).
In order to solve this I could add a hack to the Xen resume process (which is
already different from the ACPI one), but this looks gross. I
could also
attach
the Xen PV timer to the nexus directly (as it was done before),
but I
also
prefer to keep all Xen-related devices in the same bus for
coherency.
Last
option would be to add some kind of suspend/resume priorities to
the
devices,
and do more than one suspend/resume pass. This is more complex and requires more
changes, so I would like to know if it would be helpful for other systems, or if
someone has already attempted to do it.
I think Justin Hibbits had some patches to make use of the boot-time new-bus
passes for suspend and resume which I think would help with this.
You
suspend
things in the reverse order of boot and resume operates in the
same order
as
boot.
--
John Baldwin
John is right. I have a (somewhat abandoned due to time and focus) branch,
https://svnweb.freebsd.org/base/projects/pmac_pmu/ which has the necessary
code working mostly on PowerPC. The diff can be found at
https://reviews.freebsd.org/D203 too.
Cool. Does it have a mechanism similar to the attach code that lets
you run again at each pass?
Warner
Not exactly. The code will call the BUS_SUSPEND_CHILD() as it rolls
back the pass levels, and stop on errors. The meat is in a rewrite of
bus_generic_suspend() in that review.

- Justin

Loading...