Roger Pau Monné
2016-12-15 11:40:33 UTC
Hello,
I'm currently dealing with a bug in the Xen suspend/resume sequence, and I've
found that lacking a way to order device priority during suspend/resume is
proving quite harmful for Xen (and maybe other systems too). The current
suspend/resume code simply scans the root bus, and suspends/resumes every device
based on the order they are attached to their parents. The problem here is that
there's no way to tell that some devices should be resumed before others, for
example the event timers/time counters/uarts should definitely be resume before
other devices, but that's seems to happens mostly out of chance.
Currently most time related devices are attached directly to the nexus, which
means they will get resumed first, but for example the uart is currently
attached to the pci bus IIRC, which means it gets resumed quite late. On Xen
systems, this is even worse. The Xen PV bus (that contains all Xen-related
devices) is attached the last one (because it tends to pick up unused memory
regions for it's own usage) and this bus also contains the PV timecounter which
should be resumed _before_ other devices, or else timecounting will be
completely screwed and things can get stuck in indefinitely long loops (due to
the fact that the timecounter is implemented based on the uptime of the host,
and that changes from host-to-host).
In order to solve this I could add a hack to the Xen resume process (which is
already different from the ACPI one), but this looks gross. I could also attach
the Xen PV timer to the nexus directly (as it was done before), but I also
prefer to keep all Xen-related devices in the same bus for coherency. Last
option would be to add some kind of suspend/resume priorities to the devices,
and do more than one suspend/resume pass. This is more complex and requires more
changes, so I would like to know if it would be helpful for other systems, or if
someone has already attempted to do it.
Thanks, Roger.
I'm currently dealing with a bug in the Xen suspend/resume sequence, and I've
found that lacking a way to order device priority during suspend/resume is
proving quite harmful for Xen (and maybe other systems too). The current
suspend/resume code simply scans the root bus, and suspends/resumes every device
based on the order they are attached to their parents. The problem here is that
there's no way to tell that some devices should be resumed before others, for
example the event timers/time counters/uarts should definitely be resume before
other devices, but that's seems to happens mostly out of chance.
Currently most time related devices are attached directly to the nexus, which
means they will get resumed first, but for example the uart is currently
attached to the pci bus IIRC, which means it gets resumed quite late. On Xen
systems, this is even worse. The Xen PV bus (that contains all Xen-related
devices) is attached the last one (because it tends to pick up unused memory
regions for it's own usage) and this bus also contains the PV timecounter which
should be resumed _before_ other devices, or else timecounting will be
completely screwed and things can get stuck in indefinitely long loops (due to
the fact that the timecounter is implemented based on the uptime of the host,
and that changes from host-to-host).
In order to solve this I could add a hack to the Xen resume process (which is
already different from the ACPI one), but this looks gross. I could also attach
the Xen PV timer to the nexus directly (as it was done before), but I also
prefer to keep all Xen-related devices in the same bus for coherency. Last
option would be to add some kind of suspend/resume priorities to the devices,
and do more than one suspend/resume pass. This is more complex and requires more
changes, so I would like to know if it would be helpful for other systems, or if
someone has already attempted to do it.
Thanks, Roger.