Nathan Whitehorn
2016-07-24 22:29:38 UTC
There is a very long thread on the SVN list about this that has lasted
much too long and should be over here. So, I'd like to start with a
clean slate.
The discussion is related to r301453, which adds a function
BUS_MAP_INTR() and a function bus_extend_resource() that allow a parent
bus to decode SYS_RES_IRQ-type resources during bus_alloc_resource().
This is designed to allow the parent bus to read in flags from the
device tree regarding IRQ setup (polarity, trigger mode) and pass that
along to the interrupt controller with the resource allocation request.
By specifying an enum intr_map_data_type, you can also specify what kind
of interrupt specifier is being added to the decoration (ACPI, FDT, GPIO).
This is a departure from the existing code used on device tree systems
(OFW_BUS_MAP_INTR()), which allocates a virtual IRQ corresponding to an
interrupt-parent key and one of the device tree's opaque
arbitrary-length interrupt specifiers at resource assignment time. The
information that virtual IRQ maps to is cached and then applied by the
PIC when the PIC's interrupts are configured, which may be after
BUS_SETUP_INTR() if the call is made before interrupts are on and the
PIC hasn't attached at that point.
I am making a wiki page at
https://wiki.freebsd.org/Complicated_Interrupts to describe the current
implementation and the rationale for its implementation, which should
exist by the time most read this. Some of that content is copied below
for ease of replying.
My concern is that the new API (r301453) parallels the existing one in a
way that will require both to be maintained indefinitely, while
providing less functionality, in particular in three ways:
1. Breaking the opacity of the device tree's interrupt specifiers (which
only the PIC driver knows what to do with)
2. Requiring the bus parent to know exactly how to map an IRQ number
(poorly determined, as above) to a device tree entry (which may not
include it -- see below) and know how to interpret it (above, which only
the PIC driver knows)
3. It also requires that the PIC driver already be attached, which
cannot be guaranteed on some systems where the PIC is a bus child of
devices with interrupt on that same PIC.
What I would like to establish here, rather than just being cranky, is
that this new API both (a) does something on real hardware that the
existing API cannot do, either currently or with trivial modifications
and (b) is capable of expressing the things the current API can express.
If the answer to either of those is "no", we're going to have to support
both in perpetuity, with different paths on different platforms and the
whole thing is going to be a huge mess. We're in a situation right now
where that will be baked into FreeBSD 11 for the duration of the branch,
which is quite unfortunate.
-Nathan
---- Excerpt of wiki text ----
---- Part 1: Overview of mechanism ----
The core part of this system is a registry in machine-dependent code
that maps some description of an interrupt to an IRQ number used by the
rest of the kernel. This number is arbitrary; on systems in which a
useful human-readable number can be extracted in a general way from the
description, it is helpful for users for the IRQ number to be related to
something about the system (e.g. the interrupt pin on single-controller
systems) but it can be just a monotonically increasing integer.
Currently one interrupt mapping strategy is implemented: Open Firmware
(or FDT) interrupt-parent / interrupt specifier tuples to IRQ. Bus code
maps the Open Firmware interrupt specifier using the ofw_bus_map_intr()
function, which is cascaded through the bus hierarchy and is usually
resolved by nexus.
int ofw_bus_map_intr(device_t dev, phandle_t iparent, int icells,
pcell_t *intr);
This takes the requesting device, the xref phandle of the interrupt
parent (e.g. from the "interrupt-parent" property, or the equivalent
entry in an interrupt-map) and the byte string describing the interrupt
(e.g. the contents of the "interrupts" property, or the equivalent entry
in an interrupt-map) and returns a unique IRQ number that can be added
to a resource list and used with bus_alloc_resource(), bus_setup_intr(),
etc.
In the event you needed more than OF-type mappings (e.g. for ACPI) you
could add an equivalent acpi_bus_map_intr() method to nexus that
tabulates mappings in parallel based on different data.
---- Part 2: Rationale ----
As a separate issue, it would be great if you could comment on a way to
implement the following two scenarios with this API, which I think are
currently impossible and would need to solved to avoid bifurcation.
These kinds of things are what drove the current API.
--- Case 1: the G5 Powermac ---
I have hardware with two PICs, one cascaded from the other. PIC 1 lives
in the northbridge, and PIC 2 lives on a device on the PCI bus behind a
couple of bridges. Depending on the era of the hardware, these are
cascaded in different directions. They have different interrupt
specifier formats.
A. How do I represent interrupts on the PCI bus parent of PIC 2 that are
handled by PIC 2? PIC 2 obviously can't attach before its bus parent,
but the bus parent can't complete initialization without the ability to
setup its interrupts.
B. Devices on the PCI bus have interrupts handled by a mixture of PIC 1
and PIC 2, sometimes on the same device and not always expressable
through the bus hierarchy. For example, one of the two storage
controllers has an interrupt on PIC 2 run through a wire that doesn't go
through the PCI connection and so isn't in the interrupt-map of the PCI
bus, which is wired (mostly) to PIC 1, and about which the parent bus
can and should know nothing.
--- Case 2: IBM OPAL firmware ---
The /ibm,opal device on IBM PowerNV systems has a non-standard
interrupts property ("opal-interrupts") that contains the list of IRQs
that should be forwarded to the firmware. These are not interrupts
belonging to a single physical device at /ibm,opal (which is a virtual
device anyway) and so are not in the interrupts property; nor do they
necessarily share an interrupt parent. How do I represent this?
--- Case 3: IBM XICS interrupts ---
On virtualized (and most non-virtualized) IBM hardware, the interrupts
are one cell and that single cell encodes the interrupt parent, the line
sense, and the IRQ.
--- Case 4: MSIs ---
MSIs are assigned purely by the PCI bus and the PCI bus parent can't
know about them from the device tree. How does the bus parent sensibly
decorate resources like this? The PCI MSI API assumes that these all
exist purely as 32-bit integers and they are not assigned through
resource lists in the conventional fashion.
much too long and should be over here. So, I'd like to start with a
clean slate.
The discussion is related to r301453, which adds a function
BUS_MAP_INTR() and a function bus_extend_resource() that allow a parent
bus to decode SYS_RES_IRQ-type resources during bus_alloc_resource().
This is designed to allow the parent bus to read in flags from the
device tree regarding IRQ setup (polarity, trigger mode) and pass that
along to the interrupt controller with the resource allocation request.
By specifying an enum intr_map_data_type, you can also specify what kind
of interrupt specifier is being added to the decoration (ACPI, FDT, GPIO).
This is a departure from the existing code used on device tree systems
(OFW_BUS_MAP_INTR()), which allocates a virtual IRQ corresponding to an
interrupt-parent key and one of the device tree's opaque
arbitrary-length interrupt specifiers at resource assignment time. The
information that virtual IRQ maps to is cached and then applied by the
PIC when the PIC's interrupts are configured, which may be after
BUS_SETUP_INTR() if the call is made before interrupts are on and the
PIC hasn't attached at that point.
I am making a wiki page at
https://wiki.freebsd.org/Complicated_Interrupts to describe the current
implementation and the rationale for its implementation, which should
exist by the time most read this. Some of that content is copied below
for ease of replying.
My concern is that the new API (r301453) parallels the existing one in a
way that will require both to be maintained indefinitely, while
providing less functionality, in particular in three ways:
1. Breaking the opacity of the device tree's interrupt specifiers (which
only the PIC driver knows what to do with)
2. Requiring the bus parent to know exactly how to map an IRQ number
(poorly determined, as above) to a device tree entry (which may not
include it -- see below) and know how to interpret it (above, which only
the PIC driver knows)
3. It also requires that the PIC driver already be attached, which
cannot be guaranteed on some systems where the PIC is a bus child of
devices with interrupt on that same PIC.
What I would like to establish here, rather than just being cranky, is
that this new API both (a) does something on real hardware that the
existing API cannot do, either currently or with trivial modifications
and (b) is capable of expressing the things the current API can express.
If the answer to either of those is "no", we're going to have to support
both in perpetuity, with different paths on different platforms and the
whole thing is going to be a huge mess. We're in a situation right now
where that will be baked into FreeBSD 11 for the duration of the branch,
which is quite unfortunate.
-Nathan
---- Excerpt of wiki text ----
---- Part 1: Overview of mechanism ----
The core part of this system is a registry in machine-dependent code
that maps some description of an interrupt to an IRQ number used by the
rest of the kernel. This number is arbitrary; on systems in which a
useful human-readable number can be extracted in a general way from the
description, it is helpful for users for the IRQ number to be related to
something about the system (e.g. the interrupt pin on single-controller
systems) but it can be just a monotonically increasing integer.
Currently one interrupt mapping strategy is implemented: Open Firmware
(or FDT) interrupt-parent / interrupt specifier tuples to IRQ. Bus code
maps the Open Firmware interrupt specifier using the ofw_bus_map_intr()
function, which is cascaded through the bus hierarchy and is usually
resolved by nexus.
int ofw_bus_map_intr(device_t dev, phandle_t iparent, int icells,
pcell_t *intr);
This takes the requesting device, the xref phandle of the interrupt
parent (e.g. from the "interrupt-parent" property, or the equivalent
entry in an interrupt-map) and the byte string describing the interrupt
(e.g. the contents of the "interrupts" property, or the equivalent entry
in an interrupt-map) and returns a unique IRQ number that can be added
to a resource list and used with bus_alloc_resource(), bus_setup_intr(),
etc.
In the event you needed more than OF-type mappings (e.g. for ACPI) you
could add an equivalent acpi_bus_map_intr() method to nexus that
tabulates mappings in parallel based on different data.
---- Part 2: Rationale ----
As a separate issue, it would be great if you could comment on a way to
implement the following two scenarios with this API, which I think are
currently impossible and would need to solved to avoid bifurcation.
These kinds of things are what drove the current API.
--- Case 1: the G5 Powermac ---
I have hardware with two PICs, one cascaded from the other. PIC 1 lives
in the northbridge, and PIC 2 lives on a device on the PCI bus behind a
couple of bridges. Depending on the era of the hardware, these are
cascaded in different directions. They have different interrupt
specifier formats.
A. How do I represent interrupts on the PCI bus parent of PIC 2 that are
handled by PIC 2? PIC 2 obviously can't attach before its bus parent,
but the bus parent can't complete initialization without the ability to
setup its interrupts.
B. Devices on the PCI bus have interrupts handled by a mixture of PIC 1
and PIC 2, sometimes on the same device and not always expressable
through the bus hierarchy. For example, one of the two storage
controllers has an interrupt on PIC 2 run through a wire that doesn't go
through the PCI connection and so isn't in the interrupt-map of the PCI
bus, which is wired (mostly) to PIC 1, and about which the parent bus
can and should know nothing.
--- Case 2: IBM OPAL firmware ---
The /ibm,opal device on IBM PowerNV systems has a non-standard
interrupts property ("opal-interrupts") that contains the list of IRQs
that should be forwarded to the firmware. These are not interrupts
belonging to a single physical device at /ibm,opal (which is a virtual
device anyway) and so are not in the interrupts property; nor do they
necessarily share an interrupt parent. How do I represent this?
--- Case 3: IBM XICS interrupts ---
On virtualized (and most non-virtualized) IBM hardware, the interrupts
are one cell and that single cell encodes the interrupt parent, the line
sense, and the IRQ.
--- Case 4: MSIs ---
MSIs are assigned purely by the PCI bus and the PCI bus parent can't
know about them from the device tree. How does the bus parent sensibly
decorate resources like this? The PCI MSI API assumes that these all
exist purely as 32-bit integers and they are not assigned through
resource lists in the conventional fashion.