Discussion:
INTRNG (Was: svn commit: r301453....)
Michal Meloun
2016-07-25 13:34:01 UTC
Permalink
[snip]
For this reason we makes domain key composite, in our model, the
domain key consist of "domain key type"
and "domain key value". This composite key guarantees uniqueness
and it also allows to select proper parser for byte string.
Yes, but this solves what is a nonexistant problem by making the
system substantially less flexible and more invasive. Which is not
a good tradeoff.
I think that existence of problem is confirmed in the above example .
"We could solve this in a number of ways, ... , or adding another
value (domain type, for example)."
What can I say more ...
Except that the example you gave *is not an example* of the problem
1) You had interrupts in a GPIO property rather than in an
interrupts property (or equivalent).
2) *And* you had interrupts on GPIOs in an interrupts property (or
equivalent)
3) *and* those are encoded differently
4) *and* the different encodings use the same number of cells
5) *and* are not otherwise distinguishable
Does that ever actually happen, in the real world, on any device
tree? You could imagine any kind of messed up thing you want, but we
shouldn't structure APIs around them, especially given that the
current alternative you are proposing has real, concrete problems on
real hardware that actually exists.
Allow me to respond to this issue only. I think that this main
issue, everything else looks resolvable for me (or, in worst case,
can be converted to MD code).
from linux/arch/arm/boot/dts/am335x-evm.dts (it's first file that I'm
searched)
----------------------------------------------------------------------------------------------------------------------------
&gpmc {
status = "okay";
pinctrl-names = "default";
pinctrl-0 = <&nandflash_pins_s0>;
ranges = <0 0 0x08000000 0x1000000>; /* CS0: 16MB for NAND */
compatible = "ti,omap2-nand";
reg = <0 0 4>; /* CS0, offset 0, IO size 4 */
interrupt-parent = <&gpmc>;
interrupts = <0 IRQ_TYPE_NONE>, /* fifoevent */
<1 IRQ_TYPE_NONE>; /* termcount */
rb-gpios = <&gpmc 0 GPIO_ACTIVE_HIGH>; /* gpmc_wait0 */
----------------------------------------------------------------------------------------------------------------------------
so we have
rb-gpios = <&gpmc 0 GPIO_ACTIVE_HIGH>; /* gpmc_wait0 */
and
interrupt-parent = <&gpmc>;
interrupts = <0 IRQ_TYPE_NONE>, /* fifoevent */
--- or ---
from linux/arch/arm/boot/dts/exynos5420-peach-pit.dts
compatible = "maxim,max98090";
reg = <0x10>;
interrupts = <2 0>;
interrupt-parent = <&gpx0>;
pinctrl-names = "default";
and
sleep-gpios = <&gpx3 5 GPIO_ACTIVE_HIGH>;
also, in this case, first cell in 'interrupts' property have
different meaning that second cell in 'sleep-gpio'.
(on exynos, not all gpios supports interrrupts)
In general, binding for 'interrupts ' and '<foo>-gpios ' are
different, but can have same size .
Thanks so much for the concrete example! This is very helpful. Three
1. Are the drivers for these devices actually taking interrupts on the
lines defined in the *-gpios property? It seems weird to me that there
would be interrupts on GPIOs and, simultaneously, implicit interrupts
on other GPIOs. It looks like the things on the *-gpios properties are
things you set rather than things you would request an interrupt on
(sleep for the second case, a wait pin in the first).
Yes, interrupts on GPIO pins are relatively widely used. For example,
SDHCI driver uses this for handling of card detect pin. Moreover, the
SDHCI driver have zero knowledge about given GPIO pin
interrupt capabilities. He simply tries to allocate interruption, and
when it fails, it switches to a pooling mode.
Other usage is HDMI cable/monitor detect input, USB VBUS pin or so...
2. What is the different meaning? I assumed the second cell in both
cases is the GPIO pin number, because anything else would be crazy.
No, on exynos (if memory serves me), not all GPIOs have interrupt
capability, and they are numbered independently.
But again, please note that both *-gpios and interrupts properties are
defined independently, they can be formatted differently even within a
single controller.
3. What on earth is IRQ_TYPE_NONE? That's an OpenPIC flag that just
disables the interrupt entirely. Why would you want to configure
things that way?
In Linux DT, the IRQ_TYPE_NONE is equivalent of our
<INTR_POLARITY_CONFORM, INTR_TRIGGER_CONFORM> pair -> don't change
actual interrupt configuration, i want to reuse default or already
preset configuration (from uboot for example).
(#1 is the only one that really matters here; the rest are personal
curiosity)
If the answer to (1) is "yes", there would be a good case for adding
something like a ofw_bus_map_gpio_as_intr() or the like that takes the
same arguments as ofw_bus_map_intr() and propagates them to the
control in some way but with a flag. It's an easy extension and this
is the reason the function has "ofw" in the name. My one qualm about
that is that there is no guarantee that a GPIO controller can actually
provide interrupts on a random GPIO (nor is it clear that that is
actually what is wanted here). That qualm applies to both APIs, of
course, and I still don't see a rationale for throwing away the
existing code even if (1) is true.
But current INTRNG is no longer OFW centric, it can easily handle any
other 'configuration source'. We removed it (from INTRNG core) at
r297539, at request from MIPS folks.
But again, all what I want in this area is (for me) simple change of
your ofw_bus_map_intr() method to something like:

enum intr_map_data_type {
INTR_MAP_DATA_OFW,
INTR_MAP_DATA_GPIO,
...
};

int
bus_map_intr(device_t dev, enum intr_map_data_type type, uintptr_t
pic_id, void *config_data, int config_size)

Please, don't take current implementation of bus_map_intr() in account.
From my current point of view, the method is badly named and his name
doesn't reflect its functionality.

Michal
I've changed the mailing list here since the SVN list really isn't the
best place. For people who want context and are seeing this for the
first time, please see the other email I just sent.
-Nathan
Nathan Whitehorn
2016-07-25 14:05:02 UTC
Permalink
Post by Michal Meloun
[snip]
For this reason we makes domain key composite, in our model, the
domain key consist of "domain key type"
and "domain key value". This composite key guarantees uniqueness
and it also allows to select proper parser for byte string.
Yes, but this solves what is a nonexistant problem by making the
system substantially less flexible and more invasive. Which is not
a good tradeoff.
I think that existence of problem is confirmed in the above example .
"We could solve this in a number of ways, ... , or adding another
value (domain type, for example)."
What can I say more ...
Except that the example you gave *is not an example* of the problem
1) You had interrupts in a GPIO property rather than in an
interrupts property (or equivalent).
2) *And* you had interrupts on GPIOs in an interrupts property (or
equivalent)
3) *and* those are encoded differently
4) *and* the different encodings use the same number of cells
5) *and* are not otherwise distinguishable
Does that ever actually happen, in the real world, on any device
tree? You could imagine any kind of messed up thing you want, but we
shouldn't structure APIs around them, especially given that the
current alternative you are proposing has real, concrete problems on
real hardware that actually exists.
Allow me to respond to this issue only. I think that this main
issue, everything else looks resolvable for me (or, in worst case,
can be converted to MD code).
from linux/arch/arm/boot/dts/am335x-evm.dts (it's first file that I'm
searched)
----------------------------------------------------------------------------------------------------------------------------
&gpmc {
status = "okay";
pinctrl-names = "default";
pinctrl-0 = <&nandflash_pins_s0>;
ranges = <0 0 0x08000000 0x1000000>; /* CS0: 16MB for NAND */
compatible = "ti,omap2-nand";
reg = <0 0 4>; /* CS0, offset 0, IO size 4 */
interrupt-parent = <&gpmc>;
interrupts = <0 IRQ_TYPE_NONE>, /* fifoevent */
<1 IRQ_TYPE_NONE>; /* termcount */
rb-gpios = <&gpmc 0 GPIO_ACTIVE_HIGH>; /* gpmc_wait0 */
----------------------------------------------------------------------------------------------------------------------------
so we have
rb-gpios = <&gpmc 0 GPIO_ACTIVE_HIGH>; /* gpmc_wait0 */
and
interrupt-parent = <&gpmc>;
interrupts = <0 IRQ_TYPE_NONE>, /* fifoevent */
--- or ---
from linux/arch/arm/boot/dts/exynos5420-peach-pit.dts
compatible = "maxim,max98090";
reg = <0x10>;
interrupts = <2 0>;
interrupt-parent = <&gpx0>;
pinctrl-names = "default";
and
sleep-gpios = <&gpx3 5 GPIO_ACTIVE_HIGH>;
also, in this case, first cell in 'interrupts' property have
different meaning that second cell in 'sleep-gpio'.
(on exynos, not all gpios supports interrrupts)
In general, binding for 'interrupts ' and '<foo>-gpios ' are
different, but can have same size .
Thanks so much for the concrete example! This is very helpful. Three
1. Are the drivers for these devices actually taking interrupts on the
lines defined in the *-gpios property? It seems weird to me that there
would be interrupts on GPIOs and, simultaneously, implicit interrupts
on other GPIOs. It looks like the things on the *-gpios properties are
things you set rather than things you would request an interrupt on
(sleep for the second case, a wait pin in the first).
Yes, interrupts on GPIO pins are relatively widely used. For example,
SDHCI driver uses this for handling of card detect pin. Moreover, the
SDHCI driver have zero knowledge about given GPIO pin
interrupt capabilities. He simply tries to allocate interruption, and
when it fails, it switches to a pooling mode.
Other usage is HDMI cable/monitor detect input, USB VBUS pin or so...
That wasn't my question. Are these particular device drivers allocating
interrupts both on the GPIOs in their "interrupts" property (which are
entirely GPIOs in this example) *and* on the GPIOs listed as resources
but not listed as interrupts? If they are, then you need a switching
mechanism, but that seems pretty unlikely given the names of the
non-interrupt GPIOs (they look like outputs). It would also be a
somewhat deranged way to set up a device tree -- not that that rules it
out or anything.
Post by Michal Meloun
2. What is the different meaning? I assumed the second cell in both
cases is the GPIO pin number, because anything else would be crazy.
No, on exynos (if memory serves me), not all GPIOs have interrupt
capability, and they are numbered independently.
But again, please note that both *-gpios and interrupts properties are
defined independently, they can be formatted differently even within a
single controller.
Which is totally fine. The question is if you need to set up interrupts
on the GPIOs both the "interrupts" and non-interrupts properties, which
is easy enough to handle, but kind of weird.
Post by Michal Meloun
3. What on earth is IRQ_TYPE_NONE? That's an OpenPIC flag that just
disables the interrupt entirely. Why would you want to configure
things that way?
In Linux DT, the IRQ_TYPE_NONE is equivalent of our
<INTR_POLARITY_CONFORM, INTR_TRIGGER_CONFORM> pair -> don't change
actual interrupt configuration, i want to reuse default or already
preset configuration (from uboot for example).
It's not a Linux thing, it's from the OpenPIC spec. Probably it just
means the interrupt isn't configurable, no that it matters.
Post by Michal Meloun
(#1 is the only one that really matters here; the rest are personal
curiosity)
If the answer to (1) is "yes", there would be a good case for adding
something like a ofw_bus_map_gpio_as_intr() or the like that takes the
same arguments as ofw_bus_map_intr() and propagates them to the
control in some way but with a flag. It's an easy extension and this
is the reason the function has "ofw" in the name. My one qualm about
that is that there is no guarantee that a GPIO controller can actually
provide interrupts on a random GPIO (nor is it clear that that is
actually what is wanted here). That qualm applies to both APIs, of
course, and I still don't see a rationale for throwing away the
existing code even if (1) is true.
But current INTRNG is no longer OFW centric, it can easily handle any
other 'configuration source'. We removed it (from INTRNG core) at
r297539, at request from MIPS folks.
But again, all what I want in this area is (for me) simple change of
enum intr_map_data_type {
INTR_MAP_DATA_OFW,
INTR_MAP_DATA_GPIO,
...
};
int
bus_map_intr(device_t dev, enum intr_map_data_type type, uintptr_t
pic_id, void *config_data, int config_size)
Please, don't take current implementation of bus_map_intr() in account.
From my current point of view, the method is badly named and his name
doesn't reflect its functionality.
Fair enough. My core issue is basically that this code connects
interrupts to the newbus hierarchy, which simply doesn't work in a lot
of cases when the newbus hierarchy doesn't match the interrupt one,
which is quite common (see my other mail). Unfortunately, that's a
fundamental architectural issue with r301453 and so I'm not sure how to
proceed here.

The current mechanism I was discussing is not OFW-centric either,
although it is true that the only current mapping method
(ofw_bus_map_intr()) is (though it can be abused for non-OFW data
sources successfully), but you could easily add an acpi_bus_map_intr()
or whatever that takes different data and connects in the MD interrupt
code. I have no objections to that at all. I'm still confused about how
your GPIO interrupts work (above), but that's a detail. Is that
something that you think would be workable? I'd be more than happy to
write some prototype code.
-Nathan
Post by Michal Meloun
Michal
I've changed the mailing list here since the SVN list really isn't the
best place. For people who want context and are seeing this for the
first time, please see the other email I just sent.
-Nathan
Warner Losh
2016-07-25 16:32:24 UTC
Permalink
On Mon, Jul 25, 2016 at 8:05 AM, Nathan Whitehorn
Post by Nathan Whitehorn
That wasn't my question. Are these particular device drivers allocating
interrupts both on the GPIOs in their "interrupts" property (which are
entirely GPIOs in this example) *and* on the GPIOs listed as resources but
not listed as interrupts? If they are, then you need a switching mechanism,
but that seems pretty unlikely given the names of the non-interrupt GPIOs
(they look like outputs). It would also be a somewhat deranged way to set up
a device tree -- not that that rules it out or anything.
On Atmel, there's a situation that this covers, I think.

The MCI device has an interrupt in the core:

mmc0: ***@fffa8000 {
compatible = "atmel,hsmci";
reg = <0xfffa8000 0x600>;
interrupts = <9 IRQ_TYPE_LEVEL_HIGH 0>;
#address-cells = <1>;
#size-cells = <0>;
pinctrl-names = "default";
clocks = <&mci0_clk>;
clock-names = "mci_clk";
status = "disabled";
};

and in other places wires in GPIO interrupts for things like card
eject / insertion.

mmc0: ***@f0008000 {
pinctrl-0 = <
&pinctrl_board_mmc0
&pinctrl_mmc0_slot0_clk_cmd_dat0
&pinctrl_mmc0_slot0_dat1_3>;
status = "okay";
***@0 {
reg = <0>;
bus-width = <4>;
cd-gpios = <&pioD 15 GPIO_ACTIVE_HIGH>;
};
};

an interrupt is registered on the cd-gpios pin for when the card changes. At
least in linux, FreeBSD doesn't (yet) implement this, but will someday if I get
back to the armv6 atmel work I started (see at91-cosino.dts for example, there's
others).

I think this is an example of what you are asking about, or did I get
lost in the
twisty maze of conversation zigs and zags...

Warner
Nathan Whitehorn
2016-07-25 16:48:24 UTC
Permalink
Post by Warner Losh
On Mon, Jul 25, 2016 at 8:05 AM, Nathan Whitehorn
Post by Nathan Whitehorn
That wasn't my question. Are these particular device drivers allocating
interrupts both on the GPIOs in their "interrupts" property (which are
entirely GPIOs in this example) *and* on the GPIOs listed as resources but
not listed as interrupts? If they are, then you need a switching mechanism,
but that seems pretty unlikely given the names of the non-interrupt GPIOs
(they look like outputs). It would also be a somewhat deranged way to set up
a device tree -- not that that rules it out or anything.
On Atmel, there's a situation that this covers, I think.
compatible = "atmel,hsmci";
reg = <0xfffa8000 0x600>;
interrupts = <9 IRQ_TYPE_LEVEL_HIGH 0>;
#address-cells = <1>;
#size-cells = <0>;
pinctrl-names = "default";
clocks = <&mci0_clk>;
clock-names = "mci_clk";
status = "disabled";
};
and in other places wires in GPIO interrupts for things like card
eject / insertion.
pinctrl-0 = <
&pinctrl_board_mmc0
&pinctrl_mmc0_slot0_clk_cmd_dat0
&pinctrl_mmc0_slot0_dat1_3>;
status = "okay";
reg = <0>;
bus-width = <4>;
cd-gpios = <&pioD 15 GPIO_ACTIVE_HIGH>;
};
};
an interrupt is registered on the cd-gpios pin for when the card changes. At
least in linux, FreeBSD doesn't (yet) implement this, but will someday if I get
back to the armv6 atmel work I started (see at91-cosino.dts for example, there's
others).
I think this is an example of what you are asking about, or did I get
lost in the
twisty maze of conversation zigs and zags...
Warner
Where we would run into (minor) problems is if the interrupt parent for
the first mmc0 is the GPIO controller. More generally, if &pioD has
interrupt children specified in some way that is not a <pin, active
high/whatever> tuple somewhere else in the tree then you would have to
have methods to parse both interrupt specifiers
as-obtained-from-interrupts-properties (or equivalent) and specifiers
as-obtained-from-gpio-properties. If the tree picks one format and
sticks with it, you can get away with just the one. Glancing through the
DTS source for this board, that doesn't appear to be the case and the
property formatting is uniform, but I might have missed something in one
of the many #includes.

As a general point, GPIO weirdness would be easy enough case to handle
if it did come up (add some mapping method, as above) that I think we
shouldn't worry too much about it from an architectural point of view.
If a board appears that is set up this way, we can roll with the punches
at that point and add whatever small amount of shim code that is
required. It would be annoyance, sure, but not a real complication.
-Nathan
Warner Losh
2016-07-26 04:24:53 UTC
Permalink
On Mon, Jul 25, 2016 at 10:48 AM, Nathan Whitehorn
Post by Warner Losh
On Mon, Jul 25, 2016 at 8:05 AM, Nathan Whitehorn
Post by Nathan Whitehorn
That wasn't my question. Are these particular device drivers allocating
interrupts both on the GPIOs in their "interrupts" property (which are
entirely GPIOs in this example) *and* on the GPIOs listed as resources but
not listed as interrupts? If they are, then you need a switching mechanism,
but that seems pretty unlikely given the names of the non-interrupt GPIOs
(they look like outputs). It would also be a somewhat deranged way to set up
a device tree -- not that that rules it out or anything.
On Atmel, there's a situation that this covers, I think.
compatible = "atmel,hsmci";
reg = <0xfffa8000 0x600>;
interrupts = <9 IRQ_TYPE_LEVEL_HIGH 0>;
#address-cells = <1>;
#size-cells = <0>;
pinctrl-names = "default";
clocks = <&mci0_clk>;
clock-names = "mci_clk";
status = "disabled";
};
and in other places wires in GPIO interrupts for things like card
eject / insertion.
pinctrl-0 = <
&pinctrl_board_mmc0
&pinctrl_mmc0_slot0_clk_cmd_dat0
&pinctrl_mmc0_slot0_dat1_3>;
status = "okay";
reg = <0>;
bus-width = <4>;
cd-gpios = <&pioD 15
GPIO_ACTIVE_HIGH>;
};
};
an interrupt is registered on the cd-gpios pin for when the card changes. At
least in linux, FreeBSD doesn't (yet) implement this, but will someday if I get
back to the armv6 atmel work I started (see at91-cosino.dts for example, there's
others).
I think this is an example of what you are asking about, or did I get
lost in the
twisty maze of conversation zigs and zags...
Warner
Where we would run into (minor) problems is if the interrupt parent for the
first mmc0 is the GPIO controller. More generally, if &pioD has interrupt
children specified in some way that is not a <pin, active high/whatever>
tuple somewhere else in the tree then you would have to have methods to
parse both interrupt specifiers as-obtained-from-interrupts-properties (or
equivalent) and specifiers as-obtained-from-gpio-properties. If the tree
picks one format and sticks with it, you can get away with just the one.
Glancing through the DTS source for this board, that doesn't appear to be
the case and the property formatting is uniform, but I might have missed
something in one of the many #includes.
Interrupts and GPIO specifiers are different in subtle ways. The interrupt
parent for mmc0 is an AIC, which is also the ultimate parent of the GPIO
controller. But the properties for the GPIO pins that act as interrupts and
the interrupt specifiers are different.
As a general point, GPIO weirdness would be easy enough case to handle if it
did come up (add some mapping method, as above) that I think we shouldn't
worry too much about it from an architectural point of view. If a board
appears that is set up this way, we can roll with the punches at that point
and add whatever small amount of shim code that is required. It would be
annoyance, sure, but not a real complication.
I suspect that either I don't understand the issue, or we'll have such boards
very quickly. The Atmel design is fairly clean in comparison to other
franken-horrors
I've seen...

Warner
Nathan Whitehorn
2016-07-26 05:41:40 UTC
Permalink
Post by Warner Losh
On Mon, Jul 25, 2016 at 10:48 AM, Nathan Whitehorn
Post by Warner Losh
On Mon, Jul 25, 2016 at 8:05 AM, Nathan Whitehorn
Post by Nathan Whitehorn
That wasn't my question. Are these particular device drivers allocating
interrupts both on the GPIOs in their "interrupts" property (which are
entirely GPIOs in this example) *and* on the GPIOs listed as resources but
not listed as interrupts? If they are, then you need a switching mechanism,
but that seems pretty unlikely given the names of the non-interrupt GPIOs
(they look like outputs). It would also be a somewhat deranged way to set up
a device tree -- not that that rules it out or anything.
On Atmel, there's a situation that this covers, I think.
compatible = "atmel,hsmci";
reg = <0xfffa8000 0x600>;
interrupts = <9 IRQ_TYPE_LEVEL_HIGH 0>;
#address-cells = <1>;
#size-cells = <0>;
pinctrl-names = "default";
clocks = <&mci0_clk>;
clock-names = "mci_clk";
status = "disabled";
};
and in other places wires in GPIO interrupts for things like card
eject / insertion.
pinctrl-0 = <
&pinctrl_board_mmc0
&pinctrl_mmc0_slot0_clk_cmd_dat0
&pinctrl_mmc0_slot0_dat1_3>;
status = "okay";
reg = <0>;
bus-width = <4>;
cd-gpios = <&pioD 15 GPIO_ACTIVE_HIGH>;
};
};
an interrupt is registered on the cd-gpios pin for when the card changes. At
least in linux, FreeBSD doesn't (yet) implement this, but will someday if I get
back to the armv6 atmel work I started (see at91-cosino.dts for example, there's
others).
I think this is an example of what you are asking about, or did I get
lost in the
twisty maze of conversation zigs and zags...
Warner
Where we would run into (minor) problems is if the interrupt parent for the
first mmc0 is the GPIO controller. More generally, if &pioD has interrupt
children specified in some way that is not a <pin, active high/whatever>
tuple somewhere else in the tree then you would have to have methods to
parse both interrupt specifiers as-obtained-from-interrupts-properties (or
equivalent) and specifiers as-obtained-from-gpio-properties. If the tree
picks one format and sticks with it, you can get away with just the one.
Glancing through the DTS source for this board, that doesn't appear to be
the case and the property formatting is uniform, but I might have missed
something in one of the many #includes.
Interrupts and GPIO specifiers are different in subtle ways. The interrupt
parent for mmc0 is an AIC, which is also the ultimate parent of the GPIO
controller.
That is what it looked like.
Post by Warner Losh
But the properties for the GPIO pins that act as interrupts and
the interrupt specifiers are different.
So there are devices with both interrupts = <foo bar>, #interrupt-parent
= <&gpio> and gpios = <&gpio bleh baz> where "Bleh baz" is formatted
different than "foo bar" and both are meant to be treated as interrupts?

It's fine if there are, but I haven't seen any such device trees yet.
Post by Warner Losh
As a general point, GPIO weirdness would be easy enough case to handle if it
did come up (add some mapping method, as above) that I think we shouldn't
worry too much about it from an architectural point of view. If a board
appears that is set up this way, we can roll with the punches at that point
and add whatever small amount of shim code that is required. It would be
annoyance, sure, but not a real complication.
I suspect that either I don't understand the issue, or we'll have such boards
very quickly. The Atmel design is fairly clean in comparison to other
franken-horrors
I've seen...
People do weird things for sure. My point is just that the details of
how (implicit) GPIO interrupts are formatted just isn't that important.
It's easy enough to add special code for devices that are set up in
bizarre ways as they are spotted in the wild and that special code is so
minor that it doesn't matter for the design of the API. This is a point
I was just curious about, since I had never actually seen device trees
set up that way.

The issue with this patch is, at its core, whether you have an
architecture that relies on the newbus hierarchy (like r301453) or that
allows links outside of that hierarchy that can cross branches or go the
"wrong" way, like the previously existing code. There are some other
differences, of varying degrees of importance, but that's the really
fundamental one. I haven't seen any cases where r301453 provides
functionality absent in the already existing system, but there seem to
be a large number of cases (see the first email I sent to freebsd-arch
or https://wiki.freebsd.org/Complicated_Interrupts) that the API in
r301453 cannot accommodate and that are needed to support a variety of
hardware. Since having two APIs will be a significant maintenance
burden, if there are cases the existing code can't handle, I would like
to know about them; otherwise, I think we should back out r301453 and
stick to the standard device tree interrupt mapping mechanism on ARM
instead. I'm happy to help implement any necessary enhancements there
(for example, dealing with weirdly-encoded GPIOs).
-Nathan
Post by Warner Losh
Warner
Michal Meloun
2016-07-26 13:40:55 UTC
Permalink
Post by Nathan Whitehorn
Post by Warner Losh
On Mon, Jul 25, 2016 at 10:48 AM, Nathan Whitehorn
Post by Warner Losh
On Mon, Jul 25, 2016 at 8:05 AM, Nathan Whitehorn
Post by Nathan Whitehorn
That wasn't my question. Are these particular device drivers allocating
interrupts both on the GPIOs in their "interrupts" property (which are
entirely GPIOs in this example) *and* on the GPIOs listed as resources
but
not listed as interrupts? If they are, then you need a switching mechanism,
but that seems pretty unlikely given the names of the
non-interrupt GPIOs
(they look like outputs). It would also be a somewhat deranged way
to set
up
a device tree -- not that that rules it out or anything.
On Atmel, there's a situation that this covers, I think.
compatible = "atmel,hsmci";
reg = <0xfffa8000 0x600>;
interrupts = <9
IRQ_TYPE_LEVEL_HIGH 0>;
#address-cells = <1>;
#size-cells = <0>;
pinctrl-names = "default";
clocks = <&mci0_clk>;
clock-names = "mci_clk";
status = "disabled";
};
and in other places wires in GPIO interrupts for things like card
eject / insertion.
pinctrl-0 = <
&pinctrl_board_mmc0
&pinctrl_mmc0_slot0_clk_cmd_dat0
&pinctrl_mmc0_slot0_dat1_3>;
status = "okay";
reg = <0>;
bus-width = <4>;
cd-gpios = <&pioD 15 GPIO_ACTIVE_HIGH>;
};
};
an interrupt is registered on the cd-gpios pin for when the card
changes.
At
least in linux, FreeBSD doesn't (yet) implement this, but will
someday if
I get
back to the armv6 atmel work I started (see at91-cosino.dts for
example,
there's
others).
I think this is an example of what you are asking about, or did I get
lost in the
twisty maze of conversation zigs and zags...
Warner
Where we would run into (minor) problems is if the interrupt parent for the
first mmc0 is the GPIO controller. More generally, if &pioD has interrupt
children specified in some way that is not a <pin, active
high/whatever>
tuple somewhere else in the tree then you would have to have methods to
parse both interrupt specifiers
as-obtained-from-interrupts-properties (or
equivalent) and specifiers as-obtained-from-gpio-properties. If the tree
picks one format and sticks with it, you can get away with just the one.
Glancing through the DTS source for this board, that doesn't appear to be
the case and the property formatting is uniform, but I might have missed
something in one of the many #includes.
Interrupts and GPIO specifiers are different in subtle ways. The interrupt
parent for mmc0 is an AIC, which is also the ultimate parent of the GPIO
controller.
That is what it looked like.
Post by Warner Losh
But the properties for the GPIO pins that act as interrupts and
the interrupt specifiers are different.
So there are devices with both interrupts = <foo bar>,
#interrupt-parent = <&gpio> and gpios = <&gpio bleh baz> where "Bleh
baz" is formatted different than "foo bar" and both are meant to be
treated as interrupts?
It's fine if there are, but I haven't seen any such device trees yet.
I think that yes, but I'm not ready to search all 1372 Linux DT files
only for explicit example.
But your question is invalid from beginning.
1) Format of each single 'interrupts' and '*-gpios' property is
different because each single one are defined differently.
2) The question should not be 'Are there device' but 'Is this
combination valid'

Moreover, I'm curious why you want OFW property for gpio interrupts at all,
given gpio can be instantiated by other entity (PCIe, USB).

Please see:
https://svnweb.freebsd.org/base/head/sys/dev/gpio/gpiobus.c?revision=301539&view=markup#l92
and, for example this one controller
https://svnweb.freebsd.org/base/head/sys/arm/nvidia/tegra_gpio.c?revision=300149&view=markup#l575
Post by Nathan Whitehorn
Post by Warner Losh
As a general point, GPIO weirdness would be easy enough case to handle if it
did come up (add some mapping method, as above) that I think we shouldn't
worry too much about it from an architectural point of view. If a board
appears that is set up this way, we can roll with the punches at that point
and add whatever small amount of shim code that is required. It would be
annoyance, sure, but not a real complication.
I suspect that either I don't understand the issue, or we'll have such boards
very quickly. The Atmel design is fairly clean in comparison to other
franken-horrors
I've seen...
People do weird things for sure. My point is just that the details of
how (implicit) GPIO interrupts are formatted just isn't that
important. It's easy enough to add special code for devices that are
set up in bizarre ways as they are spotted in the wild and that
special code is so minor that it doesn't matter for the design of the
API. This is a point I was just curious about, since I had never
actually seen device trees set up that way.
The issue with this patch is, at its core, whether you have an
architecture that relies on the newbus hierarchy (like r301453) or
that allows links outside of that hierarchy that can cross branches or
go the "wrong" way, like the previously existing code.
OK. The r301453 :
- it removes unnecessary idea of virtual IRQs. I gave you some examples
where virtual IRQ concept fails.
Also, both variants needs attached PIC at bus_alloc_resource() time,
so timing wasn't been changed.

- it implements new BUS_MAP_INTR(). As I understand it, this is
problematic for you, and I'm ready to change it. But I need more details
than "it's fundamentally broken".
Post by Nathan Whitehorn
There are some other differences, of varying degrees of importance,
but that's the really fundamental one. I haven't seen any cases where
r301453 provides functionality absent in the already existing system,
but there seem to be a large number of cases (see the first email I
sent to freebsd-arch or
https://wiki.freebsd.org/Complicated_Interrupts) that the API in
r301453 cannot accommodate and that are needed to support a variety of
hardware.
Which single feature has been removed by r301453?
Michal
Post by Nathan Whitehorn
Since having two APIs will be a significant maintenance burden, if
there are cases the existing code can't handle, I would like to know
about them; otherwise, I think we should back out r301453 and stick to
the standard device tree interrupt mapping mechanism on ARM instead.
I'm happy to help implement any necessary enhancements there (for
example, dealing with weirdly-encoded GPIOs).
-Nathan
Post by Warner Losh
Warner
Nathan Whitehorn
2016-07-26 14:56:09 UTC
Permalink
Post by Michal Meloun
Post by Nathan Whitehorn
Post by Warner Losh
On Mon, Jul 25, 2016 at 10:48 AM, Nathan Whitehorn
Post by Warner Losh
On Mon, Jul 25, 2016 at 8:05 AM, Nathan Whitehorn
Post by Nathan Whitehorn
That wasn't my question. Are these particular device drivers allocating
interrupts both on the GPIOs in their "interrupts" property (which are
entirely GPIOs in this example) *and* on the GPIOs listed as resources
but
not listed as interrupts? If they are, then you need a switching mechanism,
but that seems pretty unlikely given the names of the
non-interrupt GPIOs
(they look like outputs). It would also be a somewhat deranged way
to set
up
a device tree -- not that that rules it out or anything.
On Atmel, there's a situation that this covers, I think.
compatible = "atmel,hsmci";
reg = <0xfffa8000 0x600>;
interrupts = <9
IRQ_TYPE_LEVEL_HIGH 0>;
#address-cells = <1>;
#size-cells = <0>;
pinctrl-names = "default";
clocks = <&mci0_clk>;
clock-names = "mci_clk";
status = "disabled";
};
and in other places wires in GPIO interrupts for things like card
eject / insertion.
pinctrl-0 = <
&pinctrl_board_mmc0
&pinctrl_mmc0_slot0_clk_cmd_dat0
&pinctrl_mmc0_slot0_dat1_3>;
status = "okay";
reg = <0>;
bus-width = <4>;
cd-gpios = <&pioD 15 GPIO_ACTIVE_HIGH>;
};
};
an interrupt is registered on the cd-gpios pin for when the card
changes.
At
least in linux, FreeBSD doesn't (yet) implement this, but will
someday if
I get
back to the armv6 atmel work I started (see at91-cosino.dts for
example,
there's
others).
I think this is an example of what you are asking about, or did I get
lost in the
twisty maze of conversation zigs and zags...
Warner
Where we would run into (minor) problems is if the interrupt parent for the
first mmc0 is the GPIO controller. More generally, if &pioD has interrupt
children specified in some way that is not a <pin, active
high/whatever>
tuple somewhere else in the tree then you would have to have methods to
parse both interrupt specifiers
as-obtained-from-interrupts-properties (or
equivalent) and specifiers as-obtained-from-gpio-properties. If the tree
picks one format and sticks with it, you can get away with just the one.
Glancing through the DTS source for this board, that doesn't appear to be
the case and the property formatting is uniform, but I might have missed
something in one of the many #includes.
Interrupts and GPIO specifiers are different in subtle ways. The interrupt
parent for mmc0 is an AIC, which is also the ultimate parent of the GPIO
controller.
That is what it looked like.
Post by Warner Losh
But the properties for the GPIO pins that act as interrupts and
the interrupt specifiers are different.
So there are devices with both interrupts = <foo bar>,
#interrupt-parent = <&gpio> and gpios = <&gpio bleh baz> where "Bleh
baz" is formatted different than "foo bar" and both are meant to be
treated as interrupts?
It's fine if there are, but I haven't seen any such device trees yet.
I think that yes, but I'm not ready to search all 1372 Linux DT files
only for explicit example.
But your question is invalid from beginning.
1) Format of each single 'interrupts' and '*-gpios' property is
different because each single one are defined differently.
2) The question should not be 'Are there device' but 'Is this
combination valid'
Moreover, I'm curious why you want OFW property for gpio interrupts at all,
given gpio can be instantiated by other entity (PCIe, USB).
https://svnweb.freebsd.org/base/head/sys/dev/gpio/gpiobus.c?revision=301539&view=markup#l92
and, for example this one controller
https://svnweb.freebsd.org/base/head/sys/arm/nvidia/tegra_gpio.c?revision=300149&view=markup#l575
Well, it's part of the device tree standard, so dealing with GPIO
properties seems reasonable where they occur. You could, of course, have
systems without device trees and GPIOs. I'm not sure where you are going
with this.
Post by Michal Meloun
Post by Nathan Whitehorn
Post by Warner Losh
As a general point, GPIO weirdness would be easy enough case to handle if it
did come up (add some mapping method, as above) that I think we shouldn't
worry too much about it from an architectural point of view. If a board
appears that is set up this way, we can roll with the punches at that point
and add whatever small amount of shim code that is required. It would be
annoyance, sure, but not a real complication.
I suspect that either I don't understand the issue, or we'll have such boards
very quickly. The Atmel design is fairly clean in comparison to other
franken-horrors
I've seen...
People do weird things for sure. My point is just that the details of
how (implicit) GPIO interrupts are formatted just isn't that
important. It's easy enough to add special code for devices that are
set up in bizarre ways as they are spotted in the wild and that
special code is so minor that it doesn't matter for the design of the
API. This is a point I was just curious about, since I had never
actually seen device trees set up that way.
The issue with this patch is, at its core, whether you have an
architecture that relies on the newbus hierarchy (like r301453) or
that allows links outside of that hierarchy that can cross branches or
go the "wrong" way, like the previously existing code.
- it removes unnecessary idea of virtual IRQs. I gave you some examples
where virtual IRQ concept fails.
I have never seen any of these examples.

The concept is *extremely* necessary, which is why both Linux and
FreeBSD decided to use it independently There is no way to handle parent
buses with a single rman and devices on multiple PICs with overlapping
interrupt ranges without them; neither is there a way to decode
arbitrary-length interrupt specifiers or to handle things like MSIs.
Please see the list of cases at
https://wiki.freebsd.org/Complicated_Interrupts and in my earlier email
for some examples of things that you just can't represent with this new
system, as far as I can tell.
Post by Michal Meloun
Also, both variants needs attached PIC at bus_alloc_resource() time,
so timing wasn't been changed.
They absolutely do not, as I have explained repeatedly. Due to parent
devices with interrupts handled by their bus children, this is a hard
requirement of any workable system. This is not a theoretical issue; I
have lots of hardware like this.
Post by Michal Meloun
- it implements new BUS_MAP_INTR(). As I understand it, this is
problematic for you, and I'm ready to change it. But I need more details
than "it's fundamentally broken".
Please explain (a) what cases it handles that the existing code and
does, and (b) how you would resolve each of the cases on the wiki page I
sent.

The general issue is that it traces the newbus hierarchy, when
interrupts often do not, and so breaks when you have links to
as-yet-unattached parts of the hierarchy. It also relies on the assigned
"IRQ" numbers being nonoverlapping to avoid rman errors, but that's a
relatively minor thing.
Post by Michal Meloun
Post by Nathan Whitehorn
There are some other differences, of varying degrees of importance,
but that's the really fundamental one. I haven't seen any cases where
r301453 provides functionality absent in the already existing system,
but there seem to be a large number of cases (see the first email I
sent to freebsd-arch or
https://wiki.freebsd.org/Complicated_Interrupts) that the API in
r301453 cannot accommodate and that are needed to support a variety of
hardware.
Which single feature has been removed by r301453?
Nothing has been removed by it, because the normal code is intact.
However, the new code is less functional than the old code, so cannot
replace it. In particular, it is architecturally incapable of working
with the kinds of device trees found on PowerPC systems (see the wiki
page). That means we would have to keep both indefinitely, which is a
significant maintenance burden to no gain whatsoever.
-Nathan
Post by Michal Meloun
Michal
Post by Nathan Whitehorn
Since having two APIs will be a significant maintenance burden, if
there are cases the existing code can't handle, I would like to know
about them; otherwise, I think we should back out r301453 and stick to
the standard device tree interrupt mapping mechanism on ARM instead.
I'm happy to help implement any necessary enhancements there (for
example, dealing with weirdly-encoded GPIOs).
-Nathan
Post by Warner Losh
Warner
Michal Meloun
2016-07-27 16:27:48 UTC
Permalink
Post by Nathan Whitehorn
Post by Michal Meloun
Post by Nathan Whitehorn
Post by Warner Losh
On Mon, Jul 25, 2016 at 10:48 AM, Nathan Whitehorn
Post by Warner Losh
On Mon, Jul 25, 2016 at 8:05 AM, Nathan Whitehorn
Post by Nathan Whitehorn
That wasn't my question. Are these particular device drivers allocating
interrupts both on the GPIOs in their "interrupts" property (which are
entirely GPIOs in this example) *and* on the GPIOs listed as resources
but
not listed as interrupts? If they are, then you need a switching mechanism,
but that seems pretty unlikely given the names of the
non-interrupt GPIOs
(they look like outputs). It would also be a somewhat deranged way
to set
up
a device tree -- not that that rules it out or anything.
On Atmel, there's a situation that this covers, I think.
compatible = "atmel,hsmci";
reg = <0xfffa8000 0x600>;
interrupts = <9
IRQ_TYPE_LEVEL_HIGH 0>;
#address-cells = <1>;
#size-cells = <0>;
pinctrl-names = "default";
clocks = <&mci0_clk>;
clock-names = "mci_clk";
status = "disabled";
};
and in other places wires in GPIO interrupts for things like card
eject / insertion.
pinctrl-0 = <
&pinctrl_board_mmc0
&pinctrl_mmc0_slot0_clk_cmd_dat0
&pinctrl_mmc0_slot0_dat1_3>;
status = "okay";
reg = <0>;
bus-width = <4>;
cd-gpios = <&pioD 15
GPIO_ACTIVE_HIGH>;
};
};
an interrupt is registered on the cd-gpios pin for when the card
changes.
At
least in linux, FreeBSD doesn't (yet) implement this, but will
someday if
I get
back to the armv6 atmel work I started (see at91-cosino.dts for
example,
there's
others).
I think this is an example of what you are asking about, or did I get
lost in the
twisty maze of conversation zigs and zags...
Warner
Where we would run into (minor) problems is if the interrupt parent for the
first mmc0 is the GPIO controller. More generally, if &pioD has interrupt
children specified in some way that is not a <pin, active
high/whatever>
tuple somewhere else in the tree then you would have to have methods to
parse both interrupt specifiers
as-obtained-from-interrupts-properties (or
equivalent) and specifiers as-obtained-from-gpio-properties. If the tree
picks one format and sticks with it, you can get away with just the one.
Glancing through the DTS source for this board, that doesn't appear to be
the case and the property formatting is uniform, but I might have missed
something in one of the many #includes.
Interrupts and GPIO specifiers are different in subtle ways. The interrupt
parent for mmc0 is an AIC, which is also the ultimate parent of the GPIO
controller.
That is what it looked like.
Post by Warner Losh
But the properties for the GPIO pins that act as interrupts and
the interrupt specifiers are different.
So there are devices with both interrupts = <foo bar>,
#interrupt-parent = <&gpio> and gpios = <&gpio bleh baz> where "Bleh
baz" is formatted different than "foo bar" and both are meant to be
treated as interrupts?
It's fine if there are, but I haven't seen any such device trees yet.
I think that yes, but I'm not ready to search all 1372 Linux DT files
only for explicit example.
But your question is invalid from beginning.
1) Format of each single 'interrupts' and '*-gpios' property is
different because each single one are defined differently.
2) The question should not be 'Are there device' but 'Is this
combination valid'
Moreover, I'm curious why you want OFW property for gpio interrupts at all,
given gpio can be instantiated by other entity (PCIe, USB).
https://svnweb.freebsd.org/base/head/sys/dev/gpio/gpiobus.c?revision=301539&view=markup#l92
and, for example this one controller
https://svnweb.freebsd.org/base/head/sys/arm/nvidia/tegra_gpio.c?revision=300149&view=markup#l575
Well, it's part of the device tree standard, so dealing with GPIO
properties seems reasonable where they occur. You could, of course,
have systems without device trees and GPIOs. I'm not sure where you
are going with this.
Post by Michal Meloun
Post by Nathan Whitehorn
Post by Warner Losh
As a general point, GPIO weirdness would be easy enough case to handle if it
did come up (add some mapping method, as above) that I think we shouldn't
worry too much about it from an architectural point of view. If a board
appears that is set up this way, we can roll with the punches at that point
and add whatever small amount of shim code that is required. It would be
annoyance, sure, but not a real complication.
I suspect that either I don't understand the issue, or we'll have such boards
very quickly. The Atmel design is fairly clean in comparison to other
franken-horrors
I've seen...
People do weird things for sure. My point is just that the details of
how (implicit) GPIO interrupts are formatted just isn't that
important. It's easy enough to add special code for devices that are
set up in bizarre ways as they are spotted in the wild and that
special code is so minor that it doesn't matter for the design of the
API. This is a point I was just curious about, since I had never
actually seen device trees set up that way.
The issue with this patch is, at its core, whether you have an
architecture that relies on the newbus hierarchy (like r301453) or
that allows links outside of that hierarchy that can cross branches or
go the "wrong" way, like the previously existing code.
- it removes unnecessary idea of virtual IRQs. I gave you some examples
where virtual IRQ concept fails.
I have never seen any of these examples.
See [1].
Post by Nathan Whitehorn
The concept is *extremely* necessary, which is why both Linux and
FreeBSD decided to use it independently There is no way to handle
parent buses with a single rman and devices on multiple PICs with
overlapping interrupt ranges without them; neither is there a way to
decode arbitrary-length interrupt specifiers or to handle things like
MSIs. Please see the list of cases at
https://wiki.freebsd.org/Complicated_Interrupts and in my earlier
email for some examples of things that you just can't represent with
this new system, as far as I can tell.
Post by Michal Meloun
Also, both variants needs attached PIC at bus_alloc_resource() time,
so timing wasn't been changed.
They absolutely do not, as I have explained repeatedly. Due to parent
devices with interrupts handled by their bus children, this is a hard
requirement of any workable system. This is not a theoretical issue; I
have lots of hardware like this.
Nathan, I'm talking about pre /post r301453 state. Current INTRNG it
has never supported. But yes, I'm fully understand why you want it. See
[2].
Post by Nathan Whitehorn
Post by Michal Meloun
- it implements new BUS_MAP_INTR(). As I understand it, this is
problematic for you, and I'm ready to change it. But I need more details
than "it's fundamentally broken".
Please explain (a) what cases it handles that the existing code and
does, and (b) how you would resolve each of the cases on the wiki page
I sent.
The general issue is that it traces the newbus hierarchy, when
interrupts often do not, and so breaks when you have links to
as-yet-unattached parts of the hierarchy. It also relies on the
assigned "IRQ" numbers being nonoverlapping to avoid rman errors, but
that's a relatively minor thing.
Well, from my point of view, only "get interrupt data from
OFW/GPIO/.." part of bus_alloc_resource() follows bus hierarchy, real
execution not changed.
In any case, I see many possible variants how we can modify current
implementation. For rest, see [2].
Post by Nathan Whitehorn
Post by Michal Meloun
Post by Nathan Whitehorn
There are some other differences, of varying degrees of importance,
but that's the really fundamental one. I haven't seen any cases where
r301453 provides functionality absent in the already existing system,
but there seem to be a large number of cases (see the first email I
sent to freebsd-arch or
https://wiki.freebsd.org/Complicated_Interrupts) that the API in
r301453 cannot accommodate and that are needed to support a variety of
hardware.
Which single feature has been removed by r301453?
Nothing has been removed by it, because the normal code is intact.
However, the new code is less functional than the old code, so cannot
replace it. In particular, it is architecturally incapable of working
with the kinds of device trees found on PowerPC systems (see the wiki
page). That means we would have to keep both indefinitely, which is a
significant maintenance burden to no gain whatsoever.
-Nathan
Firstly, please, ignore dependency problem. It will be explained later
in [2].

[1]
The original (your) INTRNG assume several things:

- value of interrupt parent xref together with byte contents of
'interrupts' property forms some sort of 'key' (aka virtual IRQ)
- byte contents of 'interrupts' property cannot be parsed in any way
inside INTRNG core.
- data forming this key are sufficient for subsequent mapping (and/or
configuring) to real IRQ.
- there must exist bidirectional unique relation between virtual and
real IRQs. So one key (virtual IRQ) can be mappable to one and just one
real IRQ. And only one virtual IRQ can be mapped to any given real
IRQ.
- we have cache that stores all observed 'keys' and associated real IRQ
) the they already mapped.

Unfortunately, necessity of unique mapping is very problematic.
We cannot handle situation, where shared interrupt is declared by
different but compatible 'interrupts' properties.

For example:
-------------------------
foo1: ***@12345678 {
interrupt-parent = <&pic1>;
interrupts = <1 IRQ_TYPE_LEVEL_HIGH>;
}

foo2: ***@1234567C {
interrupt-parent = <&pic1>;
interrupts = <1 IRQ_TYPE_NONE>;
}

Is this valid (from DT point of view) configuration? I think that yes.
Is this frequent configuration? I'm sure, isn't.
Is this possible configuration? I'm afraid that yes.
--------------------------

Next example are GPIO interrupts. These are encoded differently, so two
'keys' may point to to same real IRQ.
After that we decided to change our approach and utilize standard
resources and resource list entries to deliveryconfiguration data from
various sources (OFW/GPIO/...) to consumers.

[2]
Each OFW device has a lot of dependencies. It must enable clocks, power
regulators..., it must be taken from reset. All these action must be
done before we can access single register.
Most of these action can be done only with attached clock/power/reset
controller.
Within this, interrupts are not special. Moreover, 'cross type' circular
dependencies are not that rare.
I want to say:
Resource dependencies are must solved (and resolved) at different level
than is INTRNG.
Blind resource allocation is not universal solution because given
resource may/must be accessible immediately after allocation (in many
cases).

Unfortunately, and at time time, I known only one really working solution:
staggered driver initialization combined with multipass bus initialization.

I'm sorry, I'm not able to express all this accurately and clearly, but
I've really tried...
Michal
Nathan Whitehorn
2016-07-27 17:19:41 UTC
Permalink
On 07/27/16 09:27, Michal Meloun wrote:


[snip]
Post by Michal Meloun
Post by Nathan Whitehorn
The concept is *extremely* necessary, which is why both Linux and
FreeBSD decided to use it independently There is no way to handle
parent buses with a single rman and devices on multiple PICs with
overlapping interrupt ranges without them; neither is there a way to
decode arbitrary-length interrupt specifiers or to handle things like
MSIs. Please see the list of cases at
https://wiki.freebsd.org/Complicated_Interrupts and in my earlier
email for some examples of things that you just can't represent with
this new system, as far as I can tell.
Post by Michal Meloun
Also, both variants needs attached PIC at bus_alloc_resource() time,
so timing wasn't been changed.
They absolutely do not, as I have explained repeatedly. Due to parent
devices with interrupts handled by their bus children, this is a hard
requirement of any workable system. This is not a theoretical issue; I
have lots of hardware like this.
Nathan, I'm talking about pre /post r301453 state. Current INTRNG it
has never supported. But yes, I'm fully understand why you want it. See
[2].
Current "INTRNG", at least on PowerPC, has supported this for 10 years.
If it doesn't on ARM, it's because of some trivial implementation bug.
Post by Michal Meloun
Post by Nathan Whitehorn
Post by Michal Meloun
- it implements new BUS_MAP_INTR(). As I understand it, this is
problematic for you, and I'm ready to change it. But I need more details
than "it's fundamentally broken".
Please explain (a) what cases it handles that the existing code and
does, and (b) how you would resolve each of the cases on the wiki page
I sent.
The general issue is that it traces the newbus hierarchy, when
interrupts often do not, and so breaks when you have links to
as-yet-unattached parts of the hierarchy. It also relies on the
assigned "IRQ" numbers being nonoverlapping to avoid rman errors, but
that's a relatively minor thing.
Well, from my point of view, only "get interrupt data from
OFW/GPIO/.." part of bus_alloc_resource() follows bus hierarchy, real
execution not changed.
In any case, I see many possible variants how we can modify current
implementation. For rest, see [2].
But that's an important part, for three reasons:
1. The PIC may not exist when bus_alloc_resource() is called
2. The bus parents have no useful information to contribute to this
since the hierarchy doesn't go that way
3. If you try to use the interrupt pin as the resource ID, which r301453
does, you end up with rman conflicts if, for example, a bus has two
children with interrupts on identically numbered pins on two different
interrupt controllers.
Post by Michal Meloun
Post by Nathan Whitehorn
Post by Michal Meloun
Post by Nathan Whitehorn
There are some other differences, of varying degrees of importance,
but that's the really fundamental one. I haven't seen any cases where
r301453 provides functionality absent in the already existing system,
but there seem to be a large number of cases (see the first email I
sent to freebsd-arch or
https://wiki.freebsd.org/Complicated_Interrupts) that the API in
r301453 cannot accommodate and that are needed to support a variety of
hardware.
Which single feature has been removed by r301453?
Nothing has been removed by it, because the normal code is intact.
However, the new code is less functional than the old code, so cannot
replace it. In particular, it is architecturally incapable of working
with the kinds of device trees found on PowerPC systems (see the wiki
page). That means we would have to keep both indefinitely, which is a
significant maintenance burden to no gain whatsoever.
-Nathan
Firstly, please, ignore dependency problem. It will be explained later
in [2].
Except that that isn't a workable solution (see the end)
Post by Michal Meloun
[1]
- value of interrupt parent xref together with byte contents of
'interrupts' property forms some sort of 'key' (aka virtual IRQ)
- byte contents of 'interrupts' property cannot be parsed in any way
inside INTRNG core.
- data forming this key are sufficient for subsequent mapping (and/or
configuring) to real IRQ.
- there must exist bidirectional unique relation between virtual and
real IRQs. So one key (virtual IRQ) can be mappable to one and just one
real IRQ. And only one virtual IRQ can be mapped to any given real
IRQ.
- we have cache that stores all observed 'keys' and associated real IRQ
) the they already mapped.
Unfortunately, necessity of unique mapping is very problematic.
We cannot handle situation, where shared interrupt is declared by
different but compatible 'interrupts' properties.
There isn't actually a requirement of bidirectional uniqueness, as I
explained the last time you brought this up. One "virtual" IRQ does need
to map to exactly one interrupt specifier, but the reverse is not the
case. The PIC driver is absolutely free to map and dispatch multiple
virtual IRQs from the same shared interrupt pin, with no more overhead
than you usually get from shared interrupt lines.
Post by Michal Meloun
-------------------------
interrupt-parent = <&pic1>;
interrupts = <1 IRQ_TYPE_LEVEL_HIGH>;
}
interrupt-parent = <&pic1>;
interrupts = <1 IRQ_TYPE_NONE>;
}
Is this valid (from DT point of view) configuration? I think that yes.
No, it's malformed and a violation of the standard.
Post by Michal Meloun
Is this frequent configuration? I'm sure, isn't.
Is this possible configuration? I'm afraid that yes.
People do in fact do deranged non-conformant things with device trees
sometimes, unfortunately, so it is indeed worth worrying about.
Post by Michal Meloun
--------------------------
Next example are GPIO interrupts. These are encoded differently, so two
'keys' may point to to same real IRQ.
After that we decided to change our approach and utilize standard
resources and resource list entries to deliveryconfiguration data from
various sources (OFW/GPIO/...) to consumers.
This is a non-issue. You make the GPIO controller a PIC, it handles
interrupts however you like. If you got a weird device tree that uses
two encodings (one for "interrupts" properties with GPIOs and one for
"GPIO" properties on which you need to take interrupts but that weren't
added to the "interrupts" list for mysterious reasons), you introduce a
helper function for the GPIO case.

Are there any actual problems with the pre-existing interrupt mapping
code? I have not seen any so far.
Post by Michal Meloun
[2]
Each OFW device has a lot of dependencies. It must enable clocks, power
regulators..., it must be taken from reset. All these action must be
done before we can access single register.
Most of these action can be done only with attached clock/power/reset
controller.
Within this, interrupts are not special.
They are actually quite special in that the kernel enables them late and
so you can defer setup. They are also special in that, for that reason,
it is possible to design systems with circular dependency graphs with
interrupts. It is not possible to do this with, for example, clocks: if
I need to apply a clock to the clock controller with itself, the system
is just not bootable and such a system will not be built or shipped.
Interrupts need only happen later, after device setup, and so such
systems *are* designed and shipped. The same is true for power.
Post by Michal Meloun
Moreover, 'cross type' circular
dependencies are not that rare.
Resource dependencies are must solved (and resolved) at different level
than is INTRNG.
Blind resource allocation is not universal solution because given
resource may/must be accessible immediately after allocation (in many
cases).
Absolutely: for GPIOs, clocks, power, etc. you want a system that looks
different than the interrupt virtualization system. Probably with extra
resource types and maybe with some API similar to r301453. But
interrupts have different, and more complex, requirements and cannot be
mapped this way.
Post by Michal Meloun
staggered driver initialization combined with multipass bus initialization.
That does not solve the interrupt problem. If devices have interrupts on
their own children, no amount of multipass initialization can possibly
break the loop. Multipass would work for other kinds of resources
(clocks, power, etc.) and is a perfect match for those resource types. A
dynamic multipass (where a driver can return EAGAIN or something from
attach if its resources don't exist yet) would work great.

*But* in the specific, special case of interrupts, it is not a workable
model. You could imagine some change to initialization that gives
devices a late attach and an early attach method and does interrupts in
the late part, but that is hugely invasive change that would need to
touch every single driver in the tree -- to solve a problem that is
already solved by interrupt virtualization.
-Nathan
Post by Michal Meloun
I'm sorry, I'm not able to express all this accurately and clearly, but
I've really tried...
Michal
_______________________________________________
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
Adrian Chadd
2016-07-27 19:38:11 UTC
Permalink
[snip]

can the ARM specific bits be added to Nathan's wiki page describing
interrupt stuff? I'd like to see some of this complication documented
so people see the full variety of hilarity.

Thanks!


-adrian
Michal Meloun
2016-07-28 15:33:15 UTC
Permalink
[snip]
Post by Michal Meloun
Post by Nathan Whitehorn
The concept is *extremely* necessary, which is why both Linux and
FreeBSD decided to use it independently There is no way to handle
parent buses with a single rman and devices on multiple PICs with
overlapping interrupt ranges without them; neither is there a way to
decode arbitrary-length interrupt specifiers or to handle things like
MSIs. Please see the list of cases at
https://wiki.freebsd.org/Complicated_Interrupts and in my earlier
email for some examples of things that you just can't represent with
this new system, as far as I can tell.
Post by Michal Meloun
Also, both variants needs attached PIC at bus_alloc_resource() time,
so timing wasn't been changed.
They absolutely do not, as I have explained repeatedly. Due to parent
devices with interrupts handled by their bus children, this is a hard
requirement of any workable system. This is not a theoretical issue; I
have lots of hardware like this.
Nathan, I'm talking about pre /post r301453 state. Current INTRNG it
has never supported. But yes, I'm fully understand why you want it. See
[2].
Current "INTRNG", at least on PowerPC, has supported this for 10
years. If it doesn't on ARM, it's because of some trivial
implementation bug.
Post by Michal Meloun
Post by Nathan Whitehorn
Post by Michal Meloun
- it implements new BUS_MAP_INTR(). As I understand it, this is
problematic for you, and I'm ready to change it. But I need more details
than "it's fundamentally broken".
Please explain (a) what cases it handles that the existing code and
does, and (b) how you would resolve each of the cases on the wiki page
I sent.
The general issue is that it traces the newbus hierarchy, when
interrupts often do not, and so breaks when you have links to
as-yet-unattached parts of the hierarchy. It also relies on the
assigned "IRQ" numbers being nonoverlapping to avoid rman errors, but
that's a relatively minor thing.
Well, from my point of view, only "get interrupt data from
OFW/GPIO/.." part of bus_alloc_resource() follows bus hierarchy, real
execution not changed.
In any case, I see many possible variants how we can modify current
implementation. For rest, see [2].
1. The PIC may not exist when bus_alloc_resource() is called
2. The bus parents have no useful information to contribute to this
since the hierarchy doesn't go that way
See [2]
3. If you try to use the interrupt pin as the resource ID, which
r301453 does, you end up with rman conflicts if, for example, a bus
has two children with interrupts on identically numbered pins on two
different interrupt controllers.
No, r301453 doesn't do this.
Index to global interrupt source table (irq_sources) is used as resource
ID (and r301453 not changed it).
Post by Michal Meloun
Post by Nathan Whitehorn
Post by Michal Meloun
Post by Nathan Whitehorn
There are some other differences, of varying degrees of importance,
but that's the really fundamental one. I haven't seen any cases where
r301453 provides functionality absent in the already existing system,
but there seem to be a large number of cases (see the first email I
sent to freebsd-arch or
https://wiki.freebsd.org/Complicated_Interrupts) that the API in
r301453 cannot accommodate and that are needed to support a variety of
hardware.
Which single feature has been removed by r301453?
Nothing has been removed by it, because the normal code is intact.
However, the new code is less functional than the old code, so cannot
replace it. In particular, it is architecturally incapable of working
with the kinds of device trees found on PowerPC systems (see the wiki
page). That means we would have to keep both indefinitely, which is a
significant maintenance burden to no gain whatsoever.
-Nathan
Firstly, please, ignore dependency problem. It will be explained later
in [2].
Except that that isn't a workable solution (see the end)
Post by Michal Meloun
[1]
- value of interrupt parent xref together with byte contents of
'interrupts' property forms some sort of 'key' (aka virtual IRQ)
- byte contents of 'interrupts' property cannot be parsed in any way
inside INTRNG core.
- data forming this key are sufficient for subsequent mapping (and/or
configuring) to real IRQ.
- there must exist bidirectional unique relation between virtual and
real IRQs. So one key (virtual IRQ) can be mappable to one and just one
real IRQ. And only one virtual IRQ can be mapped to any given real
IRQ.
- we have cache that stores all observed 'keys' and associated real IRQ
) the they already mapped.
Unfortunately, necessity of unique mapping is very problematic.
We cannot handle situation, where shared interrupt is declared by
different but compatible 'interrupts' properties.
There isn't actually a requirement of bidirectional uniqueness, as I
explained the last time you brought this up. One "virtual" IRQ does
need to map to exactly one interrupt specifier, but the reverse is not
the case. The PIC driver is absolutely free to map and dispatch
multiple virtual IRQs from the same shared interrupt pin, with no more
overhead than you usually get from shared interrupt lines.
Oki, lets me to expand this a little bit more.

The pre r301453 INTRNG[ARM] does this:
- ofw_bus_intr_to_rl() reads and pre-parses 'interrupts' property
into 'key'.
- cache is searched for duplicates and new entry is created.
- position of this entry in cache is used as resource ID.
- given resource ID is stored to RL and is later used in
bus_alloc_resource()
I'm right ?

In the above model, individual cache entries may be shared are
referenced by multiple entities (PIC, RL, ..). Due to this, life cycle
of cache entries is not well defined and we probably must use some sort
of reference counting to control it.
But, as you wrote, INTRNG code is "absolutely free to map and dispatch
multiple virtual IRQs from the same shared interrupt pin" (allow me to
expand this also to INTRNG code itself) .
In attempt to simply situation abound cache, in first step, we removed
duplicates check from above model. This also removes unpleasantly
sharing of entries.
This step converts cache to some sort of duplicate of resource list
entries. Each existing RLE have exactly one corresponding entry in cache.
Also, the cache is not needed for INTRNG core. At this point, its only
transfers data from ofw_bus_intr_to_rl() to consumers PIC.

So, if we are able to attach 'key' to RLE, then we can remove the whole
cache machinery from INTRNG.
Do you agree?
Post by Michal Meloun
-------------------------
interrupt-parent = <&pic1>;
interrupts = <1 IRQ_TYPE_LEVEL_HIGH>;
}
interrupt-parent = <&pic1>;
interrupts = <1 IRQ_TYPE_NONE>;
}
Is this valid (from DT point of view) configuration? I think that yes.
No, it's malformed and a violation of the standard.
Can you give me pointer, please? I'm not able to find anything about
this...
Post by Michal Meloun
Is this frequent configuration? I'm sure, isn't.
Is this possible configuration? I'm afraid that yes.
People do in fact do deranged non-conformant things with device trees
sometimes, unfortunately, so it is indeed worth worrying about.
Post by Michal Meloun
--------------------------
Next example are GPIO interrupts. These are encoded differently, so two
'keys' may point to to same real IRQ.
After that we decided to change our approach and utilize standard
resources and resource list entries to deliveryconfiguration data from
various sources (OFW/GPIO/...) to consumers.
This is a non-issue. You make the GPIO controller a PIC, it handles
interrupts however you like. If you got a weird device tree that uses
two encodings (one for "interrupts" properties with GPIOs and one for
"GPIO" properties on which you need to take interrupts but that
weren't added to the "interrupts" list for mysterious reasons), you
introduce a helper function for the GPIO case.
Are there any actual problems with the pre-existing interrupt mapping
code? I have not seen any so far.
Post by Michal Meloun
[2]
Each OFW device has a lot of dependencies. It must enable clocks, power
regulators..., it must be taken from reset. All these action must be
done before we can access single register.
Most of these action can be done only with attached clock/power/reset
controller.
Within this, interrupts are not special.
They are actually quite special in that the kernel enables them late
and so you can defer setup. They are also special in that, for that
reason, it is possible to design systems with circular dependency
graphs with interrupts. It is not possible to do this with, for
example, clocks: if I need to apply a clock to the clock controller
with itself, the system is just not bootable and such a system will
not be built or shipped. Interrupts need only happen later, after
device setup, and so such systems *are* designed and shipped. The same
is true for power.
The G5 problem is standard 'cross type' circular dependency. You must
(at BUS_PASS_BUS) initialize memory address window of PCI bridge and
start bus enumeration.
Rest of PCI bridge initialization (including all IRQ stuff) must be
postponed to any later pass. Of course, all interrupt controllers must
be attached at BUS_PASS_INTERRUPT.
Or I missed something?
Post by Michal Meloun
Moreover, 'cross type' circular
dependencies are not that rare.
Resource dependencies are must solved (and resolved) at different level
than is INTRNG.
Blind resource allocation is not universal solution because given
resource may/must be accessible immediately after allocation (in many
cases).
Absolutely: for GPIOs, clocks, power, etc. you want a system that
looks different than the interrupt virtualization system. Probably
with extra resource types and maybe with some API similar to r301453.
But interrupts have different, and more complex, requirements and
cannot be mapped this way.
Post by Michal Meloun
staggered driver initialization combined with multipass bus
initialization.
That does not solve the interrupt problem. If devices have interrupts
on their own children, no amount of multipass initialization can
possibly break the loop. Multipass would work for other kinds of
resources (clocks, power, etc.) and is a perfect match for those
resource types. A dynamic multipass (where a driver can return EAGAIN
or something from attach if its resources don't exist yet) would work
great.
*But* in the specific, special case of interrupts, it is not a
workable model. You could imagine some change to initialization that
gives devices a late attach and an early attach method and does
interrupts in the late part, but that is hugely invasive change that
would need to touch every single driver in the tree -- to solve a
problem that is already solved by interrupt virtualization.
-Nathan
Post by Michal Meloun
I'm sorry, I'm not able to express all this accurately and clearly, but
I've really tried...
Michal
_______________________________________________
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
Nathan Whitehorn
2016-07-28 17:23:30 UTC
Permalink
Post by Michal Meloun
[snip]
Post by Michal Meloun
Post by Nathan Whitehorn
The concept is *extremely* necessary, which is why both Linux and
FreeBSD decided to use it independently There is no way to handle
parent buses with a single rman and devices on multiple PICs with
overlapping interrupt ranges without them; neither is there a way to
decode arbitrary-length interrupt specifiers or to handle things like
MSIs. Please see the list of cases at
https://wiki.freebsd.org/Complicated_Interrupts and in my earlier
email for some examples of things that you just can't represent with
this new system, as far as I can tell.
Post by Michal Meloun
Also, both variants needs attached PIC at bus_alloc_resource() time,
so timing wasn't been changed.
They absolutely do not, as I have explained repeatedly. Due to parent
devices with interrupts handled by their bus children, this is a hard
requirement of any workable system. This is not a theoretical issue; I
have lots of hardware like this.
Nathan, I'm talking about pre /post r301453 state. Current INTRNG it
has never supported. But yes, I'm fully understand why you want it. See
[2].
Current "INTRNG", at least on PowerPC, has supported this for 10
years. If it doesn't on ARM, it's because of some trivial
implementation bug.
Post by Michal Meloun
Post by Nathan Whitehorn
Post by Michal Meloun
- it implements new BUS_MAP_INTR(). As I understand it, this is
problematic for you, and I'm ready to change it. But I need more details
than "it's fundamentally broken".
Please explain (a) what cases it handles that the existing code and
does, and (b) how you would resolve each of the cases on the wiki page
I sent.
The general issue is that it traces the newbus hierarchy, when
interrupts often do not, and so breaks when you have links to
as-yet-unattached parts of the hierarchy. It also relies on the
assigned "IRQ" numbers being nonoverlapping to avoid rman errors, but
that's a relatively minor thing.
Well, from my point of view, only "get interrupt data from
OFW/GPIO/.." part of bus_alloc_resource() follows bus hierarchy, real
execution not changed.
In any case, I see many possible variants how we can modify current
implementation. For rest, see [2].
1. The PIC may not exist when bus_alloc_resource() is called
2. The bus parents have no useful information to contribute to this
since the hierarchy doesn't go that way
See [2]
3. If you try to use the interrupt pin as the resource ID, which
r301453 does, you end up with rman conflicts if, for example, a bus
has two children with interrupts on identically numbered pins on two
different interrupt controllers.
No, r301453 doesn't do this.
Index to global interrupt source table (irq_sources) is used as resource
ID (and r301453 not changed it).
Ah, I see that now. Apologies for the confusion.

So, to understand fully: your major complaint about the existing code is
that it uses arbitrary numbers reflecting some table lookup value as IRQ
numbers, which r301453 *also* does?
Post by Michal Meloun
Post by Michal Meloun
Post by Nathan Whitehorn
Post by Michal Meloun
Post by Nathan Whitehorn
There are some other differences, of varying degrees of importance,
but that's the really fundamental one. I haven't seen any cases where
r301453 provides functionality absent in the already existing system,
but there seem to be a large number of cases (see the first email I
sent to freebsd-arch or
https://wiki.freebsd.org/Complicated_Interrupts) that the API in
r301453 cannot accommodate and that are needed to support a variety of
hardware.
Which single feature has been removed by r301453?
Nothing has been removed by it, because the normal code is intact.
However, the new code is less functional than the old code, so cannot
replace it. In particular, it is architecturally incapable of working
with the kinds of device trees found on PowerPC systems (see the wiki
page). That means we would have to keep both indefinitely, which is a
significant maintenance burden to no gain whatsoever.
-Nathan
Firstly, please, ignore dependency problem. It will be explained later
in [2].
Except that that isn't a workable solution (see the end)
Post by Michal Meloun
[1]
- value of interrupt parent xref together with byte contents of
'interrupts' property forms some sort of 'key' (aka virtual IRQ)
- byte contents of 'interrupts' property cannot be parsed in any way
inside INTRNG core.
- data forming this key are sufficient for subsequent mapping (and/or
configuring) to real IRQ.
- there must exist bidirectional unique relation between virtual and
real IRQs. So one key (virtual IRQ) can be mappable to one and just one
real IRQ. And only one virtual IRQ can be mapped to any given real
IRQ.
- we have cache that stores all observed 'keys' and associated real IRQ
) the they already mapped.
Unfortunately, necessity of unique mapping is very problematic.
We cannot handle situation, where shared interrupt is declared by
different but compatible 'interrupts' properties.
There isn't actually a requirement of bidirectional uniqueness, as I
explained the last time you brought this up. One "virtual" IRQ does
need to map to exactly one interrupt specifier, but the reverse is not
the case. The PIC driver is absolutely free to map and dispatch
multiple virtual IRQs from the same shared interrupt pin, with no more
overhead than you usually get from shared interrupt lines.
Oki, lets me to expand this a little bit more.
- ofw_bus_intr_to_rl() reads and pre-parses 'interrupts' property
into 'key'.
Sometimes. For some buses. ofw_bus_intr_to_rl() is a function of very
limited application that should really only be used by simplebus; how
that works depends on the bus bindings. There are many cases (e.g. MSIs,
PCI buses generally) where interrupts are set up by a different route.
Post by Michal Meloun
- cache is searched for duplicates and new entry is created.
- position of this entry in cache is used as resource ID.
Resource ID is arbitrary, but it could be this.
Post by Michal Meloun
- given resource ID is stored to RL and is later used in
bus_alloc_resource()
Yes.
Post by Michal Meloun
I'm right ?
In the above model, individual cache entries may be shared are
referenced by multiple entities (PIC, RL, ..). Due to this, life cycle
of cache entries is not well defined and we probably must use some sort
of reference counting to control it.
Why? Interrupts are not unmapped and remapped with unique values often
enough (or, really, at all) to warrant even thinking about life-cycle
management. The maximum number is limited by the maximum of the number
of interrupt specifiers in the device tree, which are all static values.
Post by Michal Meloun
But, as you wrote, INTRNG code is "absolutely free to map and dispatch
multiple virtual IRQs from the same shared interrupt pin" (allow me to
expand this also to INTRNG code itself) .
In attempt to simply situation abound cache, in first step, we removed
duplicates check from above model. This also removes unpleasantly
sharing of entries.
This step converts cache to some sort of duplicate of resource list
entries. Each existing RLE have exactly one corresponding entry in cache.
Also, the cache is not needed for INTRNG core. At this point, its only
transfers data from ofw_bus_intr_to_rl() to consumers PIC.
So, if we are able to attach 'key' to RLE, then we can remove the whole
cache machinery from INTRNG.
Do you agree?
No. Because you have to support two things that this still can't allow:
1. Devices attaching before their interrupt controllers.
2. Interrupts encoded entirely in a 32-bit integer rather than a struct
resource * in a shared rman. This is required to support PCI (both LSI
and MSI).

#2 could of course be fixed by completely retooling the API of the PCI
bus code, but, since it fixes something that currently is not a problem,
why would we do that?
Post by Michal Meloun
Post by Michal Meloun
-------------------------
interrupt-parent = <&pic1>;
interrupts = <1 IRQ_TYPE_LEVEL_HIGH>;
}
interrupt-parent = <&pic1>;
interrupts = <1 IRQ_TYPE_NONE>;
}
Is this valid (from DT point of view) configuration? I think that yes.
No, it's malformed and a violation of the standard.
Can you give me pointer, please? I'm not able to find anything about
this...
I will try to find the reference -- unfortunately these things are
scattered about and not all of the documents are public. You can see
that this must be invalid immediately, however, because it would make
the interrupt configuration dependent on which devices attach and in
which order. I've attached the main interrupt mapping spec if you would
like to look at it.
Post by Michal Meloun
Post by Michal Meloun
Is this frequent configuration? I'm sure, isn't.
Is this possible configuration? I'm afraid that yes.
People do in fact do deranged non-conformant things with device trees
sometimes, unfortunately, so it is indeed worth worrying about.
Post by Michal Meloun
--------------------------
Next example are GPIO interrupts. These are encoded differently, so two
'keys' may point to to same real IRQ.
After that we decided to change our approach and utilize standard
resources and resource list entries to deliveryconfiguration data from
various sources (OFW/GPIO/...) to consumers.
This is a non-issue. You make the GPIO controller a PIC, it handles
interrupts however you like. If you got a weird device tree that uses
two encodings (one for "interrupts" properties with GPIOs and one for
"GPIO" properties on which you need to take interrupts but that
weren't added to the "interrupts" list for mysterious reasons), you
introduce a helper function for the GPIO case.
Are there any actual problems with the pre-existing interrupt mapping
code? I have not seen any so far.
Post by Michal Meloun
[2]
Each OFW device has a lot of dependencies. It must enable clocks, power
regulators..., it must be taken from reset. All these action must be
done before we can access single register.
Most of these action can be done only with attached clock/power/reset
controller.
Within this, interrupts are not special.
They are actually quite special in that the kernel enables them late
and so you can defer setup. They are also special in that, for that
reason, it is possible to design systems with circular dependency
graphs with interrupts. It is not possible to do this with, for
example, clocks: if I need to apply a clock to the clock controller
with itself, the system is just not bootable and such a system will
not be built or shipped. Interrupts need only happen later, after
device setup, and so such systems *are* designed and shipped. The same
is true for power.
The G5 problem is standard 'cross type' circular dependency. You must
(at BUS_PASS_BUS) initialize memory address window of PCI bridge and
start bus enumeration.
Rest of PCI bridge initialization (including all IRQ stuff) must be
postponed to any later pass. Of course, all interrupt controllers must
be attached at BUS_PASS_INTERRUPT.
Or I missed something?
Yes. As I said, you could solve it that way. It would, of course,
require a massive retooling of newbus (to allow partial delayed attach),
everything in dev/pci (which doesn't expect that), and (almost) every
PCI driver in and out of the tree.

Or: you could skip all of that and use the fully functional mechanism
that already exists.

This is my point in a nutshell, basically. This code reinvents a
perfectly functional wheel in a way that could, in the future, with a
huge amount of work, accomplish the same things. For now, it is less
functional and offers no visible advantages or new capabilities of any
kind -- nor are there any such capabilities it could obviously provide
in the future. Between now and that future, which will likely never
arrive, we will have to maintain both in parallel and cause the device
tree support on ARM and PowerPC to diverge.
-Nathan
Post by Michal Meloun
Post by Michal Meloun
Moreover, 'cross type' circular
dependencies are not that rare.
Resource dependencies are must solved (and resolved) at different level
than is INTRNG.
Blind resource allocation is not universal solution because given
resource may/must be accessible immediately after allocation (in many
cases).
Absolutely: for GPIOs, clocks, power, etc. you want a system that
looks different than the interrupt virtualization system. Probably
with extra resource types and maybe with some API similar to r301453.
But interrupts have different, and more complex, requirements and
cannot be mapped this way.
Post by Michal Meloun
staggered driver initialization combined with multipass bus
initialization.
That does not solve the interrupt problem. If devices have interrupts
on their own children, no amount of multipass initialization can
possibly break the loop. Multipass would work for other kinds of
resources (clocks, power, etc.) and is a perfect match for those
resource types. A dynamic multipass (where a driver can return EAGAIN
or something from attach if its resources don't exist yet) would work
great.
*But* in the specific, special case of interrupts, it is not a
workable model. You could imagine some change to initialization that
gives devices a late attach and an early attach method and does
interrupts in the late part, but that is hugely invasive change that
would need to touch every single driver in the tree -- to solve a
problem that is already solved by interrupt virtualization.
-Nathan
Post by Michal Meloun
I'm sorry, I'm not able to express all this accurately and clearly, but
I've really tried...
Michal
_______________________________________________
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
_______________________________________________
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
Michal Meloun
2016-07-29 07:03:33 UTC
Permalink
Post by Nathan Whitehorn
Post by Michal Meloun
[snip]
Post by Michal Meloun
Post by Nathan Whitehorn
The concept is *extremely* necessary, which is why both Linux and
FreeBSD decided to use it independently There is no way to handle
parent buses with a single rman and devices on multiple PICs with
overlapping interrupt ranges without them; neither is there a way to
decode arbitrary-length interrupt specifiers or to handle things like
MSIs. Please see the list of cases at
https://wiki.freebsd.org/Complicated_Interrupts and in my earlier
email for some examples of things that you just can't represent with
this new system, as far as I can tell.
Post by Michal Meloun
Also, both variants needs attached PIC at bus_alloc_resource() time,
so timing wasn't been changed.
They absolutely do not, as I have explained repeatedly. Due to parent
devices with interrupts handled by their bus children, this is a hard
requirement of any workable system. This is not a theoretical issue; I
have lots of hardware like this.
Nathan, I'm talking about pre /post r301453 state. Current INTRNG it
has never supported. But yes, I'm fully understand why you want it. See
[2].
Current "INTRNG", at least on PowerPC, has supported this for 10
years. If it doesn't on ARM, it's because of some trivial
implementation bug.
Post by Michal Meloun
Post by Nathan Whitehorn
Post by Michal Meloun
- it implements new BUS_MAP_INTR(). As I understand it, this is
problematic for you, and I'm ready to change it. But I need more details
than "it's fundamentally broken".
Please explain (a) what cases it handles that the existing code and
does, and (b) how you would resolve each of the cases on the wiki page
I sent.
The general issue is that it traces the newbus hierarchy, when
interrupts often do not, and so breaks when you have links to
as-yet-unattached parts of the hierarchy. It also relies on the
assigned "IRQ" numbers being nonoverlapping to avoid rman errors, but
that's a relatively minor thing.
Well, from my point of view, only "get interrupt data from
OFW/GPIO/.." part of bus_alloc_resource() follows bus hierarchy,
real
execution not changed.
In any case, I see many possible variants how we can modify current
implementation. For rest, see [2].
1. The PIC may not exist when bus_alloc_resource() is called
2. The bus parents have no useful information to contribute to this
since the hierarchy doesn't go that way
See [2]
3. If you try to use the interrupt pin as the resource ID, which
r301453 does, you end up with rman conflicts if, for example, a bus
has two children with interrupts on identically numbered pins on two
different interrupt controllers.
No, r301453 doesn't do this.
Index to global interrupt source table (irq_sources) is used as resource
ID (and r301453 not changed it).
Ah, I see that now. Apologies for the confusion.
So, to understand fully: your major complaint about the existing code
is that it uses arbitrary numbers reflecting some table lookup value
as IRQ numbers, which r301453 *also* does?
No, my major complaint is that MIPS and pre-r301453 INTRNG mixes data
from independent sources in one table (irq_sources) . Which r301453
eliminates.
All what I want is to have data associated to RLE separated from data
associated to already allocated (by bus_alloc_resouce()) interrupts.
Post by Nathan Whitehorn
Post by Michal Meloun
Post by Michal Meloun
Post by Nathan Whitehorn
Post by Michal Meloun
Post by Nathan Whitehorn
There are some other differences, of varying degrees of importance,
but that's the really fundamental one. I haven't seen any cases where
r301453 provides functionality absent in the already existing system,
but there seem to be a large number of cases (see the first email I
sent to freebsd-arch or
https://wiki.freebsd.org/Complicated_Interrupts) that the API in
r301453 cannot accommodate and that are needed to support a variety of
hardware.
Which single feature has been removed by r301453?
Nothing has been removed by it, because the normal code is intact.
However, the new code is less functional than the old code, so cannot
replace it. In particular, it is architecturally incapable of working
with the kinds of device trees found on PowerPC systems (see the wiki
page). That means we would have to keep both indefinitely, which is a
significant maintenance burden to no gain whatsoever.
-Nathan
Firstly, please, ignore dependency problem. It will be explained later
in [2].
Except that that isn't a workable solution (see the end)
Post by Michal Meloun
[1]
- value of interrupt parent xref together with byte contents of
'interrupts' property forms some sort of 'key' (aka virtual IRQ)
- byte contents of 'interrupts' property cannot be parsed in any way
inside INTRNG core.
- data forming this key are sufficient for subsequent mapping (and/or
configuring) to real IRQ.
- there must exist bidirectional unique relation between virtual and
real IRQs. So one key (virtual IRQ) can be mappable to one and just one
real IRQ. And only one virtual IRQ can be mapped to any given real
IRQ.
- we have cache that stores all observed 'keys' and associated real IRQ
) the they already mapped.
Unfortunately, necessity of unique mapping is very problematic.
We cannot handle situation, where shared interrupt is declared by
different but compatible 'interrupts' properties.
There isn't actually a requirement of bidirectional uniqueness, as I
explained the last time you brought this up. One "virtual" IRQ does
need to map to exactly one interrupt specifier, but the reverse is not
the case. The PIC driver is absolutely free to map and dispatch
multiple virtual IRQs from the same shared interrupt pin, with no more
overhead than you usually get from shared interrupt lines.
Oki, lets me to expand this a little bit more.
- ofw_bus_intr_to_rl() reads and pre-parses 'interrupts' property
into 'key'.
Sometimes. For some buses. ofw_bus_intr_to_rl() is a function of very
limited application that should really only be used by simplebus; how
that works depends on the bus bindings. There are many cases (e.g.
MSIs, PCI buses generally) where interrupts are set up by a different
route.
Yes, I agree.
Post by Nathan Whitehorn
Post by Michal Meloun
- cache is searched for duplicates and new entry is created.
- position of this entry in cache is used as resource ID.
Resource ID is arbitrary, but it could be this.
Post by Michal Meloun
- given resource ID is stored to RL and is later used in
bus_alloc_resource()
Yes.
Post by Michal Meloun
I'm right ?
In the above model, individual cache entries may be shared are
referenced by multiple entities (PIC, RL, ..). Due to this, life cycle
of cache entries is not well defined and we probably must use some sort
of reference counting to control it.
Why? Interrupts are not unmapped and remapped with unique values often
enough (or, really, at all) to warrant even thinking about life-cycle
management. The maximum number is limited by the maximum of the number
of interrupt specifiers in the device tree, which are all static values.
No, we must also support attachable/removable devices. GPIO board
attached to hot-pug capable interfaces(USB, ExpressCard,...) for example.
Due to this we must be able to attach/detach PIC at run-time.
Post by Nathan Whitehorn
Post by Michal Meloun
But, as you wrote, INTRNG code is "absolutely free to map and dispatch
multiple virtual IRQs from the same shared interrupt pin" (allow me to
expand this also to INTRNG code itself) .
In attempt to simply situation abound cache, in first step, we removed
duplicates check from above model. This also removes unpleasantly
sharing of entries.
This step converts cache to some sort of duplicate of resource list
entries. Each existing RLE have exactly one corresponding entry in cache.
Also, the cache is not needed for INTRNG core. At this point, its only
transfers data from ofw_bus_intr_to_rl() to consumers PIC.
So, if we are able to attach 'key' to RLE, then we can remove the whole
cache machinery from INTRNG.
Do you agree?
1. Devices attaching before their interrupt controllers.
You still expect that data readed from OFW must be parsed at RLE
creation time. I don't see single reason why we cannot postpone
parsing to bus_alloc_resource() time.
At this point, its irrelevant if bus_alloc_resource() supports
detached PIC or not.
Post by Nathan Whitehorn
2. Interrupts encoded entirely in a 32-bit integer rather than a
struct resource * in a shared rman. This is required to support PCI
(both LSI and MSI).
#2 could of course be fixed by completely retooling the API of the PCI
bus code, but, since it fixes something that currently is not a
problem, why would we do that?
All relevant bus functions have struct resource * as an argument. And,
in post r301453 kernel, all necessary data are attached to it.
I don't see any reason for API change.
Post by Nathan Whitehorn
Post by Michal Meloun
Post by Michal Meloun
-------------------------
interrupt-parent = <&pic1>;
interrupts = <1 IRQ_TYPE_LEVEL_HIGH>;
}
interrupt-parent = <&pic1>;
interrupts = <1 IRQ_TYPE_NONE>;
}
Is this valid (from DT point of view) configuration? I think that yes.
No, it's malformed and a violation of the standard.
Can you give me pointer, please? I'm not able to find anything about
this...
I will try to find the reference -- unfortunately these things are
scattered about and not all of the documents are public.
Don't worry, I trust you :)
Post by Nathan Whitehorn
You can see that this must be invalid immediately, however, because it
would make the interrupt configuration dependent on which devices
attach and in which order. I've attached the main interrupt mapping
spec if you would like to look at it.
Well, if IRQ_TYPE_LEVEL_HIGH is default interrupt configuration then
not. And with current hell with various shield boards and its overlays...
Post by Nathan Whitehorn
Post by Michal Meloun
Post by Michal Meloun
Is this frequent configuration? I'm sure, isn't.
Is this possible configuration? I'm afraid that yes.
People do in fact do deranged non-conformant things with device trees
sometimes, unfortunately, so it is indeed worth worrying about.
Post by Michal Meloun
--------------------------
Next example are GPIO interrupts. These are encoded differently,
so two
'keys' may point to to same real IRQ.
After that we decided to change our approach and utilize standard
resources and resource list entries to deliveryconfiguration data from
various sources (OFW/GPIO/...) to consumers.
This is a non-issue. You make the GPIO controller a PIC, it handles
interrupts however you like. If you got a weird device tree that uses
two encodings (one for "interrupts" properties with GPIOs and one for
"GPIO" properties on which you need to take interrupts but that
weren't added to the "interrupts" list for mysterious reasons), you
introduce a helper function for the GPIO case.
Are there any actual problems with the pre-existing interrupt mapping
code? I have not seen any so far.
Post by Michal Meloun
[2]
Each OFW device has a lot of dependencies. It must enable clocks, power
regulators..., it must be taken from reset. All these action must be
done before we can access single register.
Most of these action can be done only with attached clock/power/reset
controller.
Within this, interrupts are not special.
They are actually quite special in that the kernel enables them late
and so you can defer setup. They are also special in that, for that
reason, it is possible to design systems with circular dependency
graphs with interrupts. It is not possible to do this with, for
example, clocks: if I need to apply a clock to the clock controller
with itself, the system is just not bootable and such a system will
not be built or shipped. Interrupts need only happen later, after
device setup, and so such systems *are* designed and shipped. The same
is true for power.
The G5 problem is standard 'cross type' circular dependency. You must
(at BUS_PASS_BUS) initialize memory address window of PCI bridge and
start bus enumeration.
Rest of PCI bridge initialization (including all IRQ stuff) must be
postponed to any later pass. Of course, all interrupt controllers must
be attached at BUS_PASS_INTERRUPT.
Or I missed something?
Yes. As I said, you could solve it that way. It would, of course,
require a massive retooling of newbus (to allow partial delayed
attach), everything in dev/pci (which doesn't expect that), and
(almost) every PCI driver in and out of the tree.
I still think that only host to PCI bridge must me modified, and PCI-PCI
bridge must be run at BUS_PASS_BUS (which is single line modification).
Post by Nathan Whitehorn
Or: you could skip all of that and use the fully functional mechanism
that already exists.
The problem is, at least from my point of view, that we don't have it.
The actual MIPS code doesn't work on ARM for numerous reasons. This is
first one ->

https://svnweb.freebsd.org/base/head/sys/powerpc/powerpc/nexus.c?revision=297000&view=markup#l186
The pre-r301453 ARM INTRNG doesn't support detached interrupts.
And r301453 doesn't changed nothing about this.

Michal
Post by Nathan Whitehorn
This is my point in a nutshell, basically. This code reinvents a
perfectly functional wheel in a way that could, in the future, with a
huge amount of work, accomplish the same things. For now, it is less
functional and offers no visible advantages or new capabilities of any
kind -- nor are there any such capabilities it could obviously provide
in the future. Between now and that future, which will likely never
arrive, we will have to maintain both in parallel and cause the device
tree support on ARM and PowerPC to diverge.
-Nathan
Post by Michal Meloun
Post by Michal Meloun
Moreover, 'cross type' circular
dependencies are not that rare.
Resource dependencies are must solved (and resolved) at different level
than is INTRNG.
Blind resource allocation is not universal solution because given
resource may/must be accessible immediately after allocation (in many
cases).
Absolutely: for GPIOs, clocks, power, etc. you want a system that
looks different than the interrupt virtualization system. Probably
with extra resource types and maybe with some API similar to r301453.
But interrupts have different, and more complex, requirements and
cannot be mapped this way.
Post by Michal Meloun
staggered driver initialization combined with multipass bus
initialization.
That does not solve the interrupt problem. If devices have interrupts
on their own children, no amount of multipass initialization can
possibly break the loop. Multipass would work for other kinds of
resources (clocks, power, etc.) and is a perfect match for those
resource types. A dynamic multipass (where a driver can return EAGAIN
or something from attach if its resources don't exist yet) would work
great.
*But* in the specific, special case of interrupts, it is not a
workable model. You could imagine some change to initialization that
gives devices a late attach and an early attach method and does
interrupts in the late part, but that is hugely invasive change that
would need to touch every single driver in the tree -- to solve a
problem that is already solved by interrupt virtualization.
-Nathan
Post by Michal Meloun
I'm sorry, I'm not able to express all this accurately and clearly, but
I've really tried...
Michal
_______________________________________________
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to
_______________________________________________
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
Nathan Whitehorn
2016-07-29 14:41:22 UTC
Permalink
Post by Michal Meloun
Post by Nathan Whitehorn
Post by Michal Meloun
[snip]
Post by Michal Meloun
Post by Nathan Whitehorn
The concept is *extremely* necessary, which is why both Linux and
FreeBSD decided to use it independently There is no way to handle
parent buses with a single rman and devices on multiple PICs with
overlapping interrupt ranges without them; neither is there a way to
decode arbitrary-length interrupt specifiers or to handle things like
MSIs. Please see the list of cases at
https://wiki.freebsd.org/Complicated_Interrupts and in my earlier
email for some examples of things that you just can't represent with
this new system, as far as I can tell.
Post by Michal Meloun
Also, both variants needs attached PIC at bus_alloc_resource() time,
so timing wasn't been changed.
They absolutely do not, as I have explained repeatedly. Due to parent
devices with interrupts handled by their bus children, this is a hard
requirement of any workable system. This is not a theoretical issue; I
have lots of hardware like this.
Nathan, I'm talking about pre /post r301453 state. Current INTRNG it
has never supported. But yes, I'm fully understand why you want it. See
[2].
Current "INTRNG", at least on PowerPC, has supported this for 10
years. If it doesn't on ARM, it's because of some trivial
implementation bug.
Post by Michal Meloun
Post by Nathan Whitehorn
Post by Michal Meloun
- it implements new BUS_MAP_INTR(). As I understand it, this is
problematic for you, and I'm ready to change it. But I need more details
than "it's fundamentally broken".
Please explain (a) what cases it handles that the existing code and
does, and (b) how you would resolve each of the cases on the wiki page
I sent.
The general issue is that it traces the newbus hierarchy, when
interrupts often do not, and so breaks when you have links to
as-yet-unattached parts of the hierarchy. It also relies on the
assigned "IRQ" numbers being nonoverlapping to avoid rman errors, but
that's a relatively minor thing.
Well, from my point of view, only "get interrupt data from
OFW/GPIO/.." part of bus_alloc_resource() follows bus hierarchy,
real
execution not changed.
In any case, I see many possible variants how we can modify current
implementation. For rest, see [2].
1. The PIC may not exist when bus_alloc_resource() is called
2. The bus parents have no useful information to contribute to this
since the hierarchy doesn't go that way
See [2]
3. If you try to use the interrupt pin as the resource ID, which
r301453 does, you end up with rman conflicts if, for example, a bus
has two children with interrupts on identically numbered pins on two
different interrupt controllers.
No, r301453 doesn't do this.
Index to global interrupt source table (irq_sources) is used as resource
ID (and r301453 not changed it).
Ah, I see that now. Apologies for the confusion.
So, to understand fully: your major complaint about the existing code
is that it uses arbitrary numbers reflecting some table lookup value
as IRQ numbers, which r301453 *also* does?
No, my major complaint is that MIPS and pre-r301453 INTRNG mixes data
from independent sources in one table (irq_sources) . Which r301453
eliminates.
All what I want is to have data associated to RLE separated from data
associated to already allocated (by bus_alloc_resouce()) interrupts.
So the complaint is that pre-r301453 stores the raw interrupt specifier
in the table and post-r301453 stores a decoded version of the interrupt
specifier in the table?
Post by Michal Meloun
Post by Nathan Whitehorn
Post by Michal Meloun
Post by Michal Meloun
Post by Nathan Whitehorn
Post by Michal Meloun
Post by Nathan Whitehorn
There are some other differences, of varying degrees of importance,
but that's the really fundamental one. I haven't seen any cases where
r301453 provides functionality absent in the already existing system,
but there seem to be a large number of cases (see the first email I
sent to freebsd-arch or
https://wiki.freebsd.org/Complicated_Interrupts) that the API in
r301453 cannot accommodate and that are needed to support a variety of
hardware.
Which single feature has been removed by r301453?
Nothing has been removed by it, because the normal code is intact.
However, the new code is less functional than the old code, so cannot
replace it. In particular, it is architecturally incapable of working
with the kinds of device trees found on PowerPC systems (see the wiki
page). That means we would have to keep both indefinitely, which is a
significant maintenance burden to no gain whatsoever.
-Nathan
Firstly, please, ignore dependency problem. It will be explained later
in [2].
Except that that isn't a workable solution (see the end)
Post by Michal Meloun
[1]
- value of interrupt parent xref together with byte contents of
'interrupts' property forms some sort of 'key' (aka virtual IRQ)
- byte contents of 'interrupts' property cannot be parsed in any way
inside INTRNG core.
- data forming this key are sufficient for subsequent mapping (and/or
configuring) to real IRQ.
- there must exist bidirectional unique relation between virtual and
real IRQs. So one key (virtual IRQ) can be mappable to one and just one
real IRQ. And only one virtual IRQ can be mapped to any given real
IRQ.
- we have cache that stores all observed 'keys' and associated real IRQ
) the they already mapped.
Unfortunately, necessity of unique mapping is very problematic.
We cannot handle situation, where shared interrupt is declared by
different but compatible 'interrupts' properties.
There isn't actually a requirement of bidirectional uniqueness, as I
explained the last time you brought this up. One "virtual" IRQ does
need to map to exactly one interrupt specifier, but the reverse is not
the case. The PIC driver is absolutely free to map and dispatch
multiple virtual IRQs from the same shared interrupt pin, with no more
overhead than you usually get from shared interrupt lines.
Oki, lets me to expand this a little bit more.
- ofw_bus_intr_to_rl() reads and pre-parses 'interrupts' property
into 'key'.
Sometimes. For some buses. ofw_bus_intr_to_rl() is a function of very
limited application that should really only be used by simplebus; how
that works depends on the bus bindings. There are many cases (e.g.
MSIs, PCI buses generally) where interrupts are set up by a different
route.
Yes, I agree.
Post by Nathan Whitehorn
Post by Michal Meloun
- cache is searched for duplicates and new entry is created.
- position of this entry in cache is used as resource ID.
Resource ID is arbitrary, but it could be this.
Post by Michal Meloun
- given resource ID is stored to RL and is later used in
bus_alloc_resource()
Yes.
Post by Michal Meloun
I'm right ?
In the above model, individual cache entries may be shared are
referenced by multiple entities (PIC, RL, ..). Due to this, life cycle
of cache entries is not well defined and we probably must use some sort
of reference counting to control it.
Why? Interrupts are not unmapped and remapped with unique values often
enough (or, really, at all) to warrant even thinking about life-cycle
management. The maximum number is limited by the maximum of the number
of interrupt specifiers in the device tree, which are all static values.
No, we must also support attachable/removable devices. GPIO board
attached to hot-pug capable interfaces(USB, ExpressCard,...) for example.
Due to this we must be able to attach/detach PIC at run-time.
OK... There just really aren't that many cases of this, but fine. If you
find yourself needing hot-pluggable PICs, neither API is any more or
less complicated than the other. You can't use FDT entries for this,
since new devices aren't in the FDT, and you presumably end up in a much
more packaged situation where only the things on the hotplug card depend
on the PIC on the hotplug device, so that's nice. In either case, you
just have to clean up mappings to the associated PIC when the PIC
detaches (and would, in both cases, have some assert that all IRQs
handled by that PIC have been deallocated first before cleaning up). But
it's just not that bad.
Post by Michal Meloun
Post by Nathan Whitehorn
Post by Michal Meloun
But, as you wrote, INTRNG code is "absolutely free to map and dispatch
multiple virtual IRQs from the same shared interrupt pin" (allow me to
expand this also to INTRNG code itself) .
In attempt to simply situation abound cache, in first step, we removed
duplicates check from above model. This also removes unpleasantly
sharing of entries.
This step converts cache to some sort of duplicate of resource list
entries. Each existing RLE have exactly one corresponding entry in cache.
Also, the cache is not needed for INTRNG core. At this point, its only
transfers data from ofw_bus_intr_to_rl() to consumers PIC.
So, if we are able to attach 'key' to RLE, then we can remove the whole
cache machinery from INTRNG.
Do you agree?
1. Devices attaching before their interrupt controllers.
You still expect that data readed from OFW must be parsed at RLE
creation time. I don't see single reason why we cannot postpone
parsing to bus_alloc_resource() time.
At this point, its irrelevant if bus_alloc_resource() supports
detached PIC or not.
Because there are *lots* of cases in the kernel where interrupts are
represented as a number and not a struct resource *. I've listed many of
them; I'm sure there are more. PCI is the most notable example (both LSI
and MSI). For both of these, the route_interrupt (LSI) and
alloc_msi[x]() (MSI) functions give the bus driver a chance to look at
the device tree and then they return an IRQ number. There doesn't seem
to be any way to map that back to a struct resource * or something
besides an internal table.

Moreover, there are cases (self-added interrupts by children, for
instance) in which the RL assigned by the parent does not reflect the
final RL used by the child and there is no way for the parent bus to map
an RID to a device tree entry.

When we moved to doing this at interrupt assignment time rather than
resource allocation time or another occasion on PowerPC, it was for
these kinds of reasons. There were ways to do it later, but they were
hugely invasive and involved changing large parts of the kernel. Even
then, it would still be fragile: Trying to guess what device tree entry
was meant at bus_alloc_resource() time is much more error-prone than
doing the mapping when reading that device tree to assign the resources
in the first place, when you can make guarantees.
Post by Michal Meloun
Post by Nathan Whitehorn
2. Interrupts encoded entirely in a 32-bit integer rather than a
struct resource * in a shared rman. This is required to support PCI
(both LSI and MSI).
#2 could of course be fixed by completely retooling the API of the PCI
bus code, but, since it fixes something that currently is not a
problem, why would we do that?
All relevant bus functions have struct resource * as an argument. And,
in post r301453 kernel, all necessary data are attached to it.
I don't see any reason for API change.
There is:
int PCIB_ROUTE_INTERRUPT(device_t, device_t, pin)

Which takes a PIN (A, B, C, D) and returns the IRQ to add to the RL.
This is the only real opportunity to parse an interrupt-map and does not
deal with struct resource.

There are also PCIB_ALLOC_MSI[X], which take a device_t and count and
return a vector of IRQ numbers.

How would you do resource allocation given these APIs? There seem to be
some dubious-looking attempts at generic MSI mapping code in the new
kern/subr_intr.c that don't allow the specification of more than 32-bit
interrupt specifiers and otherwise exactly duplicate the pre-301453 flow
for MSIs, but in a more complicated and less-functional way.

(While investigating this, I also just found dev/pci/pci_host_generic.c,
which mostly does the wrong things, is actually specific to a couple of
ARM boards, and duplicates code in dev/ofw/ofwpci.c. But that's another
discussion...)
Post by Michal Meloun
Post by Nathan Whitehorn
Post by Michal Meloun
Post by Michal Meloun
-------------------------
interrupt-parent = <&pic1>;
interrupts = <1 IRQ_TYPE_LEVEL_HIGH>;
}
interrupt-parent = <&pic1>;
interrupts = <1 IRQ_TYPE_NONE>;
}
Is this valid (from DT point of view) configuration? I think that yes.
No, it's malformed and a violation of the standard.
Can you give me pointer, please? I'm not able to find anything about
this...
I will try to find the reference -- unfortunately these things are
scattered about and not all of the documents are public.
Don't worry, I trust you :)
Post by Nathan Whitehorn
You can see that this must be invalid immediately, however, because it
would make the interrupt configuration dependent on which devices
attach and in which order. I've attached the main interrupt mapping
spec if you would like to look at it.
Well, if IRQ_TYPE_LEVEL_HIGH is default interrupt configuration then
not. And with current hell with various shield boards and its overlays...
As I said, people do stupid things. This is such a minor problem that it
is best worried about when seen in the wild.
Post by Michal Meloun
Post by Nathan Whitehorn
Post by Michal Meloun
Post by Michal Meloun
Is this frequent configuration? I'm sure, isn't.
Is this possible configuration? I'm afraid that yes.
People do in fact do deranged non-conformant things with device trees
sometimes, unfortunately, so it is indeed worth worrying about.
Post by Michal Meloun
--------------------------
Next example are GPIO interrupts. These are encoded differently,
so two
'keys' may point to to same real IRQ.
After that we decided to change our approach and utilize standard
resources and resource list entries to deliveryconfiguration data from
various sources (OFW/GPIO/...) to consumers.
This is a non-issue. You make the GPIO controller a PIC, it handles
interrupts however you like. If you got a weird device tree that uses
two encodings (one for "interrupts" properties with GPIOs and one for
"GPIO" properties on which you need to take interrupts but that
weren't added to the "interrupts" list for mysterious reasons), you
introduce a helper function for the GPIO case.
Are there any actual problems with the pre-existing interrupt mapping
code? I have not seen any so far.
Post by Michal Meloun
[2]
Each OFW device has a lot of dependencies. It must enable clocks, power
regulators..., it must be taken from reset. All these action must be
done before we can access single register.
Most of these action can be done only with attached clock/power/reset
controller.
Within this, interrupts are not special.
They are actually quite special in that the kernel enables them late
and so you can defer setup. They are also special in that, for that
reason, it is possible to design systems with circular dependency
graphs with interrupts. It is not possible to do this with, for
example, clocks: if I need to apply a clock to the clock controller
with itself, the system is just not bootable and such a system will
not be built or shipped. Interrupts need only happen later, after
device setup, and so such systems *are* designed and shipped. The same
is true for power.
The G5 problem is standard 'cross type' circular dependency. You must
(at BUS_PASS_BUS) initialize memory address window of PCI bridge and
start bus enumeration.
Rest of PCI bridge initialization (including all IRQ stuff) must be
postponed to any later pass. Of course, all interrupt controllers must
be attached at BUS_PASS_INTERRUPT.
Or I missed something?
Yes. As I said, you could solve it that way. It would, of course,
require a massive retooling of newbus (to allow partial delayed
attach), everything in dev/pci (which doesn't expect that), and
(almost) every PCI driver in and out of the tree.
I still think that only host to PCI bridge must me modified, and PCI-PCI
bridge must be run at BUS_PASS_BUS (which is single line modification).
On Powermacs, one PIC is 5 levels deep in the bus hierarchy, under 3
PCI<->PCI bridges and a device that also has an ATA controller and a
bunch of other things. Some of those can have their own interrupts
(which is fairly common for error reporting or PCI-E hotplug). It
wouldn't be impossible, but it is decidedly nontrivial.
Post by Michal Meloun
Post by Nathan Whitehorn
Or: you could skip all of that and use the fully functional mechanism
that already exists.
The problem is, at least from my point of view, that we don't have it.
The actual MIPS code doesn't work on ARM for numerous reasons. This is
first one ->
https://svnweb.freebsd.org/base/head/sys/powerpc/powerpc/nexus.c?revision=297000&view=markup#l186
The pre-r301453 ARM INTRNG doesn't support detached interrupts.
And r301453 doesn't changed nothing about this.
I'm happy to write whatever code is missing. The ARM implementation of
ofw_bus_map_intr() does seem fairly braindead and should be replaced.

So here is where I am right now:
- The perceived advantage of the new API seems to be that the mapping
information (interrupt parent, etc.) ends up in a struct resource
instead of in a centralized mapping table
- Additionally, it gives you a better shot at having the PIC online
before the PIC's interrupts are parsed (which is not required, but nice).
- Beyond these aesthetic points, there are no concrete examples of new
functionality added by this API, aside from some minor implementation
bugs of the old one on ARM that are easily fixed.
- There are, conversely, a number of concrete cases where this new API
would not be able to do the right thing. Some of these can be worked
around or fixed with significant restructuring in the MI parts of the
kernel.
- If we have both, the interrupt handling code in the MI parts of the
kernel will bifurcate. This patch alone has added a bunch of #ifdef to
kern/subr_bus.c, a new file to kern/, and lots of #ifdef to dev/ofw.
This is going to be really hard to maintain if we need both.

Is this all correct?

If so, can we please back this out until this discussion is complete?
I'm asking this formally at this point, under the Committer's Guide
section about reversion requests.
-Nathan
Post by Michal Meloun
Michal
Post by Nathan Whitehorn
This is my point in a nutshell, basically. This code reinvents a
perfectly functional wheel in a way that could, in the future, with a
huge amount of work, accomplish the same things. For now, it is less
functional and offers no visible advantages or new capabilities of any
kind -- nor are there any such capabilities it could obviously provide
in the future. Between now and that future, which will likely never
arrive, we will have to maintain both in parallel and cause the device
tree support on ARM and PowerPC to diverge.
-Nathan
Post by Michal Meloun
Post by Michal Meloun
Moreover, 'cross type' circular
dependencies are not that rare.
Resource dependencies are must solved (and resolved) at different level
than is INTRNG.
Blind resource allocation is not universal solution because given
resource may/must be accessible immediately after allocation (in many
cases).
Absolutely: for GPIOs, clocks, power, etc. you want a system that
looks different than the interrupt virtualization system. Probably
with extra resource types and maybe with some API similar to r301453.
But interrupts have different, and more complex, requirements and
cannot be mapped this way.
Post by Michal Meloun
staggered driver initialization combined with multipass bus initialization.
That does not solve the interrupt problem. If devices have interrupts
on their own children, no amount of multipass initialization can
possibly break the loop. Multipass would work for other kinds of
resources (clocks, power, etc.) and is a perfect match for those
resource types. A dynamic multipass (where a driver can return EAGAIN
or something from attach if its resources don't exist yet) would work
great.
*But* in the specific, special case of interrupts, it is not a
workable model. You could imagine some change to initialization that
gives devices a late attach and an early attach method and does
interrupts in the late part, but that is hugely invasive change that
would need to touch every single driver in the tree -- to solve a
problem that is already solved by interrupt virtualization.
-Nathan
Post by Michal Meloun
I'm sorry, I'm not able to express all this accurately and clearly, but
I've really tried...
Michal
_______________________________________________
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to
_______________________________________________
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
Svatopluk Kraus
2016-07-29 22:16:28 UTC
Permalink
Well, I'm online again, unfortanately, I'm quite busy, so just quick
response for now to the formal ask for the reversion.

On Fri, Jul 29, 2016 at 4:41 PM, Nathan Whitehorn
<***@freebsd.org> wrote:
[snip]
Post by Nathan Whitehorn
- The perceived advantage of the new API seems to be that the mapping
information (interrupt parent, etc.) ends up in a struct resource instead of
in a centralized mapping table
It's more than that. The change has made INTRNG framework not
dependent on OFW(FDT) stuff. So after next r301543, the framework is
clean of all additions related to various buses. It was something I
wanted to do before FreeBSD-11 release, once when adrian@ moved INTRNG
to sys/kern. The framework is used by arm, arm64 and mips now, so from
my point of view, it was quite important. The way how it was done is
not perfect and may be changed in the future. However, it does not
break anything, does not change old functionality, and only few bus
drivers were modified slightly. It was also only way which I and
Michal have agreed on considering current kernel code.
Post by Nathan Whitehorn
- Additionally, it gives you a better shot at having the PIC online before
the PIC's interrupts are parsed (which is not required, but nice).
For me, it's just correct design. Please, not all world is about OFW.
Post by Nathan Whitehorn
- Beyond these aesthetic points, there are no concrete examples of new
functionality added by this API, aside from some minor implementation bugs
of the old one on ARM that are easily fixed.
Please, don't use subjective attributes like "aesthetic".
Post by Nathan Whitehorn
- There are, conversely, a number of concrete cases where this new API would
not be able to do the right thing. Some of these can be worked around or
fixed with significant restructuring in the MI parts of the kernel.
I have not looked yet at these concrete cases, but which MI parts of
kernel do you think of? It may be supposed that some bus drivers would
be modified, but not MI parts of kernel!
Post by Nathan Whitehorn
- If we have both, the interrupt handling code in the MI parts of the kernel
will bifurcate. This patch alone has added a bunch of #ifdef to
kern/subr_bus.c, a new file to kern/, and lots of #ifdef to dev/ofw. This is
going to be really hard to maintain if we need both.
IMO, this point is bogus totally. The #ifdefs in kern/subr_bus.c were
added just to be polite to not-INTRNG kernels. They can be removed. No
one new file was added. Lots of #ifdef in dev/ofw were added because
ofw_bus_map_intr() return value is not checked in
ofw_bus_intr_to_rl(), so resource list entry is always added. They are
there also to clearly manifest INTRNG needs.
Post by Nathan Whitehorn
Is this all correct?
I don't think so. Further, considering the reversion, I don't think
that it can be done simply now. I suppose that more files were changed
afterwards when no bus header is polluted by the framework now.
However, as I have already wrote, this part of INTRNG may be changed
to serve well for everyone. On the other hand, IMO, a centralized
global interrupt mapping table is not good design, and if established
after all, it should not be a part of the framework. That said, I
suggest to continue with work on INTRNG.

Svata
Post by Nathan Whitehorn
If so, can we please back this out until this discussion is complete? I'm
asking this formally at this point, under the Committer's Guide section
about reversion requests.
-Nathan
Nathan Whitehorn
2016-07-30 01:35:30 UTC
Permalink
Post by Svatopluk Kraus
Well, I'm online again, unfortanately, I'm quite busy, so just quick
response for now to the formal ask for the reversion.
On Fri, Jul 29, 2016 at 4:41 PM, Nathan Whitehorn
[snip]
Post by Nathan Whitehorn
- The perceived advantage of the new API seems to be that the mapping
information (interrupt parent, etc.) ends up in a struct resource instead of
in a centralized mapping table
It's more than that. The change has made INTRNG framework not
dependent on OFW(FDT) stuff. So after next r301543, the framework is
clean of all additions related to various buses. It was something I
to sys/kern. The framework is used by arm, arm64 and mips now, so from
my point of view, it was quite important. The way how it was done is
not perfect and may be changed in the future. However, it does not
break anything, does not change old functionality, and only few bus
drivers were modified slightly. It was also only way which I and
Michal have agreed on considering current kernel code.
Except that the API cannot support all the existing things, so now we
have two APIs to support for the lifetime of the 11.x branch. This
needed extensive review from platform maintainers and from arch@, which
it did not get.
Post by Svatopluk Kraus
Post by Nathan Whitehorn
- Additionally, it gives you a better shot at having the PIC online before
the PIC's interrupts are parsed (which is not required, but nice).
For me, it's just correct design. Please, not all world is about OFW.
Of course not! But it does affect OFW platforms, which worked fine
before. I also disagree strongly about the "correct design" in this
case. This API is fragile since it requires coordination between
resource list enumeration and allocation and it provides fewer features
than the old one (which was not OFW-only either).
Post by Svatopluk Kraus
Post by Nathan Whitehorn
- Beyond these aesthetic points, there are no concrete examples of new
functionality added by this API, aside from some minor implementation bugs
of the old one on ARM that are easily fixed.
Please, don't use subjective attributes like "aesthetic".
If the issues are not aesthetic, then what concrete, technical problems
does this solve?
Post by Svatopluk Kraus
Post by Nathan Whitehorn
- There are, conversely, a number of concrete cases where this new API would
not be able to do the right thing. Some of these can be worked around or
fixed with significant restructuring in the MI parts of the kernel.
I have not looked yet at these concrete cases, but which MI parts of
kernel do you think of? It may be supposed that some bus drivers would
be modified, but not MI parts of kernel!
If you don't want a global IRQ lookup table, the PCI interrupt routing
APIs need to change, for one thing. You would also need a partial
delayed attach mechanism to handle bus bridges that themselves have
interrupts (hot-plug bridges, for example) and might have interrupt
controllers as children athat does not currently exist in newbus.
Post by Svatopluk Kraus
Post by Nathan Whitehorn
- If we have both, the interrupt handling code in the MI parts of the kernel
will bifurcate. This patch alone has added a bunch of #ifdef to
kern/subr_bus.c, a new file to kern/, and lots of #ifdef to dev/ofw. This is
going to be really hard to maintain if we need both.
IMO, this point is bogus totally. The #ifdefs in kern/subr_bus.c were
added just to be polite to not-INTRNG kernels. They can be removed. No
one new file was added. Lots of #ifdef in dev/ofw were added because
ofw_bus_map_intr() return value is not checked in
ofw_bus_intr_to_rl(), so resource list entry is always added. They are
there also to clearly manifest INTRNG needs.
#ifdef or not, there are now two unrelated mechanisms and code paths for
dealing with device tree interrupts. One of them cannot possibly be used
on PowerPC (this one) and so there will have to be two parallel code
paths in perpetuity.
Post by Svatopluk Kraus
Post by Nathan Whitehorn
Is this all correct?
I don't think so. Further, considering the reversion, I don't think
that it can be done simply now. I suppose that more files were changed
afterwards when no bus header is polluted by the framework now.
However, as I have already wrote, this part of INTRNG may be changed
to serve well for everyone. On the other hand, IMO, a centralized
global interrupt mapping table is not good design, and if established
after all, it should not be a part of the framework. That said, I
suggest to continue with work on INTRNG.
It cannot be done simply because the code was checked in without review
or warning during a code slush with a cryptic commit message and now we
are stuck with it for 11.x. In HEAD, there are only a few commits based
on this that seem to provide no functional changes, so I really don't
think reverting would be that bad.

My concerns here are technical and I will stop complaining about this
instantly if:
1. Anyone can point me to a single concrete example of a problem solved
by this API that could not be solved with the existing code. You spent
time writing this code, so there must be such a case!
2. Anyone can tell me how to implement the cases in my email to arch and
on the wiki at https://wiki.freebsd.org/Complicated_Interrupts using
this API. These work perfectly fine with the normal newbus APIs but
appear to break with this one.

This affects me directly since I maintain, or try to maintain, the
content of dev/ofw and a platform (PowerPC) that relies on this code.
I'm happy to switch the interrupt architecture, but it needs to be at
least as functional as the old code to do that. Ideally, it would also
be *more* functional so that it wasn't just churn -- but I would be
willing to do the work regardless.

As it is, however, having this new API that seemingly can't be supported
on one of our platforms imposes a significant burden on doing those
things. That absolutely needs to be resolved before we can continue
here. The normal procedure would be to revert, have a discussion, and
then recommit. If you refuse to do that, or to have any meaningful
discussion about the design of this patch, I am not sure what the path
forward is.
-Nathan
Post by Svatopluk Kraus
Svata
Post by Nathan Whitehorn
If so, can we please back this out until this discussion is complete? I'm
asking this formally at this point, under the Committer's Guide section
about reversion requests.
-Nathan
Svatopluk Kraus
2016-07-31 14:22:13 UTC
Permalink
Unfortunatelly, I have no time now to read back all this message
thread and related ones. Nevertheless, I found time to summarize the
main ideas about INTRNG design related to this discussion.

Svata

-----------------------------------------------------------

INTRNG is designed with three main ideas:

(1) An interrupt is identified by one and only unique number.
(2) This unique number is assigned only to registered interrupt.
(3) The framework does not know any interrupt mapping encoding semantics.

A general intr_map_irq() serves to get this unique number for an
interrupt mapping. As INTRNG itself does not know any interrupt
mapping encoding semantics, it pushes the request to a PIC. Of course,
a PIC specification must be provided as arguments to this function. A
PIC is specified by either device_t or/and xref (intptr_t). It means
that such PIC should be already registered in INTRNG.

The idea that INTRNG does not know any interrupt mapping encoding
semantics is crucial. It makes INTRNG true general framework. Any
interrupt mapping is fully transparent for INTRNG. Any new interrupt
mapping semantics can be added without INTRNG concern.

That said, centralized interrupt mapping table is no concern for
INTRNG as INTRNG works with its interrupt numbers, not with numbers
associated with something else.

In general, an interrupt mapping table can be established anywhere and
clients of this table may use indexes to this table as interrupt
specifiers. But if INTRNG bus interface functions are used, a
translation to INTRNG interrupt numbers must be done (see paragraph
above about intr_map_irq()). In fact, this is a natural case of
bridges where interrupts must be remapped.

The main reasons why INTRNG does not use a centralized interrupt
mapping table instead of an interrupt table are the following:

(1) As not only one interrupt mappings can exist for an interrupt in
the very same time, it's far more simple, sane, and framework like.
One interrupt, one interrupt number, one data associated, no need to
know any interrupt mapping semantics.

(2) An interrupt mapping is associated with consumer driver and
belongs to this driver or some of its parent driver. It's not both
sane and newbus like to store an interrupt mapping in some third party
- in INTRNG in this case.
-----------------------------------------------------------
Nathan Whitehorn
2016-07-31 14:54:36 UTC
Permalink
Post by Svatopluk Kraus
Unfortunatelly, I have no time now to read back all this message
thread and related ones. Nevertheless, I found time to summarize the
main ideas about INTRNG design related to this discussion.
Svata
Thank you for the summary. It would be great if you could look at the
wiki page at https://wiki.freebsd.org/Complicated_Interrupts at some
point and see how it corresponds to this. In particular, there are five
cases at the end that I cannot figure out to implement with your API and
that are essential to support PowerPC. I have a few comments inline below.
-Nathan
Post by Svatopluk Kraus
-----------------------------------------------------------
(1) An interrupt is identified by one and only unique number.
(2) This unique number is assigned only to registered interrupt.
(3) The framework does not know any interrupt mapping encoding semantics.
This is also true of the previous system; these are good design principles.
Post by Svatopluk Kraus
A general intr_map_irq() serves to get this unique number for an
interrupt mapping. As INTRNG itself does not know any interrupt
mapping encoding semantics, it pushes the request to a PIC. Of course,
a PIC specification must be provided as arguments to this function. A
PIC is specified by either device_t or/and xref (intptr_t). It means
that such PIC should be already registered in INTRNG.
The idea that INTRNG does not know any interrupt mapping encoding
semantics is crucial. It makes INTRNG true general framework. Any
interrupt mapping is fully transparent for INTRNG. Any new interrupt
mapping semantics can be added without INTRNG concern.
That said, centralized interrupt mapping table is no concern for
INTRNG as INTRNG works with its interrupt numbers, not with numbers
associated with something else.
All of this, again, is true of the old code. You provide a PIC
specification and data meaningful to that PIC, you get back an abstract
IRQ. The core interrupt code neither knows nor cares about what that
information means.
Post by Svatopluk Kraus
In general, an interrupt mapping table can be established anywhere and
clients of this table may use indexes to this table as interrupt
specifiers. But if INTRNG bus interface functions are used, a
translation to INTRNG interrupt numbers must be done (see paragraph
above about intr_map_irq()). In fact, this is a natural case of
bridges where interrupts must be remapped.
Yes, of course.
Post by Svatopluk Kraus
The main reasons why INTRNG does not use a centralized interrupt
(1) As not only one interrupt mappings can exist for an interrupt in
the very same time, it's far more simple, sane, and framework like.
One interrupt, one interrupt number, one data associated, no need to
know any interrupt mapping semantics.
(2) An interrupt mapping is associated with consumer driver and
belongs to this driver or some of its parent driver. It's not both
sane and newbus like to store an interrupt mapping in some third party
- in INTRNG in this case.
I agree with this completely. The sticking point is whether you
associate PIC-specific interrupt data with the interrupt when the bus
parent enumerates the bus or when resources are allocated with
bus_alloc_resource().

Doing it at enumeration time means that there is a robust relationship
between what the interrupt specifier the bus parent meant to assign and
what actually *is* assigned and fixes a large number of issues where
interrupts are enumerated by code over which the bus parent does not
have full control: its children, the MI PCI code, etc.

Doing it when the resources are allocated, as r301453 does, makes it
much more fragile (as well as a little confusing, since what interrupts
refer to is only set long after they are allocated). For example, I have
no idea how to implement PCIB_ROUTE_INTERRUPT(), and so non-MSI PCI
interrupts, with this framework and MSI interrupts work only in simple
cases. This is a critical thing: It means I can't support PCI. I could
hack around the issue with a mapping table and API that records which
abstract OFW IRQ meant what specifier and iparent and put that in, say,
nexus, using it to do the right thing when bus_alloc_resource() is later
called on those specifiers -- but we are immediately back to the
pre-310453 API at that point. There are a number of examples (not
complete, but illustrative) of other such cases on the referenced wiki page.
-Nathan
Post by Svatopluk Kraus
-----------------------------------------------------------
_______________________________________________
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
Adrian Chadd
2016-07-30 06:55:08 UTC
Permalink
Post by Svatopluk Kraus
Well, I'm online again, unfortanately, I'm quite busy, so just quick
response for now to the formal ask for the reversion.
On Fri, Jul 29, 2016 at 4:41 PM, Nathan Whitehorn
[snip]
Post by Nathan Whitehorn
- The perceived advantage of the new API seems to be that the mapping
information (interrupt parent, etc.) ends up in a struct resource instead of
in a centralized mapping table
It's more than that. The change has made INTRNG framework not
dependent on OFW(FDT) stuff. So after next r301543, the framework is
clean of all additions related to various buses. It was something I
to sys/kern. The framework is used by arm, arm64 and mips now, so from
my point of view, it was quite important. The way how it was done is
not perfect and may be changed in the future. However, it does not
break anything, does not change old functionality, and only few bus
drivers were modified slightly. It was also only way which I and
Michal have agreed on considering current kernel code.
Post by Nathan Whitehorn
- Additionally, it gives you a better shot at having the PIC online before
the PIC's interrupts are parsed (which is not required, but nice).
For me, it's just correct design. Please, not all world is about OFW.
Hi,

Right, but the world does include open firmware, and Nathan/Justin
have a lot of experience over on the powerpc side with all the crazy
variants on interrupt enumeration and handling there. I think you and
Nathan are sort of on the same page, but sort of not on the same page,
and Nathan is trying to ensure things don't become more complicated
moving forward.

I really do suggest this stuff get written down and thought about a
little more, and not just in the context of the ARM style GIC/PIC+GPIO
interrupt setups that people are writing code for. Don't get me wrong,
I'm glad it's happening, but there are more consumers involved than
just ARM. :)

So, are you/Michal willing to describe some of the ARM
interrupt/platform hilarity on his wiki page
(https://wiki.freebsd.org/Complicated_Interrupts ) so everyone can be
on the same page (heh!) ? Right now there's a lot of information
floating around in emails but it's hard to pull out exactly what's
going on unless you're knee deep in it all working on that platform.
I'm trying to figure it out too for the MIPS + GPIO interrupt stuff
that people are now doing with the QCA MIPS hardware and I'd like to
make sure I know what's required for the QCA ARM (Dakota/IPQ8019)
hardware.

Thanks!



-adrian



-adrian
Michal Meloun
2016-07-30 16:18:29 UTC
Permalink
Dne 29.07.2016 v 16:41 Nathan Whitehorn napsal(a):
[snip]
Svata is online now, so i think that he is more competent for responding
to all questions about INTRNG internals.
Post by Nathan Whitehorn
Post by Michal Meloun
Post by Nathan Whitehorn
Post by Michal Meloun
But, as you wrote, INTRNG code is "absolutely free to map and dispatch
multiple virtual IRQs from the same shared interrupt pin" (allow me to
expand this also to INTRNG code itself) .
In attempt to simply situation abound cache, in first step, we removed
duplicates check from above model. This also removes unpleasantly
sharing of entries.
This step converts cache to some sort of duplicate of resource list
entries. Each existing RLE have exactly one corresponding entry in cache.
Also, the cache is not needed for INTRNG core. At this point, its only
transfers data from ofw_bus_intr_to_rl() to consumers PIC.
So, if we are able to attach 'key' to RLE, then we can remove the whole
cache machinery from INTRNG.
Do you agree?
1. Devices attaching before their interrupt controllers.
You still expect that data readed from OFW must be parsed at RLE
creation time. I don't see single reason why we cannot postpone
parsing to bus_alloc_resource() time.
At this point, its irrelevant if bus_alloc_resource() supports
detached PIC or not.
Because there are *lots* of cases in the kernel where interrupts are
represented as a number and not a struct resource *. I've listed many
of them; I'm sure there are more. PCI is the most notable example
(both LSI and MSI). For both of these, the route_interrupt (LSI) and
alloc_msi[x]() (MSI) functions give the bus driver a chance to look at
the device tree and then they return an IRQ number. There doesn't seem
to be any way to map that back to a struct resource * or something
besides an internal table.
Moreover, there are cases (self-added interrupts by children, for
instance) in which the RL assigned by the parent does not reflect the
final RL used by the child and there is no way for the parent bus to
map an RID to a device tree entry.
When we moved to doing this at interrupt assignment time rather than
resource allocation time or another occasion on PowerPC, it was for
these kinds of reasons. There were ways to do it later, but they were
hugely invasive and involved changing large parts of the kernel. Even
then, it would still be fragile: Trying to guess what device tree
entry was meant at bus_alloc_resource() time is much more error-prone
than doing the mapping when reading that device tree to assign the
resources in the first place, when you can make guarantees.
Post by Michal Meloun
Post by Nathan Whitehorn
2. Interrupts encoded entirely in a 32-bit integer rather than a
struct resource * in a shared rman. This is required to support PCI
(both LSI and MSI).
#2 could of course be fixed by completely retooling the API of the PCI
bus code, but, since it fixes something that currently is not a
problem, why would we do that?
All relevant bus functions have struct resource * as an argument. And,
in post r301453 kernel, all necessary data are attached to it.
I don't see any reason for API change.
int PCIB_ROUTE_INTERRUPT(device_t, device_t, pin)
Which takes a PIN (A, B, C, D) and returns the IRQ to add to the RL.
This is the only real opportunity to parse an interrupt-map and does
not deal with struct resource.
There are also PCIB_ALLOC_MSI[X], which take a device_t and count and
return a vector of IRQ numbers.
How would you do resource allocation given these APIs? There seem to
be some dubious-looking attempts at generic MSI mapping code in the
new kern/subr_intr.c that don't allow the specification of more than
32-bit interrupt specifiers and otherwise exactly duplicate the
pre-301453 flow for MSIs, but in a more complicated and
less-functional way.
(While investigating this, I also just found
dev/pci/pci_host_generic.c, which mostly does the wrong things, is
actually specific to a couple of ARM boards, and duplicates code in
dev/ofw/ofwpci.c. But that's another discussion...)
[snip][2]
Post by Nathan Whitehorn
Post by Michal Meloun
Post by Nathan Whitehorn
Post by Michal Meloun
Post by Nathan Whitehorn
Post by Michal Meloun
Each OFW device has a lot of dependencies. It must enable clocks, power
regulators..., it must be taken from reset. All these action must be
done before we can access single register.
Most of these action can be done only with attached
clock/power/reset
controller.
Within this, interrupts are not special.
They are actually quite special in that the kernel enables them late
and so you can defer setup. They are also special in that, for that
reason, it is possible to design systems with circular dependency
graphs with interrupts. It is not possible to do this with, for
example, clocks: if I need to apply a clock to the clock controller
with itself, the system is just not bootable and such a system will
not be built or shipped. Interrupts need only happen later, after
device setup, and so such systems *are* designed and shipped. The same
is true for power.
The G5 problem is standard 'cross type' circular dependency. You must
(at BUS_PASS_BUS) initialize memory address window of PCI bridge and
start bus enumeration.
Rest of PCI bridge initialization (including all IRQ stuff) must be
postponed to any later pass. Of course, all interrupt controllers must
be attached at BUS_PASS_INTERRUPT.
Or I missed something?
Yes. As I said, you could solve it that way. It would, of course,
require a massive retooling of newbus (to allow partial delayed
attach), everything in dev/pci (which doesn't expect that), and
(almost) every PCI driver in and out of the tree.
I still think that only host to PCI bridge must me modified, and PCI-PCI
bridge must be run at BUS_PASS_BUS (which is single line modification).
On Powermacs, one PIC is 5 levels deep in the bus hierarchy, under 3
PCI<->PCI bridges and a device that also has an ATA controller and a
bunch of other things.
Yes, I see. And, as a said before, it's pretty common situation in OFW
based world.
Imho, all whats is needed is:
https://github.com/strejda/freebsd/commit/96ab64e04dcebb131643c2ae1cf373194d23e8e5
(of course, it's not tested, I havn't access to powermac HW)

It's hard for me to understand why you say that this change requires
"massive retooling of newbus, everything in dev/pci, every PCI driver).
Post by Nathan Whitehorn
Some of those can have their own interrupts (which is fairly common
for error reporting or PCI-E hotplug). It wouldn't be impossible, but
it is decidedly nontrivial.
That's true. Any 'bus' driver, in multipass environment, *must* export
provided resources, make bus accessible and enumerate it at
'BUS_PASS_BUS', and
consume resources at some later bus pass.
At this time, only pcie-hotplug needs this change. But I don't see why
a few lines change is "decidedly nontrivial".
Post by Nathan Whitehorn
Post by Michal Meloun
Post by Nathan Whitehorn
Or: you could skip all of that and use the fully functional mechanism
that already exists.
The problem is, at least from my point of view, that we don't have it.
The actual MIPS code doesn't work on ARM for numerous reasons. This is
first one ->
https://svnweb.freebsd.org/base/head/sys/powerpc/powerpc/nexus.c?revision=297000&view=markup#l186
The pre-r301453 ARM INTRNG doesn't support detached interrupts.
And r301453 doesn't changed nothing about this.
I'm happy to write whatever code is missing. The ARM implementation of
ofw_bus_map_intr() does seem fairly braindead and should be replaced.
- The perceived advantage of the new API seems to be that the mapping
information (interrupt parent, etc.) ends up in a struct resource
instead of in a centralized mapping table
True. And, in optimal case, also in RLE. See [1].
Post by Nathan Whitehorn
- Additionally, it gives you a better shot at having the PIC online
before the PIC's interrupts are parsed (which is not required, but nice).
Nope, we simply fount that detached pics is very dangerous and not
needed. But, if you love it, it can be added as MD extension.
Post by Nathan Whitehorn
- Beyond these aesthetic points, there are no concrete examples of new
functionality added by this API, aside from some minor implementation
bugs of the old one on ARM that are easily fixed.
Really? Or you just ignore it?
Post by Nathan Whitehorn
- There are, conversely, a number of concrete cases where this new API
would not be able to do the right thing. Some of these can be worked
around or fixed with significant restructuring in the MI parts of the
kernel.
And i offer you a fix, without "significant restructuring in the MI
parts of the kernel". Unfortunately, you did not want to accept anything
other than the old API.
Post by Nathan Whitehorn
- If we have both, the interrupt handling code in the MI parts of the
kernel will bifurcate. This patch alone has added a bunch of #ifdef to
kern/subr_bus.c, a new file to kern/, and lots of #ifdef to dev/ofw.
This is going to be really hard to maintain if we need both.
All those #ifdefs has been added as safeguards for 11/stable and they
can be removed immediately.
Michal
Post by Nathan Whitehorn
Is this all correct?
If so, can we please back this out until this discussion is complete?
I'm asking this formally at this point, under the Committer's Guide
section about reversion requests.
-Nathan
Nathan Whitehorn
2016-07-30 16:59:03 UTC
Permalink
[snip]
Svata is online now, so i think that he is more competent for responding
to all questions about INTRNG internals.
Post by Nathan Whitehorn
Post by Michal Meloun
Post by Nathan Whitehorn
Post by Michal Meloun
But, as you wrote, INTRNG code is "absolutely free to map and dispatch
multiple virtual IRQs from the same shared interrupt pin" (allow me to
expand this also to INTRNG code itself) .
In attempt to simply situation abound cache, in first step, we removed
duplicates check from above model. This also removes unpleasantly
sharing of entries.
This step converts cache to some sort of duplicate of resource list
entries. Each existing RLE have exactly one corresponding entry in cache.
Also, the cache is not needed for INTRNG core. At this point, its only
transfers data from ofw_bus_intr_to_rl() to consumers PIC.
So, if we are able to attach 'key' to RLE, then we can remove the whole
cache machinery from INTRNG.
Do you agree?
1. Devices attaching before their interrupt controllers.
You still expect that data readed from OFW must be parsed at RLE
creation time. I don't see single reason why we cannot postpone
parsing to bus_alloc_resource() time.
At this point, its irrelevant if bus_alloc_resource() supports
detached PIC or not.
Because there are *lots* of cases in the kernel where interrupts are
represented as a number and not a struct resource *. I've listed many
of them; I'm sure there are more. PCI is the most notable example
(both LSI and MSI). For both of these, the route_interrupt (LSI) and
alloc_msi[x]() (MSI) functions give the bus driver a chance to look at
the device tree and then they return an IRQ number. There doesn't seem
to be any way to map that back to a struct resource * or something
besides an internal table.
Moreover, there are cases (self-added interrupts by children, for
instance) in which the RL assigned by the parent does not reflect the
final RL used by the child and there is no way for the parent bus to
map an RID to a device tree entry.
When we moved to doing this at interrupt assignment time rather than
resource allocation time or another occasion on PowerPC, it was for
these kinds of reasons. There were ways to do it later, but they were
hugely invasive and involved changing large parts of the kernel. Even
then, it would still be fragile: Trying to guess what device tree
entry was meant at bus_alloc_resource() time is much more error-prone
than doing the mapping when reading that device tree to assign the
resources in the first place, when you can make guarantees.
Post by Michal Meloun
Post by Nathan Whitehorn
2. Interrupts encoded entirely in a 32-bit integer rather than a
struct resource * in a shared rman. This is required to support PCI
(both LSI and MSI).
#2 could of course be fixed by completely retooling the API of the PCI
bus code, but, since it fixes something that currently is not a
problem, why would we do that?
All relevant bus functions have struct resource * as an argument. And,
in post r301453 kernel, all necessary data are attached to it.
I don't see any reason for API change.
int PCIB_ROUTE_INTERRUPT(device_t, device_t, pin)
Which takes a PIN (A, B, C, D) and returns the IRQ to add to the RL.
This is the only real opportunity to parse an interrupt-map and does
not deal with struct resource.
There are also PCIB_ALLOC_MSI[X], which take a device_t and count and
return a vector of IRQ numbers.
How would you do resource allocation given these APIs? There seem to
be some dubious-looking attempts at generic MSI mapping code in the
new kern/subr_intr.c that don't allow the specification of more than
32-bit interrupt specifiers and otherwise exactly duplicate the
pre-301453 flow for MSIs, but in a more complicated and
less-functional way.
(While investigating this, I also just found
dev/pci/pci_host_generic.c, which mostly does the wrong things, is
actually specific to a couple of ARM boards, and duplicates code in
dev/ofw/ofwpci.c. But that's another discussion...)
[snip][2]
Post by Nathan Whitehorn
Post by Michal Meloun
Post by Nathan Whitehorn
Post by Michal Meloun
Post by Nathan Whitehorn
Post by Michal Meloun
Each OFW device has a lot of dependencies. It must enable clocks, power
regulators..., it must be taken from reset. All these action must be
done before we can access single register.
Most of these action can be done only with attached
clock/power/reset
controller.
Within this, interrupts are not special.
They are actually quite special in that the kernel enables them late
and so you can defer setup. They are also special in that, for that
reason, it is possible to design systems with circular dependency
graphs with interrupts. It is not possible to do this with, for
example, clocks: if I need to apply a clock to the clock controller
with itself, the system is just not bootable and such a system will
not be built or shipped. Interrupts need only happen later, after
device setup, and so such systems *are* designed and shipped. The same
is true for power.
The G5 problem is standard 'cross type' circular dependency. You must
(at BUS_PASS_BUS) initialize memory address window of PCI bridge and
start bus enumeration.
Rest of PCI bridge initialization (including all IRQ stuff) must be
postponed to any later pass. Of course, all interrupt controllers must
be attached at BUS_PASS_INTERRUPT.
Or I missed something?
Yes. As I said, you could solve it that way. It would, of course,
require a massive retooling of newbus (to allow partial delayed
attach), everything in dev/pci (which doesn't expect that), and
(almost) every PCI driver in and out of the tree.
I still think that only host to PCI bridge must me modified, and PCI-PCI
bridge must be run at BUS_PASS_BUS (which is single line modification).
On Powermacs, one PIC is 5 levels deep in the bus hierarchy, under 3
PCI<->PCI bridges and a device that also has an ATA controller and a
bunch of other things.
Yes, I see. And, as a said before, it's pretty common situation in OFW
based world.
https://github.com/strejda/freebsd/commit/96ab64e04dcebb131643c2ae1cf373194d23e8e5
(of course, it's not tested, I havn't access to powermac HW)
That would probably work on at least some of the hardware. Some of those
parents can have their own interrupts, however, which would break. I'll
investigate how often this happens.
It's hard for me to understand why you say that this change requires
"massive retooling of newbus, everything in dev/pci, every PCI driver).
Post by Nathan Whitehorn
Some of those can have their own interrupts (which is fairly common
for error reporting or PCI-E hotplug). It wouldn't be impossible, but
it is decidedly nontrivial.
That's true. Any 'bus' driver, in multipass environment, *must* export
provided resources, make bus accessible and enumerate it at
'BUS_PASS_BUS', and
consume resources at some later bus pass.
At this time, only pcie-hotplug needs this change. But I don't see why
a few lines change is "decidedly nontrivial".
It's not just a few lines change. Newbus provides no mechanism for a bus
to attach at two different bus passes.
Post by Nathan Whitehorn
Post by Michal Meloun
Post by Nathan Whitehorn
Or: you could skip all of that and use the fully functional mechanism
that already exists.
The problem is, at least from my point of view, that we don't have it.
The actual MIPS code doesn't work on ARM for numerous reasons. This is
first one ->
https://svnweb.freebsd.org/base/head/sys/powerpc/powerpc/nexus.c?revision=297000&view=markup#l186
The pre-r301453 ARM INTRNG doesn't support detached interrupts.
And r301453 doesn't changed nothing about this.
I'm happy to write whatever code is missing. The ARM implementation of
ofw_bus_map_intr() does seem fairly braindead and should be replaced.
- The perceived advantage of the new API seems to be that the mapping
information (interrupt parent, etc.) ends up in a struct resource
instead of in a centralized mapping table
True. And, in optimal case, also in RLE. See [1].
Your code also has a centralized table (static struct intr_irqsrc
*irq_sources[NIRQ];), basically the same one, so I'm actually really
confused on this point. The only actual difference seems to be that the
firmware interrupt descriptor is stored in the RLE at
bus_alloc_resource() time instead of in the table at bus enumeration
time and that the new code requires some extra bus methods.
Post by Nathan Whitehorn
- Additionally, it gives you a better shot at having the PIC online
before the PIC's interrupts are parsed (which is not required, but nice).
Nope, we simply fount that detached pics is very dangerous and not
needed. But, if you love it, it can be added as MD extension.
Dangerous how? I don't "love it"; it's an unfortunate necessity.
Post by Nathan Whitehorn
- Beyond these aesthetic points, there are no concrete examples of new
functionality added by this API, aside from some minor implementation
bugs of the old one on ARM that are easily fixed.
Really? Or you just ignore it?
Which ones? I've been asking for a week and a half now and you have
responded with some hypothetical examples, all of which either (a) do
not seem to occur in the real world and (b) were trivially implementable
with the existing code.
Post by Nathan Whitehorn
- There are, conversely, a number of concrete cases where this new API
would not be able to do the right thing. Some of these can be worked
around or fixed with significant restructuring in the MI parts of the
kernel.
And i offer you a fix, without "significant restructuring in the MI
parts of the kernel". Unfortunately, you did not want to accept anything
other than the old API.
For one of them, partially. For the others, not so much. For example, I
have no idea how to implement PCI interrupts (PCIB_ROUTE_INTERRUPT) in
this framework. I am perfectly happy to accept a new API. What I am not
perfectly happy to accept is an API that makes it impossible for me to
support the platforms that I need to support and that, simultaneously,
doesn't even provide any new capabilities on other platforms.

The current API is certainly a bit of a hack and I would be pleased to
see it go; but the replacement needs to be more functional and more
robust. As far as I can tell, I literally *cannot* move PowerPC over to
this architecture.

The ideal API would be one in which the resource enumeration code could
assign, instead of numbers, some structured information that contains
the full firmware specifier (string parent + interrupt pin for ACPI,
interrupt parent phandle + specifier for device tree, bare numbers for
simple systems). That seemed, and seems, too invasive. I think we agree
on this and have chosen to approximate that API in two different ways.

When we designed the interrupt mapping code, we evaluated a large number
of possible approaches. Something very much like this was one of them.
It was rejected as being too fragile (it requires resource allocation
and enumeration to be coupled) and to have too many corner cases (the
PCI MSI and LSI APIs, for instance). The interrupt mapping stuff seemed
at the time, and so far still seems, like the least-bad approximation to
the ideal case: it is essentially that ideal API, in that it happens at
bus enumeration and involves just assigning the firmware specifiers, but
with lookup keys to that information stuffed into the 32-bit numbers
that the rest of the system uses.
Post by Nathan Whitehorn
- If we have both, the interrupt handling code in the MI parts of the
kernel will bifurcate. This patch alone has added a bunch of #ifdef to
kern/subr_bus.c, a new file to kern/, and lots of #ifdef to dev/ofw.
This is going to be really hard to maintain if we need both.
All those #ifdefs has been added as safeguards for 11/stable and they
can be removed immediately.
They should not be and I will request immediate reversion and appeal to
core@ if they are before this discussion is complete.
-Nathan
Michal
Post by Nathan Whitehorn
Is this all correct?
If so, can we please back this out until this discussion is complete?
I'm asking this formally at this point, under the Committer's Guide
section about reversion requests.
-Nathan
_______________________________________________
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
Warner Losh
2016-07-30 18:06:30 UTC
Permalink
On Sat, Jul 30, 2016 at 10:59 AM, Nathan Whitehorn
It's not just a few lines change. Newbus provides no mechanism for a bus to
attach at two different bus passes.
Sure it does. That's used in the Atmel code (which might not be in the tree) for
some things. Newbus lets you know for each new pass that something is happening.

Perhaps you could be clearer as to what exactly it doesn't provide.

Warner
Nathan Whitehorn
2016-07-30 19:26:02 UTC
Permalink
Post by Warner Losh
On Sat, Jul 30, 2016 at 10:59 AM, Nathan Whitehorn
It's not just a few lines change. Newbus provides no mechanism for a bus to
attach at two different bus passes.
Sure it does. That's used in the Atmel code (which might not be in the tree) for
some things. Newbus lets you know for each new pass that something is happening.
Perhaps you could be clearer as to what exactly it doesn't provide.
Warner
interesting. I hadn't realized that. We'd need to find all the bus
drivers in the tree and make them attach this way, I guess.

There are still some ordering dilemmas here coming from the static
nature of the ordering. For example, if you have two PICs supported by
the same driver, one of which is attached to the other, how do you order
their attachments? I have a whole list of these at
https://wiki.freebsd.org/Complicated_Interrupts. The list is basically
the full set of weird cases we support with the current code. It would
be great to get some responses to how you would implement those specific
things with this new API.

As usual, there are two things I would like to see:
1. A case of something that wasn't supported by the old API, but is
supported by the new API.
2. How I would support all of the complicated cases the existing code
supports, listed on the wiki, with the new API.
-Nathan
Michal Meloun
2016-07-30 18:54:00 UTC
Permalink
Post by Nathan Whitehorn
[snip]
Svata is online now, so i think that he is more competent for responding
to all questions about INTRNG internals.
Post by Nathan Whitehorn
Post by Michal Meloun
Post by Nathan Whitehorn
Post by Michal Meloun
But, as you wrote, INTRNG code is "absolutely free to map and dispatch
multiple virtual IRQs from the same shared interrupt pin" (allow me to
expand this also to INTRNG code itself) .
In attempt to simply situation abound cache, in first step, we removed
duplicates check from above model. This also removes unpleasantly
sharing of entries.
This step converts cache to some sort of duplicate of resource list
entries. Each existing RLE have exactly one corresponding entry in cache.
Also, the cache is not needed for INTRNG core. At this point, its only
transfers data from ofw_bus_intr_to_rl() to consumers PIC.
So, if we are able to attach 'key' to RLE, then we can remove the whole
cache machinery from INTRNG.
Do you agree?
1. Devices attaching before their interrupt controllers.
You still expect that data readed from OFW must be parsed at RLE
creation time. I don't see single reason why we cannot postpone
parsing to bus_alloc_resource() time.
At this point, its irrelevant if bus_alloc_resource() supports
detached PIC or not.
Because there are *lots* of cases in the kernel where interrupts are
represented as a number and not a struct resource *. I've listed many
of them; I'm sure there are more. PCI is the most notable example
(both LSI and MSI). For both of these, the route_interrupt (LSI) and
alloc_msi[x]() (MSI) functions give the bus driver a chance to look at
the device tree and then they return an IRQ number. There doesn't seem
to be any way to map that back to a struct resource * or something
besides an internal table.
Moreover, there are cases (self-added interrupts by children, for
instance) in which the RL assigned by the parent does not reflect the
final RL used by the child and there is no way for the parent bus to
map an RID to a device tree entry.
When we moved to doing this at interrupt assignment time rather than
resource allocation time or another occasion on PowerPC, it was for
these kinds of reasons. There were ways to do it later, but they were
hugely invasive and involved changing large parts of the kernel. Even
then, it would still be fragile: Trying to guess what device tree
entry was meant at bus_alloc_resource() time is much more error-prone
than doing the mapping when reading that device tree to assign the
resources in the first place, when you can make guarantees.
Post by Michal Meloun
Post by Nathan Whitehorn
2. Interrupts encoded entirely in a 32-bit integer rather than a
struct resource * in a shared rman. This is required to support PCI
(both LSI and MSI).
#2 could of course be fixed by completely retooling the API of the PCI
bus code, but, since it fixes something that currently is not a
problem, why would we do that?
All relevant bus functions have struct resource * as an argument. And,
in post r301453 kernel, all necessary data are attached to it.
I don't see any reason for API change.
int PCIB_ROUTE_INTERRUPT(device_t, device_t, pin)
Which takes a PIN (A, B, C, D) and returns the IRQ to add to the RL.
This is the only real opportunity to parse an interrupt-map and does
not deal with struct resource.
There are also PCIB_ALLOC_MSI[X], which take a device_t and count and
return a vector of IRQ numbers.
How would you do resource allocation given these APIs? There seem to
be some dubious-looking attempts at generic MSI mapping code in the
new kern/subr_intr.c that don't allow the specification of more than
32-bit interrupt specifiers and otherwise exactly duplicate the
pre-301453 flow for MSIs, but in a more complicated and
less-functional way.
(While investigating this, I also just found
dev/pci/pci_host_generic.c, which mostly does the wrong things, is
actually specific to a couple of ARM boards, and duplicates code in
dev/ofw/ofwpci.c. But that's another discussion...)
[snip][2]
Post by Nathan Whitehorn
Post by Michal Meloun
Post by Nathan Whitehorn
Post by Michal Meloun
Post by Nathan Whitehorn
Post by Michal Meloun
Each OFW device has a lot of dependencies. It must enable clocks, power
regulators..., it must be taken from reset. All these action must be
done before we can access single register.
Most of these action can be done only with attached
clock/power/reset
controller.
Within this, interrupts are not special.
They are actually quite special in that the kernel enables them late
and so you can defer setup. They are also special in that, for that
reason, it is possible to design systems with circular dependency
graphs with interrupts. It is not possible to do this with, for
example, clocks: if I need to apply a clock to the clock controller
with itself, the system is just not bootable and such a system will
not be built or shipped. Interrupts need only happen later, after
device setup, and so such systems *are* designed and shipped. The same
is true for power.
The G5 problem is standard 'cross type' circular dependency. You must
(at BUS_PASS_BUS) initialize memory address window of PCI bridge and
start bus enumeration.
Rest of PCI bridge initialization (including all IRQ stuff) must be
postponed to any later pass. Of course, all interrupt controllers must
be attached at BUS_PASS_INTERRUPT.
Or I missed something?
Yes. As I said, you could solve it that way. It would, of course,
require a massive retooling of newbus (to allow partial delayed
attach), everything in dev/pci (which doesn't expect that), and
(almost) every PCI driver in and out of the tree.
I still think that only host to PCI bridge must me modified, and PCI-PCI
bridge must be run at BUS_PASS_BUS (which is single line
modification).
On Powermacs, one PIC is 5 levels deep in the bus hierarchy, under 3
PCI<->PCI bridges and a device that also has an ATA controller and a
bunch of other things.
Yes, I see. And, as a said before, it's pretty common situation in OFW
based world.
https://github.com/strejda/freebsd/commit/96ab64e04dcebb131643c2ae1cf373194d23e8e5
(of course, it's not tested, I havn't access to powermac HW)
That would probably work on at least some of the hardware. Some of
those parents can have their own interrupts, however, which would
break. I'll investigate how often this happens.
It's hard for me to understand why you say that this change requires
"massive retooling of newbus, everything in dev/pci, every PCI driver).
Post by Nathan Whitehorn
Some of those can have their own interrupts (which is fairly common
for error reporting or PCI-E hotplug). It wouldn't be impossible, but
it is decidedly nontrivial.
That's true. Any 'bus' driver, in multipass environment, *must* export
provided resources, make bus accessible and enumerate it at
'BUS_PASS_BUS', and
consume resources at some later bus pass.
At this time, only pcie-hotplug needs this change. But I don't see why
a few lines change is "decidedly nontrivial".
It's not just a few lines change. Newbus provides no mechanism for a
bus to attach at two different bus passes.
??
Please see:
https://github.com/strejda/tegra/blob/master/sys/arm/nvidia/drm2/tegra_host1x.c#L639

and
https://github.com/strejda/tegra/blob/master/sys/arm/nvidia/drm2/tegra_host1x.c#L451
Post by Nathan Whitehorn
Post by Nathan Whitehorn
Post by Michal Meloun
Post by Nathan Whitehorn
Or: you could skip all of that and use the fully functional mechanism
that already exists.
The problem is, at least from my point of view, that we don't have it.
The actual MIPS code doesn't work on ARM for numerous reasons.
This is
first one ->
https://svnweb.freebsd.org/base/head/sys/powerpc/powerpc/nexus.c?revision=297000&view=markup#l186
The pre-r301453 ARM INTRNG doesn't support detached interrupts.
And r301453 doesn't changed nothing about this.
I'm happy to write whatever code is missing. The ARM implementation of
ofw_bus_map_intr() does seem fairly braindead and should be replaced.
- The perceived advantage of the new API seems to be that the mapping
information (interrupt parent, etc.) ends up in a struct resource
instead of in a centralized mapping table
True. And, in optimal case, also in RLE. See [1].
Your code also has a centralized table (static struct intr_irqsrc
*irq_sources[NIRQ];), basically the same one, so I'm actually really
confused on this point. The only actual difference seems to be that
the firmware interrupt descriptor is stored in the RLE at
bus_alloc_resource() time instead of in the table at bus enumeration
time and that the new code requires some extra bus methods.
The irq_sources holds only real interrupts (all interrupts pin from all
interrupts controllers). Your effort is move all 'interrupt description
data' out of INTRNG code -
to RLE (at bus enumeration time) (see [1]) and to struct resource (at
bus_alloc_resource() time). The new bus method is only temporary
shortcut to bypass 11 release.

But, to be fair, I was hoping it might be necessary.

[1] This is last missing step in whole patch series and, in optimal
case, it need new filed(s) in RLE structure. So we chose to postpone
discussion about this to time after 11 release.
Post by Nathan Whitehorn
Post by Nathan Whitehorn
- Additionally, it gives you a better shot at having the PIC online
before the PIC's interrupts are parsed (which is not required, but nice).
Nope, we simply fount that detached pics is very dangerous and not
needed. But, if you love it, it can be added as MD extension.
Dangerous how? I don't "love it"; it's an unfortunate necessity.
Imho, some drivers may rely on correct return code from
bus_alloc_resource() or from bus_setup_intr().
For example to detect if interrupt exist, or if given interrupt
configuration is supported.
Post by Nathan Whitehorn
Post by Nathan Whitehorn
- Beyond these aesthetic points, there are no concrete examples of new
functionality added by this API, aside from some minor implementation
bugs of the old one on ARM that are easily fixed.
Really? Or you just ignore it?
Which ones? I've been asking for a week and a half now and you have
responded with some hypothetical examples, all of which either (a) do
not seem to occur in the real world and (b) were trivially
implementable with the existing code.
Post by Nathan Whitehorn
- There are, conversely, a number of concrete cases where this new API
would not be able to do the right thing. Some of these can be worked
around or fixed with significant restructuring in the MI parts of the
kernel.
And i offer you a fix, without "significant restructuring in the MI
parts of the kernel". Unfortunately, you did not want to accept anything
other than the old API.
For one of them, partially. For the others, not so much. For example,
I have no idea how to implement PCI interrupts (PCIB_ROUTE_INTERRUPT)
in this framework. I am perfectly happy to accept a new API. What I am
not perfectly happy to accept is an API that makes it impossible for
me to support the platforms that I need to support and that,
simultaneously, doesn't even provide any new capabilities on other
platforms.
From one of previous mail:
<quote>
But again, all what I want in this area is (for me) simple change of
your ofw_bus_map_intr() method to something like:

enum intr_map_data_type {
INTR_MAP_DATA_OFW,
INTR_MAP_DATA_GPIO,
...
};

int
bus_map_intr(device_t dev, enum intr_map_data_type type, uintptr_t
pic_id, void *config_data, int config_size)
</quote>
Post by Nathan Whitehorn
The current API is certainly a bit of a hack and I would be pleased to
see it go; but the replacement needs to be more functional and more
robust. As far as I can tell, I literally *cannot* move PowerPC over
to this architecture.
Yes, at this time, I agree. If you can accept enhancement of
(owf_)bus_map_intr() then we can move to next, significantly less
hackish, state.
Post by Nathan Whitehorn
The ideal API would be one in which the resource enumeration code
could assign, instead of numbers, some structured information that
contains the full firmware specifier (string parent + interrupt pin
for ACPI, interrupt parent phandle + specifier for device tree, bare
numbers for simple systems).
Yes, exactly. I am happy that at least we agree on something. The only
difference is that we want triplet <OFW, interrupt parent phandle,
specifier> or <ACPI, string parent, interrupt pin>
Post by Nathan Whitehorn
That seemed, and seems, too invasive. I think we agree on this and
have chosen to approximate that API in two different ways.
I still think that we found way how we can realize the above idea in
non-invasive way.
From my point of view the solution is:
Put/attach the above triplet to RLE and then copy it (within
bus_resource_attach() to struct resource.
But yes, I understand that I am not able to explain clearly enough.
Post by Nathan Whitehorn
When we designed the interrupt mapping code, we evaluated a large
number of possible approaches. Something very much like this was one
of them. It was rejected as being too fragile (it requires resource
allocation and enumeration to be coupled) and to have too many corner
cases (the PCI MSI and LSI APIs, for instance). The interrupt mapping
stuff seemed at the time, and so far still seems, like the least-bad
approximation to the ideal case: it is essentially that ideal API, in
that it happens at bus enumeration and involves just assigning the
firmware specifiers, but with lookup keys to that information stuffed
into the 32-bit numbers that the rest of the system uses.
The raw PCI (or MSIX) case is relatively simple. The PCI domain uses
raw numbers and it's host-to-pci bridge job to translate raw numbers
from PCI domain to (for example) OFW based resource.
At this point, I'm still not sure about MSI...

Michal
Post by Nathan Whitehorn
Post by Nathan Whitehorn
- If we have both, the interrupt handling code in the MI parts of the
kernel will bifurcate. This patch alone has added a bunch of #ifdef to
kern/subr_bus.c, a new file to kern/, and lots of #ifdef to dev/ofw.
This is going to be really hard to maintain if we need both.
All those #ifdefs has been added as safeguards for 11/stable and they
can be removed immediately.
They should not be and I will request immediate reversion and appeal
-Nathan
Michal
Post by Nathan Whitehorn
Is this all correct?
If so, can we please back this out until this discussion is complete?
I'm asking this formally at this point, under the Committer's Guide
section about reversion requests.
-Nathan
_______________________________________________
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
_______________________________________________
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
Nathan Whitehorn
2016-07-31 06:11:29 UTC
Permalink
On 07/30/16 11:54, Michal Meloun wrote:


[snip]
Post by Michal Meloun
Post by Nathan Whitehorn
Post by Michal Meloun
Post by Nathan Whitehorn
Post by Michal Meloun
Post by Nathan Whitehorn
Or: you could skip all of that and use the fully functional mechanism
that already exists.
The problem is, at least from my point of view, that we don't have it.
The actual MIPS code doesn't work on ARM for numerous reasons.
This is
first one ->
https://svnweb.freebsd.org/base/head/sys/powerpc/powerpc/nexus.c?revision=297000&view=markup#l186
The pre-r301453 ARM INTRNG doesn't support detached interrupts.
And r301453 doesn't changed nothing about this.
I'm happy to write whatever code is missing. The ARM implementation of
ofw_bus_map_intr() does seem fairly braindead and should be replaced.
- The perceived advantage of the new API seems to be that the mapping
information (interrupt parent, etc.) ends up in a struct resource
instead of in a centralized mapping table
True. And, in optimal case, also in RLE. See [1].
Your code also has a centralized table (static struct intr_irqsrc
*irq_sources[NIRQ];), basically the same one, so I'm actually really
confused on this point. The only actual difference seems to be that
the firmware interrupt descriptor is stored in the RLE at
bus_alloc_resource() time instead of in the table at bus enumeration
time and that the new code requires some extra bus methods.
The irq_sources holds only real interrupts (all interrupts pin from all
interrupts controllers). Your effort is move all 'interrupt description
data' out of INTRNG code -
to RLE (at bus enumeration time) (see [1]) and to struct resource (at
bus_alloc_resource() time). The new bus method is only temporary
shortcut to bypass 11 release.
But, to be fair, I was hoping it might be necessary.
[1] This is last missing step in whole patch series and, in optimal
case, it need new filed(s) in RLE structure. So we chose to postpone
discussion about this to time after 11 release.
That's a nice idea. I think it will require a big amount of API
retooling, though, since we don't use RLEs consistently everywhere in
the tree (the PCI code, for example).
Post by Michal Meloun
Post by Nathan Whitehorn
Post by Michal Meloun
Post by Nathan Whitehorn
- Additionally, it gives you a better shot at having the PIC online
before the PIC's interrupts are parsed (which is not required, but nice).
Nope, we simply fount that detached pics is very dangerous and not
needed. But, if you love it, it can be added as MD extension.
Dangerous how? I don't "love it"; it's an unfortunate necessity.
Imho, some drivers may rely on correct return code from
bus_alloc_resource() or from bus_setup_intr().
For example to detect if interrupt exist, or if given interrupt
configuration is supported.
These can be decoupled. The thing you need for sure is to be able to
assign resources before the PIC attaches. I think you've convinced me
that with some care in bus passes, it is possible to postpone the need
for bus_alloc_resource() and bus_setup_intr() to succeed until after the
PIC[s] have attached in all cases. That will require work, but it's in
principle possible.
Post by Michal Meloun
Post by Nathan Whitehorn
Post by Michal Meloun
Post by Nathan Whitehorn
- Beyond these aesthetic points, there are no concrete examples of new
functionality added by this API, aside from some minor implementation
bugs of the old one on ARM that are easily fixed.
Really? Or you just ignore it?
Which ones? I've been asking for a week and a half now and you have
responded with some hypothetical examples, all of which either (a) do
not seem to occur in the real world and (b) were trivially
implementable with the existing code.
Post by Michal Meloun
Post by Nathan Whitehorn
- There are, conversely, a number of concrete cases where this new API
would not be able to do the right thing. Some of these can be worked
around or fixed with significant restructuring in the MI parts of the
kernel.
And i offer you a fix, without "significant restructuring in the MI
parts of the kernel". Unfortunately, you did not want to accept anything
other than the old API.
For one of them, partially. For the others, not so much. For example,
I have no idea how to implement PCI interrupts (PCIB_ROUTE_INTERRUPT)
in this framework. I am perfectly happy to accept a new API. What I am
not perfectly happy to accept is an API that makes it impossible for
me to support the platforms that I need to support and that,
simultaneously, doesn't even provide any new capabilities on other
platforms.
<quote>
But again, all what I want in this area is (for me) simple change of
enum intr_map_data_type {
INTR_MAP_DATA_OFW,
INTR_MAP_DATA_GPIO,
...
};
int
bus_map_intr(device_t dev, enum intr_map_data_type type, uintptr_t
pic_id, void *config_data, int config_size)
</quote>
Post by Nathan Whitehorn
The current API is certainly a bit of a hack and I would be pleased to
see it go; but the replacement needs to be more functional and more
robust. As far as I can tell, I literally *cannot* move PowerPC over
to this architecture.
Yes, at this time, I agree. If you can accept enhancement of
(owf_)bus_map_intr() then we can move to next, significantly less
hackish, state.
OK, sure. I wrote some code this afternoon (attached) to sketch this. It
does not compile or run or anything useful, but it should illustrate the
important parts of what I think is an API that should work for everyone.
I'm happy to finish it if it looks reasonable -- I may in any case just
to replace PowerPC's increasingly teetering intr_machdep.c.

The OF part is essentially unchanged from how it currently works:
there's a method that you pass the interrupt specifier and interrupt
parent to, and it spits back an IRQ that means that combination of
things in perpetuity. OFW_BUS_MAP_INTR() in nexus.c, with its current
API, can be implemented in terms of it in one line. Whenever the
relevant PIC attaches, it is told to use the given IRQ to refer to that
opaque bytestring.

I've added a set of equivalent functions for ACPI that take the contents
of an ACPI interrupt specifier and do the same things. It tries to have
the IRQ be human-meaningful, if possible, by matching the ACPI interrupt
number. Having separate functions seemed a little cleaner than exposing
the enums and unions as part of the KPI, but I'm not wedded to it --
this is just an example.

PICs register (and deregister) themselves with either an OF phandle or
an ACPI resource string or (god help us) both as needed. The PICs have
corresponding methods for interpreting various kinds of interrupt
specifiers.

Whether it allows bus_alloc_resource(), bus_setup_intr(), etc. to
succeed before the PICs are attached is gated on an #ifdef. That can
probably be off by default on ARM. A PowerPC version of this code would
have to keep it on until various bus drivers are fixed.

Here's the general flow:
- Parent bus enumerates children, maps IRQs as needed, adds to resource
list. The struct resources involved aren't special (just a single
number), so PCIB_ROUTE_INTERRUPT(), MSIs, etc. are trivial to implement
(and already are implemented, in general, for OF/FDT drivers).
- bus_alloc_resource() does nothing special
- bus_setup_intr() calls into the PIC and does what is needed, as in
r301453 (to within that #ifdef)

This should keep all the best pieces of everything:
- ACPI and OF are supported together, and easy to extend for other types
of mapping technologies without breaking either KBI or KPI (just add new
functions)
- No APIs change for existing Open Firmware code
- The support code can live entirely in machine-dependent directories,
with no code changes at all required in kern/ or dev/ofw/
- bus_setup_intr() and friends behave as in r301453 and succeed or fail
for real
- PCI code is not an issue
- No disconnected interrupt state maintained in mapping table (unless
the transitional #ifdef is on) and the mapping table content is only
larger than r301453 by having a copy of the original interrupt specifier.

If and when we manage to fix the kernel APIs to allow non-scalar
interrupt specifiers, this should also be easy to gradually transmogrify
to support that case since there exists a 1:1 mapping of scalar IRQs to
rich data and the control flow would be identical: interrupt specifiers
are read at enumeration time and a resulting handle -- struct resource
or scalar IRQ -- used afterward to refer to it.

Some more comments below.
Post by Michal Meloun
Post by Nathan Whitehorn
The ideal API would be one in which the resource enumeration code
could assign, instead of numbers, some structured information that
contains the full firmware specifier (string parent + interrupt pin
for ACPI, interrupt parent phandle + specifier for device tree, bare
numbers for simple systems).
Yes, exactly. I am happy that at least we agree on something. The only
difference is that we want triplet <OFW, interrupt parent phandle,
specifier> or <ACPI, string parent, interrupt pin>
Which makes perfect sense and is a good idea.
Post by Michal Meloun
Post by Nathan Whitehorn
That seemed, and seems, too invasive. I think we agree on this and
have chosen to approximate that API in two different ways.
I still think that we found way how we can realize the above idea in
non-invasive way.
Put/attach the above triplet to RLE and then copy it (within
bus_resource_attach() to struct resource.
But yes, I understand that I am not able to explain clearly enough.
I think we were just talking past eachother somehow.

Anyway, my concern about putting this in struct resource * at
bus_alloc_resource() time is basically that there are a whole bunch of
cases in which (a) it's hard to reverse engineer which interrupt from
the device tree (or whatever) was meant to correspond to a particular
arbitrary IRQ in an allocation request unless you keep logs and (b)
there are a bunch of cases, mostly involving PCI, where interrupt
resource allocation doesn't proceed through struct resource * at all, so
you pretty much have to keep logs. Once you start keeping logs of which
IRQ corresponds to which specifier at the device level, you might as
well just do it once in a centralized place.
Post by Michal Meloun
Post by Nathan Whitehorn
When we designed the interrupt mapping code, we evaluated a large
number of possible approaches. Something very much like this was one
of them. It was rejected as being too fragile (it requires resource
allocation and enumeration to be coupled) and to have too many corner
cases (the PCI MSI and LSI APIs, for instance). The interrupt mapping
stuff seemed at the time, and so far still seems, like the least-bad
approximation to the ideal case: it is essentially that ideal API, in
that it happens at bus enumeration and involves just assigning the
firmware specifiers, but with lookup keys to that information stuffed
into the 32-bit numbers that the rest of the system uses.
The raw PCI (or MSIX) case is relatively simple. The PCI domain uses
raw numbers and it's host-to-pci bridge job to translate raw numbers
from PCI domain to (for example) OFW based resource.
At this point, I'm still not sure about MSI...
Right, you can keep a big table in the PCI driver that maps whatever
IRQs you are handing out to some richer interrupt specifiers and consult
that when you get a bus_alloc_resource() request. Which is basically the
approach I'm advocating, except that the table is centralized, which
reduces code duplication and fixes a number of weird corner cases where
children assign extra interrupts to themselves, etc. and the bus parent
has no ability to do something sensible.

Please take a look at the attached interface and see if it (or something
like it) meets your needs. It meets all of mine, at least, and I think
fixes all the things you were concerned about.
-Nathan
Michal Meloun
2016-07-31 15:40:18 UTC
Permalink
Dne 31.07.2016 v 8:11 Nathan Whitehorn napsal(a):
[snip]
Post by Nathan Whitehorn
Post by Nathan Whitehorn
The current API is certainly a bit of a hack and I would be pleased to
Post by Nathan Whitehorn
see it go; but the replacement needs to be more functional and more
robust. As far as I can tell, I literally *cannot* move PowerPC over
to this architecture.
Yes, at this time, I agree. If you can accept enhancement of
(owf_)bus_map_intr() then we can move to next, significantly less
hackish, state.
OK, sure. I wrote some code this afternoon (attached) to sketch this.
It does not compile or run or anything useful, but it should
illustrate the important parts of what I think is an API that should
work for everyone. I'm happy to finish it if it looks reasonable -- I
may in any case just to replace PowerPC's increasingly teetering
intr_machdep.c.
there's a method that you pass the interrupt specifier and interrupt
parent to, and it spits back an IRQ that means that combination of
things in perpetuity. OFW_BUS_MAP_INTR() in nexus.c, with its current
API, can be implemented in terms of it in one line. Whenever the
relevant PIC attaches, it is told to use the given IRQ to refer to
that opaque bytestring.
I've added a set of equivalent functions for ACPI that take the
contents of an ACPI interrupt specifier and do the same things. It
tries to have the IRQ be human-meaningful, if possible, by matching
the ACPI interrupt number. Having separate functions seemed a little
cleaner than exposing the enums and unions as part of the KPI, but I'm
not wedded to it -- this is just an example.
PICs register (and deregister) themselves with either an OF phandle or
an ACPI resource string or (god help us) both as needed. The PICs have
corresponding methods for interpreting various kinds of interrupt
specifiers.
Whether it allows bus_alloc_resource(), bus_setup_intr(), etc. to
succeed before the PICs are attached is gated on an #ifdef. That can
probably be off by default on ARM. A PowerPC version of this code
would have to keep it on until various bus drivers are fixed.
- Parent bus enumerates children, maps IRQs as needed, adds to
resource list. The struct resources involved aren't special (just a
single number), so PCIB_ROUTE_INTERRUPT(), MSIs, etc. are trivial to
implement (and already are implemented, in general, for OF/FDT drivers).
- bus_alloc_resource() does nothing special
- bus_setup_intr() calls into the PIC and does what is needed, as in
r301453 (to within that #ifdef)
- ACPI and OF are supported together, and easy to extend for other
types of mapping technologies without breaking either KBI or KPI (just
add new functions)
- No APIs change for existing Open Firmware code
- The support code can live entirely in machine-dependent directories,
with no code changes at all required in kern/ or dev/ofw/
- bus_setup_intr() and friends behave as in r301453 and succeed or
fail for real
- PCI code is not an issue
- No disconnected interrupt state maintained in mapping table (unless
the transitional #ifdef is on) and the mapping table content is only
larger than r301453 by having a copy of the original interrupt specifier.
If and when we manage to fix the kernel APIs to allow non-scalar
interrupt specifiers, this should also be easy to gradually
transmogrify to support that case since there exists a 1:1 mapping of
interrupt specifiers are read at enumeration time and a resulting
handle -- struct resource or scalar IRQ -- used afterward to refer to it.
Nice, nice, i think that we are very close at this point.
The problem with the above workflow is that maps IRQ function is called
at parent bus enumeration time, generally at BUS_PASS_BUS pass. And at
this time, PICs are not exist.
Also, 'static struct intr_irqsrc *irq_sources[NIRQ];' is not a mapping
table, it doesn't provide this functionality.
But yes, i understand that mapping table is important and necessary for you.

So, if i can extend our concept a little bit, then:
We can add new table (map_data_tbl for example) that holds a copy of
the original interrupt specifier, index to irq_sources table and
probably some flags.
And we can modify the above workflow to:
- Parent bus enumerates children, allocates entries from map_data_tbl,
adds to resource list. The struct resources involved aren't special
(just a single number), so PCIB_ROUTE_INTERRUPT(), MSIs, etc. are
trivial to implement (and already are implemented, in general, for
OF/FDT drivers). Index to map_data_tbl is used as resource ID.
- bus_alloc_resource() sets 'ALLOCATED' in map_data_tbl flags field,
*maps IRQs* and stores result in 'index' field.
- bus_setup_intr() sets 'SETUP_DONE' in map_data_tbl flags field, calls
into the PIC and does what is needed, as in r301453. (to within that
#ifdef)

And, in PPC case, newly attached PIC scans whole map_data_tbl table,
finds his entries and makes what needs.

Does that make sense?
I hope that postponing of map IRQ call is not a problem for PPC,
everything else looks easy.
Moreover, this additional level of indirection also solves all my
'hypothetical corner cases' with N:1 mappings (between original
interrupt specifier and real interrupt number).
Post by Nathan Whitehorn
Some more comments below.
Post by Nathan Whitehorn
Post by Nathan Whitehorn
The ideal API would be one in which the resource enumeration code
could assign, instead of numbers, some structured information that
contains the full firmware specifier (string parent + interrupt pin
for ACPI, interrupt parent phandle + specifier for device tree, bare
numbers for simple systems).
Yes, exactly. I am happy that at least we agree on something. The only
difference is that we want triplet <OFW, interrupt parent phandle,
specifier> or <ACPI, string parent, interrupt pin>
Which makes perfect sense and is a good idea.
Post by Nathan Whitehorn
Post by Nathan Whitehorn
That seemed, and seems, too invasive. I think we agree on this and
have chosen to approximate that API in two different ways.
I still think that we found way how we can realize the above idea in
non-invasive way.
Put/attach the above triplet to RLE and then copy it (within
bus_resource_attach() to struct resource.
But yes, I understand that I am not able to explain clearly enough.
I think we were just talking past eachother somehow.
Anyway, my concern about putting this in struct resource * at
bus_alloc_resource() time is basically that there are a whole bunch of
cases in which (a) it's hard to reverse engineer which interrupt from
the device tree (or whatever) was meant to correspond to a particular
arbitrary IRQ in an allocation request unless you keep logs and (b)
there are a bunch of cases, mostly involving PCI, where interrupt
resource allocation doesn't proceed through struct resource * at all,
so you pretty much have to keep logs. Once you start keeping logs of
which IRQ corresponds to which specifier at the device level, you
might as well just do it once in a centralized place.
Oki, i agree.
Post by Nathan Whitehorn
Post by Nathan Whitehorn
Post by Nathan Whitehorn
When we designed the interrupt mapping code, we evaluated a large
number of possible approaches. Something very much like this was one
of them. It was rejected as being too fragile (it requires resource
allocation and enumeration to be coupled) and to have too many corner
cases (the PCI MSI and LSI APIs, for instance). The interrupt mapping
stuff seemed at the time, and so far still seems, like the least-bad
approximation to the ideal case: it is essentially that ideal API, in
that it happens at bus enumeration and involves just assigning the
firmware specifiers, but with lookup keys to that information stuffed
into the 32-bit numbers that the rest of the system uses.
The raw PCI (or MSIX) case is relatively simple. The PCI domain uses
raw numbers and it's host-to-pci bridge job to translate raw numbers
from PCI domain to (for example) OFW based resource.
At this point, I'm still not sure about MSI...
Right, you can keep a big table in the PCI driver that maps whatever
IRQs you are handing out to some richer interrupt specifiers and
consult that when you get a bus_alloc_resource() request. Which is
basically the approach I'm advocating, except that the table is
centralized, which reduces code duplication and fixes a number of
weird corner cases where children assign extra interrupts to
themselves, etc. and the bus parent has no ability to do something
sensible.
Please take a look at the attached interface and see if it (or
something like it) meets your needs. It meets all of mine, at least,
and I think fixes all the things you were concerned about.
Can you, please, give me also example for GPIO (these are identified by
<device_t, pin number> pair).
Or, in other words, GPIO needs his own sets of functions?

And thanks for your effort and patience,
Michal
Nathan Whitehorn
2016-07-31 18:43:23 UTC
Permalink
[snip]
Post by Nathan Whitehorn
Post by Nathan Whitehorn
The current API is certainly a bit of a hack and I would be pleased to
Post by Nathan Whitehorn
see it go; but the replacement needs to be more functional and more
robust. As far as I can tell, I literally *cannot* move PowerPC over
to this architecture.
Yes, at this time, I agree. If you can accept enhancement of
(owf_)bus_map_intr() then we can move to next, significantly less
hackish, state.
OK, sure. I wrote some code this afternoon (attached) to sketch this.
It does not compile or run or anything useful, but it should
illustrate the important parts of what I think is an API that should
work for everyone. I'm happy to finish it if it looks reasonable -- I
may in any case just to replace PowerPC's increasingly teetering
intr_machdep.c.
there's a method that you pass the interrupt specifier and interrupt
parent to, and it spits back an IRQ that means that combination of
things in perpetuity. OFW_BUS_MAP_INTR() in nexus.c, with its current
API, can be implemented in terms of it in one line. Whenever the
relevant PIC attaches, it is told to use the given IRQ to refer to
that opaque bytestring.
I've added a set of equivalent functions for ACPI that take the
contents of an ACPI interrupt specifier and do the same things. It
tries to have the IRQ be human-meaningful, if possible, by matching
the ACPI interrupt number. Having separate functions seemed a little
cleaner than exposing the enums and unions as part of the KPI, but I'm
not wedded to it -- this is just an example.
PICs register (and deregister) themselves with either an OF phandle or
an ACPI resource string or (god help us) both as needed. The PICs have
corresponding methods for interpreting various kinds of interrupt
specifiers.
Whether it allows bus_alloc_resource(), bus_setup_intr(), etc. to
succeed before the PICs are attached is gated on an #ifdef. That can
probably be off by default on ARM. A PowerPC version of this code
would have to keep it on until various bus drivers are fixed.
- Parent bus enumerates children, maps IRQs as needed, adds to
resource list. The struct resources involved aren't special (just a
single number), so PCIB_ROUTE_INTERRUPT(), MSIs, etc. are trivial to
implement (and already are implemented, in general, for OF/FDT drivers).
- bus_alloc_resource() does nothing special
- bus_setup_intr() calls into the PIC and does what is needed, as in
r301453 (to within that #ifdef)
- ACPI and OF are supported together, and easy to extend for other
types of mapping technologies without breaking either KBI or KPI (just
add new functions)
- No APIs change for existing Open Firmware code
- The support code can live entirely in machine-dependent directories,
with no code changes at all required in kern/ or dev/ofw/
- bus_setup_intr() and friends behave as in r301453 and succeed or
fail for real
- PCI code is not an issue
- No disconnected interrupt state maintained in mapping table (unless
the transitional #ifdef is on) and the mapping table content is only
larger than r301453 by having a copy of the original interrupt specifier.
If and when we manage to fix the kernel APIs to allow non-scalar
interrupt specifiers, this should also be easy to gradually
transmogrify to support that case since there exists a 1:1 mapping of
interrupt specifiers are read at enumeration time and a resulting
handle -- struct resource or scalar IRQ -- used afterward to refer to it.
Nice, nice, i think that we are very close at this point.
The problem with the above workflow is that maps IRQ function is called
at parent bus enumeration time, generally at BUS_PASS_BUS pass. And at
this time, PICs are not exist.
Also, 'static struct intr_irqsrc *irq_sources[NIRQ];' is not a mapping
table, it doesn't provide this functionality.
But yes, i understand that mapping table is important and necessary for you.
We can add new table (map_data_tbl for example) that holds a copy of
the original interrupt specifier, index to irq_sources table and
probably some flags.
- Parent bus enumerates children, allocates entries from map_data_tbl,
adds to resource list. The struct resources involved aren't special
(just a single number), so PCIB_ROUTE_INTERRUPT(), MSIs, etc. are
trivial to implement (and already are implemented, in general, for
OF/FDT drivers). Index to map_data_tbl is used as resource ID.
- bus_alloc_resource() sets 'ALLOCATED' in map_data_tbl flags field,
*maps IRQs* and stores result in 'index' field.
- bus_setup_intr() sets 'SETUP_DONE' in map_data_tbl flags field, calls
into the PIC and does what is needed, as in r301453. (to within that
#ifdef)
And, in PPC case, newly attached PIC scans whole map_data_tbl table,
finds his entries and makes what needs.
Does that make sense?
I hope that postponing of map IRQ call is not a problem for PPC,
everything else looks easy.
Moreover, this additional level of indirection also solves all my
'hypothetical corner cases' with N:1 mappings (between original
interrupt specifier and real interrupt number).
Yes, I think we are converging.

My qualm about bus_alloc_resource() is twofold:
1. Traditionally, with interrupts, bus_alloc_resource() has no side
effects and I'm not sure it propagates cleanly down the tree in all cases.
2. T. , x n nn b
n /here are a few bus APIs (bus_config_intr() comes to mind) that use
raw IRQ IDs and, so far as I know, can be, and sometimes are called
before bus_alloc_resource(). If the PIC doesn't know about the IRQ yet
when bus_config_intr() (etc.) is called, things will just break.

So you would need to make sure that any bus method handling a resource
ID causes it to be mapped on the PIC at that time. It's OK if that
happens in bus_alloc_resource() so long as bus_config_intr(),
bus_setup_intr(), etc. cause that to happen if it hasn't happened yet --
I don't care when these calls are made to the PIC driver so long as
*what* calls will be made is set at enumeration time.

I am also totally fine with having, on ARM, bus_config_intr(),
bus_setup_intr() etc. fail if the PIC hasn't attached yet (On PowerPC,
we can't do that yet, but after this conversation, I regard that as a
bug and would fix that later), as well as delaying setup on the PIC to
the first time any bus driver actually tries to *use* the interrupt
(alloc_resource(), setup_intr(), config_intr(), whatever) rather than
when the interrupt is originally allocated.

------------ The following is a large parenthesis -------------------

One other possible route here that would also work well would be to make
ofwbus.c MD (it's a trivial piece of code anyway, so we don't gain a lot
by sharing it) and implement ofw_bus_map_intr() locally as an ofwbus bus
method. Then you could have the mapping table stored in ofwbus's softc
-- the API was designed for this initially. You would need MD extensions
for doing PIC registration there (which is fine), but that segregates
all the OFW-specific information into OFW-specific code and would let
the bus methods and the OFW interrupt mapping table interact cleanly in
the same place. This still preserves the pre-r301453 API, makes PowerPC
work, and maybe address's skra@'s concern about extensibility and
letting core interrupt code know about FDT (or ACPI). I'd be happy to
mock this up as well if you think it's a good approach.

So you would have something in arm/arm/ofwbus.c like (pseudo-code,
missing unlocks, etc.):

struct ofwbus_softc {
(list of PICs)
(list of IRQ mapping from # to specifier, iparent, and device_t)
}

static int ofw_bus_map_if_unmapped(struct ofwbus_softc *sc, int irq)
{
struct ofw_bus_irq_mapping *i;
struct ofw_bus_pic *j;
int err = 0;
mtx_lock(sc->ofw_bus_irqmap_lock);
LIST_FOREACH(i, sc->irq_mappings) {
if (i->irq == irq)
break;
}

if (i == NULL) return (ENXIO); /* PANIC? */

if (i->dev != NULL) return (0); /* Mapped already */

LIST_FOREACH(j, sc->ofw_bus_registered_pics) {
if (j->phandle == i->phandle) {
arm_intr_set_pic(irq, j->dev);
err = PIC_SET_OFW_ISPEC(j->dev, i->ispec, i->icells);
return (err);
}
}

return (ENXIO); /* No matching PIC. Panic? */
}

static int ofw_bus_setup_intr(device_t dev, device_t child, int irq,
blah...)
{

int err;

/* This section at the beginning of anything
allocating/using/modifying interrupts */
err = ofw_bus_map_if_unmapped(sc, irq);
if (err != 0) /* Explode if the PIC isn't online yet
return (err);

bus_setup_intr(dev, child, irq, blah...); /* onward to nexus */
}

This has the following features:
- Existing OFW API and semantics unchanged
- As such, PowerPC, PCI, etc. work fine with no changes
- Details encapsulated in MD code, so individual platforms can implement
this however they like
- arm/arm/intr.c (or whatever) only needs a method to allocate a fresh
interrupt, with no state, and anoter to set the device_t for an
interrupt sometime later.
- The internal table in the platform interrupt code has no knowledge of
any mappings whatsoever except having the appropriate device_t for the
PIC stored with the interrupt vector.
- Device tree bits handled purely in device tree code
- No action need be taken on any mapping until the interrupt is actually
allocated/set up, like r301453
- Easy to add more mapping mechanisms (e.g. ACPI) by having similar
enumeration-mechanism-specific code in the root bus for that mapping
mechanism.

-------------- End parenthesis -------------------------------
`/
Post by Nathan Whitehorn
Some more comments below.
Post by Nathan Whitehorn
Post by Nathan Whitehorn
The ideal API would be one in which the resource enumeration code
could assign, instead of numbers, some structured information that
contains the full firmware specifier (string parent + interrupt pin
for ACPI, interrupt parent phandle + specifier for device tree, bare
numbers for simple systems).
Yes, exactly. I am happy that at least we agree on something. The only
difference is that we want triplet <OFW, interrupt parent phandle,
specifier> or <ACPI, string parent, interrupt pin>
Which makes perfect sense and is a good idea.
Post by Nathan Whitehorn
Post by Nathan Whitehorn
That seemed, and seems, too invasive. I think we agree on this and
have chosen to approximate that API in two different ways.
I still think that we found way how we can realize the above idea in
non-invasive way.
Put/attach the above triplet to RLE and then copy it (within
bus_resource_attach() to struct resource.
But yes, I understand that I am not able to explain clearly enough.
I think we were just talking past eachother somehow.
Anyway, my concern about putting this in struct resource * at
bus_alloc_resource() time is basically that there are a whole bunch of
cases in which (a) it's hard to reverse engineer which interrupt from
the device tree (or whatever) was meant to correspond to a particular
arbitrary IRQ in an allocation request unless you keep logs and (b)
there are a bunch of cases, mostly involving PCI, where interrupt
resource allocation doesn't proceed through struct resource * at all,
so you pretty much have to keep logs. Once you start keeping logs of
which IRQ corresponds to which specifier at the device level, you
might as well just do it once in a centralized place.
Oki, i agree. o
Great, I'm glad we're on the same page.
Post by Nathan Whitehorn
Post by Nathan Whitehorn
gd `
Post by Nathan Whitehorn
When we designed the interrupt mapping code, we evaluated a large
number of possible approaches. Something very much like this was one
of them. It was rejected as being too fragile (it requires resource
allocation and enumeration to be coupled) and to have too many corner
cases (the PCI MSI and LSI APIs, for instance). The interrupt mapping
stuff seemed at the time, and so far still seems, like the least-bad
approximation to the ideal case: it is essentially that ideal API, in
that it happens at bus enumeration and involves just assigning the
firmware specifiers, but with lookup keys to that information stuffed
into the 32-bit numbers that the rest of the system uses.
The raw PCI (or MSIX) case is relatively simple. The PCI domain uses
raw numbers and it's host-to-pci bridge job to translate raw numbers
from PCI domain to (for example) OFW based resource.
At this point, I'm still not sure about MSI...
Right, you can keep a big table in the PCI driver that maps whatever
IRQs you are handing out to some richer interrupt specifiers and
consult that when you get a bus_alloc_resource() request. Which is
basically the approach I'm advocating, except that the table is
centralized, which reduces code duplication and fixes a number of
weird corner cases where children assign extra interrupts to
themselves, etc. and the bus parent has no ability to do something
sensible.
Please take a look at the attached interface and see if it (or
something like it) meets your needs. It meets all of mine, at least,
and I think fixes all the things you were concerned about.
Can you, please, give me also example for GPIO (these are identified by
<device_t, pin number> pair).
Or, in other words, GPIO needs his own sets of functions?
You have more experience with GPIOs than I do, but I could imagine a
couple different ways of doing this for GPIO interrupts not in standard
interrupts properties:
1. You have an extra interrupt class (like r301453) called "GPIO" or
something that either takes a OFW/ACPI handle or a device_t and a GPIO
specifier.

2. You have some global translator function that makes an OFW/ACPI
interrupt specifier out of a GPIO property. I don't like this one.

3. You add a method to the GPIO controller (GPIO_GET_INTERRUPT_FOR_PIN
or something) that does something driver specific (e.g. checks if you
can have such an interrupt, then fabricates an OF- or ACPI-compatible
specifier and maps it) to return a usable IRQ or an error. This also
abstracts out how you want to describe the GPIO interrupt domain.

Aside from not liking #2, I don't have strong opinions. #1 and #3 aren't
mutually exclusive (#3 could be implemented in terms of #1), and #3
seems a little cleaner, so I would have a weak preference for #3 as the
canonical mechanism. But anything is OK, really.
And thanks for your effort and patience,
Of course -- apologies for being somewhat strident. Thanks for taking
the time to discuss this!
-Nathan
Michal
Nathan Whitehorn
2016-08-02 04:00:37 UTC
Permalink
Post by Nathan Whitehorn
[snip]
Post by Nathan Whitehorn
Post by Nathan Whitehorn
The current API is certainly a bit of a hack and I would be pleased to
Post by Nathan Whitehorn
see it go; but the replacement needs to be more functional and more
robust. As far as I can tell, I literally *cannot* move PowerPC over
to this architecture.
Yes, at this time, I agree. If you can accept enhancement of
(owf_)bus_map_intr() then we can move to next, significantly less
hackish, state.
OK, sure. I wrote some code this afternoon (attached) to sketch this.
It does not compile or run or anything useful, but it should
illustrate the important parts of what I think is an API that should
work for everyone. I'm happy to finish it if it looks reasonable -- I
may in any case just to replace PowerPC's increasingly teetering
intr_machdep.c.
there's a method that you pass the interrupt specifier and interrupt
parent to, and it spits back an IRQ that means that combination of
things in perpetuity. OFW_BUS_MAP_INTR() in nexus.c, with its current
API, can be implemented in terms of it in one line. Whenever the
relevant PIC attaches, it is told to use the given IRQ to refer to
that opaque bytestring.
I've added a set of equivalent functions for ACPI that take the
contents of an ACPI interrupt specifier and do the same things. It
tries to have the IRQ be human-meaningful, if possible, by matching
the ACPI interrupt number. Having separate functions seemed a little
cleaner than exposing the enums and unions as part of the KPI, but I'm
not wedded to it -- this is just an example.
PICs register (and deregister) themselves with either an OF phandle or
an ACPI resource string or (god help us) both as needed. The PICs have
corresponding methods for interpreting various kinds of interrupt
specifiers.
Whether it allows bus_alloc_resource(), bus_setup_intr(), etc. to
succeed before the PICs are attached is gated on an #ifdef. That can
probably be off by default on ARM. A PowerPC version of this code
would have to keep it on until various bus drivers are fixed.
- Parent bus enumerates children, maps IRQs as needed, adds to
resource list. The struct resources involved aren't special (just a
single number), so PCIB_ROUTE_INTERRUPT(), MSIs, etc. are trivial to
implement (and already are implemented, in general, for OF/FDT drivers).
- bus_alloc_resource() does nothing special
- bus_setup_intr() calls into the PIC and does what is needed, as in
r301453 (to within that #ifdef)
- ACPI and OF are supported together, and easy to extend for other
types of mapping technologies without breaking either KBI or KPI (just
add new functions)
- No APIs change for existing Open Firmware code
- The support code can live entirely in machine-dependent directories,
with no code changes at all required in kern/ or dev/ofw/
- bus_setup_intr() and friends behave as in r301453 and succeed or
fail for real
- PCI code is not an issue
- No disconnected interrupt state maintained in mapping table (unless
the transitional #ifdef is on) and the mapping table content is only
larger than r301453 by having a copy of the original interrupt specifier.
If and when we manage to fix the kernel APIs to allow non-scalar
interrupt specifiers, this should also be easy to gradually
transmogrify to support that case since there exists a 1:1 mapping of
interrupt specifiers are read at enumeration time and a resulting
handle -- struct resource or scalar IRQ -- used afterward to refer to it.
Nice, nice, i think that we are very close at this point.
The problem with the above workflow is that maps IRQ function is called
at parent bus enumeration time, generally at BUS_PASS_BUS pass. And at
this time, PICs are not exist.
Also, 'static struct intr_irqsrc *irq_sources[NIRQ];' is not a mapping
table, it doesn't provide this functionality.
But yes, i understand that mapping table is important and necessary for you.
We can add new table (map_data_tbl for example) that holds a copy of
the original interrupt specifier, index to irq_sources table and
probably some flags.
- Parent bus enumerates children, allocates entries from map_data_tbl,
adds to resource list. The struct resources involved aren't special
(just a single number), so PCIB_ROUTE_INTERRUPT(), MSIs, etc. are
trivial to implement (and already are implemented, in general, for
OF/FDT drivers). Index to map_data_tbl is used as resource ID.
- bus_alloc_resource() sets 'ALLOCATED' in map_data_tbl flags field,
*maps IRQs* and stores result in 'index' field.
- bus_setup_intr() sets 'SETUP_DONE' in map_data_tbl flags field, calls
into the PIC and does what is needed, as in r301453. (to within that
#ifdef)
And, in PPC case, newly attached PIC scans whole map_data_tbl table,
finds his entries and makes what needs.
Does that make sense?
I hope that postponing of map IRQ call is not a problem for PPC,
everything else looks easy.
Moreover, this additional level of indirection also solves all my
'hypothetical corner cases' with N:1 mappings (between original
interrupt specifier and real interrupt number).
Yes, I think we are converging.
1. Traditionally, with interrupts, bus_alloc_resource() has no side
effects and I'm not sure it propagates cleanly down the tree in all cases.
2. T. , x n nn b
n /here are a few bus APIs (bus_config_intr() comes to mind) that
use raw IRQ IDs and, so far as I know, can be, and sometimes are
called before bus_alloc_resource(). If the PIC doesn't know about the
IRQ yet when bus_config_intr() (etc.) is called, things will just break.
So you would need to make sure that any bus method handling a resource
ID causes it to be mapped on the PIC at that time. It's OK if that
happens in bus_alloc_resource() so long as bus_config_intr(),
bus_setup_intr(), etc. cause that to happen if it hasn't happened yet
-- I don't care when these calls are made to the PIC driver so long as
*what* calls will be made is set at enumeration time.
I am also totally fine with having, on ARM, bus_config_intr(),
bus_setup_intr() etc. fail if the PIC hasn't attached yet (On PowerPC,
we can't do that yet, but after this conversation, I regard that as a
bug and would fix that later), as well as delaying setup on the PIC to
the first time any bus driver actually tries to *use* the interrupt
(alloc_resource(), setup_intr(), config_intr(), whatever) rather than
when the interrupt is originally allocated.
------------ The following is a large parenthesis -------------------
One other possible route here that would also work well would be to
make ofwbus.c MD (it's a trivial piece of code anyway, so we don't
gain a lot by sharing it) and implement ofw_bus_map_intr() locally as
an ofwbus bus method. Then you could have the mapping table stored in
ofwbus's softc -- the API was designed for this initially. You would
need MD extensions for doing PIC registration there (which is fine),
but that segregates all the OFW-specific information into OFW-specific
code and would let the bus methods and the OFW interrupt mapping table
interact cleanly in the same place. This still preserves the
concern about extensibility and letting core interrupt code know about
FDT (or ACPI). I'd be happy to mock this up as well if you think it's
a good approach.
[snip]
Post by Nathan Whitehorn
- Existing OFW API and semantics unchanged
- As such, PowerPC, PCI, etc. work fine with no changes
- Details encapsulated in MD code, so individual platforms can
implement this however they like
- arm/arm/intr.c (or whatever) only needs a method to allocate a fresh
interrupt, with no state, and anoter to set the device_t for an
interrupt sometime later.
- The internal table in the platform interrupt code has no knowledge
of any mappings whatsoever except having the appropriate device_t for
the PIC stored with the interrupt vector.
- Device tree bits handled purely in device tree code
- No action need be taken on any mapping until the interrupt is
actually allocated/set up, like r301453
- Easy to add more mapping mechanisms (e.g. ACPI) by having similar
enumeration-mechanism-specific code in the root bus for that mapping
mechanism.
-------------- End parenthesis -------------------------------
Here's an implementation of the parenthesis I wrote on an airplane this
afternoon. It should be complete, though has not been tested. The code
is short and simple (+70 lines in ofwbus.c). This preserves the
pre-r301453 API and semantics relative to drivers, which means PowerPC
and PCI work out of the box, while keeping the semantics relative to the
interrupt layer of r301453 (PIC methods only called on resource
allocation, no allocatable IRQs on unattached PICs, encapsulation of
OFW-specific code in OFW-specific bits of the tree). It turns out those
two things are compatible, somewhat to my surprise, and that makes the
result very clean. I like this approach and would be happy to move
forward with it. There are five functions of interest:

1. OFW_BUS_MAP_INTR(). This has the semantics and API it has now: you
pass an interrupt specifier and parent, you get back an IRQ. No changes.
This is the core of the normal OFW interrupt API.

2. OFW_BUS_REGISTER_PIC(device_t pic, phandle_t phandle). This is a new
function that PIC drivers are supposed to use to register control of an
interrupt domain. This replaces machine-specific code like
powerpc_register_pic() to allow the PIC table to be in a bus parent
rather than in the interrupt core.

3. PIC_MAP_OFW_INTR(device_t pic, int irq, interrupt specifier). This is
a new function that PIC drivers that know how to handle device-tree
interrupt descriptors implement (analogous to various existing ones that
vary by platform). It tells the PIC that the given abstract IRQ means
the given opaque interrupt specifier.

4. arm_allocate_irq(int suggested). This allocates a new IRQ number not
(yet) attached to a PIC, as in r301453. I've added a parameter that lets
you pass a suggested number to try in case it is possible to make it
match an interrupt pin or something for human-readability.

5. arm_set_pic_for_irq(int irq, device_t). This tells the MD interrupt
layer to associate a given PIC device_t with an IRQ. That is all the
information the MD layer ever has about the IRQ mapping.

Functions #1 and #2 are now implemented completely in ofwbus.c: there
are no callouts anywhere else and the interrupt mapping table is
maintained entirely internally to ofwbus (in its softc). In order to
implement ACPI, or some other strategy, you would implement analogs to
functions #1 and #2 that live somewhere in the bus hierarchy that is
guaranteed to be above all devices that might want that type of
interrupt (e.g. in acpi0), and some analog to #3 that PIC drivers
implementing the mapping scheme would provide.

Since the system interrupt code has no knowledge at all of interrupt
mapping, of any type, in this scheme, adding new mapping types is
trivial and can be done on a driver-by-driver basis if necessary without
changing KPI and without any other part of the system even being aware.
For example, GPIOs can use a completely different mechanism if they want
and can do setup purely in the GPIO controller driver. You could have a
method GPIO_GET_INTERRUPT_FOR_PIN() on a GPIO controller in which the
GPIO controller allocates a generic IRQ, assigns through some internal
table just in the GPIO driver, and returns to it to a consumer in some
other device driver -- without a GPIO mapping type, new bus functions,
or modifications to the platform interrupt code.

The control flow goes like this:
- Bus driver enumerates children, parses interrupts properties, calls
OFW_BUS_MAP_INTR() to get IRQs for them (as pre-r301453), adds to
interrupt list.
- ofwbus receives the OFW_BUS_MAP_INTR() call, allocates a blank
disconnected IRQ from the MD interrupt layer, and stores the mapping
from the new IRQ to the given interrupt specifier and phandle in an
internal table in ofwbus's softc.
NB: Nothing else happens here, like post-r301453. Changing this does
not change any semantics of the API pre-r301453, which means it remains
fully compatible with PCI and PowerPC. Also, like post-r301453, there is
no involvement of nexus.
- PICs attach and call OFW_BUS_REGISTER_PIC(). ofwbus receives these
messages and adds a (device_t, phandle_t) mapping to a second internal
table. Note that the interrupt layer does not need to handle PIC
registration anymore at all (except for the root PIC).
- Bus child eventually calls a function that tries to set the interrupt
up (e.g. bus_setup_intr()). That propagates up the bus hierarchy,
eventually getting to ofwbus. ofwbus notes the IRQ number, looks it up
in the table, looks up the appropriate PIC from the PIC table, then:
A) calls arm_set_pic_for_irq(irq, pic_device_t) -- this is the
interrupt layer's only interaction with the mapping code. All it deals
with is device_ts and abstract IRQ numbers.
B) calls PIC_MAP_OFW_INTR(pic, irq_number, interrupt-cells,
interrupt-specifier) to tell the PIC that the interrupt layer's IRQ
irq_number means the given specifier
C) finally, passes the call onto nexus, which will do whatever would
normally happen (unmasking the interrupt, setting handlers, etc.) in
terms only of the abstract IRQ and the device_t assigned by ofwbus.

You would implement ACPI just by doing a s/OFW/ACPI/g search-and-replace
above -- since the interrupt layer doesn't know about OFW or ACPI or
anything else, there is no need to touch it. This seems clean, simple,
compartmentalized, preserves the existing API, and should work on all of
our various hardware. PowerPC can't quite work with it yet without some
multipass foo, but, since the API is preserved, that transition can
happen gradually without KPI changes. For the same reason that it is
API-preserving, I think this code is also MFC-able.
-Nathan
Michal Meloun
2016-08-02 13:25:08 UTC
Permalink
I'm sorry for delayed response. We have serious trouble @work so I
cannot spend to much time on FreeBSB for next 2 or 3 days.
Post by Nathan Whitehorn
[snip]
Post by Nathan Whitehorn
Post by Nathan Whitehorn
The current API is certainly a bit of a hack and I would be pleased to
Post by Nathan Whitehorn
see it go; but the replacement needs to be more functional and more
robust. As far as I can tell, I literally *cannot* move PowerPC over
to this architecture.
Yes, at this time, I agree. If you can accept enhancement of
(owf_)bus_map_intr() then we can move to next, significantly less
hackish, state.
OK, sure. I wrote some code this afternoon (attached) to sketch this.
It does not compile or run or anything useful, but it should
illustrate the important parts of what I think is an API that should
work for everyone. I'm happy to finish it if it looks reasonable -- I
may in any case just to replace PowerPC's increasingly teetering
intr_machdep.c.
there's a method that you pass the interrupt specifier and interrupt
parent to, and it spits back an IRQ that means that combination of
things in perpetuity. OFW_BUS_MAP_INTR() in nexus.c, with its current
API, can be implemented in terms of it in one line. Whenever the
relevant PIC attaches, it is told to use the given IRQ to refer to
that opaque bytestring.
I've added a set of equivalent functions for ACPI that take the
contents of an ACPI interrupt specifier and do the same things. It
tries to have the IRQ be human-meaningful, if possible, by matching
the ACPI interrupt number. Having separate functions seemed a little
cleaner than exposing the enums and unions as part of the KPI, but I'm
not wedded to it -- this is just an example.
PICs register (and deregister) themselves with either an OF phandle or
an ACPI resource string or (god help us) both as needed. The PICs have
corresponding methods for interpreting various kinds of interrupt
specifiers.
Whether it allows bus_alloc_resource(), bus_setup_intr(), etc. to
succeed before the PICs are attached is gated on an #ifdef. That can
probably be off by default on ARM. A PowerPC version of this code
would have to keep it on until various bus drivers are fixed.
- Parent bus enumerates children, maps IRQs as needed, adds to
resource list. The struct resources involved aren't special (just a
single number), so PCIB_ROUTE_INTERRUPT(), MSIs, etc. are trivial to
implement (and already are implemented, in general, for OF/FDT drivers).
- bus_alloc_resource() does nothing special
- bus_setup_intr() calls into the PIC and does what is needed, as in
r301453 (to within that #ifdef)
- ACPI and OF are supported together, and easy to extend for other
types of mapping technologies without breaking either KBI or KPI (just
add new functions)
- No APIs change for existing Open Firmware code
- The support code can live entirely in machine-dependent directories,
with no code changes at all required in kern/ or dev/ofw/
- bus_setup_intr() and friends behave as in r301453 and succeed or
fail for real
- PCI code is not an issue
- No disconnected interrupt state maintained in mapping table (unless
the transitional #ifdef is on) and the mapping table content is only
larger than r301453 by having a copy of the original interrupt specifier.
If and when we manage to fix the kernel APIs to allow non-scalar
interrupt specifiers, this should also be easy to gradually
transmogrify to support that case since there exists a 1:1 mapping of
interrupt specifiers are read at enumeration time and a resulting
handle -- struct resource or scalar IRQ -- used afterward to refer to it.
Nice, nice, i think that we are very close at this point.
The problem with the above workflow is that maps IRQ function is called
at parent bus enumeration time, generally at BUS_PASS_BUS pass. And at
this time, PICs are not exist.
Also, 'static struct intr_irqsrc *irq_sources[NIRQ];' is not a mapping
table, it doesn't provide this functionality.
But yes, i understand that mapping table is important and necessary for you.
We can add new table (map_data_tbl for example) that holds a copy of
the original interrupt specifier, index to irq_sources table and
probably some flags.
- Parent bus enumerates children, allocates entries from map_data_tbl,
adds to resource list. The struct resources involved aren't special
(just a single number), so PCIB_ROUTE_INTERRUPT(), MSIs, etc. are
trivial to implement (and already are implemented, in general, for
OF/FDT drivers). Index to map_data_tbl is used as resource ID.
- bus_alloc_resource() sets 'ALLOCATED' in map_data_tbl flags field,
*maps IRQs* and stores result in 'index' field.
- bus_setup_intr() sets 'SETUP_DONE' in map_data_tbl flags field, calls
into the PIC and does what is needed, as in r301453. (to within that
#ifdef)
And, in PPC case, newly attached PIC scans whole map_data_tbl table,
finds his entries and makes what needs.
Does that make sense?
I hope that postponing of map IRQ call is not a problem for PPC,
everything else looks easy.
Moreover, this additional level of indirection also solves all my
'hypothetical corner cases' with N:1 mappings (between original
interrupt specifier and real interrupt number).
Yes, I think we are converging.
1. Traditionally, with interrupts, bus_alloc_resource() has no side
effects and I'm not sure it propagates cleanly down the tree in all cases.
2. There are a few bus APIs (bus_config_intr() comes to mind) that
use raw IRQ IDs and, so far as I know, can be, and sometimes are
called before bus_alloc_resource(). If the PIC doesn't know about the
IRQ yet when bus_config_intr() (etc.) is called, things will just break.
So you would need to make sure that any bus method handling a
resource ID causes it to be mapped on the PIC at that time. It's OK
if that happens in bus_alloc_resource() so long as bus_config_intr(),
bus_setup_intr(), etc. cause that to happen if it hasn't happened yet
-- I don't care when these calls are made to the PIC driver so long
as *what* calls will be made is set at enumeration time.
I am also totally fine with having, on ARM, bus_config_intr(),
bus_setup_intr() etc. fail if the PIC hasn't attached yet (On
PowerPC, we can't do that yet, but after this conversation, I regard
that as a bug and would fix that later), as well as delaying setup on
the PIC to the first time any bus driver actually tries to *use* the
interrupt (alloc_resource(), setup_intr(), config_intr(), whatever)
rather than when the interrupt is originally allocated.
I understand. And yes, bus_alloc_resource() isn't propagated cleanly
down the tree in all cases. That's why we have 'INTRNG' hook in
subr_bus.c, which is suboptimal.

About bus_config_intr(). The only consumer of bus_config_intr() is ACPI
code, in little hackish manner, as workaround for problem which is
fully solved by INTRNG.
Also, the semantic of bus_config_intr() is not clear, from INTRNG point
of view.
So i have tendency to ignore it :)
Post by Nathan Whitehorn
------------ The following is a large parenthesis -------------------
One other possible route here that would also work well would be to
make ofwbus.c MD (it's a trivial piece of code anyway, so we don't
gain a lot by sharing it) and implement ofw_bus_map_intr() locally as
an ofwbus bus method. Then you could have the mapping table stored in
ofwbus's softc -- the API was designed for this initially. You would
need MD extensions for doing PIC registration there (which is fine),
but that segregates all the OFW-specific information into
OFW-specific code and would let the bus methods and the OFW interrupt
mapping table interact cleanly in the same place. This still
preserves the pre-r301453 API, makes PowerPC work, and maybe
interrupt code know about FDT (or ACPI). I'd be happy to mock this up
as well if you think it's a good approach.
[snip]
Post by Nathan Whitehorn
- Existing OFW API and semantics unchanged
- As such, PowerPC, PCI, etc. work fine with no changes
- Details encapsulated in MD code, so individual platforms can
implement this however they like
- arm/arm/intr.c (or whatever) only needs a method to allocate a
fresh interrupt, with no state, and anoter to set the device_t for an
interrupt sometime later.
- The internal table in the platform interrupt code has no knowledge
of any mappings whatsoever except having the appropriate device_t for
the PIC stored with the interrupt vector.
- Device tree bits handled purely in device tree code
- No action need be taken on any mapping until the interrupt is
actually allocated/set up, like r301453
- Easy to add more mapping mechanisms (e.g. ACPI) by having similar
enumeration-mechanism-specific code in the root bus for that mapping
mechanism.
-------------- End parenthesis -------------------------------
Here's an implementation of the parenthesis I wrote on an airplane
this afternoon. It should be complete, though has not been tested. The
code is short and simple (+70 lines in ofwbus.c). This preserves the
pre-r301453 API and semantics relative to drivers, which means PowerPC
and PCI work out of the box, while keeping the semantics relative to
the interrupt layer of r301453 (PIC methods only called on resource
allocation, no allocatable IRQs on unattached PICs, encapsulation of
OFW-specific code in OFW-specific bits of the tree). It turns out
those two things are compatible, somewhat to my surprise, and that
makes the result very clean. I like this approach and would be happy
1. OFW_BUS_MAP_INTR(). This has the semantics and API it has now: you
pass an interrupt specifier and parent, you get back an IRQ. No
changes. This is the core of the normal OFW interrupt API.
2. OFW_BUS_REGISTER_PIC(device_t pic, phandle_t phandle). This is a
new function that PIC drivers are supposed to use to register control
of an interrupt domain. This replaces machine-specific code like
powerpc_register_pic() to allow the PIC table to be in a bus parent
rather than in the interrupt core.
3. PIC_MAP_OFW_INTR(device_t pic, int irq, interrupt specifier). This
is a new function that PIC drivers that know how to handle device-tree
interrupt descriptors implement (analogous to various existing ones
that vary by platform). It tells the PIC that the given abstract IRQ
means the given opaque interrupt specifier.
4. arm_allocate_irq(int suggested). This allocates a new IRQ number
not (yet) attached to a PIC, as in r301453. I've added a parameter
that lets you pass a suggested number to try in case it is possible to
make it match an interrupt pin or something for human-readability.
5. arm_set_pic_for_irq(int irq, device_t). This tells the MD interrupt
layer to associate a given PIC device_t with an IRQ. That is all the
information the MD layer ever has about the IRQ mapping.
Functions #1 and #2 are now implemented completely in ofwbus.c: there
are no callouts anywhere else and the interrupt mapping table is
maintained entirely internally to ofwbus (in its softc). In order to
implement ACPI, or some other strategy, you would implement analogs to
functions #1 and #2 that live somewhere in the bus hierarchy that is
guaranteed to be above all devices that might want that type of
interrupt (e.g. in acpi0), and some analog to #3 that PIC drivers
implementing the mapping scheme would provide.
Since the system interrupt code has no knowledge at all of interrupt
mapping, of any type, in this scheme, adding new mapping types is
trivial and can be done on a driver-by-driver basis if necessary
without changing KPI and without any other part of the system even
being aware. For example, GPIOs can use a completely different
mechanism if they want and can do setup purely in the GPIO controller
driver. You could have a method GPIO_GET_INTERRUPT_FOR_PIN() on a GPIO
controller in which the GPIO controller allocates a generic IRQ,
assigns through some internal table just in the GPIO driver, and
returns to it to a consumer in some other device driver -- without a
GPIO mapping type, new bus functions, or modifications to the platform
interrupt code.
- Bus driver enumerates children, parses interrupts properties, calls
OFW_BUS_MAP_INTR() to get IRQs for them (as pre-r301453), adds to
interrupt list.
- ofwbus receives the OFW_BUS_MAP_INTR() call, allocates a blank
disconnected IRQ from the MD interrupt layer, and stores the mapping
from the new IRQ to the given interrupt specifier and phandle in an
internal table in ofwbus's softc.
NB: Nothing else happens here, like post-r301453. Changing this does
not change any semantics of the API pre-r301453, which means it
remains fully compatible with PCI and PowerPC. Also, like
post-r301453, there is no involvement of nexus.
- PICs attach and call OFW_BUS_REGISTER_PIC(). ofwbus receives these
messages and adds a (device_t, phandle_t) mapping to a second internal
table. Note that the interrupt layer does not need to handle PIC
registration anymore at all (except for the root PIC).
- Bus child eventually calls a function that tries to set the
interrupt up (e.g. bus_setup_intr()). That propagates up the bus
hierarchy, eventually getting to ofwbus. ofwbus notes the IRQ number,
looks it up in the table, looks up the appropriate PIC from the PIC
A) calls arm_set_pic_for_irq(irq, pic_device_t) -- this is the
interrupt layer's only interaction with the mapping code. All it deals
with is device_ts and abstract IRQ numbers.
B) calls PIC_MAP_OFW_INTR(pic, irq_number, interrupt-cells,
interrupt-specifier) to tell the PIC that the interrupt layer's IRQ
irq_number means the given specifier
C) finally, passes the call onto nexus, which will do whatever would
normally happen (unmasking the interrupt, setting handlers, etc.) in
terms only of the abstract IRQ and the device_t assigned by ofwbus.
You would implement ACPI just by doing a s/OFW/ACPI/g
search-and-replace above -- since the interrupt layer doesn't know
about OFW or ACPI or anything else, there is no need to touch it. This
seems clean, simple, compartmentalized, preserves the existing API,
and should work on all of our various hardware. PowerPC can't quite
work with it yet without some multipass foo, but, since the API is
preserved, that transition can happen gradually without KPI changes.
For the same reason that it is API-preserving, I think this code is
also MFC-able.
-Nathan
I think that we are slightly diverges in this place:
- please note that PIC API (in kern/pic_if.m) is different from the one
that PPC uses.
- we must be able to store configuration data (original interrupt
specifier) in all cases, not only for OFW. That's reason why I have
proposed
to create 'mapping table'.

Anyway, i think that we both talking about +/- same solution, only i
want to move it from OFW specific level implemented at OFW bus level to
bus/interrupt specifier format independent level implemented in
subr_intr.c.
Most of your control flow can be implemented at general level as is, or
already exist [intr_map_irq(), intr_pic_register()] .
Also, impact for current PPC code is, by me, minimal.
I can sketch my idea in more details, if you found it acceptable.

Again, I'm sorry for delayed and very brief response.
Michal
Nathan Whitehorn
2016-08-02 15:31:39 UTC
Permalink
Post by Michal Meloun
cannot spend to much time on FreeBSB for next 2 or 3 days.
Understood.
Post by Michal Meloun
Post by Nathan Whitehorn
[snip]
Post by Nathan Whitehorn
Post by Nathan Whitehorn
The current API is certainly a bit of a hack and I would be pleased to
Post by Nathan Whitehorn
see it go; but the replacement needs to be more functional and more
robust. As far as I can tell, I literally *cannot* move PowerPC over
to this architecture.
Yes, at this time, I agree. If you can accept enhancement of
(owf_)bus_map_intr() then we can move to next, significantly less
hackish, state.
OK, sure. I wrote some code this afternoon (attached) to sketch this.
It does not compile or run or anything useful, but it should
illustrate the important parts of what I think is an API that should
work for everyone. I'm happy to finish it if it looks reasonable -- I
may in any case just to replace PowerPC's increasingly teetering
intr_machdep.c.
there's a method that you pass the interrupt specifier and interrupt
parent to, and it spits back an IRQ that means that combination of
things in perpetuity. OFW_BUS_MAP_INTR() in nexus.c, with its current
API, can be implemented in terms of it in one line. Whenever the
relevant PIC attaches, it is told to use the given IRQ to refer to
that opaque bytestring.
I've added a set of equivalent functions for ACPI that take the
contents of an ACPI interrupt specifier and do the same things. It
tries to have the IRQ be human-meaningful, if possible, by matching
the ACPI interrupt number. Having separate functions seemed a little
cleaner than exposing the enums and unions as part of the KPI, but I'm
not wedded to it -- this is just an example.
PICs register (and deregister) themselves with either an OF phandle or
an ACPI resource string or (god help us) both as needed. The PICs have
corresponding methods for interpreting various kinds of interrupt
specifiers.
Whether it allows bus_alloc_resource(), bus_setup_intr(), etc. to
succeed before the PICs are attached is gated on an #ifdef. That can
probably be off by default on ARM. A PowerPC version of this code
would have to keep it on until various bus drivers are fixed.
- Parent bus enumerates children, maps IRQs as needed, adds to
resource list. The struct resources involved aren't special (just a
single number), so PCIB_ROUTE_INTERRUPT(), MSIs, etc. are trivial to
implement (and already are implemented, in general, for OF/FDT drivers).
- bus_alloc_resource() does nothing special
- bus_setup_intr() calls into the PIC and does what is needed, as in
r301453 (to within that #ifdef)
- ACPI and OF are supported together, and easy to extend for other
types of mapping technologies without breaking either KBI or KPI (just
add new functions)
- No APIs change for existing Open Firmware code
- The support code can live entirely in machine-dependent directories,
with no code changes at all required in kern/ or dev/ofw/
- bus_setup_intr() and friends behave as in r301453 and succeed or
fail for real
- PCI code is not an issue
- No disconnected interrupt state maintained in mapping table (unless
the transitional #ifdef is on) and the mapping table content is only
larger than r301453 by having a copy of the original interrupt specifier.
If and when we manage to fix the kernel APIs to allow non-scalar
interrupt specifiers, this should also be easy to gradually
transmogrify to support that case since there exists a 1:1 mapping of
interrupt specifiers are read at enumeration time and a resulting
handle -- struct resource or scalar IRQ -- used afterward to refer to it.
Nice, nice, i think that we are very close at this point.
The problem with the above workflow is that maps IRQ function is called
at parent bus enumeration time, generally at BUS_PASS_BUS pass. And at
this time, PICs are not exist.
Also, 'static struct intr_irqsrc *irq_sources[NIRQ];' is not a mapping
table, it doesn't provide this functionality.
But yes, i understand that mapping table is important and necessary for you.
We can add new table (map_data_tbl for example) that holds a copy of
the original interrupt specifier, index to irq_sources table and
probably some flags.
- Parent bus enumerates children, allocates entries from map_data_tbl,
adds to resource list. The struct resources involved aren't special
(just a single number), so PCIB_ROUTE_INTERRUPT(), MSIs, etc. are
trivial to implement (and already are implemented, in general, for
OF/FDT drivers). Index to map_data_tbl is used as resource ID.
- bus_alloc_resource() sets 'ALLOCATED' in map_data_tbl flags field,
*maps IRQs* and stores result in 'index' field.
- bus_setup_intr() sets 'SETUP_DONE' in map_data_tbl flags field, calls
into the PIC and does what is needed, as in r301453. (to within that
#ifdef)
And, in PPC case, newly attached PIC scans whole map_data_tbl table,
finds his entries and makes what needs.
Does that make sense?
I hope that postponing of map IRQ call is not a problem for PPC,
everything else looks easy.
Moreover, this additional level of indirection also solves all my
'hypothetical corner cases' with N:1 mappings (between original
interrupt specifier and real interrupt number).
Yes, I think we are converging.
1. Traditionally, with interrupts, bus_alloc_resource() has no side
effects and I'm not sure it propagates cleanly down the tree in all cases.
2. There are a few bus APIs (bus_config_intr() comes to mind) that
use raw IRQ IDs and, so far as I know, can be, and sometimes are
called before bus_alloc_resource(). If the PIC doesn't know about the
IRQ yet when bus_config_intr() (etc.) is called, things will just break.
So you would need to make sure that any bus method handling a
resource ID causes it to be mapped on the PIC at that time. It's OK
if that happens in bus_alloc_resource() so long as bus_config_intr(),
bus_setup_intr(), etc. cause that to happen if it hasn't happened yet
-- I don't care when these calls are made to the PIC driver so long
as *what* calls will be made is set at enumeration time.
I am also totally fine with having, on ARM, bus_config_intr(),
bus_setup_intr() etc. fail if the PIC hasn't attached yet (On
PowerPC, we can't do that yet, but after this conversation, I regard
that as a bug and would fix that later), as well as delaying setup on
the PIC to the first time any bus driver actually tries to *use* the
interrupt (alloc_resource(), setup_intr(), config_intr(), whatever)
rather than when the interrupt is originally allocated.
I understand. And yes, bus_alloc_resource() isn't propagated cleanly
down the tree in all cases. That's why we have 'INTRNG' hook in
subr_bus.c, which is suboptimal.
About bus_config_intr(). The only consumer of bus_config_intr() is ACPI
code, in little hackish manner, as workaround for problem which is
fully solved by INTRNG.
Also, the semantic of bus_config_intr() is not clear, from INTRNG point
of view.
So i have tendency to ignore it :)
Heh. Fair enough. I hadn't noticed that -- though I can see legitimate
uses for it in the context of GPIOs. In any case, I'm a little wary of
bus_alloc_resource() having side effects. Usually
bus_activate_resource() would do that kind of thing.
Post by Michal Meloun
Post by Nathan Whitehorn
------------ The following is a large parenthesis -------------------
One other possible route here that would also work well would be to
make ofwbus.c MD (it's a trivial piece of code anyway, so we don't
gain a lot by sharing it) and implement ofw_bus_map_intr() locally as
an ofwbus bus method. Then you could have the mapping table stored in
ofwbus's softc -- the API was designed for this initially. You would
need MD extensions for doing PIC registration there (which is fine),
but that segregates all the OFW-specific information into
OFW-specific code and would let the bus methods and the OFW interrupt
mapping table interact cleanly in the same place. This still
preserves the pre-r301453 API, makes PowerPC work, and maybe
interrupt code know about FDT (or ACPI). I'd be happy to mock this up
as well if you think it's a good approach.
[snip]
Post by Nathan Whitehorn
- Existing OFW API and semantics unchanged
- As such, PowerPC, PCI, etc. work fine with no changes
- Details encapsulated in MD code, so individual platforms can
implement this however they like
- arm/arm/intr.c (or whatever) only needs a method to allocate a
fresh interrupt, with no state, and anoter to set the device_t for an
interrupt sometime later.
- The internal table in the platform interrupt code has no knowledge
of any mappings whatsoever except having the appropriate device_t for
the PIC stored with the interrupt vector.
- Device tree bits handled purely in device tree code
- No action need be taken on any mapping until the interrupt is
actually allocated/set up, like r301453
- Easy to add more mapping mechanisms (e.g. ACPI) by having similar
enumeration-mechanism-specific code in the root bus for that mapping
mechanism.
-------------- End parenthesis -------------------------------
Here's an implementation of the parenthesis I wrote on an airplane
this afternoon. It should be complete, though has not been tested. The
code is short and simple (+70 lines in ofwbus.c). This preserves the
pre-r301453 API and semantics relative to drivers, which means PowerPC
and PCI work out of the box, while keeping the semantics relative to
the interrupt layer of r301453 (PIC methods only called on resource
allocation, no allocatable IRQs on unattached PICs, encapsulation of
OFW-specific code in OFW-specific bits of the tree). It turns out
those two things are compatible, somewhat to my surprise, and that
makes the result very clean. I like this approach and would be happy
1. OFW_BUS_MAP_INTR(). This has the semantics and API it has now: you
pass an interrupt specifier and parent, you get back an IRQ. No
changes. This is the core of the normal OFW interrupt API.
2. OFW_BUS_REGISTER_PIC(device_t pic, phandle_t phandle). This is a
new function that PIC drivers are supposed to use to register control
of an interrupt domain. This replaces machine-specific code like
powerpc_register_pic() to allow the PIC table to be in a bus parent
rather than in the interrupt core.
3. PIC_MAP_OFW_INTR(device_t pic, int irq, interrupt specifier). This
is a new function that PIC drivers that know how to handle device-tree
interrupt descriptors implement (analogous to various existing ones
that vary by platform). It tells the PIC that the given abstract IRQ
means the given opaque interrupt specifier.
4. arm_allocate_irq(int suggested). This allocates a new IRQ number
not (yet) attached to a PIC, as in r301453. I've added a parameter
that lets you pass a suggested number to try in case it is possible to
make it match an interrupt pin or something for human-readability.
5. arm_set_pic_for_irq(int irq, device_t). This tells the MD interrupt
layer to associate a given PIC device_t with an IRQ. That is all the
information the MD layer ever has about the IRQ mapping.
Functions #1 and #2 are now implemented completely in ofwbus.c: there
are no callouts anywhere else and the interrupt mapping table is
maintained entirely internally to ofwbus (in its softc). In order to
implement ACPI, or some other strategy, you would implement analogs to
functions #1 and #2 that live somewhere in the bus hierarchy that is
guaranteed to be above all devices that might want that type of
interrupt (e.g. in acpi0), and some analog to #3 that PIC drivers
implementing the mapping scheme would provide.
Since the system interrupt code has no knowledge at all of interrupt
mapping, of any type, in this scheme, adding new mapping types is
trivial and can be done on a driver-by-driver basis if necessary
without changing KPI and without any other part of the system even
being aware. For example, GPIOs can use a completely different
mechanism if they want and can do setup purely in the GPIO controller
driver. You could have a method GPIO_GET_INTERRUPT_FOR_PIN() on a GPIO
controller in which the GPIO controller allocates a generic IRQ,
assigns through some internal table just in the GPIO driver, and
returns to it to a consumer in some other device driver -- without a
GPIO mapping type, new bus functions, or modifications to the platform
interrupt code.
- Bus driver enumerates children, parses interrupts properties, calls
OFW_BUS_MAP_INTR() to get IRQs for them (as pre-r301453), adds to
interrupt list.
- ofwbus receives the OFW_BUS_MAP_INTR() call, allocates a blank
disconnected IRQ from the MD interrupt layer, and stores the mapping
from the new IRQ to the given interrupt specifier and phandle in an
internal table in ofwbus's softc.
NB: Nothing else happens here, like post-r301453. Changing this does
not change any semantics of the API pre-r301453, which means it
remains fully compatible with PCI and PowerPC. Also, like
post-r301453, there is no involvement of nexus.
- PICs attach and call OFW_BUS_REGISTER_PIC(). ofwbus receives these
messages and adds a (device_t, phandle_t) mapping to a second internal
table. Note that the interrupt layer does not need to handle PIC
registration anymore at all (except for the root PIC).
- Bus child eventually calls a function that tries to set the
interrupt up (e.g. bus_setup_intr()). That propagates up the bus
hierarchy, eventually getting to ofwbus. ofwbus notes the IRQ number,
looks it up in the table, looks up the appropriate PIC from the PIC
A) calls arm_set_pic_for_irq(irq, pic_device_t) -- this is the
interrupt layer's only interaction with the mapping code. All it deals
with is device_ts and abstract IRQ numbers.
B) calls PIC_MAP_OFW_INTR(pic, irq_number, interrupt-cells,
interrupt-specifier) to tell the PIC that the interrupt layer's IRQ
irq_number means the given specifier
C) finally, passes the call onto nexus, which will do whatever would
normally happen (unmasking the interrupt, setting handlers, etc.) in
terms only of the abstract IRQ and the device_t assigned by ofwbus.
You would implement ACPI just by doing a s/OFW/ACPI/g
search-and-replace above -- since the interrupt layer doesn't know
about OFW or ACPI or anything else, there is no need to touch it. This
seems clean, simple, compartmentalized, preserves the existing API,
and should work on all of our various hardware. PowerPC can't quite
work with it yet without some multipass foo, but, since the API is
preserved, that transition can happen gradually without KPI changes.
For the same reason that it is API-preserving, I think this code is
also MFC-able.
-Nathan
- please note that PIC API (in kern/pic_if.m) is different from the one
that PPC uses.
Yes, which is fine (this is machine-dependent code anyway). On
consideration, the PIC_MAP_OFW_INTR() function should probably be called
OFW_BUS_PIC_MAP_INTR() and live in ofw_bus_if.m anyway, then be
implemented by PICs.
Post by Michal Meloun
- we must be able to store configuration data (original interrupt
specifier) in all cases, not only for OFW. That's reason why I have
proposed
to create 'mapping table'.
It is stored in all cases, just not in the core interrupt code. I've
only mocked this up for OFW here. For ACPI, you would have the
equivalent table in acpi.c, along with ACPI_MAP_INTR(),
ACPI_REGISTER_PIC(), etc. in acpi_if.m and implemented in dev/acpica/acpi.c.

For GPIOs, you would have a mechanism that just traces however you are
representing GPIOs anyway.
Post by Michal Meloun
Anyway, i think that we both talking about +/- same solution, only i
want to move it from OFW specific level implemented at OFW bus level to
bus/interrupt specifier format independent level implemented in
subr_intr.c.
This implements the same API in any case (identical to that
pre-r301453), so the implementation doesn't really matter at all in
terms of my needs.

That said, having it in the root bus for the mapping domain (ofwbus0,
acpi0) etc. seems cleaner to me, with no loss of functionality. The core
interrupt code (subr_intr.c or whatever) doesn't have any obvious need
to know anything at all about the mapping information so long as it
knows the PIC device_t that corresponds to every abstract IRQ so that it
can mask it or do other operations. Since, presumably, nothing in an
ACPI device tree references an interrupt in the OFW device tree (if you
even had both -- and how would you specify that, anyway?), implementing
the relevant bits one step up the bus hierarchy doesn't change any behavior.

And nothing can possibly be more flexible in terms of the mapping table
in subr_intr.c than not having a mapping table: you don't need enums for
mapping types, or unions that need to be expanded, breaking KBI, or
anything. You can delete all the PIC registration (and MSI) code, all
the switches, and all the unions from subr_intr.c and have a totally
mapping-mechanism-agnostic KPI.
Post by Michal Meloun
Most of your control flow can be implemented at general level as is, or
already exist [intr_map_irq(), intr_pic_register()] .
Also, impact for current PPC code is, by me, minimal.
I can sketch my idea in more details, if you found it acceptable.
All ideas are fine -- and the solution does not need to apply to all
platforms anyway, so long as the OFW_BUS_MAP_INTR() semantics (and,
ideally, API) are preserved.
Post by Michal Meloun
Again, I'm sorry for delayed and very brief response.
No worries; good luck with the work emergencies.
-Nathan
Post by Michal Meloun
Michal
Michal Meloun
2016-08-04 09:31:30 UTC
Permalink
Post by Nathan Whitehorn
Post by Michal Meloun
cannot spend to much time on FreeBSB for next 2 or 3 days.
Understood.
Post by Michal Meloun
Post by Nathan Whitehorn
[snip]
Post by Nathan Whitehorn
Post by Nathan Whitehorn
The current API is certainly a bit of a hack and I would be pleased to
Post by Nathan Whitehorn
see it go; but the replacement needs to be more functional and more
robust. As far as I can tell, I literally *cannot* move PowerPC over
to this architecture.
Yes, at this time, I agree. If you can accept enhancement of
(owf_)bus_map_intr() then we can move to next, significantly less
hackish, state.
OK, sure. I wrote some code this afternoon (attached) to sketch this.
It does not compile or run or anything useful, but it should
illustrate the important parts of what I think is an API that should
work for everyone. I'm happy to finish it if it looks reasonable -- I
may in any case just to replace PowerPC's increasingly teetering
intr_machdep.c.
there's a method that you pass the interrupt specifier and interrupt
parent to, and it spits back an IRQ that means that combination of
things in perpetuity. OFW_BUS_MAP_INTR() in nexus.c, with its current
API, can be implemented in terms of it in one line. Whenever the
relevant PIC attaches, it is told to use the given IRQ to refer to
that opaque bytestring.
I've added a set of equivalent functions for ACPI that take the
contents of an ACPI interrupt specifier and do the same things. It
tries to have the IRQ be human-meaningful, if possible, by matching
the ACPI interrupt number. Having separate functions seemed a little
cleaner than exposing the enums and unions as part of the KPI, but I'm
not wedded to it -- this is just an example.
PICs register (and deregister) themselves with either an OF phandle or
an ACPI resource string or (god help us) both as needed. The PICs have
corresponding methods for interpreting various kinds of interrupt
specifiers.
Whether it allows bus_alloc_resource(), bus_setup_intr(), etc. to
succeed before the PICs are attached is gated on an #ifdef. That can
probably be off by default on ARM. A PowerPC version of this code
would have to keep it on until various bus drivers are fixed.
- Parent bus enumerates children, maps IRQs as needed, adds to
resource list. The struct resources involved aren't special (just a
single number), so PCIB_ROUTE_INTERRUPT(), MSIs, etc. are trivial to
implement (and already are implemented, in general, for OF/FDT drivers).
- bus_alloc_resource() does nothing special
- bus_setup_intr() calls into the PIC and does what is needed, as in
r301453 (to within that #ifdef)
- ACPI and OF are supported together, and easy to extend for other
types of mapping technologies without breaking either KBI or KPI (just
add new functions)
- No APIs change for existing Open Firmware code
- The support code can live entirely in machine-dependent
directories,
with no code changes at all required in kern/ or dev/ofw/
- bus_setup_intr() and friends behave as in r301453 and succeed or
fail for real
- PCI code is not an issue
- No disconnected interrupt state maintained in mapping table (unless
the transitional #ifdef is on) and the mapping table content is only
larger than r301453 by having a copy of the original interrupt specifier.
If and when we manage to fix the kernel APIs to allow non-scalar
interrupt specifiers, this should also be easy to gradually
transmogrify to support that case since there exists a 1:1 mapping of
interrupt specifiers are read at enumeration time and a resulting
handle -- struct resource or scalar IRQ -- used afterward to refer to it.
Nice, nice, i think that we are very close at this point.
The problem with the above workflow is that maps IRQ function is called
at parent bus enumeration time, generally at BUS_PASS_BUS pass. And at
this time, PICs are not exist.
Also, 'static struct intr_irqsrc *irq_sources[NIRQ];' is not a mapping
table, it doesn't provide this functionality.
But yes, i understand that mapping table is important and necessary for you.
We can add new table (map_data_tbl for example) that holds a copy of
the original interrupt specifier, index to irq_sources table and
probably some flags.
- Parent bus enumerates children, allocates entries from
map_data_tbl,
adds to resource list. The struct resources involved aren't special
(just a single number), so PCIB_ROUTE_INTERRUPT(), MSIs, etc. are
trivial to implement (and already are implemented, in general, for
OF/FDT drivers). Index to map_data_tbl is used as resource ID.
- bus_alloc_resource() sets 'ALLOCATED' in map_data_tbl flags field,
*maps IRQs* and stores result in 'index' field.
- bus_setup_intr() sets 'SETUP_DONE' in map_data_tbl flags field, calls
into the PIC and does what is needed, as in r301453. (to within that
#ifdef)
And, in PPC case, newly attached PIC scans whole map_data_tbl table,
finds his entries and makes what needs.
Does that make sense?
I hope that postponing of map IRQ call is not a problem for PPC,
everything else looks easy.
Moreover, this additional level of indirection also solves all my
'hypothetical corner cases' with N:1 mappings (between original
interrupt specifier and real interrupt number).
Yes, I think we are converging.
1. Traditionally, with interrupts, bus_alloc_resource() has no side
effects and I'm not sure it propagates cleanly down the tree in all cases.
2. There are a few bus APIs (bus_config_intr() comes to mind) that
use raw IRQ IDs and, so far as I know, can be, and sometimes are
called before bus_alloc_resource(). If the PIC doesn't know about the
IRQ yet when bus_config_intr() (etc.) is called, things will just break.
So you would need to make sure that any bus method handling a
resource ID causes it to be mapped on the PIC at that time. It's OK
if that happens in bus_alloc_resource() so long as bus_config_intr(),
bus_setup_intr(), etc. cause that to happen if it hasn't happened yet
-- I don't care when these calls are made to the PIC driver so long
as *what* calls will be made is set at enumeration time.
I am also totally fine with having, on ARM, bus_config_intr(),
bus_setup_intr() etc. fail if the PIC hasn't attached yet (On
PowerPC, we can't do that yet, but after this conversation, I regard
that as a bug and would fix that later), as well as delaying setup on
the PIC to the first time any bus driver actually tries to *use* the
interrupt (alloc_resource(), setup_intr(), config_intr(), whatever)
rather than when the interrupt is originally allocated.
I understand. And yes, bus_alloc_resource() isn't propagated cleanly
down the tree in all cases. That's why we have 'INTRNG' hook in
subr_bus.c, which is suboptimal.
About bus_config_intr(). The only consumer of bus_config_intr() is ACPI
code, in little hackish manner, as workaround for problem which is
fully solved by INTRNG.
Also, the semantic of bus_config_intr() is not clear, from INTRNG point
of view.
So i have tendency to ignore it :)
Heh. Fair enough. I hadn't noticed that -- though I can see legitimate
uses for it in the context of GPIOs. In any case, I'm a little wary of
bus_alloc_resource() having side effects. Usually
bus_activate_resource() would do that kind of thing.
Yes, good catch!! bus_activate_resource() is definitively best method.
Post by Nathan Whitehorn
Post by Michal Meloun
Post by Nathan Whitehorn
------------ The following is a large parenthesis -------------------
One other possible route here that would also work well would be to
make ofwbus.c MD (it's a trivial piece of code anyway, so we don't
gain a lot by sharing it) and implement ofw_bus_map_intr() locally as
an ofwbus bus method. Then you could have the mapping table stored in
ofwbus's softc -- the API was designed for this initially. You would
need MD extensions for doing PIC registration there (which is fine),
but that segregates all the OFW-specific information into
OFW-specific code and would let the bus methods and the OFW interrupt
mapping table interact cleanly in the same place. This still
preserves the pre-r301453 API, makes PowerPC work, and maybe
interrupt code know about FDT (or ACPI). I'd be happy to mock this up
as well if you think it's a good approach.
[snip]
Post by Nathan Whitehorn
- Existing OFW API and semantics unchanged
- As such, PowerPC, PCI, etc. work fine with no changes
- Details encapsulated in MD code, so individual platforms can
implement this however they like
- arm/arm/intr.c (or whatever) only needs a method to allocate a
fresh interrupt, with no state, and anoter to set the device_t for an
interrupt sometime later.
- The internal table in the platform interrupt code has no knowledge
of any mappings whatsoever except having the appropriate device_t for
the PIC stored with the interrupt vector.
- Device tree bits handled purely in device tree code
- No action need be taken on any mapping until the interrupt is
actually allocated/set up, like r301453
- Easy to add more mapping mechanisms (e.g. ACPI) by having similar
enumeration-mechanism-specific code in the root bus for that mapping
mechanism.
-------------- End parenthesis -------------------------------
Here's an implementation of the parenthesis I wrote on an airplane
this afternoon. It should be complete, though has not been tested. The
code is short and simple (+70 lines in ofwbus.c). This preserves the
pre-r301453 API and semantics relative to drivers, which means PowerPC
and PCI work out of the box, while keeping the semantics relative to
the interrupt layer of r301453 (PIC methods only called on resource
allocation, no allocatable IRQs on unattached PICs, encapsulation of
OFW-specific code in OFW-specific bits of the tree). It turns out
those two things are compatible, somewhat to my surprise, and that
makes the result very clean. I like this approach and would be happy
1. OFW_BUS_MAP_INTR(). This has the semantics and API it has now: you
pass an interrupt specifier and parent, you get back an IRQ. No
changes. This is the core of the normal OFW interrupt API.
2. OFW_BUS_REGISTER_PIC(device_t pic, phandle_t phandle). This is a
new function that PIC drivers are supposed to use to register control
of an interrupt domain. This replaces machine-specific code like
powerpc_register_pic() to allow the PIC table to be in a bus parent
rather than in the interrupt core.
3. PIC_MAP_OFW_INTR(device_t pic, int irq, interrupt specifier). This
is a new function that PIC drivers that know how to handle device-tree
interrupt descriptors implement (analogous to various existing ones
that vary by platform). It tells the PIC that the given abstract IRQ
means the given opaque interrupt specifier.
4. arm_allocate_irq(int suggested). This allocates a new IRQ number
not (yet) attached to a PIC, as in r301453. I've added a parameter
that lets you pass a suggested number to try in case it is possible to
make it match an interrupt pin or something for human-readability.
5. arm_set_pic_for_irq(int irq, device_t). This tells the MD interrupt
layer to associate a given PIC device_t with an IRQ. That is all the
information the MD layer ever has about the IRQ mapping.
Functions #1 and #2 are now implemented completely in ofwbus.c: there
are no callouts anywhere else and the interrupt mapping table is
maintained entirely internally to ofwbus (in its softc). In order to
implement ACPI, or some other strategy, you would implement analogs to
functions #1 and #2 that live somewhere in the bus hierarchy that is
guaranteed to be above all devices that might want that type of
interrupt (e.g. in acpi0), and some analog to #3 that PIC drivers
implementing the mapping scheme would provide.
Since the system interrupt code has no knowledge at all of interrupt
mapping, of any type, in this scheme, adding new mapping types is
trivial and can be done on a driver-by-driver basis if necessary
without changing KPI and without any other part of the system even
being aware. For example, GPIOs can use a completely different
mechanism if they want and can do setup purely in the GPIO controller
driver. You could have a method GPIO_GET_INTERRUPT_FOR_PIN() on a GPIO
controller in which the GPIO controller allocates a generic IRQ,
assigns through some internal table just in the GPIO driver, and
returns to it to a consumer in some other device driver -- without a
GPIO mapping type, new bus functions, or modifications to the platform
interrupt code.
- Bus driver enumerates children, parses interrupts properties, calls
OFW_BUS_MAP_INTR() to get IRQs for them (as pre-r301453), adds to
interrupt list.
- ofwbus receives the OFW_BUS_MAP_INTR() call, allocates a blank
disconnected IRQ from the MD interrupt layer, and stores the mapping
from the new IRQ to the given interrupt specifier and phandle in an
internal table in ofwbus's softc.
NB: Nothing else happens here, like post-r301453. Changing this does
not change any semantics of the API pre-r301453, which means it
remains fully compatible with PCI and PowerPC. Also, like
post-r301453, there is no involvement of nexus.
- PICs attach and call OFW_BUS_REGISTER_PIC(). ofwbus receives these
messages and adds a (device_t, phandle_t) mapping to a second internal
table. Note that the interrupt layer does not need to handle PIC
registration anymore at all (except for the root PIC).
- Bus child eventually calls a function that tries to set the
interrupt up (e.g. bus_setup_intr()). That propagates up the bus
hierarchy, eventually getting to ofwbus. ofwbus notes the IRQ number,
looks it up in the table, looks up the appropriate PIC from the PIC
A) calls arm_set_pic_for_irq(irq, pic_device_t) -- this is the
interrupt layer's only interaction with the mapping code. All it deals
with is device_ts and abstract IRQ numbers.
B) calls PIC_MAP_OFW_INTR(pic, irq_number, interrupt-cells,
interrupt-specifier) to tell the PIC that the interrupt layer's IRQ
irq_number means the given specifier
C) finally, passes the call onto nexus, which will do whatever would
normally happen (unmasking the interrupt, setting handlers, etc.) in
terms only of the abstract IRQ and the device_t assigned by ofwbus.
You would implement ACPI just by doing a s/OFW/ACPI/g
search-and-replace above -- since the interrupt layer doesn't know
about OFW or ACPI or anything else, there is no need to touch it. This
seems clean, simple, compartmentalized, preserves the existing API,
and should work on all of our various hardware. PowerPC can't quite
work with it yet without some multipass foo, but, since the API is
preserved, that transition can happen gradually without KPI changes.
For the same reason that it is API-preserving, I think this code is
also MFC-able.
-Nathan
- please note that PIC API (in kern/pic_if.m) is different from the one
that PPC uses.
Yes, which is fine (this is machine-dependent code anyway). On
consideration, the PIC_MAP_OFW_INTR() function should probably be
called OFW_BUS_PIC_MAP_INTR() and live in ofw_bus_if.m anyway, then be
implemented by PICs.
Post by Michal Meloun
- we must be able to store configuration data (original interrupt
specifier) in all cases, not only for OFW. That's reason why I have
proposed
to create 'mapping table'.
It is stored in all cases, just not in the core interrupt code. I've
only mocked this up for OFW here. For ACPI, you would have the
equivalent table in acpi.c, along with ACPI_MAP_INTR(),
ACPI_REGISTER_PIC(), etc. in acpi_if.m and implemented in
dev/acpica/acpi.c.
For GPIOs, you would have a mechanism that just traces however you are
representing GPIOs anyway.
Post by Michal Meloun
Anyway, i think that we both talking about +/- same solution, only i
want to move it from OFW specific level implemented at OFW bus level to
bus/interrupt specifier format independent level implemented in
subr_intr.c.
This implements the same API in any case (identical to that
pre-r301453), so the implementation doesn't really matter at all in
terms of my needs.
Perfect.
Post by Nathan Whitehorn
That said, having it in the root bus for the mapping domain (ofwbus0,
acpi0) etc. seems cleaner to me, with no loss of functionality. The
core interrupt code (subr_intr.c or whatever) doesn't have any obvious
need to know anything at all about the mapping information so long as
it knows the PIC device_t that corresponds to every abstract IRQ so
that it can mask it or do other operations. Since, presumably, nothing
in an ACPI device tree references an interrupt in the OFW device tree
(if you even had both -- and how would you specify that, anyway?),
implementing the relevant bits one step up the bus hierarchy doesn't
change any behavior.
And nothing can possibly be more flexible in terms of the mapping
table in subr_intr.c than not having a mapping table: you don't need
enums for mapping types, or unions that need to be expanded, breaking
KBI, or anything. You can delete all the PIC registration (and MSI)
code, all the switches, and all the unions from subr_intr.c and have a
totally mapping-mechanism-agnostic KPI.
Where you see "all the switches, and all the unions" in subr_intr.c?
subr_intr.c uses nested structures approach, in exactly same way as is
used in OFW bus. Moreover, subr_intr.c *IS* currently totally
mapping-mechanism-agnostic KPI, and any form of mapping table must
follow this rule.
But allow me to make short summary:
- we must support 3 'main' cases - OFW, ACPI and HINTS (for older arm
and mips boards) based systems
- single kernel must support all above cases, but only one can by
'active' (kernel must be able to select right one at runtime).
- we must support 'direct' (interrupt specifier is defined by one of
above methods) or 'indirect' (where interrupt is associated with other
function - GPIO pin, but don't have direct description).
- we must be able to access single physical pin by both methods,
'direct' and/or 'indirect'.
- we must be able to add new interrupt type as simple as possible

For interrupt controllers:
- single controller must be able to parse multiple formats. 'direct' or
'indirect'. OFW or ACPI. ARM GIC must accept OFW or ACPI arguments (in
single kernel), GPIO based interrupt controller must accept 'direct' or
'indirect' formats.

For interrupt consumers:
- 'direct' interrupts are easy
- driver must be able to consume 'indirect' interrupt in 'root
bus'/'mapping-mechanism' agnostic manner.
For example, MMC driver must be able to get interrupt from CD gpio
pin in all possible cases/combinations .
Are we still connected?

I don't think that all this is possible without single universal format
of interrupt mapping specifier.
And, if we must have universal format then why we needs different
mapping databases?
Post by Nathan Whitehorn
Post by Michal Meloun
Most of your control flow can be implemented at general level as is, or
already exist [intr_map_irq(), intr_pic_register()] .
Also, impact for current PPC code is, by me, minimal.
I can sketch my idea in more details, if you found it acceptable.
All ideas are fine -- and the solution does not need to apply to all
platforms anyway, so long as the OFW_BUS_MAP_INTR() semantics (and,
ideally, API) are preserved.
Post by Michal Meloun
Again, I'm sorry for delayed and very brief response.
No worries; good luck with the work emergencies.
-Nathan
All problems have been solved, sleep deficit remains :)
Michal
Nathan Whitehorn
2016-08-04 12:34:50 UTC
Permalink
Post by Michal Meloun
Post by Nathan Whitehorn
Post by Michal Meloun
cannot spend to much time on FreeBSB for next 2 or 3 days.
Understood.
Post by Michal Meloun
Post by Nathan Whitehorn
[snip]
Post by Nathan Whitehorn
Post by Nathan Whitehorn
The current API is certainly a bit of a hack and I would be pleased to
Post by Nathan Whitehorn
see it go; but the replacement needs to be more functional and more
robust. As far as I can tell, I literally *cannot* move PowerPC over
to this architecture.
Yes, at this time, I agree. If you can accept enhancement of
(owf_)bus_map_intr() then we can move to next, significantly less
hackish, state.
OK, sure. I wrote some code this afternoon (attached) to sketch this.
It does not compile or run or anything useful, but it should
illustrate the important parts of what I think is an API that should
work for everyone. I'm happy to finish it if it looks reasonable -- I
may in any case just to replace PowerPC's increasingly teetering
intr_machdep.c.
there's a method that you pass the interrupt specifier and interrupt
parent to, and it spits back an IRQ that means that combination of
things in perpetuity. OFW_BUS_MAP_INTR() in nexus.c, with its current
API, can be implemented in terms of it in one line. Whenever the
relevant PIC attaches, it is told to use the given IRQ to refer to
that opaque bytestring.
I've added a set of equivalent functions for ACPI that take the
contents of an ACPI interrupt specifier and do the same things. It
tries to have the IRQ be human-meaningful, if possible, by matching
the ACPI interrupt number. Having separate functions seemed a little
cleaner than exposing the enums and unions as part of the KPI, but I'm
not wedded to it -- this is just an example.
PICs register (and deregister) themselves with either an OF phandle or
an ACPI resource string or (god help us) both as needed. The PICs have
corresponding methods for interpreting various kinds of interrupt
specifiers.
Whether it allows bus_alloc_resource(), bus_setup_intr(), etc. to
succeed before the PICs are attached is gated on an #ifdef. That can
probably be off by default on ARM. A PowerPC version of this code
would have to keep it on until various bus drivers are fixed.
- Parent bus enumerates children, maps IRQs as needed, adds to
resource list. The struct resources involved aren't special (just a
single number), so PCIB_ROUTE_INTERRUPT(), MSIs, etc. are trivial to
implement (and already are implemented, in general, for OF/FDT drivers).
- bus_alloc_resource() does nothing special
- bus_setup_intr() calls into the PIC and does what is needed, as in
r301453 (to within that #ifdef)
- ACPI and OF are supported together, and easy to extend for other
types of mapping technologies without breaking either KBI or KPI (just
add new functions)
- No APIs change for existing Open Firmware code
- The support code can live entirely in machine-dependent directories,
with no code changes at all required in kern/ or dev/ofw/
- bus_setup_intr() and friends behave as in r301453 and succeed or
fail for real
- PCI code is not an issue
- No disconnected interrupt state maintained in mapping table (unless
the transitional #ifdef is on) and the mapping table content is only
larger than r301453 by having a copy of the original interrupt specifier.
If and when we manage to fix the kernel APIs to allow non-scalar
interrupt specifiers, this should also be easy to gradually
transmogrify to support that case since there exists a 1:1 mapping of
interrupt specifiers are read at enumeration time and a resulting
handle -- struct resource or scalar IRQ -- used afterward to refer to it.
Nice, nice, i think that we are very close at this point.
The problem with the above workflow is that maps IRQ function is called
at parent bus enumeration time, generally at BUS_PASS_BUS pass. And at
this time, PICs are not exist.
Also, 'static struct intr_irqsrc *irq_sources[NIRQ];' is not a mapping
table, it doesn't provide this functionality.
But yes, i understand that mapping table is important and necessary for you.
We can add new table (map_data_tbl for example) that holds a copy of
the original interrupt specifier, index to irq_sources table and
probably some flags.
- Parent bus enumerates children, allocates entries from
map_data_tbl,
adds to resource list. The struct resources involved aren't special
(just a single number), so PCIB_ROUTE_INTERRUPT(), MSIs, etc. are
trivial to implement (and already are implemented, in general, for
OF/FDT drivers). Index to map_data_tbl is used as resource ID.
- bus_alloc_resource() sets 'ALLOCATED' in map_data_tbl flags field,
*maps IRQs* and stores result in 'index' field.
- bus_setup_intr() sets 'SETUP_DONE' in map_data_tbl flags field, calls
into the PIC and does what is needed, as in r301453. (to within that
#ifdef)
And, in PPC case, newly attached PIC scans whole map_data_tbl table,
finds his entries and makes what needs.
Does that make sense?
I hope that postponing of map IRQ call is not a problem for PPC,
everything else looks easy.
Moreover, this additional level of indirection also solves all my
'hypothetical corner cases' with N:1 mappings (between original
interrupt specifier and real interrupt number).
Yes, I think we are converging.
1. Traditionally, with interrupts, bus_alloc_resource() has no side
effects and I'm not sure it propagates cleanly down the tree in all cases.
2. There are a few bus APIs (bus_config_intr() comes to mind) that
use raw IRQ IDs and, so far as I know, can be, and sometimes are
called before bus_alloc_resource(). If the PIC doesn't know about the
IRQ yet when bus_config_intr() (etc.) is called, things will just break.
So you would need to make sure that any bus method handling a
resource ID causes it to be mapped on the PIC at that time. It's OK
if that happens in bus_alloc_resource() so long as bus_config_intr(),
bus_setup_intr(), etc. cause that to happen if it hasn't happened yet
-- I don't care when these calls are made to the PIC driver so long
as *what* calls will be made is set at enumeration time.
I am also totally fine with having, on ARM, bus_config_intr(),
bus_setup_intr() etc. fail if the PIC hasn't attached yet (On
PowerPC, we can't do that yet, but after this conversation, I regard
that as a bug and would fix that later), as well as delaying setup on
the PIC to the first time any bus driver actually tries to *use* the
interrupt (alloc_resource(), setup_intr(), config_intr(), whatever)
rather than when the interrupt is originally allocated.
I understand. And yes, bus_alloc_resource() isn't propagated cleanly
down the tree in all cases. That's why we have 'INTRNG' hook in
subr_bus.c, which is suboptimal.
About bus_config_intr(). The only consumer of bus_config_intr() is ACPI
code, in little hackish manner, as workaround for problem which is
fully solved by INTRNG.
Also, the semantic of bus_config_intr() is not clear, from INTRNG point
of view.
So i have tendency to ignore it :)
Heh. Fair enough. I hadn't noticed that -- though I can see legitimate
uses for it in the context of GPIOs. In any case, I'm a little wary of
bus_alloc_resource() having side effects. Usually
bus_activate_resource() would do that kind of thing.
Yes, good catch!! bus_activate_resource() is definitively best method.
Post by Nathan Whitehorn
Post by Michal Meloun
Post by Nathan Whitehorn
------------ The following is a large parenthesis -------------------
One other possible route here that would also work well would be to
make ofwbus.c MD (it's a trivial piece of code anyway, so we don't
gain a lot by sharing it) and implement ofw_bus_map_intr() locally as
an ofwbus bus method. Then you could have the mapping table stored in
ofwbus's softc -- the API was designed for this initially. You would
need MD extensions for doing PIC registration there (which is fine),
but that segregates all the OFW-specific information into
OFW-specific code and would let the bus methods and the OFW interrupt
mapping table interact cleanly in the same place. This still
preserves the pre-r301453 API, makes PowerPC work, and maybe
interrupt code know about FDT (or ACPI). I'd be happy to mock this up
as well if you think it's a good approach.
[snip]
Post by Nathan Whitehorn
- Existing OFW API and semantics unchanged
- As such, PowerPC, PCI, etc. work fine with no changes
- Details encapsulated in MD code, so individual platforms can
implement this however they like
- arm/arm/intr.c (or whatever) only needs a method to allocate a
fresh interrupt, with no state, and anoter to set the device_t for an
interrupt sometime later.
- The internal table in the platform interrupt code has no knowledge
of any mappings whatsoever except having the appropriate device_t for
the PIC stored with the interrupt vector.
- Device tree bits handled purely in device tree code
- No action need be taken on any mapping until the interrupt is
actually allocated/set up, like r301453
- Easy to add more mapping mechanisms (e.g. ACPI) by having similar
enumeration-mechanism-specific code in the root bus for that mapping
mechanism.
-------------- End parenthesis -------------------------------
Here's an implementation of the parenthesis I wrote on an airplane
this afternoon. It should be complete, though has not been tested. The
code is short and simple (+70 lines in ofwbus.c). This preserves the
pre-r301453 API and semantics relative to drivers, which means PowerPC
and PCI work out of the box, while keeping the semantics relative to
the interrupt layer of r301453 (PIC methods only called on resource
allocation, no allocatable IRQs on unattached PICs, encapsulation of
OFW-specific code in OFW-specific bits of the tree). It turns out
those two things are compatible, somewhat to my surprise, and that
makes the result very clean. I like this approach and would be happy
1. OFW_BUS_MAP_INTR(). This has the semantics and API it has now: you
pass an interrupt specifier and parent, you get back an IRQ. No
changes. This is the core of the normal OFW interrupt API.
2. OFW_BUS_REGISTER_PIC(device_t pic, phandle_t phandle). This is a
new function that PIC drivers are supposed to use to register control
of an interrupt domain. This replaces machine-specific code like
powerpc_register_pic() to allow the PIC table to be in a bus parent
rather than in the interrupt core.
3. PIC_MAP_OFW_INTR(device_t pic, int irq, interrupt specifier). This
is a new function that PIC drivers that know how to handle device-tree
interrupt descriptors implement (analogous to various existing ones
that vary by platform). It tells the PIC that the given abstract IRQ
means the given opaque interrupt specifier.
4. arm_allocate_irq(int suggested). This allocates a new IRQ number
not (yet) attached to a PIC, as in r301453. I've added a parameter
that lets you pass a suggested number to try in case it is possible to
make it match an interrupt pin or something for human-readability.
5. arm_set_pic_for_irq(int irq, device_t). This tells the MD interrupt
layer to associate a given PIC device_t with an IRQ. That is all the
information the MD layer ever has about the IRQ mapping.
Functions #1 and #2 are now implemented completely in ofwbus.c: there
are no callouts anywhere else and the interrupt mapping table is
maintained entirely internally to ofwbus (in its softc). In order to
implement ACPI, or some other strategy, you would implement analogs to
functions #1 and #2 that live somewhere in the bus hierarchy that is
guaranteed to be above all devices that might want that type of
interrupt (e.g. in acpi0), and some analog to #3 that PIC drivers
implementing the mapping scheme would provide.
Since the system interrupt code has no knowledge at all of interrupt
mapping, of any type, in this scheme, adding new mapping types is
trivial and can be done on a driver-by-driver basis if necessary
without changing KPI and without any other part of the system even
being aware. For example, GPIOs can use a completely different
mechanism if they want and can do setup purely in the GPIO controller
driver. You could have a method GPIO_GET_INTERRUPT_FOR_PIN() on a GPIO
controller in which the GPIO controller allocates a generic IRQ,
assigns through some internal table just in the GPIO driver, and
returns to it to a consumer in some other device driver -- without a
GPIO mapping type, new bus functions, or modifications to the platform
interrupt code.
- Bus driver enumerates children, parses interrupts properties, calls
OFW_BUS_MAP_INTR() to get IRQs for them (as pre-r301453), adds to
interrupt list.
- ofwbus receives the OFW_BUS_MAP_INTR() call, allocates a blank
disconnected IRQ from the MD interrupt layer, and stores the mapping
from the new IRQ to the given interrupt specifier and phandle in an
internal table in ofwbus's softc.
NB: Nothing else happens here, like post-r301453. Changing this does
not change any semantics of the API pre-r301453, which means it
remains fully compatible with PCI and PowerPC. Also, like
post-r301453, there is no involvement of nexus.
- PICs attach and call OFW_BUS_REGISTER_PIC(). ofwbus receives these
messages and adds a (device_t, phandle_t) mapping to a second internal
table. Note that the interrupt layer does not need to handle PIC
registration anymore at all (except for the root PIC).
- Bus child eventually calls a function that tries to set the
interrupt up (e.g. bus_setup_intr()). That propagates up the bus
hierarchy, eventually getting to ofwbus. ofwbus notes the IRQ number,
looks it up in the table, looks up the appropriate PIC from the PIC
A) calls arm_set_pic_for_irq(irq, pic_device_t) -- this is the
interrupt layer's only interaction with the mapping code. All it deals
with is device_ts and abstract IRQ numbers.
B) calls PIC_MAP_OFW_INTR(pic, irq_number, interrupt-cells,
interrupt-specifier) to tell the PIC that the interrupt layer's IRQ
irq_number means the given specifier
C) finally, passes the call onto nexus, which will do whatever would
normally happen (unmasking the interrupt, setting handlers, etc.) in
terms only of the abstract IRQ and the device_t assigned by ofwbus.
You would implement ACPI just by doing a s/OFW/ACPI/g
search-and-replace above -- since the interrupt layer doesn't know
about OFW or ACPI or anything else, there is no need to touch it. This
seems clean, simple, compartmentalized, preserves the existing API,
and should work on all of our various hardware. PowerPC can't quite
work with it yet without some multipass foo, but, since the API is
preserved, that transition can happen gradually without KPI changes.
For the same reason that it is API-preserving, I think this code is
also MFC-able.
-Nathan
- please note that PIC API (in kern/pic_if.m) is different from the one
that PPC uses.
Yes, which is fine (this is machine-dependent code anyway). On
consideration, the PIC_MAP_OFW_INTR() function should probably be
called OFW_BUS_PIC_MAP_INTR() and live in ofw_bus_if.m anyway, then be
implemented by PICs.
Post by Michal Meloun
- we must be able to store configuration data (original interrupt
specifier) in all cases, not only for OFW. That's reason why I have
proposed
to create 'mapping table'.
It is stored in all cases, just not in the core interrupt code. I've
only mocked this up for OFW here. For ACPI, you would have the
equivalent table in acpi.c, along with ACPI_MAP_INTR(),
ACPI_REGISTER_PIC(), etc. in acpi_if.m and implemented in
dev/acpica/acpi.c.
For GPIOs, you would have a mechanism that just traces however you are
representing GPIOs anyway.
Post by Michal Meloun
Anyway, i think that we both talking about +/- same solution, only i
want to move it from OFW specific level implemented at OFW bus level to
bus/interrupt specifier format independent level implemented in
subr_intr.c.
This implements the same API in any case (identical to that
pre-r301453), so the implementation doesn't really matter at all in
terms of my needs.
Perfect.
Post by Nathan Whitehorn
That said, having it in the root bus for the mapping domain (ofwbus0,
acpi0) etc. seems cleaner to me, with no loss of functionality. The
core interrupt code (subr_intr.c or whatever) doesn't have any obvious
need to know anything at all about the mapping information so long as
it knows the PIC device_t that corresponds to every abstract IRQ so
that it can mask it or do other operations. Since, presumably, nothing
in an ACPI device tree references an interrupt in the OFW device tree
(if you even had both -- and how would you specify that, anyway?),
implementing the relevant bits one step up the bus hierarchy doesn't
change any behavior.
And nothing can possibly be more flexible in terms of the mapping
table in subr_intr.c than not having a mapping table: you don't need
enums for mapping types, or unions that need to be expanded, breaking
KBI, or anything. You can delete all the PIC registration (and MSI)
code, all the switches, and all the unions from subr_intr.c and have a
totally mapping-mechanism-agnostic KPI.
Where you see "all the switches, and all the unions" in subr_intr.c?
subr_intr.c uses nested structures approach, in exactly same way as is
used in OFW bus. Moreover, subr_intr.c *IS* currently totally
mapping-mechanism-agnostic KPI, and any form of mapping table must
follow this rule.
That was hyperbolic; my apologies.

The point was that you don't need intr_map_data, or the
intr_map_data_type enum, this way. One of the nice things about that is
that you don't need to add more entries to the enum to add new one-off
mapping types. For example, the PS3 platform on PPC has its own
non-device-tree, non-ACPI bus description and interrupt system. Needing
to make new enums (or anything) like INTR_MAP_DATA_PS3 in sys/bus.h
seems a little silly, though hardly very bad.

I've attached a version of PowerPC's intr_machdep.c and pic_if.m that
fully implements this distributed-table scheme. The code ends up being
very short and clean (a third the number of lines of code as
kern/subr_intr.c, for example) and the diff to PowerPC, including the
new ofwbus.c code, ends up being net-negative.
Post by Michal Meloun
- we must support 3 'main' cases - OFW, ACPI and HINTS (for older arm
and mips boards) based systems
- single kernel must support all above cases, but only one can by
'active' (kernel must be able to select right one at runtime).
- we must support 'direct' (interrupt specifier is defined by one of
above methods) or 'indirect' (where interrupt is associated with other
function - GPIO pin, but don't have direct description).
- we must be able to access single physical pin by both methods,
'direct' and/or 'indirect'.
- we must be able to add new interrupt type as simple as possible
Agreed with all of this.
Post by Michal Meloun
- single controller must be able to parse multiple formats. 'direct' or
'indirect'. OFW or ACPI. ARM GIC must accept OFW or ACPI arguments (in
single kernel), GPIO based interrupt controller must accept 'direct' or
'indirect' formats.
- 'direct' interrupts are easy
- driver must be able to consume 'indirect' interrupt in 'root
bus'/'mapping-mechanism' agnostic manner.
For example, MMC driver must be able to get interrupt from CD gpio
pin in all possible cases/combinations .
Are we still connected?
And this.
Post by Michal Meloun
I don't think that all this is possible without single universal format
of interrupt mapping specifier.
And, if we must have universal format then why we needs different
mapping databases?
It actually works just fine in this mapping-table-in-the-root-bus model.
If you have an OFW-type of interrupt, it is set up by ofwbus.c. Since
everything that would use an OFW interrupt is a child of ofwbus, this is
totally indistinguishable from a global table from the perspective of
client code. For ACPI interrupts, they are set up by acpi.c, which is
indistinguishable from a global table for the same reason. For hints --
and single-domain systems generally -- you can just have the
machine-dependent code assume that interrupts it doesn't explicitly know
about belong to the root PIC. This is what you call the "direct" case.

For the "indirect" case, there are as usual a few things you could do.
They match the set of things you would do in any implementation:
1. Set up one or more global registries for "indirect" PICs, with
corresponding mapping methods, either as bus methods on some high-level
bus or on the machine-specific nexus, or just a global function that you
call.
2. Use the global registry that you already have to have for GPIO
controllers and provide a mapping method on the GPIO controller
(GPIO_GET_INTERRUPT_FOR_PIN() or something) that returns a mapped IRQ.

One of the nice things about distributing the tables this way is that
you have lots of flexibility with things like these "indirect"
interrupts and are free to do things like #2 at will locally in some
machine-specific set of drivers. For example, in the PS3 code, I don't
need to add a new "PS3" mapping type *anywhere*. The ps3bus root driver
just has a table when it hands out interrupts and talks to ps3pic
internally.
-Nathan
Post by Michal Meloun
Post by Nathan Whitehorn
Post by Michal Meloun
Most of your control flow can be implemented at general level as is, or
already exist [intr_map_irq(), intr_pic_register()] .
Also, impact for current PPC code is, by me, minimal.
I can sketch my idea in more details, if you found it acceptable.
All ideas are fine -- and the solution does not need to apply to all
platforms anyway, so long as the OFW_BUS_MAP_INTR() semantics (and,
ideally, API) are preserved.
Post by Michal Meloun
Again, I'm sorry for delayed and very brief response.
No worries; good luck with the work emergencies.
-Nathan
All problems have been solved, sleep deficit remains :)
Michal
_______________________________________________
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
Michal Meloun
2016-08-06 14:30:24 UTC
Permalink
Dne 04.08.2016 v 14:34 Nathan Whitehorn napsal(a):
[snip]
Post by Nathan Whitehorn
Post by Michal Meloun
Post by Nathan Whitehorn
Post by Michal Meloun
Post by Nathan Whitehorn
------------ The following is a large parenthesis
-------------------
One other possible route here that would also work well would be to
make ofwbus.c MD (it's a trivial piece of code anyway, so we don't
gain a lot by sharing it) and implement ofw_bus_map_intr() locally as
an ofwbus bus method. Then you could have the mapping table stored in
ofwbus's softc -- the API was designed for this initially. You would
need MD extensions for doing PIC registration there (which is fine),
but that segregates all the OFW-specific information into
OFW-specific code and would let the bus methods and the OFW interrupt
mapping table interact cleanly in the same place. This still
preserves the pre-r301453 API, makes PowerPC work, and maybe
interrupt code know about FDT (or ACPI). I'd be happy to mock this up
as well if you think it's a good approach.
[snip]
Post by Nathan Whitehorn
- Existing OFW API and semantics unchanged
- As such, PowerPC, PCI, etc. work fine with no changes
- Details encapsulated in MD code, so individual platforms can
implement this however they like
- arm/arm/intr.c (or whatever) only needs a method to allocate a
fresh interrupt, with no state, and anoter to set the device_t for an
interrupt sometime later.
- The internal table in the platform interrupt code has no knowledge
of any mappings whatsoever except having the appropriate device_t for
the PIC stored with the interrupt vector.
- Device tree bits handled purely in device tree code
- No action need be taken on any mapping until the interrupt is
actually allocated/set up, like r301453
- Easy to add more mapping mechanisms (e.g. ACPI) by having similar
enumeration-mechanism-specific code in the root bus for that mapping
mechanism.
-------------- End parenthesis -------------------------------
Here's an implementation of the parenthesis I wrote on an airplane
this afternoon. It should be complete, though has not been tested. The
code is short and simple (+70 lines in ofwbus.c). This preserves the
pre-r301453 API and semantics relative to drivers, which means PowerPC
and PCI work out of the box, while keeping the semantics relative to
the interrupt layer of r301453 (PIC methods only called on resource
allocation, no allocatable IRQs on unattached PICs, encapsulation of
OFW-specific code in OFW-specific bits of the tree). It turns out
those two things are compatible, somewhat to my surprise, and that
makes the result very clean. I like this approach and would be happy
1. OFW_BUS_MAP_INTR(). This has the semantics and API it has now: you
pass an interrupt specifier and parent, you get back an IRQ. No
changes. This is the core of the normal OFW interrupt API.
2. OFW_BUS_REGISTER_PIC(device_t pic, phandle_t phandle). This is a
new function that PIC drivers are supposed to use to register control
of an interrupt domain. This replaces machine-specific code like
powerpc_register_pic() to allow the PIC table to be in a bus parent
rather than in the interrupt core.
3. PIC_MAP_OFW_INTR(device_t pic, int irq, interrupt specifier). This
is a new function that PIC drivers that know how to handle
device-tree
interrupt descriptors implement (analogous to various existing ones
that vary by platform). It tells the PIC that the given abstract IRQ
means the given opaque interrupt specifier.
4. arm_allocate_irq(int suggested). This allocates a new IRQ number
not (yet) attached to a PIC, as in r301453. I've added a parameter
that lets you pass a suggested number to try in case it is
possible to
make it match an interrupt pin or something for human-readability.
5. arm_set_pic_for_irq(int irq, device_t). This tells the MD interrupt
layer to associate a given PIC device_t with an IRQ. That is all the
information the MD layer ever has about the IRQ mapping.
Functions #1 and #2 are now implemented completely in ofwbus.c: there
are no callouts anywhere else and the interrupt mapping table is
maintained entirely internally to ofwbus (in its softc). In order to
implement ACPI, or some other strategy, you would implement analogs to
functions #1 and #2 that live somewhere in the bus hierarchy that is
guaranteed to be above all devices that might want that type of
interrupt (e.g. in acpi0), and some analog to #3 that PIC drivers
implementing the mapping scheme would provide.
Since the system interrupt code has no knowledge at all of interrupt
mapping, of any type, in this scheme, adding new mapping types is
trivial and can be done on a driver-by-driver basis if necessary
without changing KPI and without any other part of the system even
being aware. For example, GPIOs can use a completely different
mechanism if they want and can do setup purely in the GPIO controller
driver. You could have a method GPIO_GET_INTERRUPT_FOR_PIN() on a GPIO
controller in which the GPIO controller allocates a generic IRQ,
assigns through some internal table just in the GPIO driver, and
returns to it to a consumer in some other device driver -- without a
GPIO mapping type, new bus functions, or modifications to the platform
interrupt code.
- Bus driver enumerates children, parses interrupts properties, calls
OFW_BUS_MAP_INTR() to get IRQs for them (as pre-r301453), adds to
interrupt list.
- ofwbus receives the OFW_BUS_MAP_INTR() call, allocates a blank
disconnected IRQ from the MD interrupt layer, and stores the mapping
from the new IRQ to the given interrupt specifier and phandle in an
internal table in ofwbus's softc.
NB: Nothing else happens here, like post-r301453. Changing this does
not change any semantics of the API pre-r301453, which means it
remains fully compatible with PCI and PowerPC. Also, like
post-r301453, there is no involvement of nexus.
- PICs attach and call OFW_BUS_REGISTER_PIC(). ofwbus receives these
messages and adds a (device_t, phandle_t) mapping to a second internal
table. Note that the interrupt layer does not need to handle PIC
registration anymore at all (except for the root PIC).
- Bus child eventually calls a function that tries to set the
interrupt up (e.g. bus_setup_intr()). That propagates up the bus
hierarchy, eventually getting to ofwbus. ofwbus notes the IRQ number,
looks it up in the table, looks up the appropriate PIC from the PIC
A) calls arm_set_pic_for_irq(irq, pic_device_t) -- this is the
interrupt layer's only interaction with the mapping code. All it deals
with is device_ts and abstract IRQ numbers.
B) calls PIC_MAP_OFW_INTR(pic, irq_number, interrupt-cells,
interrupt-specifier) to tell the PIC that the interrupt layer's IRQ
irq_number means the given specifier
C) finally, passes the call onto nexus, which will do whatever would
normally happen (unmasking the interrupt, setting handlers, etc.) in
terms only of the abstract IRQ and the device_t assigned by ofwbus.
You would implement ACPI just by doing a s/OFW/ACPI/g
search-and-replace above -- since the interrupt layer doesn't know
about OFW or ACPI or anything else, there is no need to touch it. This
seems clean, simple, compartmentalized, preserves the existing API,
and should work on all of our various hardware. PowerPC can't quite
work with it yet without some multipass foo, but, since the API is
preserved, that transition can happen gradually without KPI changes.
For the same reason that it is API-preserving, I think this code is
also MFC-able.
-Nathan
- please note that PIC API (in kern/pic_if.m) is different from the one
that PPC uses.
Yes, which is fine (this is machine-dependent code anyway). On
consideration, the PIC_MAP_OFW_INTR() function should probably be
called OFW_BUS_PIC_MAP_INTR() and live in ofw_bus_if.m anyway, then be
implemented by PICs.
Post by Michal Meloun
- we must be able to store configuration data (original interrupt
specifier) in all cases, not only for OFW. That's reason why I have
proposed
to create 'mapping table'.
It is stored in all cases, just not in the core interrupt code. I've
only mocked this up for OFW here. For ACPI, you would have the
equivalent table in acpi.c, along with ACPI_MAP_INTR(),
ACPI_REGISTER_PIC(), etc. in acpi_if.m and implemented in
dev/acpica/acpi.c.
For GPIOs, you would have a mechanism that just traces however you are
representing GPIOs anyway.
Post by Michal Meloun
Anyway, i think that we both talking about +/- same solution, only i
want to move it from OFW specific level implemented at OFW bus level to
bus/interrupt specifier format independent level implemented in
subr_intr.c.
This implements the same API in any case (identical to that
pre-r301453), so the implementation doesn't really matter at all in
terms of my needs.
Perfect.
Post by Nathan Whitehorn
That said, having it in the root bus for the mapping domain (ofwbus0,
acpi0) etc. seems cleaner to me, with no loss of functionality. The
core interrupt code (subr_intr.c or whatever) doesn't have any obvious
need to know anything at all about the mapping information so long as
it knows the PIC device_t that corresponds to every abstract IRQ so
that it can mask it or do other operations. Since, presumably, nothing
in an ACPI device tree references an interrupt in the OFW device tree
(if you even had both -- and how would you specify that, anyway?),
implementing the relevant bits one step up the bus hierarchy doesn't
change any behavior.
And nothing can possibly be more flexible in terms of the mapping
table in subr_intr.c than not having a mapping table: you don't need
enums for mapping types, or unions that need to be expanded, breaking
KBI, or anything. You can delete all the PIC registration (and MSI)
code, all the switches, and all the unions from subr_intr.c and have a
totally mapping-mechanism-agnostic KPI.
Where you see "all the switches, and all the unions" in subr_intr.c?
subr_intr.c uses nested structures approach, in exactly same way as is
used in OFW bus. Moreover, subr_intr.c *IS* currently totally
mapping-mechanism-agnostic KPI, and any form of mapping table must
follow this rule.
That was hyperbolic; my apologies.
The point was that you don't need intr_map_data, or the
intr_map_data_type enum, this way. One of the nice things about that
is that you don't need to add more entries to the enum to add new
one-off mapping types. For example, the PS3 platform on PPC has its
own non-device-tree, non-ACPI bus description and interrupt system.
Needing to make new enums (or anything) like INTR_MAP_DATA_PS3 in
sys/bus.h seems a little silly, though hardly very bad.
Hmm, right.
Is something like <machine/_intr.h> solution for you?
Post by Nathan Whitehorn
I've attached a version of PowerPC's intr_machdep.c and pic_if.m that
fully implements this distributed-table scheme. The code ends up being
very short and clean (a third the number of lines of code as
kern/subr_intr.c, for example) and the diff to PowerPC, including the
new ofwbus.c code, ends up being net-negative.
Yep, I read it very carefully.
Post by Nathan Whitehorn
Post by Michal Meloun
- we must support 3 'main' cases - OFW, ACPI and HINTS (for older arm
and mips boards) based systems
- single kernel must support all above cases, but only one can by
'active' (kernel must be able to select right one at runtime).
- we must support 'direct' (interrupt specifier is defined by one of
above methods) or 'indirect' (where interrupt is associated with other
function - GPIO pin, but don't have direct description).
- we must be able to access single physical pin by both methods,
'direct' and/or 'indirect'.
- we must be able to add new interrupt type as simple as possible
Agreed with all of this.
Post by Michal Meloun
- single controller must be able to parse multiple formats. 'direct' or
'indirect'. OFW or ACPI. ARM GIC must accept OFW or ACPI arguments (in
single kernel), GPIO based interrupt controller must accept 'direct' or
'indirect' formats.
- 'direct' interrupts are easy
- driver must be able to consume 'indirect' interrupt in 'root
bus'/'mapping-mechanism' agnostic manner.
For example, MMC driver must be able to get interrupt from CD gpio
pin in all possible cases/combinations .
Are we still connected?
And this.
Post by Michal Meloun
I don't think that all this is possible without single universal format
of interrupt mapping specifier.
And, if we must have universal format then why we needs different
mapping databases?
It actually works just fine in this mapping-table-in-the-root-bus
model. If you have an OFW-type of interrupt, it is set up by ofwbus.c.
Since everything that would use an OFW interrupt is a child of ofwbus,
this is totally indistinguishable from a global table from the
perspective of client code. For ACPI interrupts, they are set up by
acpi.c, which is indistinguishable from a global table for the same
reason. For hints -- and single-domain systems generally -- you can
just have the machine-dependent code assume that interrupts it doesn't
explicitly know about belong to the root PIC. This is what you call
the "direct" case.
For the "indirect" case, there are as usual a few things you could do.
1. Set up one or more global registries for "indirect" PICs, with
corresponding mapping methods, either as bus methods on some
high-level bus or on the machine-specific nexus, or just a global
function that you call.
2. Use the global registry that you already have to have for GPIO
controllers and provide a mapping method on the GPIO controller
(GPIO_GET_INTERRUPT_FOR_PIN() or something) that returns a mapped IRQ.
I still don't think that this can works.
Assume that i have driver with 2 interrupts, one is standard (OFW
property based), second is GPIO, targeting same interrupt controller.
First is allocated by bus_alloc_resource(), second by
gpio_alloc_intr_resource(), so each resource is stored in different table.
Then bus_setup_intr() is called, on both. So bus_setup_intr() must be
able to select right table, right? But how?
Post by Nathan Whitehorn
One of the nice things about distributing the tables this way is that
you have lots of flexibility with things like these "indirect"
interrupts and are free to do things like #2 at will locally in some
machine-specific set of drivers. For example, in the PS3 code, I don't
need to add a new "PS3" mapping type *anywhere*. The ps3bus root
driver just has a table when it hands out interrupts and talks to
ps3pic internally.
-Nathan
Post by Michal Meloun
Post by Nathan Whitehorn
Post by Michal Meloun
Most of your control flow can be implemented at general level as is, or
already exist [intr_map_irq(), intr_pic_register()] .
Also, impact for current PPC code is, by me, minimal.
I can sketch my idea in more details, if you found it acceptable.
All ideas are fine -- and the solution does not need to apply to all
platforms anyway, so long as the OFW_BUS_MAP_INTR() semantics (and,
ideally, API) are preserved.
I have prepared working patch (it's not full, but it works on Tegra),
https://github.com/strejda/tegra/commit/2cf72e248877fb917c4fc618bcb6e46b7c1058a4
can you, please, take look on it?

Also, please, take in account:
- we have, currently, 20+ interrupt controllers converted to new INTRNG
PIC API.
- universal format of interrupt resources is generic part of this API.
- I'm not ready to commit any PIC API change together with above patch -
i hope that this is understandable.

also,
"we must be able to add new interrupt type as simple as possible" rule
is very important for me. Adding new table (with implementation), one
bus method and one PIC method for each new type is not *simple*.

In any case, can we concentrate to above patch first? I'm ready to
finish it in next few hours, then put it to phabricator for real review.

Michal
Nathan Whitehorn
2016-08-06 16:44:27 UTC
Permalink
[snip]
Post by Nathan Whitehorn
Post by Michal Meloun
Post by Nathan Whitehorn
Post by Michal Meloun
Post by Nathan Whitehorn
------------ The following is a large parenthesis
-------------------
One other possible route here that would also work well would be to
make ofwbus.c MD (it's a trivial piece of code anyway, so we don't
gain a lot by sharing it) and implement ofw_bus_map_intr() locally as
an ofwbus bus method. Then you could have the mapping table stored in
ofwbus's softc -- the API was designed for this initially. You would
need MD extensions for doing PIC registration there (which is fine),
but that segregates all the OFW-specific information into
OFW-specific code and would let the bus methods and the OFW interrupt
mapping table interact cleanly in the same place. This still
preserves the pre-r301453 API, makes PowerPC work, and maybe
interrupt code know about FDT (or ACPI). I'd be happy to mock this up
as well if you think it's a good approach.
[snip]
Post by Nathan Whitehorn
- Existing OFW API and semantics unchanged
- As such, PowerPC, PCI, etc. work fine with no changes
- Details encapsulated in MD code, so individual platforms can
implement this however they like
- arm/arm/intr.c (or whatever) only needs a method to allocate a
fresh interrupt, with no state, and anoter to set the device_t for an
interrupt sometime later.
- The internal table in the platform interrupt code has no knowledge
of any mappings whatsoever except having the appropriate device_t for
the PIC stored with the interrupt vector.
- Device tree bits handled purely in device tree code
- No action need be taken on any mapping until the interrupt is
actually allocated/set up, like r301453
- Easy to add more mapping mechanisms (e.g. ACPI) by having similar
enumeration-mechanism-specific code in the root bus for that mapping
mechanism.
-------------- End parenthesis -------------------------------
Here's an implementation of the parenthesis I wrote on an airplane
this afternoon. It should be complete, though has not been tested. The
code is short and simple (+70 lines in ofwbus.c). This preserves the
pre-r301453 API and semantics relative to drivers, which means PowerPC
and PCI work out of the box, while keeping the semantics relative to
the interrupt layer of r301453 (PIC methods only called on resource
allocation, no allocatable IRQs on unattached PICs, encapsulation of
OFW-specific code in OFW-specific bits of the tree). It turns out
those two things are compatible, somewhat to my surprise, and that
makes the result very clean. I like this approach and would be happy
1. OFW_BUS_MAP_INTR(). This has the semantics and API it has now: you
pass an interrupt specifier and parent, you get back an IRQ. No
changes. This is the core of the normal OFW interrupt API.
2. OFW_BUS_REGISTER_PIC(device_t pic, phandle_t phandle). This is a
new function that PIC drivers are supposed to use to register control
of an interrupt domain. This replaces machine-specific code like
powerpc_register_pic() to allow the PIC table to be in a bus parent
rather than in the interrupt core.
3. PIC_MAP_OFW_INTR(device_t pic, int irq, interrupt specifier). This
is a new function that PIC drivers that know how to handle device-tree
interrupt descriptors implement (analogous to various existing ones
that vary by platform). It tells the PIC that the given abstract IRQ
means the given opaque interrupt specifier.
4. arm_allocate_irq(int suggested). This allocates a new IRQ number
not (yet) attached to a PIC, as in r301453. I've added a parameter
that lets you pass a suggested number to try in case it is possible to
make it match an interrupt pin or something for human-readability.
5. arm_set_pic_for_irq(int irq, device_t). This tells the MD interrupt
layer to associate a given PIC device_t with an IRQ. That is all the
information the MD layer ever has about the IRQ mapping.
Functions #1 and #2 are now implemented completely in ofwbus.c: there
are no callouts anywhere else and the interrupt mapping table is
maintained entirely internally to ofwbus (in its softc). In order to
implement ACPI, or some other strategy, you would implement analogs to
functions #1 and #2 that live somewhere in the bus hierarchy that is
guaranteed to be above all devices that might want that type of
interrupt (e.g. in acpi0), and some analog to #3 that PIC drivers
implementing the mapping scheme would provide.
Since the system interrupt code has no knowledge at all of interrupt
mapping, of any type, in this scheme, adding new mapping types is
trivial and can be done on a driver-by-driver basis if necessary
without changing KPI and without any other part of the system even
being aware. For example, GPIOs can use a completely different
mechanism if they want and can do setup purely in the GPIO controller
driver. You could have a method GPIO_GET_INTERRUPT_FOR_PIN() on a GPIO
controller in which the GPIO controller allocates a generic IRQ,
assigns through some internal table just in the GPIO driver, and
returns to it to a consumer in some other device driver -- without a
GPIO mapping type, new bus functions, or modifications to the platform
interrupt code.
- Bus driver enumerates children, parses interrupts properties, calls
OFW_BUS_MAP_INTR() to get IRQs for them (as pre-r301453), adds to
interrupt list.
- ofwbus receives the OFW_BUS_MAP_INTR() call, allocates a blank
disconnected IRQ from the MD interrupt layer, and stores the mapping
from the new IRQ to the given interrupt specifier and phandle in an
internal table in ofwbus's softc.
NB: Nothing else happens here, like post-r301453. Changing this does
not change any semantics of the API pre-r301453, which means it
remains fully compatible with PCI and PowerPC. Also, like
post-r301453, there is no involvement of nexus.
- PICs attach and call OFW_BUS_REGISTER_PIC(). ofwbus receives these
messages and adds a (device_t, phandle_t) mapping to a second internal
table. Note that the interrupt layer does not need to handle PIC
registration anymore at all (except for the root PIC).
- Bus child eventually calls a function that tries to set the
interrupt up (e.g. bus_setup_intr()). That propagates up the bus
hierarchy, eventually getting to ofwbus. ofwbus notes the IRQ number,
looks it up in the table, looks up the appropriate PIC from the PIC
A) calls arm_set_pic_for_irq(irq, pic_device_t) -- this is the
interrupt layer's only interaction with the mapping code. All it deals
with is device_ts and abstract IRQ numbers.
B) calls PIC_MAP_OFW_INTR(pic, irq_number, interrupt-cells,
interrupt-specifier) to tell the PIC that the interrupt layer's IRQ
irq_number means the given specifier
C) finally, passes the call onto nexus, which will do whatever would
normally happen (unmasking the interrupt, setting handlers, etc.) in
terms only of the abstract IRQ and the device_t assigned by ofwbus.
You would implement ACPI just by doing a s/OFW/ACPI/g
search-and-replace above -- since the interrupt layer doesn't know
about OFW or ACPI or anything else, there is no need to touch it. This
seems clean, simple, compartmentalized, preserves the existing API,
and should work on all of our various hardware. PowerPC can't quite
work with it yet without some multipass foo, but, since the API is
preserved, that transition can happen gradually without KPI changes.
For the same reason that it is API-preserving, I think this code is
also MFC-able.
-Nathan
- please note that PIC API (in kern/pic_if.m) is different from the one
that PPC uses.
Yes, which is fine (this is machine-dependent code anyway). On
consideration, the PIC_MAP_OFW_INTR() function should probably be
called OFW_BUS_PIC_MAP_INTR() and live in ofw_bus_if.m anyway, then be
implemented by PICs.
Post by Michal Meloun
- we must be able to store configuration data (original interrupt
specifier) in all cases, not only for OFW. That's reason why I have
proposed
to create 'mapping table'.
It is stored in all cases, just not in the core interrupt code. I've
only mocked this up for OFW here. For ACPI, you would have the
equivalent table in acpi.c, along with ACPI_MAP_INTR(),
ACPI_REGISTER_PIC(), etc. in acpi_if.m and implemented in
dev/acpica/acpi.c.
For GPIOs, you would have a mechanism that just traces however you are
representing GPIOs anyway.
Post by Michal Meloun
Anyway, i think that we both talking about +/- same solution, only i
want to move it from OFW specific level implemented at OFW bus level to
bus/interrupt specifier format independent level implemented in
subr_intr.c.
This implements the same API in any case (identical to that
pre-r301453), so the implementation doesn't really matter at all in
terms of my needs.
Perfect.
Post by Nathan Whitehorn
That said, having it in the root bus for the mapping domain (ofwbus0,
acpi0) etc. seems cleaner to me, with no loss of functionality. The
core interrupt code (subr_intr.c or whatever) doesn't have any obvious
need to know anything at all about the mapping information so long as
it knows the PIC device_t that corresponds to every abstract IRQ so
that it can mask it or do other operations. Since, presumably, nothing
in an ACPI device tree references an interrupt in the OFW device tree
(if you even had both -- and how would you specify that, anyway?),
implementing the relevant bits one step up the bus hierarchy doesn't
change any behavior.
And nothing can possibly be more flexible in terms of the mapping
table in subr_intr.c than not having a mapping table: you don't need
enums for mapping types, or unions that need to be expanded, breaking
KBI, or anything. You can delete all the PIC registration (and MSI)
code, all the switches, and all the unions from subr_intr.c and have a
totally mapping-mechanism-agnostic KPI.
Where you see "all the switches, and all the unions" in subr_intr.c?
subr_intr.c uses nested structures approach, in exactly same way as is
used in OFW bus. Moreover, subr_intr.c *IS* currently totally
mapping-mechanism-agnostic KPI, and any form of mapping table must
follow this rule.
That was hyperbolic; my apologies.
The point was that you don't need intr_map_data, or the
intr_map_data_type enum, this way. One of the nice things about that
is that you don't need to add more entries to the enum to add new
one-off mapping types. For example, the PS3 platform on PPC has its
own non-device-tree, non-ACPI bus description and interrupt system.
Needing to make new enums (or anything) like INTR_MAP_DATA_PS3 in
sys/bus.h seems a little silly, though hardly very bad.
Hmm, right.
Is something like <machine/_intr.h> solution for you?
Or just the existing machine/intr_whatever_it_is.h.
Post by Nathan Whitehorn
I've attached a version of PowerPC's intr_machdep.c and pic_if.m that
fully implements this distributed-table scheme. The code ends up being
very short and clean (a third the number of lines of code as
kern/subr_intr.c, for example) and the diff to PowerPC, including the
new ofwbus.c code, ends up being net-negative.
Yep, I read it very carefully.
Post by Nathan Whitehorn
Post by Michal Meloun
- we must support 3 'main' cases - OFW, ACPI and HINTS (for older arm
and mips boards) based systems
- single kernel must support all above cases, but only one can by
'active' (kernel must be able to select right one at runtime).
- we must support 'direct' (interrupt specifier is defined by one of
above methods) or 'indirect' (where interrupt is associated with other
function - GPIO pin, but don't have direct description).
- we must be able to access single physical pin by both methods,
'direct' and/or 'indirect'.
- we must be able to add new interrupt type as simple as possible
Agreed with all of this.
Post by Michal Meloun
- single controller must be able to parse multiple formats. 'direct' or
'indirect'. OFW or ACPI. ARM GIC must accept OFW or ACPI arguments (in
single kernel), GPIO based interrupt controller must accept 'direct' or
'indirect' formats.
- 'direct' interrupts are easy
- driver must be able to consume 'indirect' interrupt in 'root
bus'/'mapping-mechanism' agnostic manner.
For example, MMC driver must be able to get interrupt from CD gpio
pin in all possible cases/combinations .
Are we still connected?
And this.
Post by Michal Meloun
I don't think that all this is possible without single universal format
of interrupt mapping specifier.
And, if we must have universal format then why we needs different
mapping databases?
It actually works just fine in this mapping-table-in-the-root-bus
model. If you have an OFW-type of interrupt, it is set up by ofwbus.c.
Since everything that would use an OFW interrupt is a child of ofwbus,
this is totally indistinguishable from a global table from the
perspective of client code. For ACPI interrupts, they are set up by
acpi.c, which is indistinguishable from a global table for the same
reason. For hints -- and single-domain systems generally -- you can
just have the machine-dependent code assume that interrupts it doesn't
explicitly know about belong to the root PIC. This is what you call
the "direct" case.
For the "indirect" case, there are as usual a few things you could do.
1. Set up one or more global registries for "indirect" PICs, with
corresponding mapping methods, either as bus methods on some
high-level bus or on the machine-specific nexus, or just a global
function that you call.
2. Use the global registry that you already have to have for GPIO
controllers and provide a mapping method on the GPIO controller
(GPIO_GET_INTERRUPT_FOR_PIN() or something) that returns a mapped IRQ.
I still don't think that this can works.
Assume that i have driver with 2 interrupts, one is standard (OFW
property based), second is GPIO, targeting same interrupt controller.
First is allocated by bus_alloc_resource(), second by
gpio_alloc_intr_resource(), so each resource is stored in different table.
Then bus_setup_intr() is called, on both. So bus_setup_intr() must be
able to select right table, right? But how?
The interrupt defined by the GPIO controller from a GPIO property (as
opposed to an interrupt property) could be defined, in general, in four
ways:

1. You have a bus method on the GPIO controller that returns a
pre-mapped IRQ. Something like GPIO_INTERRUPT_FOR_PIN(). The GPIO
properties I have seen don't have standard names so this would need to
be done by the child. As a result, you don't need it to work before PIC
attachment (or GPIO attachment) and can directly call methods on the
GPIO controller. This kind of API has the nice feature that the GPIO
controller can return errors early if that pin doesn't actually support
interrupts, which is not a concern with normal interrupts property.

The IRQ would be mapped internally by the controller when you do
GPIO_INTERRUPT_FOR_PIN() and so ofwbus.c would pass it through to nexus
(you'd need to remove a KASSERT from my code), which would then go to
the interrupt code, which would see the device_t assigned earlier and
pass through all operations appropriately.

2. Identical to the first paragraph of (1), except the bus method
resolves to an interrupt-parent, specifier pair passed to
OFW_BUS_MAP_INTR() rather than an internal mapping in the GPIO
controller. This proceeds as normal for OF interrupts thereafter. API is
the same as (1).

3. Some global helper function that transforms GPIO specifiers into some
other domain (OF, ACPI) without talking to the GPIO controller. I think
this is fragile and a bad idea in any case, though would also work as
well (or as badly) here as in any other scheme.

4. A GPIO interrupt domain with specifier, like you have currently.
You'd have to implement this mapping table in nexus (potentially as a
thin wrapper for some code in intr.c or intr_gpio.c or whatever) as a
global table as now or as in the first patch I sent. ofwbus.c would,
like (1) pass through such requests. This is a perfectly good interface.

So I don't think you lose anything by putting the table in ofwbus.c. All
four cases would have the same API no matter where the mapping table is.
The only non-idiomatic case is #4, which would require some shims, but
those shims are wholly identical to the shims you would need if you
didn't do OF/FDT interrupts this way.

The larger point is that, since all four cases would have identical APIs
in either case, it doesn't much matter how it is implemented now. If we
picked one method and wanted to change later, we lose nothing and the
change can be MFC-ed at will since it has no effect on KPI.
Post by Nathan Whitehorn
One of the nice things about distributing the tables this way is that
you have lots of flexibility with things like these "indirect"
interrupts and are free to do things like #2 at will locally in some
machine-specific set of drivers. For example, in the PS3 code, I don't
need to add a new "PS3" mapping type *anywhere*. The ps3bus root
driver just has a table when it hands out interrupts and talks to
ps3pic internally.
-Nathan
Post by Michal Meloun
Post by Nathan Whitehorn
Post by Michal Meloun
Most of your control flow can be implemented at general level as is, or
already exist [intr_map_irq(), intr_pic_register()] .
Also, impact for current PPC code is, by me, minimal.
I can sketch my idea in more details, if you found it acceptable.
All ideas are fine -- and the solution does not need to apply to all
platforms anyway, so long as the OFW_BUS_MAP_INTR() semantics (and,
ideally, API) are preserved.
I have prepared working patch (it's not full, but it works on Tegra),
https://github.com/strejda/tegra/commit/2cf72e248877fb917c4fc618bcb6e46b7c1058a4
can you, please, take look on it?
Change looks great to me at first pass. Thank you for working on it!
- we have, currently, 20+ interrupt controllers converted to new INTRNG
PIC API.
- universal format of interrupt resources is generic part of this API.
- I'm not ready to commit any PIC API change together with above patch -
i hope that this is understandable.
Yes, of course. Also, as far as I can tell, there's no reason to change
the PIC API on ARM/MIPS anyway from the current INTRNG API. I think
there are some minor simplifications you could do, but there's no need
for them, and you can implement absolutely any of the APIs we have
discussed in this thread in terms of the current pic_if.m with no problems.
also,
"we must be able to add new interrupt type as simple as possible" rule
is very important for me. Adding new table (with implementation), one
bus method and one PIC method for each new type is not *simple*.
Fair enough! I don't think we need that for, e.g., GPIOs (see cases 1-2
above), just for bus enumeration schemes (ACPI, OFW are probably the
only ones) that usually require a ton of this kind of thing anyway. But,
fundamentally, it doesn't matter. There are three important things from
my end:
1. That it is possible to, at bus enumeration time, permanently assign
an IRQ to an interrupt specifier from OFW/ACPI.
2. That that assignment not depend on having the PIC attached yet.
3. That the implementation details of that mechanism be reasonably
abstracted so that they can change later or vary platform to platform.

Whether mapping tables are in some central place (subr_intr.c) or in the
parent bus, how the PIC API works, whether they are stored in that table
in the form of a union or in different tables, doesn't matter for those
three at all. And, with a constant API (3) we can even change our minds
later without a lot of hassle.
In any case, can we concentrate to above patch first? I'm ready to
finish it in next few hours, then put it to phabricator for real review.
Sounds good. As I said, first pass looks perfect to me. I'll look in
detail once it hits phabricator.
-Nathan
Michal
Warner Losh
2016-08-06 16:58:48 UTC
Permalink
On Sat, Aug 6, 2016 at 10:44 AM, Nathan Whitehorn
Post by Nathan Whitehorn
Fair enough! I don't think we need that for, e.g., GPIOs (see cases 1-2
above), just for bus enumeration schemes (ACPI, OFW are probably the only
ones) that usually require a ton of this kind of thing anyway. But,
fundamentally, it doesn't matter. There are three important things from my
1. That it is possible to, at bus enumeration time, permanently assign an
IRQ to an interrupt specifier from OFW/ACPI.
2. That that assignment not depend on having the PIC attached yet.
3. That the implementation details of that mechanism be reasonably
abstracted so that they can change later or vary platform to platform.
Whether mapping tables are in some central place (subr_intr.c) or in the
parent bus, how the PIC API works, whether they are stored in that table in
the form of a union or in different tables, doesn't matter for those three
at all. And, with a constant API (3) we can even change our minds later
without a lot of hassle.
First, I hate mapping tables at the nexus, unless they are created
dynamically at run time. There's too much variation between boards,
SoCs, etc to have that code live in the nexus otherwise. They simply
don't scale. This board has interrupts 1-16 wired this way, but that
board didn't do that and has an external PIC. This SoC based on
Cortext A<whatever> uses the GPIC, while that one based on the
same Cortext A<whatever> chose to use Atmel's PIC. Perhaps I'm
misunderstanding something here as to what is meant by a table
though.

Next, In your list there's another dependency that's implicit
but maybe not called out. You can have PICs that cascade into
other PICs, or GPIO controllers that need to enable external
PIC-like things before they can route interrupts from things
that are downstream (interrupt wise) from them. Maybe I'm
just hung up on the phrase "the PIC" and it really means
"whatever complex thing or things handles getting the
interrupt routed to the CPU." I don't see this design so much
on basic eval boards, but do see it in more complex boards
that control complicated things.

Generally, though, I like the direction things are going.

Warner
Nathan Whitehorn
2016-08-06 19:51:48 UTC
Permalink
Post by Warner Losh
On Sat, Aug 6, 2016 at 10:44 AM, Nathan Whitehorn
Post by Nathan Whitehorn
Fair enough! I don't think we need that for, e.g., GPIOs (see cases 1-2
above), just for bus enumeration schemes (ACPI, OFW are probably the only
ones) that usually require a ton of this kind of thing anyway. But,
fundamentally, it doesn't matter. There are three important things from my
1. That it is possible to, at bus enumeration time, permanently assign an
IRQ to an interrupt specifier from OFW/ACPI.
2. That that assignment not depend on having the PIC attached yet.
3. That the implementation details of that mechanism be reasonably
abstracted so that they can change later or vary platform to platform.
Whether mapping tables are in some central place (subr_intr.c) or in the
parent bus, how the PIC API works, whether they are stored in that table in
the form of a union or in different tables, doesn't matter for those three
at all. And, with a constant API (3) we can even change our minds later
without a lot of hassle.
First, I hate mapping tables at the nexus, unless they are created
dynamically at run time. There's too much variation between boards,
SoCs, etc to have that code live in the nexus otherwise. They simply
don't scale. This board has interrupts 1-16 wired this way, but that
board didn't do that and has an external PIC. This SoC based on
Cortext A<whatever> uses the GPIC, while that one based on the
same Cortext A<whatever> chose to use Atmel's PIC. Perhaps I'm
misunderstanding something here as to what is meant by a table
though.
The table in question is just a mapping of abstract IRQ numbers to the
corresponding interrupt specifier and parent.

For example,
5: <&gicX, 7 2>

Where 5 is the (potentially arbitrary) assigned number for the system
corresponding to whatever <7 2> means on gic0. The table is built on the
fly so that OFW bus nodes can specify interrupts as something <&gicX, 7
2> and get back scalars that work with rman and the PCI APIs and all the
other places that want definitions of interrupts as scalars. It
naturally handles all of your examples.

The discussion here is whether to:
a) keep an OFW/FDT-specific table like this in ofwbus.c (and analogs for
ACPI, etc. in their respective places)
b) keep a table that maps numbers to some union of OFW/ACPI/whatever
data in nexus or intr.c or some global place

It doesn't matter much from a functionality perspective (or an API
perspective) either way. The question is more about which is easier to
maintain and extend long-term. Since the API doesn't change either way,
we're free to decide incorrectly and revisit it at will later with
little consequence.
Post by Warner Losh
Next, In your list there's another dependency that's implicit
but maybe not called out. You can have PICs that cascade into
other PICs, or GPIO controllers that need to enable external
PIC-like things before they can route interrupts from things
that are downstream (interrupt wise) from them. Maybe I'm
just hung up on the phrase "the PIC" and it really means
"whatever complex thing or things handles getting the
interrupt routed to the CPU." I don't see this design so much
on basic eval boards, but do see it in more complex boards
that control complicated things.
Yes. We've supported this forever on PPC (where it is very common) and
it works here too. The implementation is really simple.

First, here's an explanation of the general interrupt mapping system,
for context:

Let's suppose you have some interrupt hierarchy like this:

CPU <- pic0 <- some device

PIC0 has a bunch of interrupt pins. As you enumerate the system, the
code encounters interrupt specifiers like <&pic0, 3 1>, <&pic0, 4, 1>,
<&pic1, 3, 2>. Through OFW_BUS_MAP_INTR(), the bus enumeration code
turns these into some arbitrary scalar IRQs assigned by
OFW_BUS_MAP_INTR(). The mapping from the (interrupt parent, specifier)
tuple to that assigned scalar IRQ is stored somewhere (see above) for later.

When the bus devices that use those IRQs attach, they call
bus_activate_resource() and bus_setup_intr(). At this point, some code
close to the root of the hierarchy (again, see above for exactly where)
looks up the corresponding vector specifier in the table, looks up the
registered PIC corresponding to the interrupt parent key, and tells the
driver attached to the interrupt parent to think of the scalar IRQ as
meaning whatever string of numbers it saw earlier. It also tells the
machine-dependent interrupt layer that the given scalar IRQ is handled
by the device_t for the PIC so that it can ask the PIC to
mask/unmask/bind/etc. the interrupt.

When PICs attach, they register themselves with whatever bus layer is
doing this mapping to say that a given interrupt parent key (the phandle
for OFW/FDT) corresponds to the device_t for the PIC. The PIC attached
directly to the CPU's interrupt pin[s] (pic0 here) also registers itself
somehow with the MD interrupt system. On PPC, there is a global device_t
called "root_pic" that the driver sets.

When "some device" signals an interrupt, pic0 (the hardware) signals the
CPU. The CPU signals the kernel. The MD interrupt layer notices and
calls the driver for pic0 (the root PIC). That code inspects whatever
registers on PIC0 give it the interrupt line and then signals the MD
interrupt layer (let's say by calling a function called
arm_dispatch_irq(int irq)) with the corresponding scalar IRQ. That then
invokes the corresponding filters and/or ithreads.

So how does this change with cascaded PICs? Very little. Let's suppose
you have some interrupt hierarchy like this:

CPU <- pic0 <- pic1 <- some device

PIC0 has a bunch of interrupt pins, one of which is connected to pic1,
which provides a bunch more interrupt pins.
Both pic0 and pic1 register themselves with the bus layer by the
appropriate handles and get their mask/unmask/bind functions called by
the interrupt layer during interrupt setup for interrupts they handle.
PIC0 registers itself as the root during attachment because it notices
that it does not have any interrupts in its resource list. Child PICs,
on the other hand, have interrupt properties in the FDT corresponding to
the pin on pic0 that is attached to pic1. These get assigned as normal
through newbus and the child PIC (pic1) calls bus_setup_intr and
registers a filter handler.

When "some device" signals an interrupt, pic1 drives a line on pic0,
which drives a line on the CPU. The driver for PIC0 calls
arm_dispatch_irq() with the IRQ corresponding to pic1, which calls
pic1's interrupt handler. Since filter handlers run in primary interrupt
context, pic1's interrupt handler can look exactly like what pic0 does
when signalled by the interrupt layer: it runs through its registers,
finds any lines with active interrupts, then calls arm_dispatch_irq()
with the appropriate corresponding scalar IRQ.

The nice thing here is that cascaded PICs require zero special handling
by the system. You can just treat them as normal interrupts, with no
special methods required in the bus layer, the interrupt system, or the
PIC driver.
-Nathan
Post by Warner Losh
Generally, though, I like the direction things are going.
Warner
Nathan Whitehorn
2016-08-12 15:31:06 UTC
Permalink
On 08/06/16 07:30, Michal Meloun wrote:

[snip]
Post by Michal Meloun
I have prepared working patch (it's not full, but it works on Tegra),
https://github.com/strejda/tegra/commit/2cf72e248877fb917c4fc618bcb6e46b7c1058a4
can you, please, take look on it?
- we have, currently, 20+ interrupt controllers converted to new INTRNG
PIC API.
- universal format of interrupt resources is generic part of this API.
- I'm not ready to commit any PIC API change together with above patch -
i hope that this is understandable.
One other non-urgent question about PCI code:

There's a new function ofw_bus_msimap() that does not seem to implement
any particular part of any real binding standard. There's a .txt file in
the device-tree repo, but many (most?, all?) PCI bridges don't seem to
implement MSI that way. This is out-of-scope for the immediate
discussion, but it would be good to fix later. If there are indeed only
a handful of bridges that do MSI that way, it should probably be moved
into the PCI bridge drivers that do use it.

Similarly, dev/pci/pci_host_generic.c isn't actually generic and is
instead a driver for some particular ARM bridges. It should be moved at
some point under sys/arm. Most of the code in it also duplicates
dev/ofw/ofwpci.c (but with some added bugs in handling the "ranges"
property).
Post by Michal Meloun
also,
"we must be able to add new interrupt type as simple as possible" rule
is very important for me. Adding new table (with implementation), one
bus method and one PIC method for each new type is not *simple*.
In any case, can we concentrate to above patch first? I'm ready to
finish it in next few hours, then put it to phabricator for real review.
Have you had a chance to finish this? Is there a way I can help? I think
non-MSI PCI interrupts are currently broken on ARM and it would be great
to get that fixed before 11.0.
-Nathan
Andrew Turner
2016-08-16 09:37:46 UTC
Permalink
On Fri, 12 Aug 2016 08:31:06 -0700
Post by Nathan Whitehorn
There's a new function ofw_bus_msimap() that does not seem to
implement any particular part of any real binding standard. There's
a .txt file in the device-tree repo, but many (most?, all?) PCI
bridges don't seem to implement MSI that way. This is out-of-scope
for the immediate discussion, but it would be good to fix later. If
there are indeed only a handful of bridges that do MSI that way, it
should probably be moved into the PCI bridge drivers that do use it.
The ofw_bus_msimap() implements the standard FDT MSI properties to
find the needed MSI/MSI-X controller. See [1] for the binding document.
Post by Nathan Whitehorn
Similarly, dev/pci/pci_host_generic.c isn't actually generic and is
instead a driver for some particular ARM bridges. It should be moved
at some point under sys/arm. Most of the code in it also duplicates
dev/ofw/ofwpci.c (but with some added bugs in handling the "ranges"
property).
I don't see why. There is nothing ARM specific in it. I also have
patches to use it with ACPI as the existing driver makes assumptions
about PCI that may not be true on all platforms.

Andrew

[1]
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/devicetree/bindings/pci/pci-msi.txt
Nathan Whitehorn
2016-08-16 12:36:28 UTC
Permalink
Post by Andrew Turner
On Fri, 12 Aug 2016 08:31:06 -0700
Post by Nathan Whitehorn
There's a new function ofw_bus_msimap() that does not seem to
implement any particular part of any real binding standard. There's
a .txt file in the device-tree repo, but many (most?, all?) PCI
bridges don't seem to implement MSI that way. This is out-of-scope
for the immediate discussion, but it would be good to fix later. If
there are indeed only a handful of bridges that do MSI that way, it
should probably be moved into the PCI bridge drivers that do use it.
The ofw_bus_msimap() implements the standard FDT MSI properties to
find the needed MSI/MSI-X controller. See [1] for the binding document.
Yes, but ThunderX seems to be the only host bridge that actually
implements those bindings. I've never run into another, certainly, and
there don't seem to be any others in the tree.
Post by Andrew Turner
Post by Nathan Whitehorn
Similarly, dev/pci/pci_host_generic.c isn't actually generic and is
instead a driver for some particular ARM bridges. It should be moved
at some point under sys/arm. Most of the code in it also duplicates
dev/ofw/ofwpci.c (but with some added bugs in handling the "ranges"
property).
I don't see why. There is nothing ARM specific in it. I also have
patches to use it with ACPI as the existing driver makes assumptions
about PCI that may not be true on all platforms.
There's an "arm,gem5_pcie" in there that is certainly suggestive of
ARM-ness. At any rate, much like the MSI bits, I could imagine multiple
controllers implementing the interface defined in this driver but there
do not seem, at present, to be any. Most of the code is indeed generic
but duplicates ofwpci. That code should be replaced through inheritance
with the ofwpci.c equivalents; the only remaining bits are MSI
allocation and config space access, which seem to be specific to a few
kinds of ARM hardware.
-Nathan
Post by Andrew Turner
Andrew
[1]
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/devicetree/bindings/pci/pci-msi.txt
Loading...