Discussion:
RF_CACHEABLE flag
Justin Hibbits
2016-02-22 01:42:49 UTC
Permalink
The Freescale/NXP Datapath Acceleration Architecture uses both
cache-inhibited and cache-enabled memory regions for buffer portals.
This doesn't quite fit right into the existing framework, so I've
added to my personal repo (on github) a RF_CACHEABLE flag to be used
by this. Now that I'm ready to commit the driver to head, I want to
float this on -arch to get opinions.

I did consider another route, using bus_space_map()/bus_space_unmap(),
and stashing sizes around, but adding a simple flag to rman would take
care of all the details, and rman already knows all the other details
for the region anyway.

I put the diff on phabricator, at https://reviews.freebsd.org/D5384 .

Thoughts on this?

- Justin
Konstantin Belousov
2016-02-22 12:18:36 UTC
Permalink
Post by Justin Hibbits
The Freescale/NXP Datapath Acceleration Architecture uses both
cache-inhibited and cache-enabled memory regions for buffer portals.
This doesn't quite fit right into the existing framework, so I've
added to my personal repo (on github) a RF_CACHEABLE flag to be used
by this. Now that I'm ready to commit the driver to head, I want to
float this on -arch to get opinions.
I did consider another route, using bus_space_map()/bus_space_unmap(),
and stashing sizes around, but adding a simple flag to rman would take
care of all the details, and rman already knows all the other details
for the region anyway.
I put the diff on phabricator, at https://reviews.freebsd.org/D5384 .
Thoughts on this?
Not making any opinion about RF_CACHEABLE, only pointing out some facts
that might be interesting to you. On x86, some BARs need specific memory
access modes at least for performance.

E.g., for Intel GPUs, there is a combined BAR where the first 512KB or
2M maps the registers and must be uncacheable, and the rest of the BAR
maps GTT. To get a reasonable performance from the graphics hardware,
GTT should be mapped as write-combining (i.e. not cacheable but writes
may sit in the combining buffers and flushed on serialization points).

Look at dev/agp/agp_i810.c:agp_gen4_install_gatt() to see direct
pmap_change_attr(WC) call to set it up.

No flag could accomodate this mode. OTOH, why couldn't you explicitely
add pmap_change_attr() to the driver of your device ?
Justin Hibbits
2016-02-23 20:19:57 UTC
Permalink
On Mon, Feb 22, 2016 at 6:18 AM, Konstantin Belousov
Post by Konstantin Belousov
Post by Justin Hibbits
The Freescale/NXP Datapath Acceleration Architecture uses both
cache-inhibited and cache-enabled memory regions for buffer portals.
This doesn't quite fit right into the existing framework, so I've
added to my personal repo (on github) a RF_CACHEABLE flag to be used
by this. Now that I'm ready to commit the driver to head, I want to
float this on -arch to get opinions.
I did consider another route, using bus_space_map()/bus_space_unmap(),
and stashing sizes around, but adding a simple flag to rman would take
care of all the details, and rman already knows all the other details
for the region anyway.
I put the diff on phabricator, at https://reviews.freebsd.org/D5384 .
Thoughts on this?
Not making any opinion about RF_CACHEABLE, only pointing out some facts
that might be interesting to you. On x86, some BARs need specific memory
access modes at least for performance.
E.g., for Intel GPUs, there is a combined BAR where the first 512KB or
2M maps the registers and must be uncacheable, and the rest of the BAR
maps GTT. To get a reasonable performance from the graphics hardware,
GTT should be mapped as write-combining (i.e. not cacheable but writes
may sit in the combining buffers and flushed on serialization points).
Look at dev/agp/agp_i810.c:agp_gen4_install_gatt() to see direct
pmap_change_attr(WC) call to set it up.
No flag could accomodate this mode. OTOH, why couldn't you explicitely
add pmap_change_attr() to the driver of your device ?
Already mentioned this on IRC yesterday, but best for me to follow up here.

This really isn't much different from bus_space_map() conceptually.
The work involved is pretty much the same, and if this route is deemed
the Wrong Way(TM), I'll go that route.

Grepping through sys/, only x86 currently implements
pmap_change_attr(), and arm has the declaration but no definition that
I could see. Writing it wouldn't be difficult of course, there's just
not much compelling case for it right now. But, yes, either of these
alternatives are acceptable, and I had explored it. Seeing John
Baldwin's comment on the phab review, it looks like he wants to go a
different (parallel) direction.

- Justin
Konstantin Belousov
2016-02-24 10:27:54 UTC
Permalink
Post by Justin Hibbits
This really isn't much different from bus_space_map() conceptually.
The work involved is pretty much the same, and if this route is deemed
the Wrong Way(TM), I'll go that route.
Grepping through sys/, only x86 currently implements
pmap_change_attr(), and arm has the declaration but no definition that
I could see. Writing it wouldn't be difficult of course, there's just
not much compelling case for it right now. But, yes, either of these
alternatives are acceptable, and I had explored it. Seeing John
Baldwin's comment on the phab review, it looks like he wants to go a
different (parallel) direction.
If this was not clear from my reply, I did not objected against RF_CACHEABLE,
but was more to highlight weird needs of seemingly simple architecture,
which are close to RF_CACHEABLE stuff. And, I think that architectures
that care about caching modes, should do provide real pmap_change_attr()
implementation. This might be important e.g. if drm is brought up on
these platforms.
Adrian Chadd
2016-02-24 17:14:17 UTC
Permalink
Post by Konstantin Belousov
Post by Justin Hibbits
This really isn't much different from bus_space_map() conceptually.
The work involved is pretty much the same, and if this route is deemed
the Wrong Way(TM), I'll go that route.
Grepping through sys/, only x86 currently implements
pmap_change_attr(), and arm has the declaration but no definition that
I could see. Writing it wouldn't be difficult of course, there's just
not much compelling case for it right now. But, yes, either of these
alternatives are acceptable, and I had explored it. Seeing John
Baldwin's comment on the phab review, it looks like he wants to go a
different (parallel) direction.
If this was not clear from my reply, I did not objected against RF_CACHEABLE,
but was more to highlight weird needs of seemingly simple architecture,
which are close to RF_CACHEABLE stuff. And, I think that architectures
that care about caching modes, should do provide real pmap_change_attr()
implementation. This might be important e.g. if drm is brought up on
these platforms.
heh, on ARM it's not "when". For MIPS it's also now "when". So I guess
yeah, we should implement it.



-a
Post by Konstantin Belousov
_______________________________________________
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
Svatopluk Kraus
2016-02-29 16:18:16 UTC
Permalink
Post by Adrian Chadd
Post by Konstantin Belousov
Post by Justin Hibbits
This really isn't much different from bus_space_map() conceptually.
The work involved is pretty much the same, and if this route is deemed
the Wrong Way(TM), I'll go that route.
Grepping through sys/, only x86 currently implements
pmap_change_attr(), and arm has the declaration but no definition that
I could see. Writing it wouldn't be difficult of course, there's just
not much compelling case for it right now. But, yes, either of these
alternatives are acceptable, and I had explored it. Seeing John
Baldwin's comment on the phab review, it looks like he wants to go a
different (parallel) direction.
If this was not clear from my reply, I did not objected against RF_CACHEABLE,
but was more to highlight weird needs of seemingly simple architecture,
which are close to RF_CACHEABLE stuff. And, I think that architectures
that care about caching modes, should do provide real pmap_change_attr()
implementation. This might be important e.g. if drm is brought up on
these platforms.
heh, on ARM it's not "when". For MIPS it's also now "when". So I guess
yeah, we should implement it.
It's not so simple to change memory attributes on ARM. Some conditions
must be met. So, a question is - in which circumstances
pmap_change_attr() is used?

It's defined like
int pmap_change_attr(vm_offset_t va, vm_size_t size, int mode);

(1) As memory attributes can be changed on a page basis only, the va
and size are arranged according to that in i386 implementation. That's
okay.

(2) Can the memory be used by somebody else while the attributes are
being changed? In other words, can the memory be unmapped temporary?
Konstantin Belousov
2016-02-29 16:35:46 UTC
Permalink
Post by Svatopluk Kraus
It's not so simple to change memory attributes on ARM. Some conditions
must be met. So, a question is - in which circumstances
pmap_change_attr() is used?
It's defined like
int pmap_change_attr(vm_offset_t va, vm_size_t size, int mode);
(1) As memory attributes can be changed on a page basis only, the va
and size are arranged according to that in i386 implementation. That's
okay.
(2) Can the memory be used by somebody else while the attributes are
being changed? In other words, can the memory be unmapped temporary?
Is this for the change of pte through invalidation ? In other words,
do you mean, is it fine to temporary unmap the range during the
pmap_change_attr() execution ?

If yes, it is fine for uses of the function in the DRM code, since
it is utilized during a setup of things like GTT or buffers, and no
other accesses to the memory could happen until the setup is finished.

I noted that function is used by several network drivers as well, and
by ntb. It seem that cxgbe and mxge also use it during setup.
Svatopluk Kraus
2016-02-29 22:20:15 UTC
Permalink
On Mon, Feb 29, 2016 at 5:35 PM, Konstantin Belousov
Post by Konstantin Belousov
Post by Svatopluk Kraus
It's not so simple to change memory attributes on ARM. Some conditions
must be met. So, a question is - in which circumstances
pmap_change_attr() is used?
It's defined like
int pmap_change_attr(vm_offset_t va, vm_size_t size, int mode);
(1) As memory attributes can be changed on a page basis only, the va
and size are arranged according to that in i386 implementation. That's
okay.
(2) Can the memory be used by somebody else while the attributes are
being changed? In other words, can the memory be unmapped temporary?
Is this for the change of pte through invalidation ? In other words,
do you mean, is it fine to temporary unmap the range during the
pmap_change_attr() execution ?
Yep, the question was about that.
Post by Konstantin Belousov
If yes, it is fine for uses of the function in the DRM code, since
it is utilized during a setup of things like GTT or buffers, and no
other accesses to the memory could happen until the setup is finished.
I noted that function is used by several network drivers as well, and
by ntb. It seem that cxgbe and mxge also use it during setup.
The pmap_change_attr() was implemented in (new) armv6 pmap, but was
removed before the pmap was committed. It was not sure when the
function is used, and if even implemented right. It was about two
years ago. Now, with better knowledge and information you provided, it
should be possible to implement it again.

Loading...