Discussion:
a question about BUS_DMA_MIN_ALLOC_COMP flag meaning
Svatopluk Kraus
2015-11-18 16:00:49 UTC
Permalink
Hi,

I have fallen to some problem with inconsistent use of
BUS_DMA_MIN_ALLOC_COMP flag. This flag was introduced in x86 MD code
very very long ago and so, the problem covers all archs which came out
from it.

However, it's only about bus_dma_tag_t with BUS_DMA_COULD_BOUNCE flag set.

(1) When bus_dma_tag_t is being created with BUS_DMA_ALLOCNOW flag
specified, some bounce pages could be allocated in advance and
BUS_DMA_MIN_ALLOC_COMP flag is set to the tag. The bounce pages are
allocated only if the tag's maxsize property is higher than size of
all bounce pages already allocated in a bounce zone.

(2) When bus_dmamap_t is being created, then if BUS_DMA_MIN_ALLOC_COMP
is not set on associated tag, some bounce pages are ALWAYS allocated
and BUS_DMA_MIN_ALLOC_COMP is set afterwards,

(3) else some bounce pages could be allocated if there is not enough
pages in a bounce zone and BUS_DMA_MIN_ALLOC_COMP is set afterwards.

The problem is the following. Due to case (2), the number of pages in
bounce zone can grow infinitely, as bounce pages once allocated are
never freed. It can happen when a big number of bus_dma_tag_t together
with bus_dmamap_t are created, or they are created dynamically either
because of a loadable module or by design.

The inconsistency is that when bus_dma_tag_t is being created, there
is no limit for how much pages could be allocated. On the other hand,
when bus_dmamap_t is being created, there is MAX_BPAGES limitation.

I think that fix for case (2) presented as x86 fix is the following:

diff --git a/sys/x86/x86/busdma_bounce.c b/sys/x86/x86/busdma_bounce.c
index 4826a2b..a15139f 100644
--- a/sys/x86/x86/busdma_bounce.c
+++ b/sys/x86/x86/busdma_bounce.c
@@ -308,7 +308,7 @@ bounce_bus_dmamap_create(bus_dma_tag_t dmat, int
flags, bus_dmamap_t *mapp)
else
maxpages = MIN(MAX_BPAGES, Maxmem -
atop(dmat->common.lowaddr));
- if ((dmat->bounce_flags & BUS_DMA_MIN_ALLOC_COMP) == 0 ||
+ if ((dmat->bounce_flags & BUS_DMA_MIN_ALLOC_COMP) == 0 &&
(bz->map_count > 0 && bz->total_bpages < maxpages)) {
pages = MAX(atop(dmat->common.maxsize), 1);
pages = MIN(maxpages - bz->total_bpages, pages);


IMO, it also fixes logic by making it same as in bus_dma_tag_t case.

The next question is, if case (1) should be limited by MAX_BPAGES as
in case (3) or maybe better if there should be some internal
limitation for bounce zone itself.

As this is quite old code which covers many archs, I really would
appreciate your comments. Thanks,

Svatopluk Kraus
Konstantin Belousov
2015-11-20 14:45:44 UTC
Permalink
Post by Svatopluk Kraus
Hi,
I have fallen to some problem with inconsistent use of
BUS_DMA_MIN_ALLOC_COMP flag. This flag was introduced in x86 MD code
very very long ago and so, the problem covers all archs which came out
from it.
However, it's only about bus_dma_tag_t with BUS_DMA_COULD_BOUNCE flag set.
(1) When bus_dma_tag_t is being created with BUS_DMA_ALLOCNOW flag
specified, some bounce pages could be allocated in advance and
BUS_DMA_MIN_ALLOC_COMP flag is set to the tag. The bounce pages are
allocated only if the tag's maxsize property is higher than size of
all bounce pages already allocated in a bounce zone.
(2) When bus_dmamap_t is being created, then if BUS_DMA_MIN_ALLOC_COMP
is not set on associated tag, some bounce pages are ALWAYS allocated
and BUS_DMA_MIN_ALLOC_COMP is set afterwards,
(3) else some bounce pages could be allocated if there is not enough
pages in a bounce zone and BUS_DMA_MIN_ALLOC_COMP is set afterwards.
The problem is the following. Due to case (2), the number of pages in
bounce zone can grow infinitely, as bounce pages once allocated are
never freed. It can happen when a big number of bus_dma_tag_t together
with bus_dmamap_t are created, or they are created dynamically either
because of a loadable module or by design.
The inconsistency is that when bus_dma_tag_t is being created, there
is no limit for how much pages could be allocated. On the other hand,
when bus_dmamap_t is being created, there is MAX_BPAGES limitation.
diff --git a/sys/x86/x86/busdma_bounce.c b/sys/x86/x86/busdma_bounce.c
index 4826a2b..a15139f 100644
--- a/sys/x86/x86/busdma_bounce.c
+++ b/sys/x86/x86/busdma_bounce.c
@@ -308,7 +308,7 @@ bounce_bus_dmamap_create(bus_dma_tag_t dmat, int
flags, bus_dmamap_t *mapp)
else
maxpages = MIN(MAX_BPAGES, Maxmem -
atop(dmat->common.lowaddr));
- if ((dmat->bounce_flags & BUS_DMA_MIN_ALLOC_COMP) == 0 ||
+ if ((dmat->bounce_flags & BUS_DMA_MIN_ALLOC_COMP) == 0 &&
(bz->map_count > 0 && bz->total_bpages < maxpages)) {
pages = MAX(atop(dmat->common.maxsize), 1);
pages = MIN(maxpages - bz->total_bpages, pages);
IMO, it also fixes logic by making it same as in bus_dma_tag_t case.
I think that this patch is correct.
Post by Svatopluk Kraus
The next question is, if case (1) should be limited by MAX_BPAGES as
in case (3) or maybe better if there should be some internal
limitation for bounce zone itself.
Could we apply e.g. MAX_BPAGES limit to bounce zones, or should we allow
the limit to change based on the tag ? But I am not sure if there is
any reasonable way to formulate the limit.

MAX_BPAGES looks like some arbitrary sanity limit, e.g. we could have
unlimited maxsize, but also have an alignment constraints, and then
tag requires bouncing. I am not sure that hard-coded values, esp. the
amd64 32MB limit, makes much sense, or that basing the limit on the
tag constraints makes more sense. Might be, we should allow some total
percentage of the physical memory on machine to be consumed by all
bounce zones altogether ?
Svatopluk Kraus
2015-11-23 13:18:43 UTC
Permalink
On Fri, Nov 20, 2015 at 3:45 PM, Konstantin Belousov
Post by Konstantin Belousov
Post by Svatopluk Kraus
Hi,
I have fallen to some problem with inconsistent use of
BUS_DMA_MIN_ALLOC_COMP flag. This flag was introduced in x86 MD code
very very long ago and so, the problem covers all archs which came out
from it.
However, it's only about bus_dma_tag_t with BUS_DMA_COULD_BOUNCE flag set.
(1) When bus_dma_tag_t is being created with BUS_DMA_ALLOCNOW flag
specified, some bounce pages could be allocated in advance and
BUS_DMA_MIN_ALLOC_COMP flag is set to the tag. The bounce pages are
allocated only if the tag's maxsize property is higher than size of
all bounce pages already allocated in a bounce zone.
(2) When bus_dmamap_t is being created, then if BUS_DMA_MIN_ALLOC_COMP
is not set on associated tag, some bounce pages are ALWAYS allocated
and BUS_DMA_MIN_ALLOC_COMP is set afterwards,
(3) else some bounce pages could be allocated if there is not enough
pages in a bounce zone and BUS_DMA_MIN_ALLOC_COMP is set afterwards.
The problem is the following. Due to case (2), the number of pages in
bounce zone can grow infinitely, as bounce pages once allocated are
never freed. It can happen when a big number of bus_dma_tag_t together
with bus_dmamap_t are created, or they are created dynamically either
because of a loadable module or by design.
The inconsistency is that when bus_dma_tag_t is being created, there
is no limit for how much pages could be allocated. On the other hand,
when bus_dmamap_t is being created, there is MAX_BPAGES limitation.
diff --git a/sys/x86/x86/busdma_bounce.c b/sys/x86/x86/busdma_bounce.c
index 4826a2b..a15139f 100644
--- a/sys/x86/x86/busdma_bounce.c
+++ b/sys/x86/x86/busdma_bounce.c
@@ -308,7 +308,7 @@ bounce_bus_dmamap_create(bus_dma_tag_t dmat, int
flags, bus_dmamap_t *mapp)
else
maxpages = MIN(MAX_BPAGES, Maxmem -
atop(dmat->common.lowaddr));
- if ((dmat->bounce_flags & BUS_DMA_MIN_ALLOC_COMP) == 0 ||
+ if ((dmat->bounce_flags & BUS_DMA_MIN_ALLOC_COMP) == 0 &&
(bz->map_count > 0 && bz->total_bpages < maxpages)) {
pages = MAX(atop(dmat->common.maxsize), 1);
pages = MIN(maxpages - bz->total_bpages, pages);
IMO, it also fixes logic by making it same as in bus_dma_tag_t case.
I think that this patch is correct.
So, with r291142 and r291193 intermezzo, the question is: what is right?

In fact, there were two possibilities:
(1) to keep BUS_DMA_MIN_ALLOC_COMP flag and make it consistent, so
bounce pages are allocated only once for a tag, or
(2) to remove it, so bounce pages are allocated for every created map
which needs them, up to some sane limit.

It turned out that (1) is not good for some driver. I'm not sure why,
but only thing I can imagine now is that a tag was created with
BUS_DMA_ALLOCNOW flag and then, a consumer breaks tag's maxsize
property. Or, there is another inconsitency in the if statement:
bz->map_count > 0. Bounce zone map count is incremented later, so when
bounce zone is without map, the test fails. Not mention that this map
count is not atomic.

Anyhow, what is right, (1) or (2) ?
Post by Konstantin Belousov
Post by Svatopluk Kraus
The next question is, if case (1) should be limited by MAX_BPAGES as
in case (3) or maybe better if there should be some internal
limitation for bounce zone itself.
Could we apply e.g. MAX_BPAGES limit to bounce zones, or should we allow
the limit to change based on the tag ? But I am not sure if there is
any reasonable way to formulate the limit.
Neither me.
Post by Konstantin Belousov
MAX_BPAGES looks like some arbitrary sanity limit, e.g. we could have
unlimited maxsize, but also have an alignment constraints, and then
tag requires bouncing. I am not sure that hard-coded values, esp. the
amd64 32MB limit, makes much sense, or that basing the limit on the
tag constraints makes more sense. Might be, we should allow some total
percentage of the physical memory on machine to be consumed by all
bounce zones altogether ?
IMO, some global limit should be there in any case. A tunable sounds
good, with some warning if a box owner wants too much.
Svatopluk Kraus
2015-11-24 15:59:10 UTC
Permalink
Post by Svatopluk Kraus
On Fri, Nov 20, 2015 at 3:45 PM, Konstantin Belousov
Post by Konstantin Belousov
Post by Svatopluk Kraus
Hi,
I have fallen to some problem with inconsistent use of
BUS_DMA_MIN_ALLOC_COMP flag. This flag was introduced in x86 MD code
very very long ago and so, the problem covers all archs which came out
from it.
However, it's only about bus_dma_tag_t with BUS_DMA_COULD_BOUNCE flag set.
(1) When bus_dma_tag_t is being created with BUS_DMA_ALLOCNOW flag
specified, some bounce pages could be allocated in advance and
BUS_DMA_MIN_ALLOC_COMP flag is set to the tag. The bounce pages are
allocated only if the tag's maxsize property is higher than size of
all bounce pages already allocated in a bounce zone.
(2) When bus_dmamap_t is being created, then if BUS_DMA_MIN_ALLOC_COMP
is not set on associated tag, some bounce pages are ALWAYS allocated
and BUS_DMA_MIN_ALLOC_COMP is set afterwards,
(3) else some bounce pages could be allocated if there is not enough
pages in a bounce zone and BUS_DMA_MIN_ALLOC_COMP is set afterwards.
The problem is the following. Due to case (2), the number of pages in
bounce zone can grow infinitely, as bounce pages once allocated are
never freed. It can happen when a big number of bus_dma_tag_t together
with bus_dmamap_t are created, or they are created dynamically either
because of a loadable module or by design.
The inconsistency is that when bus_dma_tag_t is being created, there
is no limit for how much pages could be allocated. On the other hand,
when bus_dmamap_t is being created, there is MAX_BPAGES limitation.
diff --git a/sys/x86/x86/busdma_bounce.c b/sys/x86/x86/busdma_bounce.c
index 4826a2b..a15139f 100644
--- a/sys/x86/x86/busdma_bounce.c
+++ b/sys/x86/x86/busdma_bounce.c
@@ -308,7 +308,7 @@ bounce_bus_dmamap_create(bus_dma_tag_t dmat, int
flags, bus_dmamap_t *mapp)
else
maxpages = MIN(MAX_BPAGES, Maxmem -
atop(dmat->common.lowaddr));
- if ((dmat->bounce_flags & BUS_DMA_MIN_ALLOC_COMP) == 0 ||
+ if ((dmat->bounce_flags & BUS_DMA_MIN_ALLOC_COMP) == 0 &&
(bz->map_count > 0 && bz->total_bpages < maxpages)) {
pages = MAX(atop(dmat->common.maxsize), 1);
pages = MIN(maxpages - bz->total_bpages, pages);
IMO, it also fixes logic by making it same as in bus_dma_tag_t case.
I think that this patch is correct.
So, with r291142 and r291193 intermezzo, the question is: what is right?
(1) to keep BUS_DMA_MIN_ALLOC_COMP flag and make it consistent, so
bounce pages are allocated only once for a tag, or
(2) to remove it, so bounce pages are allocated for every created map
which needs them, up to some sane limit.
It turned out that (1) is not good for some driver. I'm not sure why,
but only thing I can imagine now is that a tag was created with
BUS_DMA_ALLOCNOW flag and then, a consumer breaks tag's maxsize
bz->map_count > 0. Bounce zone map count is incremented later, so when
bounce zone is without map, the test fails. Not mention that this map
count is not atomic.
Anyhow, what is right, (1) or (2) ?
Well, Michal (mmel) came up with an idea which seems to me most likely
now, as it explains why bz->total_bpages are tested against
MAX_BPAGES when map is being created, but tag's maxsize is used for
evaluation of how many bounce pages should be allocated.

The idea is the following (considering that BUS_DMA_COULD_BOUNCE is
set on a tag):

If a tag has only one map, bounce pages should be allocated only once.
Either when the tag is being created and BUS_DMA_ALLOCNOW flags is set
or when its map is being created. The tag's maxsize property is used
for evaluation of how many bounce pages are needed.

If the tag has a second (or next) map(s), bz->map_count > 0 check is
valid and more bounce pages could be allocated if bz->total_bpages has
not met yet some sane limit.

Of course, the bz->map_count > 0 check limps a bit as a bounce zone
could be shared by more tags (and so their maps).

Thus, real problem is that
(1) tag's maxsize property is not limited by MAX_BPAGES,
(2) each new tag add some bounce pages to bounce zone without any limit.

So, I suggest
(1) to limit tag's maxsize property,
(2) to limit page count used by all bounce zones.

It seems to me that these two limitations should be right. The limits
should have some defaults with possibility to be changed by config or
in boot time. Any objections?
Post by Svatopluk Kraus
Post by Konstantin Belousov
Post by Svatopluk Kraus
The next question is, if case (1) should be limited by MAX_BPAGES as
in case (3) or maybe better if there should be some internal
limitation for bounce zone itself.
Could we apply e.g. MAX_BPAGES limit to bounce zones, or should we allow
the limit to change based on the tag ? But I am not sure if there is
any reasonable way to formulate the limit.
Neither me.
Post by Konstantin Belousov
MAX_BPAGES looks like some arbitrary sanity limit, e.g. we could have
unlimited maxsize, but also have an alignment constraints, and then
tag requires bouncing. I am not sure that hard-coded values, esp. the
amd64 32MB limit, makes much sense, or that basing the limit on the
tag constraints makes more sense. Might be, we should allow some total
percentage of the physical memory on machine to be consumed by all
bounce zones altogether ?
IMO, some global limit should be there in any case. A tunable sounds
good, with some warning if a box owner wants too much.
Loading...