PQ_LAUNDRY

Discussion:

PQ_LAUNDRY

Mark Johnston

2016-11-03 18:29:16 UTC

Hi,

Alan and I have been working on a branch at user/alc/PQ_LAUNDRY in svn.
It reworks the mechanism and policy used for dirty page laundering.

Currently, the inactive queue is used to store pages eligible for
reclamation by the pagedaemon. It contains both clean and dirty pages.
Dirty pages must be laundered before they may be reclaimed; that is,
they must either be written to swap, or to persistent storage (i.e., a
filesystem). Because laundering a page is an expensive operation, the
pagedaemon will perform at most a small number of launderings in a
first-pass scan of the inactive queue.

The PQ_LAUNDRY branch adds a new page queue, PQ_LAUNDRY, to store dirty
pages that have passed once through the inactive queue. A dedicated
thread is responsible for both deciding when to launder pages, and
actually laundering them. The new policy uses the relative sizes of the
inactive and laundry queues to determine whether to launder pages at a
given point. This leads to more intelligent swapping behaviour in
general, since the laundry thread will avoid swapping when the marginal
benefit of doing so is low. Without a dedicated queue for dirty pages,
the pagedaemon doesn't have the information to determine whether
swapping provides any utility to the system. Thus, the current policy
often results in small but steadily increasing amounts of swap usage
when the system is under memory pressure, even when the inactive queue
consists mostly of clean pages. PQ_LAUNDRY addresses this, and
incidentally also helps pave the way for some future VM improvements by
removing the last source of object-cached clean pages (PG_CACHE pages).

Some more details and the diff for PQ_LAUNDRY can be viewed here:
https://reviews.freebsd.org/D8302

We would like to commit it next week. Any additional comments, review,
or testing would be welcome.

Gary Jennejohn

2016-11-05 09:31:28 UTC

Permalink

On Thu, 3 Nov 2016 11:29:16 -0700

Post by Mark Johnston
Hi,
Alan and I have been working on a branch at user/alc/PQ_LAUNDRY in svn.
It reworks the mechanism and policy used for dirty page laundering.
Currently, the inactive queue is used to store pages eligible for
reclamation by the pagedaemon. It contains both clean and dirty pages.
Dirty pages must be laundered before they may be reclaimed; that is,
they must either be written to swap, or to persistent storage (i.e., a
filesystem). Because laundering a page is an expensive operation, the
pagedaemon will perform at most a small number of launderings in a
first-pass scan of the inactive queue.
The PQ_LAUNDRY branch adds a new page queue, PQ_LAUNDRY, to store dirty
pages that have passed once through the inactive queue. A dedicated
thread is responsible for both deciding when to launder pages, and
actually laundering them. The new policy uses the relative sizes of the
inactive and laundry queues to determine whether to launder pages at a
given point. This leads to more intelligent swapping behaviour in
general, since the laundry thread will avoid swapping when the marginal
benefit of doing so is low. Without a dedicated queue for dirty pages,
the pagedaemon doesn't have the information to determine whether
swapping provides any utility to the system. Thus, the current policy
often results in small but steadily increasing amounts of swap usage
when the system is under memory pressure, even when the inactive queue
consists mostly of clean pages. PQ_LAUNDRY addresses this, and
incidentally also helps pave the way for some future VM improvements by
removing the last source of object-cached clean pages (PG_CACHE pages).
https://reviews.freebsd.org/D8302
We would like to commit it next week. Any additional comments, review,
or testing would be welcome.

In my use case, which is moving multi-gigabyte video files from
one file system to another, this seems to swap more than the
previous code did. Moving such large files with the previous
code seemed to recycle Inact more quickly and IIRC only a few 10s
of MB were swapped out. In my test this morning 125MB were
swapped out and Inact was not recycled as quickly. The overall
size of the files moved was about the same in the two tests.

This code doesn't even come close to the behavior we had about 2
years ago. I could move 100s of GB and never see a single bye
get swapped because Inact was very quickly recycled in multi-GB
chunks, frequently the approx. 6GB of Inact would be recycled in
a (to the human eye) single transfer to Free.

In any case, the new code doesn't break anything.

--
Gary Jennejohn

Mark Johnston

2016-11-05 17:41:48 UTC

Permalink

Post by Gary Jennejohn
On Thu, 3 Nov 2016 11:29:16 -0700

Post by Mark Johnston
https://reviews.freebsd.org/D8302
We would like to commit it next week. Any additional comments, review,
or testing would be welcome.

Are you computing the amount swapped out as the amount of memory swapped
out minus the amount of swapins? Or is 125MB the amount of swap used
after the test? Output from "sysctl vm.stats" taken before and after any
test on both HEAD on PQ_LAUNDRY would be most useful.

Post by Gary Jennejohn
This code doesn't even come close to the behavior we had about 2
years ago. I could move 100s of GB and never see a single bye
get swapped because Inact was very quickly recycled in multi-GB
chunks, frequently the approx. 6GB of Inact would be recycled in
a (to the human eye) single transfer to Free.
In any case, the new code doesn't break anything.
--
Gary Jennejohn
_______________________________________________
https://lists.freebsd.org/mailman/listinfo/freebsd-arch

Gary Jennejohn

2016-11-06 10:23:26 UTC

Permalink

On Sun, 6 Nov 2016 09:12:30 +0100

On Sat, 5 Nov 2016 10:41:48 -0700

Post by Mark Johnston

Post by Gary Jennejohn
On Thu, 3 Nov 2016 11:29:16 -0700

Post by Mark Johnston
https://reviews.freebsd.org/D8302
We would like to commit it next week. Any additional comments, review,
or testing would be welcome.

125MB was the swap value showed by top after the files had all been
mv'd. But fairly soon after completion a few MB were swapped back in.

OK, on a level playing field there's no difference between the old and
the new code. In fact, according to top the old code swapped out 272K
and the new code swapped out 220K. An insignificant difference.

The test scenario was as follows:
1) boot the box
2) start X
3) mount the source directory
4) start a bash script which copied the same set of files in a for-loop
5) start top and observe what happens

Since all the files were either 4.3GB or 2GB cp didn't use mmap, but
rather did read/write in a loop (if the comment in utils.c is still valid).

My test yesterday did a `mv *`, but since mv used fastcopy(), which
also does read/write in a loop, the pressure on the vm should have
been very similar to cp.

The major difference between today and yesterday was that I'd been
running firefox and claws-mail for hours when I started the mv, so
there was something to swap out.

Since I'm not too eager to noodle around for hours before starting
a test, let's just say that the new code appears to be no worse, or
perhaps even better, than the old code.

--
Gary Jennejohn

Alan Cox

2016-11-07 03:11:06 UTC

Permalink

Post by Gary Jennejohn
On Sun, 6 Nov 2016 09:12:30 +0100

On Sat, 5 Nov 2016 10:41:48 -0700

Post by Mark Johnston

Post by Gary Jennejohn
On Thu, 3 Nov 2016 11:29:16 -0700

Post by Mark Johnston
https://reviews.freebsd.org/D8302
We would like to commit it next week. Any additional comments,

review,

Post by Mark Johnston

Post by Gary Jennejohn

Post by Mark Johnston
or testing would be welcome.

Are you computing the amount swapped out as the amount of memory

swapped

Post by Mark Johnston
out minus the amount of swapins? Or is 125MB the amount of swap used
after the test? Output from "sysctl vm.stats" taken before and after

any

Post by Mark Johnston
test on both HEAD on PQ_LAUNDRY would be most useful.

125MB was the swap value showed by top after the files had all been
mv'd. But fairly soon after completion a few MB were swapped back in.

OK, on a level playing field there's no difference between the old and
the new code. In fact, according to top the old code swapped out 272K
and the new code swapped out 220K. An insignificant difference.
1) boot the box
2) start X
3) mount the source directory
4) start a bash script which copied the same set of files in a for-loop
5) start top and observe what happens
Since all the files were either 4.3GB or 2GB cp didn't use mmap, but
rather did read/write in a loop (if the comment in utils.c is still valid).
My test yesterday did a `mv *`, but since mv used fastcopy(), which
also does read/write in a loop, the pressure on the vm should have
been very similar to cp.
The major difference between today and yesterday was that I'd been
running firefox and claws-mail for hours when I started the mv, so
there was something to swap out.
Since I'm not too eager to noodle around for hours before starting
a test, let's just say that the new code appears to be no worse, or
perhaps even better, than the old code.

The behavior that you describe is most likely a consequence of r254304 (and
r254544). You can test this hypothesis by setting the sysctl
vm.pageout_update_period to zero.