Gleb Smirnoff
2015-04-30 14:24:08 UTC
Hi!
The reason to write down this patch emerges from the
projects/sendfile branch, where vm_pager_get_pages() is
used in the sendfile(2) system call. Although the new
sendfile works flawlessly, it makes some assumptions
about vnode_pager that theoretically may not be valid,
however always hold in our current code.
Going deeper in the problem I have found more important
points, which yielded in the suggested patch. To start,
let me display the current KPI assumptions:
1) vm_pager_get_pages() works on an array of consequtive
array of pages. Pindex of (n+1)-th pages must be pindex
of n-th + 1. One page is special, it is called reqpage.
2) vm_pager_get_pages() guarantees to swapin only the reqpage,
and may skip or fail other pages for different reasons, that
may vary from pager to pager.
3) There also is function vm_pager_has_page(), which reports
availability of a page at given index in the pager, and also
provides hints on how many consequtive pages before this one
and after this one can be swapped in in single pager request.
Most pagers return zeros in these hints. The vnode pager for
UFS returns a strong promise, that one can later utilize in
vm_pager_get_pages().
4) All pages must be busied on enter. On exit only reqpage
will be left busied. The KPI doesn't guarantee that rest
of the pages is still in place. The pager usually calls
vm_page_readahead_finish() on them, which can either free,
or put the page on active/inactive queue, using quite
a strange approach to choose a queue.
5) The pages must not be wired, since vm_page_free() may be
called on them. However, this is violated by several
consumers of KPI, relying on lack of errors in the pager.
Moreover, the swap pager has a special function to skip
wired pages, while doing the sweep, to avoid this problem.
So, passing wired pages to swapper is OK, while to the
reset is not.
6) Pagers may replace a page in the object with a new one.
The sg_pager actually does that. To protect from this
event, consumers of vm_pager_get_pages() always run
vm_page_lookup() over the array of pages to relookup the pages.
However, not all consumers do this.
Speaking of pagers and their consumers:
- 11 consumers request array of size 1, a single page
- 3 consumers actually request array
My suggestion is to change the KPI assumptions to the following:
1) There is no reqpage. All pages are entered busied, all pages
are returned busied and validated. If pager fails to validate
all pages it must return error.
2) The consumer (not the pager!) is to decide what to do with the
pages: vm_page_active, vm_page_deactivate, vm_page_flash or just
vm_page_free them. The consumer also unbusies pages, if it
wants to. The consumer is free to wire pages before the call.
3) Consumers must first query the pager via vm_pager_has_page(),
and use the after/before hints to limit the size of the
requested pages array.
4) In case if pager replaces pages, it must also update the array,
so that consumer doesn't need to do relookup.
Doing this sweep, I also noticed that all pagers have a copy-pasted
code of zeroing invalid regions of partially valid pages. Also,
many pagers got a set of assertions copy and pasted from each
other. So, I decided to un-inline the vm_pager_get_pages(), bring
it to the vm_pager.c file and gather all these copy-pastes
into one place.
The suggested patch is attached. As expected, it simplifies and
removes quite a lot of code.
Right now it is tested on UFS only, testing NFS and ZFS is on my list.
There is one panic known, but it seems unrelated, and Peter pho@ says
that once it has been seen before.
The reason to write down this patch emerges from the
projects/sendfile branch, where vm_pager_get_pages() is
used in the sendfile(2) system call. Although the new
sendfile works flawlessly, it makes some assumptions
about vnode_pager that theoretically may not be valid,
however always hold in our current code.
Going deeper in the problem I have found more important
points, which yielded in the suggested patch. To start,
let me display the current KPI assumptions:
1) vm_pager_get_pages() works on an array of consequtive
array of pages. Pindex of (n+1)-th pages must be pindex
of n-th + 1. One page is special, it is called reqpage.
2) vm_pager_get_pages() guarantees to swapin only the reqpage,
and may skip or fail other pages for different reasons, that
may vary from pager to pager.
3) There also is function vm_pager_has_page(), which reports
availability of a page at given index in the pager, and also
provides hints on how many consequtive pages before this one
and after this one can be swapped in in single pager request.
Most pagers return zeros in these hints. The vnode pager for
UFS returns a strong promise, that one can later utilize in
vm_pager_get_pages().
4) All pages must be busied on enter. On exit only reqpage
will be left busied. The KPI doesn't guarantee that rest
of the pages is still in place. The pager usually calls
vm_page_readahead_finish() on them, which can either free,
or put the page on active/inactive queue, using quite
a strange approach to choose a queue.
5) The pages must not be wired, since vm_page_free() may be
called on them. However, this is violated by several
consumers of KPI, relying on lack of errors in the pager.
Moreover, the swap pager has a special function to skip
wired pages, while doing the sweep, to avoid this problem.
So, passing wired pages to swapper is OK, while to the
reset is not.
6) Pagers may replace a page in the object with a new one.
The sg_pager actually does that. To protect from this
event, consumers of vm_pager_get_pages() always run
vm_page_lookup() over the array of pages to relookup the pages.
However, not all consumers do this.
Speaking of pagers and their consumers:
- 11 consumers request array of size 1, a single page
- 3 consumers actually request array
My suggestion is to change the KPI assumptions to the following:
1) There is no reqpage. All pages are entered busied, all pages
are returned busied and validated. If pager fails to validate
all pages it must return error.
2) The consumer (not the pager!) is to decide what to do with the
pages: vm_page_active, vm_page_deactivate, vm_page_flash or just
vm_page_free them. The consumer also unbusies pages, if it
wants to. The consumer is free to wire pages before the call.
3) Consumers must first query the pager via vm_pager_has_page(),
and use the after/before hints to limit the size of the
requested pages array.
4) In case if pager replaces pages, it must also update the array,
so that consumer doesn't need to do relookup.
Doing this sweep, I also noticed that all pagers have a copy-pasted
code of zeroing invalid regions of partially valid pages. Also,
many pagers got a set of assertions copy and pasted from each
other. So, I decided to un-inline the vm_pager_get_pages(), bring
it to the vm_pager.c file and gather all these copy-pastes
into one place.
The suggested patch is attached. As expected, it simplifies and
removes quite a lot of code.
Right now it is tested on UFS only, testing NFS and ZFS is on my list.
There is one panic known, but it seems unrelated, and Peter pho@ says
that once it has been seen before.
--
Totus tuus, Glebius.
Totus tuus, Glebius.