Alan Somers
2018-11-13 22:09:24 UTC
Hole-punching has been discussed on these lists before[1]. It basically
means to turn a dense file into a sparse file by deallocating storage for
some of the blocks in the middle. There's no standard API for it. Linux
uses fallocate(2); Solaris and OSX add a new opcode to fcntl(2).
A related concept is telling a block device that some blocks are no longer
used. SATA calls this "TRIM", SCSI calls it "UNMAP", NVMe calls it
"Deallocate", ZBC and ZAC call it "Reset Write Pointer". They all do
basically the same thing, and it's analogous to hole-punching for regular
files. They are also all inaccessible from FreeBSD's userland except by
using pass(4), which is inconvenient and protocol-specific.
Linux has a BLKDISCARD ioctl for issuing TRIM-like commands from userland,
but it's totally undocumented and doesn't work on regular files.
I propose adding support for all of these things using the fcntl(2) API.
Using the same syntax that Solaris defined, you would be able to punch a
hole in a regular file or TRIM blocks from an SSD. ZFS already supports it
(though FreeBSD's port never did, and the code was deleted in r303763).
Here's what I would do:
1) Add the F_FREESP command to fcntl(2).
2) Add a .fo_space field for struct fileops
3) Add a devfs_space method that implements .fo_space
4) Add a .d_space field to struct cdevsw
5) Add a g_dev_space method for GEOM that implements .d_space using
BIO_DELETE.
6) Add a VOP_SPACE vop
7) Implement VOP_SPACE for tmpfs
8) Add aio_freesp(2), an asynchronous version of fcntl(F_FREESP).
The greatest beneficiaries of this work would be type 2 hypervisors like
QEMU and VirtualBox with guests that use TRIM, and userland filesystems
such as fusefs-ext2 and fusefs-exfat. High-performance storage systems
using SPDK would also benefit. The last item, aio_freesp(2), may seem
unnecessary but it would really benefit my application.
Questions, objections, flames?
-Alan
[1] https://lists.freebsd.org/pipermail/freebsd-fs/2011-March/010881.html
means to turn a dense file into a sparse file by deallocating storage for
some of the blocks in the middle. There's no standard API for it. Linux
uses fallocate(2); Solaris and OSX add a new opcode to fcntl(2).
A related concept is telling a block device that some blocks are no longer
used. SATA calls this "TRIM", SCSI calls it "UNMAP", NVMe calls it
"Deallocate", ZBC and ZAC call it "Reset Write Pointer". They all do
basically the same thing, and it's analogous to hole-punching for regular
files. They are also all inaccessible from FreeBSD's userland except by
using pass(4), which is inconvenient and protocol-specific.
Linux has a BLKDISCARD ioctl for issuing TRIM-like commands from userland,
but it's totally undocumented and doesn't work on regular files.
I propose adding support for all of these things using the fcntl(2) API.
Using the same syntax that Solaris defined, you would be able to punch a
hole in a regular file or TRIM blocks from an SSD. ZFS already supports it
(though FreeBSD's port never did, and the code was deleted in r303763).
Here's what I would do:
1) Add the F_FREESP command to fcntl(2).
2) Add a .fo_space field for struct fileops
3) Add a devfs_space method that implements .fo_space
4) Add a .d_space field to struct cdevsw
5) Add a g_dev_space method for GEOM that implements .d_space using
BIO_DELETE.
6) Add a VOP_SPACE vop
7) Implement VOP_SPACE for tmpfs
8) Add aio_freesp(2), an asynchronous version of fcntl(F_FREESP).
The greatest beneficiaries of this work would be type 2 hypervisors like
QEMU and VirtualBox with guests that use TRIM, and userland filesystems
such as fusefs-ext2 and fusefs-exfat. High-performance storage systems
using SPDK would also benefit. The last item, aio_freesp(2), may seem
unnecessary but it would really benefit my application.
Questions, objections, flames?
-Alan
[1] https://lists.freebsd.org/pipermail/freebsd-fs/2011-March/010881.html