Discussion:
Hole-punching, TRIM, etc
Alan Somers
2018-11-13 22:09:24 UTC
Permalink
Hole-punching has been discussed on these lists before[1]. It basically
means to turn a dense file into a sparse file by deallocating storage for
some of the blocks in the middle. There's no standard API for it. Linux
uses fallocate(2); Solaris and OSX add a new opcode to fcntl(2).

A related concept is telling a block device that some blocks are no longer
used. SATA calls this "TRIM", SCSI calls it "UNMAP", NVMe calls it
"Deallocate", ZBC and ZAC call it "Reset Write Pointer". They all do
basically the same thing, and it's analogous to hole-punching for regular
files. They are also all inaccessible from FreeBSD's userland except by
using pass(4), which is inconvenient and protocol-specific.

Linux has a BLKDISCARD ioctl for issuing TRIM-like commands from userland,
but it's totally undocumented and doesn't work on regular files.

I propose adding support for all of these things using the fcntl(2) API.
Using the same syntax that Solaris defined, you would be able to punch a
hole in a regular file or TRIM blocks from an SSD. ZFS already supports it
(though FreeBSD's port never did, and the code was deleted in r303763).
Here's what I would do:

1) Add the F_FREESP command to fcntl(2).
2) Add a .fo_space field for struct fileops
3) Add a devfs_space method that implements .fo_space
4) Add a .d_space field to struct cdevsw
5) Add a g_dev_space method for GEOM that implements .d_space using
BIO_DELETE.
6) Add a VOP_SPACE vop
7) Implement VOP_SPACE for tmpfs
8) Add aio_freesp(2), an asynchronous version of fcntl(F_FREESP).

The greatest beneficiaries of this work would be type 2 hypervisors like
QEMU and VirtualBox with guests that use TRIM, and userland filesystems
such as fusefs-ext2 and fusefs-exfat. High-performance storage systems
using SPDK would also benefit. The last item, aio_freesp(2), may seem
unnecessary but it would really benefit my application.

Questions, objections, flames?

-Alan

[1] https://lists.freebsd.org/pipermail/freebsd-fs/2011-March/010881.html
Warner Losh
2018-11-13 22:50:35 UTC
Permalink
Post by Alan Somers
Hole-punching has been discussed on these lists before[1]. It basically
means to turn a dense file into a sparse file by deallocating storage for
some of the blocks in the middle. There's no standard API for it. Linux
uses fallocate(2); Solaris and OSX add a new opcode to fcntl(2).
A related concept is telling a block device that some blocks are no longer
used. SATA calls this "TRIM", SCSI calls it "UNMAP", NVMe calls it
"Deallocate", ZBC and ZAC call it "Reset Write Pointer". They all do
basically the same thing, and it's analogous to hole-punching for regular
files. They are also all inaccessible from FreeBSD's userland except by
using pass(4), which is inconvenient and protocol-specific.
Linux has a BLKDISCARD ioctl for issuing TRIM-like commands from userland,
but it's totally undocumented and doesn't work on regular files.
I propose adding support for all of these things using the fcntl(2) API.
Using the same syntax that Solaris defined, you would be able to punch a
hole in a regular file or TRIM blocks from an SSD. ZFS already supports it
(though FreeBSD's port never did, and the code was deleted in r303763).
1) Add the F_FREESP command to fcntl(2).
2) Add a .fo_space field for struct fileops
3) Add a devfs_space method that implements .fo_space
4) Add a .d_space field to struct cdevsw
5) Add a g_dev_space method for GEOM that implements .d_space using
BIO_DELETE.
6) Add a VOP_SPACE vop
7) Implement VOP_SPACE for tmpfs
8) Add aio_freesp(2), an asynchronous version of fcntl(F_FREESP).
The greatest beneficiaries of this work would be type 2 hypervisors like
QEMU and VirtualBox with guests that use TRIM, and userland filesystems
such as fusefs-ext2 and fusefs-exfat. High-performance storage systems
using SPDK would also benefit. The last item, aio_freesp(2), may seem
unnecessary but it would really benefit my application.
Questions, objections, flames?
So the fcntl would deallocate blocks from a filesystem only. The filesystem
may issue BIO_DELETE as a result, but that's up to the filesystem, correct?

On a raw device it would be translated into a BIO_DELETE command directly,
correct?

Warner
Alan Somers
2018-11-13 22:52:39 UTC
Permalink
Post by Warner Losh
Post by Alan Somers
Hole-punching has been discussed on these lists before[1]. It basically
means to turn a dense file into a sparse file by deallocating storage for
some of the blocks in the middle. There's no standard API for it. Linux
uses fallocate(2); Solaris and OSX add a new opcode to fcntl(2).
A related concept is telling a block device that some blocks are no longer
used. SATA calls this "TRIM", SCSI calls it "UNMAP", NVMe calls it
"Deallocate", ZBC and ZAC call it "Reset Write Pointer". They all do
basically the same thing, and it's analogous to hole-punching for regular
files. They are also all inaccessible from FreeBSD's userland except by
using pass(4), which is inconvenient and protocol-specific.
Linux has a BLKDISCARD ioctl for issuing TRIM-like commands from userland,
but it's totally undocumented and doesn't work on regular files.
I propose adding support for all of these things using the fcntl(2) API.
Using the same syntax that Solaris defined, you would be able to punch a
hole in a regular file or TRIM blocks from an SSD. ZFS already supports it
(though FreeBSD's port never did, and the code was deleted in r303763).
1) Add the F_FREESP command to fcntl(2).
2) Add a .fo_space field for struct fileops
3) Add a devfs_space method that implements .fo_space
4) Add a .d_space field to struct cdevsw
5) Add a g_dev_space method for GEOM that implements .d_space using
BIO_DELETE.
6) Add a VOP_SPACE vop
7) Implement VOP_SPACE for tmpfs
8) Add aio_freesp(2), an asynchronous version of fcntl(F_FREESP).
The greatest beneficiaries of this work would be type 2 hypervisors like
QEMU and VirtualBox with guests that use TRIM, and userland filesystems
such as fusefs-ext2 and fusefs-exfat. High-performance storage systems
using SPDK would also benefit. The last item, aio_freesp(2), may seem
unnecessary but it would really benefit my application.
Questions, objections, flames?
So the fcntl would deallocate blocks from a filesystem only. The
filesystem may issue BIO_DELETE as a result, but that's up to the
filesystem, correct?
Correct.
Post by Warner Losh
On a raw device it would be translated into a BIO_DELETE command directly,
correct?
Correct, modulo edge cases.
Post by Warner Losh
Warner
Poul-Henning Kamp
2018-11-13 22:59:44 UTC
Permalink
--------
Post by Warner Losh
On a raw device it would be translated into a BIO_DELETE command directly,
correct?
We already have ioctl(DIOCGDELETE) for that. newfs(8) uses it.
--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
***@FreeBSD.ORG | TCP/IP since RFC 956
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
Conrad Meyer
2018-11-13 22:51:36 UTC
Permalink
Hi Alan,
Post by Alan Somers
Hole-punching has been discussed on these lists before[1]. It basically
means to turn a dense file into a sparse file by deallocating storage for
some of the blocks in the middle. There's no standard API for it. Linux
uses fallocate(2); Solaris and OSX add a new opcode to fcntl(2).
A related concept is telling a block device that some blocks are no longer
used. SATA calls this "TRIM", SCSI calls it "UNMAP", NVMe calls it
"Deallocate", ZBC and ZAC call it "Reset Write Pointer". They all do
basically the same thing, and it's analogous to hole-punching for regular
files. They are also all inaccessible from FreeBSD's userland except by
using pass(4), which is inconvenient and protocol-specific.
Geom devices have the DIOCGDELETE ioctl, which translates into
BIO_DELETE (which is TRIM, as I understand it). It's available in
libgeom as g_delete() and used by hastd, newfs_nandfs, and nandtool.
Post by Alan Somers
Linux has a BLKDISCARD ioctl for issuing TRIM-like commands from userland,
but it's totally undocumented and doesn't work on regular files.
I propose adding support for all of these things using the fcntl(2) API.
Using the same syntax that Solaris defined, you would be able to punch a
hole in a regular file or TRIM blocks from an SSD. ZFS already supports it
(though FreeBSD's port never did, and the code was deleted in r303763).
1) Add the F_FREESP command to fcntl(2).
2) Add a .fo_space field for struct fileops
3) Add a devfs_space method that implements .fo_space
4) Add a .d_space field to struct cdevsw
5) Add a g_dev_space method for GEOM that implements .d_space using
BIO_DELETE.
6) Add a VOP_SPACE vop
7) Implement VOP_SPACE for tmpfs
8) Add aio_freesp(2), an asynchronous version of fcntl(F_FREESP).
Why not just add DIOCGDELETE support to various VOP_IOCTL
implementations? The file objects forward correctly through vn_ioctl
to VOP_IOCTL for both regular files and devfs VCHR nodes.

We can emulate the Linux API if we want to be compatible there, but I
wouldn't bother with Solaris.

Best,
Conrad
Alan Somers
2018-11-13 22:58:26 UTC
Permalink
Post by Conrad Meyer
Hi Alan,
Post by Alan Somers
Hole-punching has been discussed on these lists before[1]. It basically
means to turn a dense file into a sparse file by deallocating storage for
some of the blocks in the middle. There's no standard API for it. Linux
uses fallocate(2); Solaris and OSX add a new opcode to fcntl(2).
A related concept is telling a block device that some blocks are no
longer
Post by Alan Somers
used. SATA calls this "TRIM", SCSI calls it "UNMAP", NVMe calls it
"Deallocate", ZBC and ZAC call it "Reset Write Pointer". They all do
basically the same thing, and it's analogous to hole-punching for regular
files. They are also all inaccessible from FreeBSD's userland except by
using pass(4), which is inconvenient and protocol-specific.
Geom devices have the DIOCGDELETE ioctl, which translates into
BIO_DELETE (which is TRIM, as I understand it). It's available in
libgeom as g_delete() and used by hastd, newfs_nandfs, and nandtool.
Ahh, I thought there must be such a thing, but I couldn't find it.
Post by Conrad Meyer
Post by Alan Somers
Linux has a BLKDISCARD ioctl for issuing TRIM-like commands from
userland,
Post by Alan Somers
but it's totally undocumented and doesn't work on regular files.
I propose adding support for all of these things using the fcntl(2) API.
Using the same syntax that Solaris defined, you would be able to punch a
hole in a regular file or TRIM blocks from an SSD. ZFS already supports
it
Post by Alan Somers
(though FreeBSD's port never did, and the code was deleted in r303763).
1) Add the F_FREESP command to fcntl(2).
2) Add a .fo_space field for struct fileops
3) Add a devfs_space method that implements .fo_space
4) Add a .d_space field to struct cdevsw
5) Add a g_dev_space method for GEOM that implements .d_space using
BIO_DELETE.
6) Add a VOP_SPACE vop
7) Implement VOP_SPACE for tmpfs
8) Add aio_freesp(2), an asynchronous version of fcntl(F_FREESP).
Why not just add DIOCGDELETE support to various VOP_IOCTL
implementations? The file objects forward correctly through vn_ioctl
to VOP_IOCTL for both regular files and devfs VCHR nodes.
We can emulate the Linux API if we want to be compatible there, but I
wouldn't bother with Solaris.
The only reason that I prefer the Solaris API is because it doesn't require
adding another syscall, and because Linux's fallocate(2) does a whole bunch
of other things besides hole-punching.

What about an asynchronous version? ioctl(2) is still synchronous. Do you
see any better way to hole-punch/TRIM asynchronously than with aio?
Post by Conrad Meyer
Best,
Conrad
Conrad Meyer
2018-11-13 23:08:51 UTC
Permalink
Post by Conrad Meyer
Post by Alan Somers
...
8) Add aio_freesp(2), an asynchronous version of fcntl(F_FREESP).
Why not just add DIOCGDELETE support to various VOP_IOCTL
implementations? The file objects forward correctly through vn_ioctl
to VOP_IOCTL for both regular files and devfs VCHR nodes.
We can emulate the Linux API if we want to be compatible there, but I
wouldn't bother with Solaris.
The only reason that I prefer the Solaris API is because it doesn't require adding another syscall, and because Linux's fallocate(2) does a whole bunch of other things besides hole-punching.
I am imagining that if we went this route, we would implement Linux
fallocate as a library shim around the native FreeBSD ioctl (or
whatever) rather than an independent system call. This would be for
API compatibility, not ABI compatibility. But Linux compat can be set
aside for now, I think — it's a secondary concern.
What about an asynchronous version? ioctl(2) is still synchronous. Do you see any better way to hole-punch/TRIM asynchronously than with aio?
Yeah, this is a good consideration. No, I don't have any better
suggestion for an asynchronous API. In general our VOPs tend to be
synchronous. Aio does seem like the logical home for a new
asynchronous API.

Best regards,
Conrad

Warner Losh
2018-11-13 22:58:45 UTC
Permalink
Post by Conrad Meyer
Geom devices have the DIOCGDELETE ioctl, which translates into
BIO_DELETE (which is TRIM, as I understand it).
Correct. TRIM is both the catch-all term people use, as well as the name of
a specific DSM (data set management) command in the ATA command set. All
FLASH technologies have it (thought what it means under the covers varies a
bit). Thin provisioned resources like in VMs also have it.

Warner
Loading...