Poul-Henning Kamp
2018-05-26 07:19:14 UTC
When I started writing Varnish Cache, on of the first decision was
that I would pepper the source code with asserts even "pointless"
ones, and I did to the order of 10% of all source file lines, and
it is probably the best decision I have ever made.
The primary effects of all the asserts is that we get crash-dumps
from the earliest possible point in time the trouble could be
spotted. For a process which can rutinely have several thousands
threads, that is what makes debugging possible in the first place.
The secondary effect is that the sheer number of asserts means
that crashes can almost always be isolated to a very small stretch
of code based on the assert location.
The one benefit from all the asserts I had not anticipated is that
they almost always prevent program bugs from turning into information
leakage vulnerabilities: We stop before we send wrong data out.
The biggest impact of all the asserts however, is that Varnish Cache
went 11 years while moving a very large fraction of all HTTP traffic
on the net, without a major security issue.
When we finally got our first big CVE, it was not a remote excution,
an information leak or anything horrible bad: It was a "harmless"
DoS caused by a wrong assert test, but a DoS is still pretty bad
news when so much traffic goes through Varnish.
I think FreeBSD needs to learn from Varnish Cache's experience:
We should have far more asserts in FreeBSD.
We already have a "private" assert facility in the kernel, and
people should simply use it more.
But in userland our asserts come from <assert.h>, and that is a
problem.
The main trouble with using assert(3) is that random people
illadvisedly set NDEBUG expecting their code to run faster as a
result.
It does not.
Almost all the asserts in Varnish happens on values already in CPU
registers and/or cache and I have never been able to credibly measure
a performance impact from the asserts in Varnish.
Besides, many of the asserts never make it to the CPU, modern
compilers have strong analysis which can see that the can never
trigger, so they are removed in optimization.
We cannot change <assert.h> to get rid of the NDEBUG mistake, and
in a more abstract line of thought we probably should not even use
<assert.h> for FreeBSD source code - it sort of belongs to the users.
I suggest and strongly urge that we add a set of userland assert-macros
which are never compiled out, for use *only* in FreeBSD userland
source code, and that we sprinkle them liberally, in particular
anything setuid etc.
In Varnish Cache we have four "kinds" of asserts, which allows us
to communicate crucial information in the message to the users. I
don't know if a similar subdivision of asserts would make sense for
FreeBSD, but I mention it here as inspiration:
1. "Regular asserts" - things which are just plain wrong, which
probably means we have a genuine bug somewhere. Examples could
be null pointers where previous checks should have ensured this
not be so. Also error situations for which there is no saner
handling that killing the projcess.
2. "xxx asserts" - situations which should (almost) never happen,
and for which we could do more productive error handling, but
where the seldomness of the condition makes it a bad idea (ie:
untested code) and a bad investment of our developer time to do
so. Disk full is a good example for Varnish.
3. "wrong asserts" - Internal state is messed up, program flow
has taken a "impossible" branch. A good example is the
default branch of a switch on a finite input set.
4. "Incomplete asserts" - Code we have not written yet, extension
points not open for business yet, very strange corner-cases
belived to be impossible, but not proven to be so yet.
Poul-Henning
that I would pepper the source code with asserts even "pointless"
ones, and I did to the order of 10% of all source file lines, and
it is probably the best decision I have ever made.
The primary effects of all the asserts is that we get crash-dumps
from the earliest possible point in time the trouble could be
spotted. For a process which can rutinely have several thousands
threads, that is what makes debugging possible in the first place.
The secondary effect is that the sheer number of asserts means
that crashes can almost always be isolated to a very small stretch
of code based on the assert location.
The one benefit from all the asserts I had not anticipated is that
they almost always prevent program bugs from turning into information
leakage vulnerabilities: We stop before we send wrong data out.
The biggest impact of all the asserts however, is that Varnish Cache
went 11 years while moving a very large fraction of all HTTP traffic
on the net, without a major security issue.
When we finally got our first big CVE, it was not a remote excution,
an information leak or anything horrible bad: It was a "harmless"
DoS caused by a wrong assert test, but a DoS is still pretty bad
news when so much traffic goes through Varnish.
I think FreeBSD needs to learn from Varnish Cache's experience:
We should have far more asserts in FreeBSD.
We already have a "private" assert facility in the kernel, and
people should simply use it more.
But in userland our asserts come from <assert.h>, and that is a
problem.
The main trouble with using assert(3) is that random people
illadvisedly set NDEBUG expecting their code to run faster as a
result.
It does not.
Almost all the asserts in Varnish happens on values already in CPU
registers and/or cache and I have never been able to credibly measure
a performance impact from the asserts in Varnish.
Besides, many of the asserts never make it to the CPU, modern
compilers have strong analysis which can see that the can never
trigger, so they are removed in optimization.
We cannot change <assert.h> to get rid of the NDEBUG mistake, and
in a more abstract line of thought we probably should not even use
<assert.h> for FreeBSD source code - it sort of belongs to the users.
I suggest and strongly urge that we add a set of userland assert-macros
which are never compiled out, for use *only* in FreeBSD userland
source code, and that we sprinkle them liberally, in particular
anything setuid etc.
In Varnish Cache we have four "kinds" of asserts, which allows us
to communicate crucial information in the message to the users. I
don't know if a similar subdivision of asserts would make sense for
FreeBSD, but I mention it here as inspiration:
1. "Regular asserts" - things which are just plain wrong, which
probably means we have a genuine bug somewhere. Examples could
be null pointers where previous checks should have ensured this
not be so. Also error situations for which there is no saner
handling that killing the projcess.
2. "xxx asserts" - situations which should (almost) never happen,
and for which we could do more productive error handling, but
where the seldomness of the condition makes it a bad idea (ie:
untested code) and a bad investment of our developer time to do
so. Disk full is a good example for Varnish.
3. "wrong asserts" - Internal state is messed up, program flow
has taken a "impossible" branch. A good example is the
default branch of a switch on a finite input set.
4. "Incomplete asserts" - Code we have not written yet, extension
points not open for business yet, very strange corner-cases
belived to be impossible, but not proven to be so yet.
Poul-Henning
--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
***@FreeBSD.ORG | TCP/IP since RFC 956
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
***@FreeBSD.ORG | TCP/IP since RFC 956
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.