Discussion:
Removing build metadata, for reproducible kernel builds
Ed Maste
2015-12-02 17:36:52 UTC
Permalink
The main issue currently preventing kernel builds from being
reproducible[1] is the build metadata itself that's included (time,
user, host, build path). In order to make the kernel build
reproducible I plan to remove these by default, and add a src.conf
knob to enable them for developers who want them in their own builds.

The user-facing effect of this is that the kern.version sysctl no
longer conveys this information, and uname -a changes from something
like:

FreeBSD ref11-amd64.freebsd.org 11.0-CURRENT FreeBSD 11.0-CURRENT #0
r288681: Mon Oct 5 01:40:11 UTC 2015
***@build-11.freebsd.org:/usr/obj/usr/src/sys/CLUSTER11 amd64

to something like:

FreeBSD feynman 10.2-STABLE FreeBSD 10.2-STABLE #44
r288174+7644546(stable-10) amd64

The current version of the change is available for review at
https://reviews.freebsd.org/D4347.

[1] See https://reproducible-builds.org/ for more information on the
reproducible builds project.
Alfred Perlstein
2015-12-02 17:44:14 UTC
Permalink
Post by Ed Maste
The main issue currently preventing kernel builds from being
reproducible[1] is the build metadata itself that's included (time,
user, host, build path). In order to make the kernel build
reproducible I plan to remove these by default, and add a src.conf
knob to enable them for developers who want them in their own builds.
The user-facing effect of this is that the kern.version sysctl no
longer conveys this information, and uname -a changes from something
FreeBSD ref11-amd64.freebsd.org 11.0-CURRENT FreeBSD 11.0-CURRENT #0
r288681: Mon Oct 5 01:40:11 UTC 2015
FreeBSD feynman 10.2-STABLE FreeBSD 10.2-STABLE #44
r288174+7644546(stable-10) amd64
The current version of the change is available for review at
https://reviews.freebsd.org/D4347.
[1] See https://reproducible-builds.org/ for more information on the
reproducible builds project.
Can it not be done as a kernel module (containing the strings/numbers)
or injected after the fact by editing the binaries?

This info is very useful.

-Alfred
John Baldwin
2015-12-02 20:03:07 UTC
Permalink
Post by Ed Maste
The main issue currently preventing kernel builds from being
reproducible[1] is the build metadata itself that's included (time,
user, host, build path). In order to make the kernel build
reproducible I plan to remove these by default, and add a src.conf
knob to enable them for developers who want them in their own builds.
The user-facing effect of this is that the kern.version sysctl no
longer conveys this information, and uname -a changes from something
FreeBSD ref11-amd64.freebsd.org 11.0-CURRENT FreeBSD 11.0-CURRENT #0
r288681: Mon Oct 5 01:40:11 UTC 2015
FreeBSD feynman 10.2-STABLE FreeBSD 10.2-STABLE #44
r288174+7644546(stable-10) amd64
The current version of the change is available for review at
https://reviews.freebsd.org/D4347.
[1] See https://reproducible-builds.org/ for more information on the
reproducible builds project.
As I noted in the review, this will break kgdb -n (and possibly crashinfo,
less certain about that). Keeping the path (which should not vary if you
build out of the same tree) will be sufficient to let kgdb -n still work
(though it may need some changes to recognize both formats).

Keeping the path also means that 'uname -a' still tells you which kernel
config you are running (I assume you aren't changing 'uname -i', but
'uname -a' doesn't include 'uname -i').
--
John Baldwin
Ian Lepore
2015-12-02 20:14:57 UTC
Permalink
Post by John Baldwin
Post by Ed Maste
The main issue currently preventing kernel builds from being
reproducible[1] is the build metadata itself that's included (time,
user, host, build path). In order to make the kernel build
reproducible I plan to remove these by default, and add a src.conf
knob to enable them for developers who want them in their own builds.
The user-facing effect of this is that the kern.version sysctl no
longer conveys this information, and uname -a changes from
something
FreeBSD ref11-amd64.freebsd.org 11.0-CURRENT FreeBSD 11.0-CURRENT #0
r288681: Mon Oct 5 01:40:11 UTC 2015
FreeBSD feynman 10.2-STABLE FreeBSD 10.2-STABLE #44
r288174+7644546(stable-10) amd64
The current version of the change is available for review at
https://reviews.freebsd.org/D4347.
[1] See https://reproducible-builds.org/ for more information on the
reproducible builds project.
As I noted in the review, this will break kgdb -n (and possibly crashinfo,
less certain about that). Keeping the path (which should not vary if you
build out of the same tree) will be sufficient to let kgdb -n still work
(though it may need some changes to recognize both formats).
Keeping the path also means that 'uname -a' still tells you which kernel
config you are running (I assume you aren't changing 'uname -i', but
'uname -a' doesn't include 'uname -i').
But in the kinds of venues where reproducible builds are most
important, such as creating images that are part of commercial
products, the build path is one of the things most likely to change
between builds and least likely to be significant in terms of any
differences to the conents of the build. Likewise the hostname of the
build machine, which it appears is still in the uname output.

-- Ian
Erik Cederstrand
2015-12-03 09:28:10 UTC
Permalink
Post by John Baldwin
As I noted in the review, this will break kgdb -n (and possibly crashinfo,
less certain about that). Keeping the path (which should not vary if you
build out of the same tree) will be sufficient to let kgdb -n still work
(though it may need some changes to recognize both formats).
Would it be feasible to include the relative build path instead of the absolute path? I seem to remember patches floating around for the __FILE__ macro, but I don't know if (k)gdb can work with relative paths.

Erik
John Baldwin
2015-12-03 16:51:29 UTC
Permalink
Post by Erik Cederstrand
Post by John Baldwin
As I noted in the review, this will break kgdb -n (and possibly crashinfo,
less certain about that). Keeping the path (which should not vary if you
build out of the same tree) will be sufficient to let kgdb -n still work
(though it may need some changes to recognize both formats).
Would it be feasible to include the relative build path instead of the absolute path? I seem to remember patches floating around for the __FILE__ macro, but I don't know if (k)gdb can work with relative paths.
This is what kgdb -n does:

/*
* No kernel image here. Parse the dump header. The kernel object
* directory can be found there and we probably have the kernel
* image still in it. The object directory may also have a kernel
* with debugging info (called kernel.debug). If we have a debug
* kernel, use it.
*/
snprintf(path, sizeof(path), "%s/info.%d", crashdir, nr);
info = fopen(path, "r");
if (info == NULL) {
warn("%s", path);
return;
}
while (fgets(path, sizeof(path), info) != NULL) {
l = strlen(path);
if (l > 0 && path[l - 1] == '\n')
path[--l] = '\0';
if (strncmp(path, " ", 4) == 0) {
s = strchr(path, ':');
s = (s == NULL) ? path + 4 : s + 1;
l = snprintf(path, sizeof(path), "%s/kernel.debug", s);
if (stat(path, &st) == -1 || !S_ISREG(st.st_mode)) {
path[l - 6] = '\0';
if (stat(path, &st) == -1 ||
!S_ISREG(st.st_mode))
break;
}
kernel = strdup(path);
break;
}
}
fclose(info);

It basically pulls the path from the 'version' string in the /var/crash/info.X
line, appends 'kernel.debug' to it and sees if there is a file with that
pathname. If so, it uses it. This means it doesn't find a kernel in some
/boot/foo, it looks in the build directory.

crashinfo instead finds all the 'kernel' files under /boot, extracts the
version string using gdb from each kernel, and does a string compare with the
version string in info.X. For this reason, crashinfo will still work if each
string is unique. However, with the proposal, kernels built with different
kernel configs from the same tree would have the same version string, thus being
indistinguishable.

A more robust solution than the string compares would be build-id, but that
requires a newer linker which we don't have.
--
John Baldwin
Andriy Gapon
2015-12-02 21:52:39 UTC
Permalink
Post by Ed Maste
The main issue currently preventing kernel builds from being
reproducible[1] is the build metadata itself that's included (time,
user, host, build path). In order to make the kernel build
reproducible I plan to remove these by default, and add a src.conf
knob to enable them for developers who want them in their own builds.
The user-facing effect of this is that the kern.version sysctl no
longer conveys this information, and uname -a changes from something
FreeBSD ref11-amd64.freebsd.org 11.0-CURRENT FreeBSD 11.0-CURRENT #0
r288681: Mon Oct 5 01:40:11 UTC 2015
FreeBSD feynman 10.2-STABLE FreeBSD 10.2-STABLE #44
r288174+7644546(stable-10) amd64
The current version of the change is available for review at
https://reviews.freebsd.org/D4347.
[1] See https://reproducible-builds.org/ for more information on the
reproducible builds project.
Personally, I would prefer that, at least initially, KERNEL_METADATA is "yes" by
default. My thinking is that people who really need reproducible builds would
have no trouble toggling the knob and the rest would have the traditional behavior.
--
Andriy Gapon
Tim Kientzle
2015-12-03 05:29:12 UTC
Permalink
Post by Ed Maste
The main issue currently preventing kernel builds from being
reproducible[1] is the build metadata itself that's included (time,
user, host, build path). In order to make the kernel build
reproducible I plan to remove these by default, and add a src.conf
knob to enable them for developers who want them in their own builds.
The user-facing effect of this is that the kern.version sysctl no
longer conveys this information, and uname -a changes from something
FreeBSD ref11-amd64.freebsd.org 11.0-CURRENT FreeBSD 11.0-CURRENT #0
r288681: Mon Oct 5 01:40:11 UTC 2015
FreeBSD feynman 10.2-STABLE FreeBSD 10.2-STABLE #44
r288174+7644546(stable-10) amd64
The current version of the change is available for review at
https://reviews.freebsd.org/D4347.
[1] See https://reproducible-builds.org/ for more information on the
reproducible builds project.
How feasible would it be for the various metadata here to
be overridable by src.conf?

That is, by default, the time, user, host, etc, are taken from
the local environment, but src.conf variables can override them
to produce more predictable results.

Tim
Warner Losh
2015-12-03 05:51:29 UTC
Permalink
Post by Ed Maste
The main issue currently preventing kernel builds from being
reproducible[1] is the build metadata itself that's included (time,
user, host, build path). In order to make the kernel build
reproducible I plan to remove these by default, and add a src.conf
knob to enable them for developers who want them in their own builds.
The user-facing effect of this is that the kern.version sysctl no
longer conveys this information, and uname -a changes from something
FreeBSD ref11-amd64.freebsd.org 11.0-CURRENT FreeBSD 11.0-CURRENT #0
r288681: Mon Oct 5 01:40:11 UTC 2015
FreeBSD feynman 10.2-STABLE FreeBSD 10.2-STABLE #44
r288174+7644546(stable-10) amd64
The current version of the change is available for review at
https://reviews.freebsd.org/D4347.
[1] See https://reproducible-builds.org/ for more information on the
reproducible builds project.
I noted in the review that I don’t like the default being no.

I also don’t like that we’re growing lots of different knobs that need
to be set to get a repeatable build. Let’s have one, or barring that,
let’s have one that sets all the sub-knobs.

I think that host and path are more worthless than date and time
in many environments. Who builds it likewise. Those are all things
that are likely to change between builds, yet change the kernel
image. I’d rather see it all gone when this option is in effect.
And I’d rather see the default be to the historical behavior.
The build number too is kinda lame here, since that’s just a history
of the number of tries. If you are building from svn, it should be
zero. But if you’re rebuilding, you can easily get that number over
100 as you update from rev to rev and reboot. It’s better to have
the date / time of the build so if you are seeing a problem on a
test machine, you’ll know more firmly if the build has that thing
you fixed yesterday afternoon or not by the date / time it
was built, and by whom (since my kernels after 9:15am
have the fix, but nobody else does before 2:00pm since
that’s when I checked it in).

So I see the need for the feature, in general. But this doesn’t
implement a reproducible build due to the build number, the
user, the host and the path still being encoded into it. That makes
the change to remove date / time completely arbitrary which
is annoying because they are useful in many environments
where it would be difficult to force everybody to ‘opt in’ to
having them included. It’s easier to opt-out the release
process.

Warner
Ed Maste
2015-12-03 07:55:04 UTC
Permalink
I noted in the review that I don’t like the default being no.
I also don’t like that we’re growing lots of different knobs that need
to be set to get a repeatable build. Let’s have one, or barring that,
let’s have one that sets all the sub-knobs.
My hope is that we'll have a reproducible build by default, and that
*no* knobs need to be set. That's what I intend with my patch. I can
rename the knob to WITH_/WITHOUT_REPRODUCIBLE_BUILD though if that's
generally desired. If there's a consensus to default to including the
metadata I'm fine with setting it in make release.
I think that host and path are more worthless than date and time
in many environments. Who builds it likewise. Those are all things
that are likely to change between builds, yet change the kernel
image. I’d rather see it all gone when this option is in effect.
I don't follow -- other than the build iteration number (which I
indeed missed), it is all gone.
NGie Cooper
2015-12-03 08:07:14 UTC
Permalink
Post by Ed Maste
I noted in the review that I don’t like the default being no.
I also don’t like that we’re growing lots of different knobs that need
to be set to get a repeatable build. Let’s have one, or barring that,
let’s have one that sets all the sub-knobs.
My hope is that we'll have a reproducible build by default, and that
*no* knobs need to be set. That's what I intend with my patch. I can
rename the knob to WITH_/WITHOUT_REPRODUCIBLE_BUILD though if that's
generally desired. If there's a consensus to default to including the
metadata I'm fine with setting it in make release.
I think that host and path are more worthless than date and time
in many environments. Who builds it likewise. Those are all things
that are likely to change between builds, yet change the kernel
image. I’d rather see it all gone when this option is in effect.
I don't follow -- other than the build iteration number (which I
indeed missed), it is all gone.
I personally like being able to debug when user A builds on machine X vs user B on machine Y — because it's helped me find issues with peoples’ build environments in the past where I could have ended up pulling teeth.

I think the single-knob src.conf knob approach is wrong though. Why not document how to do it with build(7) and tweak newvers.sh to do this (which drives this to begin with)? That would generalize the solution, accomplish this goal, and help $work accomplish this goal, because right now we ($work) hack newvers.sh in order to change the version information to brand the product appropriately, instead of build upon existing infrastructure, as the existing infrastructure is not flexible and documented and is very static.

Thanks,
-NGie
Warner Losh
2015-12-03 19:53:12 UTC
Permalink
Post by Ed Maste
I noted in the review that I don’t like the default being no.
I also don’t like that we’re growing lots of different knobs that need
to be set to get a repeatable build. Let’s have one, or barring that,
let’s have one that sets all the sub-knobs.
My hope is that we'll have a reproducible build by default, and that
*no* knobs need to be set. That's what I intend with my patch. I can
rename the knob to WITH_/WITHOUT_REPRODUCIBLE_BUILD though if that's
generally desired. If there's a consensus to default to including the
metadata I'm fine with setting it in make release.
I think this an unwise decision in the current form suggested. The kernel
metadata has saved my butt enough times I really don't want to see it
go by default. But see below for a reasonable (imho) middle ground that
would be a good default.
Post by Ed Maste
I think that host and path are more worthless than date and time
in many environments. Who builds it likewise. Those are all things
that are likely to change between builds, yet change the kernel
image. I’d rather see it all gone when this option is in effect.
I don't follow -- other than the build iteration number (which I
indeed missed), it is all gone.
Yea I was reading things backwards.

In the review, I suggested that if you've modified the tree (which the SCM
will tell you), then do the old format to preserve useful metadata that's
really really needed and if not to use the shorter version. When you've
modified the tree, reproducible builds aren't a concern at all.

Warner
Ian Lepore
2015-12-03 21:15:25 UTC
Permalink
Post by Warner Losh
Post by Ed Maste
I noted in the review that I don’t like the default being no.
I also don’t like that we’re growing lots of different knobs that need
to be set to get a repeatable build. Let’s have one, or barring that,
let’s have one that sets all the sub-knobs.
My hope is that we'll have a reproducible build by default, and that
*no* knobs need to be set. That's what I intend with my patch. I can
rename the knob to WITH_/WITHOUT_REPRODUCIBLE_BUILD though if that's
generally desired. If there's a consensus to default to including the
metadata I'm fine with setting it in make release.
I think this an unwise decision in the current form suggested. The kernel
metadata has saved my butt enough times I really don't want to see it
go by default. But see below for a reasonable (imho) middle ground that
would be a good default.
I'm curious why anyone wants this enabled by default, like... are we
missing something? Does it improve freebsd-update behavior maybe?

If it's just for some general "reproducibility is good" philosophy then
I would counter with "information is even better, so don't throw it
away without a good reason."

Reproducibility is good for some people, and completely useless for
others, and the people who need it aren't going to mind turning on a
knob or two to get what they want.
Post by Warner Losh
Post by Ed Maste
I think that host and path are more worthless than date and time
in many environments. Who builds it likewise. Those are all things
that are likely to change between builds, yet change the kernel
image. I’d rather see it all gone when this option is in effect.
I don't follow -- other than the build iteration number (which I
indeed missed), it is all gone.
Yea I was reading things backwards.
In the review, I suggested that if you've modified the tree (which the SCM
will tell you), then do the old format to preserve useful metadata that's
really really needed and if not to use the shorter version. When you've
modified the tree, reproducible builds aren't a concern at all.
How are you going to determine what consitutes a modified tree? What
you think of as modifications may be what I call my baseline version.

-- Ian
Justin Hibbits
2015-12-03 21:35:12 UTC
Permalink
Post by Ian Lepore
Post by Warner Losh
Post by Ed Maste
I noted in the review that I don’t like the default being no.
I also don’t like that we’re growing lots of different knobs that need
to be set to get a repeatable build. Let’s have one, or barring that,
let’s have one that sets all the sub-knobs.
My hope is that we'll have a reproducible build by default, and that
*no* knobs need to be set. That's what I intend with my patch. I can
rename the knob to WITH_/WITHOUT_REPRODUCIBLE_BUILD though if that's
generally desired. If there's a consensus to default to including the
metadata I'm fine with setting it in make release.
I think this an unwise decision in the current form suggested. The kernel
metadata has saved my butt enough times I really don't want to see it
go by default. But see below for a reasonable (imho) middle ground that
would be a good default.
I'm curious why anyone wants this enabled by default, like... are we
missing something? Does it improve freebsd-update behavior maybe?
If it's just for some general "reproducibility is good" philosophy then
I would counter with "information is even better, so don't throw it
away without a good reason."
Reproducibility is good for some people, and completely useless for
others, and the people who need it aren't going to mind turning on a
knob or two to get what they want.
Post by Warner Losh
Post by Ed Maste
I think that host and path are more worthless than date and time
in many environments. Who builds it likewise. Those are all things
that are likely to change between builds, yet change the kernel
image. I’d rather see it all gone when this option is in effect.
I don't follow -- other than the build iteration number (which I
indeed missed), it is all gone.
Yea I was reading things backwards.
In the review, I suggested that if you've modified the tree (which the SCM
will tell you), then do the old format to preserve useful metadata that's
really really needed and if not to use the shorter version. When you've
modified the tree, reproducible builds aren't a concern at all.
How are you going to determine what consitutes a modified tree? What
you think of as modifications may be what I call my baseline version.
-- Ian
svnversion resulting in a 'nnnnnnM'?

- Justin
Ian Lepore
2015-12-03 21:45:09 UTC
Permalink
Post by Justin Hibbits
Post by Ian Lepore
Post by Warner Losh
Post by Ed Maste
I noted in the review that I don’t like the default being no.
I also don’t like that we’re growing lots of different knobs that need
to be set to get a repeatable build. Let’s have one, or barring that,
let’s have one that sets all the sub-knobs.
My hope is that we'll have a reproducible build by default, and that
*no* knobs need to be set. That's what I intend with my patch. I can
rename the knob to WITH_/WITHOUT_REPRODUCIBLE_BUILD though if that's
generally desired. If there's a consensus to default to
including the
metadata I'm fine with setting it in make release.
I think this an unwise decision in the current form suggested. The kernel
metadata has saved my butt enough times I really don't want to see it
go by default. But see below for a reasonable (imho) middle ground that
would be a good default.
I'm curious why anyone wants this enabled by default, like... are we
missing something? Does it improve freebsd-update behavior maybe?
If it's just for some general "reproducibility is good" philosophy then
I would counter with "information is even better, so don't throw it
away without a good reason."
Reproducibility is good for some people, and completely useless for
others, and the people who need it aren't going to mind turning on a
knob or two to get what they want.
Post by Warner Losh
Post by Ed Maste
I think that host and path are more worthless than date and time
in many environments. Who builds it likewise. Those are all things
that are likely to change between builds, yet change the kernel
image. I’d rather see it all gone when this option is in effect.
I don't follow -- other than the build iteration number (which I
indeed missed), it is all gone.
Yea I was reading things backwards.
In the review, I suggested that if you've modified the tree (which the SCM
will tell you), then do the old format to preserve useful
metadata that's
really really needed and if not to use the shorter version. When you've
modified the tree, reproducible builds aren't a concern at all.
How are you going to determine what consitutes a modified tree?
What
you think of as modifications may be what I call my baseline
version.
-- Ian
svnversion resulting in a 'nnnnnnM'?
- Justin
svnversion isn't going to be able to return anything useful inside one
of my build sandboxes in which there is no hint of svn anything.

-- Ian
Warner Losh
2015-12-03 22:04:13 UTC
Permalink
Post by Ian Lepore
Post by Justin Hibbits
Post by Ian Lepore
Post by Warner Losh
Post by Ed Maste
I noted in the review that I don’t like the default being no.
I also don’t like that we’re growing lots of different knobs
that need
to be set to get a repeatable build. Let’s have one, or barring that,
let’s have one that sets all the sub-knobs.
My hope is that we'll have a reproducible build by default, and that
*no* knobs need to be set. That's what I intend with my patch. I can
rename the knob to WITH_/WITHOUT_REPRODUCIBLE_BUILD though if that's
generally desired. If there's a consensus to default to including the
metadata I'm fine with setting it in make release.
I think this an unwise decision in the current form suggested. The kernel
metadata has saved my butt enough times I really don't want to see it
go by default. But see below for a reasonable (imho) middle ground that
would be a good default.
I'm curious why anyone wants this enabled by default, like... are we
missing something? Does it improve freebsd-update behavior maybe?
If it's just for some general "reproducibility is good" philosophy then
I would counter with "information is even better, so don't throw it
away without a good reason."
Reproducibility is good for some people, and completely useless for
others, and the people who need it aren't going to mind turning on a
knob or two to get what they want.
Post by Warner Losh
Post by Ed Maste
I think that host and path are more worthless than date and time
in many environments. Who builds it likewise. Those are all things
that are likely to change between builds, yet change the kernel
image. I’d rather see it all gone when this option is in effect.
I don't follow -- other than the build iteration number (which I
indeed missed), it is all gone.
Yea I was reading things backwards.
In the review, I suggested that if you've modified the tree (which the SCM
will tell you), then do the old format to preserve useful metadata that's
really really needed and if not to use the shorter version. When you've
modified the tree, reproducible builds aren't a concern at all.
How are you going to determine what consitutes a modified tree?
What
you think of as modifications may be what I call my baseline version.
-- Ian
svnversion resulting in a 'nnnnnnM'?
- Justin
svnversion isn't going to be able to return anything useful inside one
of my build sandboxes in which there is no hint of svn anything.
Then, in my proposal, you'd get the 'reproducible' format. We already
don't include the SVN info in this case.

Perhaps this isn't desirable for you, but it's my proposal and my
suggestion and I'd welcome comments on it.

Warner
Ed Maste
2015-12-03 21:49:34 UTC
Permalink
Post by Justin Hibbits
svnversion resulting in a 'nnnnnnM'?
Warner suggested this in the review also, and it might be a good way
to choose a default. In any case it's clear that there's strong (and
reasonable) objection to enabling this by default for all builds, so
I'll not commit the change as-is.

I believe there are three separate issues here:

1) It should be possible to build the kernel reproducibly. I hope this
isn't contentious.

2) Control over enabling reproducible builds -- build knob or no,
default to on/off, based on svnversion including 'M', forced on for
release builds, etc.

3) Some tools rely on the current format / data, and will need to be fixed.

I expect to make a change so that a reproducible build is possible,
but not introduce a new knob or change anything by default. After that
I'll work on the issues in #3 and once that's done we can start the
bikeshed about whether there should be a knob, what the default should
be etc.

Thanks all for the feedback.
Jonathan Anderson
2015-12-03 21:41:27 UTC
Permalink
Post by Ian Lepore
I'm curious why anyone wants this enabled by default, like... are we
missing something? Does it improve freebsd-update behavior maybe?
There is value in being able to reproduce the things you run, especially
if you download them from somebody else (like releases or binary
packages). It's not a panacea (see "Reflections on Trusting Trust"), but
it’s helpful, even if you don't always do the reproduction work. The
very fact that someone *can* check a binary release for naughtiness is a
strong incentive for many adversaries not to try their hand.
Post by Ian Lepore
If it's just for some general "reproducibility is good" philosophy then
I would counter with "information is even better, so don't throw it
away without a good reason."
When you're building your own stuff, sure, it might help to know that
this is the kernel you built on "this machine" at "that time". When
running 10.2-RELEASE-p7, however, it’s not very useful to know that it
was built on amd64-builder.daemonology.net, or that the source tree was
located at /usr/src. It *might* be useful to know that {set of people}
all got kernels that hash to {some bit pattern} when they reproduced the
build (like Certificate Transparency). Or, more interestingly, that
{people using some configuration} got a different result. Again, like
Certificate Transparency. :)
Post by Ian Lepore
Reproducibility is good for some people, and completely useless for
others, and the people who need it aren't going to mind turning on a
knob or two to get what they want.
Possibly. I don't have any strong opinions on whether the default is
"reproducible" or "full of information that helps me identify busted
kernels”, just so long as "reproducible" is available and easy to turn
on. And my personal opinion is that it should be turned on for public
releases: I think that being able to validate the kernel is more
important than knowing what machine it was built on.
Post by Ian Lepore
Post by Warner Losh
Yea I was reading things backwards.
In the review, I suggested that if you've modified the tree (which the SCM
will tell you), then do the old format to preserve useful metadata that's
really really needed and if not to use the shorter version. When you've
modified the tree, reproducible builds aren't a concern at all.
How are you going to determine what consitutes a modified tree? What
you think of as modifications may be what I call my baseline version.
Since we host our code in Subversion and have an official Git mirror,
how about svn status || git status? If you're basing your code off of
anything other than an official mirror, you get to deal with the
reproducibility problem yourself, but it sounds like many people in this
camp would prefer the more verbose version string anyway.


Jon
--
Jonathan Anderson
***@FreeBSD.org
Ian Lepore
2015-12-03 21:59:26 UTC
Permalink
Post by Jonathan Anderson
Post by Ian Lepore
I'm curious why anyone wants this enabled by default, like... are we
missing something? Does it improve freebsd-update behavior maybe?
There is value in being able to reproduce the things you run,
especially
if you download them from somebody else (like releases or binary
packages). It's not a panacea (see "Reflections on Trusting Trust"), but
it’s helpful, even if you don't always do the reproduction work. The
very fact that someone *can* check a binary release for naughtiness is a
strong incentive for many adversaries not to try their hand.
Post by Ian Lepore
If it's just for some general "reproducibility is good" philosophy then
I would counter with "information is even better, so don't throw it
away without a good reason."
When you're building your own stuff, sure, it might help to know that
this is the kernel you built on "this machine" at "that time". When
running 10.2-RELEASE-p7, however, it’s not very useful to know that it
was built on amd64-builder.daemonology.net, or that the source tree was
located at /usr/src. It *might* be useful to know that {set of
people}
all got kernels that hash to {some bit pattern} when they reproduced the
build (like Certificate Transparency). Or, more interestingly, that
{people using some configuration} got a different result. Again, like
Certificate Transparency. :)
Post by Ian Lepore
Reproducibility is good for some people, and completely useless for
others, and the people who need it aren't going to mind turning on a
knob or two to get what they want.
Possibly. I don't have any strong opinions on whether the default is
"reproducible" or "full of information that helps me identify busted
kernels”, just so long as "reproducible" is available and easy to turn
on. And my personal opinion is that it should be turned on for public
releases: I think that being able to validate the kernel is more
important than knowing what machine it was built on.
Post by Ian Lepore
Post by Warner Losh
Yea I was reading things backwards.
In the review, I suggested that if you've modified the tree
(which
the SCM
will tell you), then do the old format to preserve useful
metadata
that's
really really needed and if not to use the shorter version. When you've
modified the tree, reproducible builds aren't a concern at all.
How are you going to determine what consitutes a modified tree?
What
you think of as modifications may be what I call my baseline
version.
Since we host our code in Subversion and have an official Git mirror,
how about svn status || git status? If you're basing your code off of
anything other than an official mirror, you get to deal with the
reproducibility problem yourself, but it sounds like many people in this
camp would prefer the more verbose version string anyway.
By "we" you must mean "The FreeBSD Project" but surely you also realize
that the universe of freebsd users is much larger than just the
project, and not all of them use subversion or git to check out freebsd
and/or manage their local copies of it.

For a company building products based on freebsd, reproducibility is
important, but they're quite likely to be using something other than
subversion or git to manage the source. They're also quite likely to
have local modifications that they consider to be part of their
baseline even if they appear to be modifications from the project's
repo at the same svn revision number. Either way, these folks are
going to want to set some control that enforces reproducibility
regardless of any build system heuristics about what to default to.

For other companies or end users the important factor might be the
ability to reproduce an official release, which one presumes would
start with checkout out the official sources using one of the official
SCMs and then a whole other set of "what constitues a modification"
would apply.

As someone who works for one of those "not-svn, not-git" companies I
just want to make sure there's a "do what I say" knob that overrides
any attempts to be smart about detecting modifications.

-- Ian
John Baldwin
2015-12-04 01:14:54 UTC
Permalink
Post by Jonathan Anderson
Post by Ian Lepore
Reproducibility is good for some people, and completely useless for
others, and the people who need it aren't going to mind turning on a
knob or two to get what they want.
Possibly. I don't have any strong opinions on whether the default is
"reproducible" or "full of information that helps me identify busted
kernels”, just so long as "reproducible" is available and easy to turn
on. And my personal opinion is that it should be turned on for public
releases: I think that being able to validate the kernel is more
important than knowing what machine it was built on.
FYI, I think most folks agree that releases should be reproducible (and
in particular the release bits that are shipped). I think the primary
question people have raised is what the default behavior is if someone
is building a kernel themselves vs a kernel from an ISO or freebsd-update.

Secondly, the whole kgdb/crashinfo thing does sort of matter if we want
users to have usable crash summaries when reporting bugs on release
installs. (crashinfo matters more here than kgdb -n's hackish thing,
and crashinfo just needs 'version' to be unique)
--
John Baldwin
NGie Cooper
2015-12-04 02:00:16 UTC
Permalink
On Dec 2, 2015, at 23:55, Ed Maste <***@freebsd.org> wrote:

On 3 December 2015 at 05:51, Warner Losh <***@bsdimp.com> wrote:


I noted in the review that I don’t like the default being no.

I also don’t like that we’re growing lots of different knobs that need
to be set to get a repeatable build. Let’s have one, or barring that,
let’s have one that sets all the sub-knobs.


My hope is that we'll have a reproducible build by default, and that
*no* knobs need to be set. That's what I intend with my patch. I can
rename the knob to WITH_/WITHOUT_REPRODUCIBLE_BUILD though if that's
generally desired. If there's a consensus to default to including the
metadata I'm fine with setting it in make release.

I think that host and path are more worthless than date and time
in many environments. Who builds it likewise. Those are all things
that are likely to change between builds, yet change the kernel
image. I’d rather see it all gone when this option is in effect.


I don't follow -- other than the build iteration number (which I
indeed missed), it is all gone.


I personally like being able to debug when user A builds on machine X vs
user B on machine Y — because it's helped me find issues with peoples’
build environments in the past where I could have ended up pulling teeth.

I think the single-knob src.conf knob approach is wrong though. Why not
document how to do it with build(7) and tweak newvers.sh to do this (which
drives this to begin with)? That would generalize the solution, accomplish
this goal, and help $work accomplish this goal, because right now we
($work) hack newvers.sh in order to change the version information to brand
the product appropriately, instead of build upon existing infrastructure,
as the existing infrastructure is not flexible/documented/static.

Thanks,
-NGie
NGie Cooper
2015-12-04 02:01:39 UTC
Permalink
On Thu, Dec 3, 2015 at 6:00 PM, NGie Cooper <***@gmail.com> wrote:
...

Sorry. Send the same email twice by accident >_>.
Fabian Keil
2015-12-04 14:43:08 UTC
Permalink
Post by Ed Maste
The main issue currently preventing kernel builds from being
reproducible[1] is the build metadata itself that's included (time,
user, host, build path). In order to make the kernel build
reproducible I plan to remove these by default, and add a src.conf
knob to enable them for developers who want them in their own builds.
To make the ElectroBSD build (kernel, world and release)
reproducible the time, user and host can be overwritten.

To make this more convenient the user can do this through a shell
script (/usr/src/reproduce.sh) which reads the values from a small
config file (/usr/src/reproduce.conf) which is included in the src.txz.

Example content:

| BUILD=ElectroBSD-r291706-29246dc
| EPOCH=1449163375

Currently the build path can't be changed between builds, mainly
because I expect most users to reproduce the build using a jail
in which case this limitation doesn't seem to matter.

The relevant patches (minus the ones I overlooked) are now available at:
https://www.fabiankeil.de/sourcecode/electrobsd/reproducible-build-goo-r291706-29246dc.diff

Due to the auto-untainting (also done by reproduce.sh) this is not
expected to build with vanilla FreeBSD, but if that code is disabled
it might work.

If anyone with a freebsd.org address and an OpenPGP key is interested
in the whole ElectroBSD patchset (which contains security fixes that
were (mostly) sent to freebsd-so@ months ago but have not been addressed
yet) I'll provide it upon request.
Post by Ed Maste
The user-facing effect of this is that the kern.version sysctl no
longer conveys this information, and uname -a changes from something
Allowing to overwrite the values avoids this problem.

Fabian

Loading...