Discussion:
loader.efi architecture for replacing boot1.efi
Eric McCorkle
2017-12-16 00:57:14 UTC
Permalink
I have posted a review which begins to move loader.efi in the direction
of replacing boot1.efi. The review can be found here:
https://reviews.freebsd.org/D13497

This patch enables loader.efi to be installed to /efi/boot/BOOTX64.EFI
on the ESP. It implements what I envision being the last-resort
fallback mechanism, but this is enough to allow it to boot a FreeBSD system.

It also preserves the existing behavior, so as not to break anyone's
install.

The *eventual* procedure for initial partition selection looks like this:

1) See if the boot loader arguments directly specify a kernel and/or
partition, use that if they do.

2) If not, then attempt to read EFI vars to determine the boot location

3) If no EFI vars are defined, and no partition was specified, fall back
to looking for an installed system on devices

4) At the very last, do the legacy (what loader.efi currently does)
behavior.

Step (3) is done by attempting to stat /boot/loader.conf and
/boot/kernel. First, all partitions on the same disk are searched, then
all remaining partitions are searched.

This should allow mechanisms like EFI vars and command-line args to work
without interference from the fallback mechanisms. However, it also
provides robustness in the face of failure modes and uninitialized
systems (I personally ran into a problem a while back with a linux
system, where I couldn't boot with EFI, because the EFI vars weren't
set, because I couldn't set them if I couldn't boot with EFI; had to use
Shell.efi to sort out the mess...)

More importantly, it provides a seamless transition from the way things
are now to the way we want things to be.

Please provide comments and feedback.
Warner Losh
2017-12-16 01:09:18 UTC
Permalink
On Dec 15, 2017 5:57 PM, "Eric McCorkle" <***@metricspace.net> wrote:

I have posted a review which begins to move loader.efi in the direction
of replacing boot1.efi. The review can be found here:
https://reviews.freebsd.org/D13497

This patch enables loader.efi to be installed to /efi/boot/BOOTX64.EFI
on the ESP. It implements what I envision being the last-resort
fallback mechanism, but this is enough to allow it to boot a FreeBSD system.


This will move to /efi/freebsd/loader.efi

It also preserves the existing behavior, so as not to break anyone's
install.

The *eventual* procedure for initial partition selection looks like this:

1) See if the boot loader arguments directly specify a kernel and/or
partition, use that if they do.


This should be second. Uefi variables Trump all.

2) If not, then attempt to read EFI vars to determine the boot location

3) If no EFI vars are defined, and no partition was specified, fall back
to looking for an installed system on devices


This is fine, so long as it is only on the device that the loader loaded
from.

4) At the very last, do the legacy (what loader.efi currently does)
behavior.


This is bogus. It violates the uefi boot loader protocol. We must abandon
this legacy behavior. The behavior is actively harmful since something
random will boot. This has caused actual operational issues at Netflix.
Guessing is really bad.

Step (3) is done by attempting to stat /boot/loader.conf and
/boot/kernel. First, all partitions on the same disk are searched, then
all remaining partitions are searched.

This should allow mechanisms like EFI vars and command-line args to work
without interference from the fallback mechanisms. However, it also
provides robustness in the face of failure modes and uninitialized
systems (I personally ran into a problem a while back with a linux
system, where I couldn't boot with EFI, because the EFI vars weren't
set, because I couldn't set them if I couldn't boot with EFI; had to use
Shell.efi to sort out the mess...)

More importantly, it provides a seamless transition from the way things
are now to the way we want things to be.

Please provide comments and feedback.


Please listen when I say searching all devices is actively harmful. The
uefi boot manager, which I'm in the process of bringing in, offers a way to
specifically say what you want to boot. If someone needs something
complicated, they must use that moving forward. Part of what makes the
protocol work is loaders giving up early so the next one on the list can be
tried.

Warner
Eric McCorkle
2017-12-16 01:43:58 UTC
Permalink
Post by Warner Losh
This should be second. Uefi variables Trump all.
2) If not, then attempt to read EFI vars to determine the boot location
3) If no EFI vars are defined, and no partition was specified, fall back
to looking for an installed system on devices
This is fine, so long as it is only on the device that the loader loaded
from.
It's fine if it's configurable, but there needs to be sane behavior if
the EFI vars aren't set.
Post by Warner Losh
4) At the very last, do the legacy (what loader.efi currently does)
behavior.
This is bogus. It violates the uefi boot loader protocol. We must
abandon this legacy behavior. The behavior is actively harmful since
something random will boot. This has caused actual operational issues at
Netflix. Guessing is really bad.
We can't just ditch the current behavior and break everyone's existing
install, though. Legacy behavior should be supported at least until the
next major release.
Post by Warner Losh
Step (3) is done by attempting to stat /boot/loader.conf and
/boot/kernel.  First, all partitions on the same disk are searched, then
all remaining partitions are searched.
This should allow mechanisms like EFI vars and command-line args to work
without interference from the fallback mechanisms.  However, it also
provides robustness in the face of failure modes and uninitialized
systems (I personally ran into a problem a while back with a linux
system, where I couldn't boot with EFI, because the EFI vars weren't
set, because I couldn't set them if I couldn't boot with EFI; had to use
Shell.efi to sort out the mess...)
More importantly, it provides a seamless transition from the way things
are now to the way we want things to be.
Please provide comments and feedback.
Please listen when I say searching all devices is actively harmful. The
uefi boot manager, which I'm in the process of bringing in, offers a way
to specifically say what you want to boot. If someone needs something
complicated, they must use that moving forward. Part of what makes the
protocol work is loaders giving up early so the next one on the list can
be tried.
We also have to deal with the reality that some EFI implementations are
adversarial. We have to be able to deal with implementations that make
it difficult to set EFI vars, or which mess with their values (Lenovo is
particularly notorious for this).

You can disable fallback mechanisms with command-line args or macros or
whatever, but they need to be there.
Warner Losh
2017-12-16 02:05:17 UTC
Permalink
Post by Warner Losh
This should be second. Uefi variables Trump all.
2) If not, then attempt to read EFI vars to determine the boot location
3) If no EFI vars are defined, and no partition was specified, fall back
to looking for an installed system on devices
This is fine, so long as it is only on the device that the loader loaded
from.
It's fine if it's configurable, but there needs to be sane behavior if
the EFI vars aren't set.


Where do we get this info for such a broken setup? Do you have actual
examples?
Post by Warner Losh
4) At the very last, do the legacy (what loader.efi currently does)
behavior.
This is bogus. It violates the uefi boot loader protocol. We must
abandon this legacy behavior. The behavior is actively harmful since
something random will boot. This has caused actual operational issues at
Netflix. Guessing is really bad.
We can't just ditch the current behavior and break everyone's existing
install, though. Legacy behavior should be supported at least until the
next major release.


What useful setups does this break? Absent a real example, we absolutely
are breaking this. There is a real cost to doing this that as the de facto
maintainer of stand I'm unwilling to maintain, test or commit to not
breaking. The legacy behavior is broken and has caused me hours of pain in
production. There has been no articulated use case this enables, especially
since boot loader can be interrupted to specify something in recovery
scenarios.
Post by Warner Losh
Step (3) is done by attempting to stat /boot/loader.conf and
/boot/kernel. First, all partitions on the same disk are searched, then
all remaining partitions are searched.
This should allow mechanisms like EFI vars and command-line args to work
without interference from the fallback mechanisms. However, it also
provides robustness in the face of failure modes and uninitialized
systems (I personally ran into a problem a while back with a linux
system, where I couldn't boot with EFI, because the EFI vars weren't
set, because I couldn't set them if I couldn't boot with EFI; had to use
Shell.efi to sort out the mess...)
More importantly, it provides a seamless transition from the way things
are now to the way we want things to be.
Please provide comments and feedback.
Please listen when I say searching all devices is actively harmful. The
uefi boot manager, which I'm in the process of bringing in, offers a way
to specifically say what you want to boot. If someone needs something
complicated, they must use that moving forward. Part of what makes the
protocol work is loaders giving up early so the next one on the list can
be tried.
We also have to deal with the reality that some EFI implementations are
adversarial. We have to be able to deal with implementations that make
it difficult to set EFI vars, or which mess with their values (Lenovo is
particularly notorious for this).

You can disable fallback mechanisms with command-line args or macros or
whatever, but they need to be there.


No. Absent a sane use case, I refuse. Give me a reasonable use case, I will
reconsider.

Warner
Warner Losh
2017-12-16 03:28:00 UTC
Permalink
Post by Eric McCorkle
Post by Warner Losh
This should be second. Uefi variables Trump all.
2) If not, then attempt to read EFI vars to determine the boot
location
Post by Warner Losh
3) If no EFI vars are defined, and no partition was specified, fall
back
Post by Warner Losh
to looking for an installed system on devices
This is fine, so long as it is only on the device that the loader loaded
from.
It's fine if it's configurable, but there needs to be sane behavior if
the EFI vars aren't set.
Where do we get this info for such a broken setup? Do you have actual
examples?
Post by Warner Losh
4) At the very last, do the legacy (what loader.efi currently does)
behavior.
This is bogus. It violates the uefi boot loader protocol. We must
abandon this legacy behavior. The behavior is actively harmful since
something random will boot. This has caused actual operational issues at
Netflix. Guessing is really bad.
We can't just ditch the current behavior and break everyone's existing
install, though. Legacy behavior should be supported at least until the
next major release.
What useful setups does this break? Absent a real example, we absolutely
are breaking this. There is a real cost to doing this that as the de facto
maintainer of stand I'm unwilling to maintain, test or commit to not
breaking. The legacy behavior is broken and has caused me hours of pain in
production. There has been no articulated use case this enables, especially
since boot loader can be interrupted to specify something in recovery
scenarios.
Post by Warner Losh
Step (3) is done by attempting to stat /boot/loader.conf and
/boot/kernel. First, all partitions on the same disk are searched,
then
Post by Warner Losh
all remaining partitions are searched.
This should allow mechanisms like EFI vars and command-line args to
work
Post by Warner Losh
without interference from the fallback mechanisms. However, it also
provides robustness in the face of failure modes and uninitialized
systems (I personally ran into a problem a while back with a linux
system, where I couldn't boot with EFI, because the EFI vars weren't
set, because I couldn't set them if I couldn't boot with EFI; had to
use
Post by Warner Losh
Shell.efi to sort out the mess...)
More importantly, it provides a seamless transition from the way
things
Post by Warner Losh
are now to the way we want things to be.
Please provide comments and feedback.
Please listen when I say searching all devices is actively harmful. The
uefi boot manager, which I'm in the process of bringing in, offers a way
to specifically say what you want to boot. If someone needs something
complicated, they must use that moving forward. Part of what makes the
protocol work is loaders giving up early so the next one on the list can
be tried.
We also have to deal with the reality that some EFI implementations are
adversarial. We have to be able to deal with implementations that make
it difficult to set EFI vars, or which mess with their values (Lenovo is
particularly notorious for this).
You can disable fallback mechanisms with command-line args or macros or
whatever, but they need to be there.
No. Absent a sane use case, I refuse. Give me a reasonable use case, I
will reconsider.
So the current behavior leads to absurd results that nobody else does, and
that we don't do for legacy boot:

If we boot loader.efi/boot1.efi off a hard drive, and find there's no
kernel, we'll load off cdrom or a floppy if we happen to find a kernel
there. That's nuts. What's more, we'll load off a different device (say a
thumb drive), which is also crazy. The last thing you want is to
accidentally pick the thumb drive recovery kernel that happens to be in a
USB slot when you have a primary and secondary partition on two main disks,
but today's behavior chooses that. It's so crazy that I can see no benefit
from supporting, testing and maintaining this. If someone wants to recover
a system, they can do it at the boot loader prompt now (they couldn't
before). If someone really wants to boot his crazy thing, we have a new way
to specify it specifically w/o any ambiguity based on how the devices might
move around.

We already support about 100 boot scenarios that are hard enough to test. I
don't want to commit to supporting this and making it 120 or 150 once you
work out all the combinatorics. We have to trim the matrix of useless
things. So absent a use case that makes sense, that people are actually
doing, I'm having a hard time justifying keeping it around as we transition.

Warner

P.S. On x86, we support geli/nogeli, gpt/mbr, ufs/zfs, and uefi/legacy/both
(24 combinations). Plus we support booting off CDROM, netbooting, etc. For
arm, and arm64 we have a similar number that are possible. zfs/ufs,
u-boot/uefi, and mbr/gpt (plus a number of different u-boot boards). For
mips we have a similar mix. Powerpc we support 4 or 6 ways. It's just too
much to hope to test and ensure works. Each new thing has an non-trivial
cost, and I see zero benefit from this one more thing, especially since it
gets in the way of UEFI boot manager support.
Eric McCorkle
2017-12-16 04:54:18 UTC
Permalink
Post by Eric McCorkle
Post by Warner Losh
This should be second. Uefi variables Trump all.
     2) If not, then attempt to read EFI vars to determine the boot location
     3) If no EFI vars are defined, and no partition was specified, fall back
     to looking for an installed system on devices
This is fine, so long as it is only on the device that the loader loaded
from.
It's fine if it's configurable, but there needs to be sane behavior if
the EFI vars aren't set.
Where do we get this info for such a broken setup? Do you have
actual examples?
Post by Warner Losh
     4) At the very last, do the legacy (what loader.efi currently does)
     behavior.
This is bogus. It violates the uefi boot loader protocol. We must
abandon this legacy behavior. The behavior is actively harmful since
something random will boot. This has caused actual operational issues at
Netflix. Guessing is really bad.
We can't just ditch the current behavior and break everyone's existing
install, though.  Legacy behavior should be supported at least
until the
next major release.
What useful setups does this break? Absent a real example, we
absolutely are breaking this. There is a real cost to doing this
that as the de facto maintainer of stand I'm unwilling to maintain,
test or commit to not breaking. The legacy behavior is broken and
has caused me hours of pain in production. There has been no
articulated use case this enables, especially since boot loader can
be interrupted to specify something in recovery scenarios.
Post by Warner Losh
     Step (3) is done by attempting to stat /boot/loader.conf and
     /boot/kernel.  First, all partitions on the same disk are
searched, then
Post by Warner Losh
     all remaining partitions are searched.
     This should allow mechanisms like EFI vars and
command-line args to work
Post by Warner Losh
     without interference from the fallback mechanisms. 
However, it also
Post by Warner Losh
     provides robustness in the face of failure modes and
uninitialized
Post by Warner Losh
     systems (I personally ran into a problem a while back with
a linux
Post by Warner Losh
     system, where I couldn't boot with EFI, because the EFI
vars weren't
Post by Warner Losh
     set, because I couldn't set them if I couldn't boot with
EFI; had to use
Post by Warner Losh
     Shell.efi to sort out the mess...)
     More importantly, it provides a seamless transition from
the way things
Post by Warner Losh
     are now to the way we want things to be.
     Please provide comments and feedback.
Please listen when I say searching all devices is actively
harmful. The
Post by Warner Losh
uefi boot manager, which I'm in the process of bringing in,
offers a way
Post by Warner Losh
to specifically say what you want to boot. If someone needs
something
Post by Warner Losh
complicated, they must use that moving forward. Part of what
makes the
Post by Warner Losh
protocol work is loaders giving up early so the next one on
the list can
Post by Warner Losh
be tried.
We also have to deal with the reality that some EFI
implementations are
adversarial.  We have to be able to deal with implementations
that make
it difficult to set EFI vars, or which mess with their values (Lenovo is
particularly notorious for this).
You can disable fallback mechanisms with command-line args or macros or
whatever, but they need to be there.
No. Absent a sane use case, I refuse. Give me a reasonable use case,
I will reconsider.
So the current behavior leads to absurd results that nobody else does,
If we boot loader.efi/boot1.efi off a hard drive, and find there's no
kernel, we'll load off cdrom or a floppy if we happen to find a kernel
there. That's nuts. What's more, we'll load off a different device (say
a thumb drive), which is also crazy. The last thing you want is to
accidentally pick the thumb drive recovery kernel that happens to be in
a USB slot when you have a primary and secondary partition on two main
disks, but today's behavior chooses that. It's so crazy that I can see
no benefit from supporting, testing and maintaining this. If someone
wants to recover a system, they can do it at the boot loader prompt now
(they couldn't before). If someone really wants to boot his crazy thing,
we have a new way to specify it specifically w/o any ambiguity based on
how the devices might move around.
We already support about 100 boot scenarios that are hard enough to
test. I don't want to commit to supporting this and making it 120 or 150
once you work out all the combinatorics. We have to trim the matrix of
useless things.  So absent a use case that makes sense, that people are
actually doing, I'm having a hard time justifying keeping it around as
we transition.
Warner
P.S. On x86, we support geli/nogeli, gpt/mbr, ufs/zfs, and
uefi/legacy/both (24 combinations). Plus we support booting off CDROM,
netbooting, etc. For arm, and arm64 we have a similar number that are
possible. zfs/ufs, u-boot/uefi, and mbr/gpt (plus a number of different
u-boot boards). For mips we have a similar mix. Powerpc we support 4 or
6 ways. It's just too much to hope to test and ensure works. Each new
thing has an non-trivial cost, and I see zero benefit from this one more
thing, especially since it gets in the way of UEFI boot manager support.
Whatever happens, this needs to not break existing installs. We can
remove probing floppy drives, fine (does anyone even HAVE those
anymore?). CD-ROM drives, will break auto-detection when booting from a
liveDVD, but that can be mitigated by specifying loader args (I suppose
we'll need to have loader get args from the boot.config files
eventually). But for now, loader.efi has got to work whether installed
in a boot1/loader (legacy) configuration, or installed directly to the
ESP. Otherwise, there's going to be a lot of unhappy people out there.

As for the fallback search, it's just that: a fallback mechanism. Its
job is to make a sane guess as to where to find the system, but
ultimately it's not doing anything the user can't do themselves. And it
will only run if the EFI vars aren't set anyway, so it can't possibly
interfere with any of that.
Warner Losh
2017-12-16 05:49:52 UTC
Permalink
Post by Eric McCorkle
Post by Eric McCorkle
Post by Warner Losh
This should be second. Uefi variables Trump all.
2) If not, then attempt to read EFI vars to determine the
boot location
Post by Eric McCorkle
Post by Warner Losh
3) If no EFI vars are defined, and no partition was
specified, fall back
Post by Eric McCorkle
Post by Warner Losh
to looking for an installed system on devices
This is fine, so long as it is only on the device that the
loader loaded
Post by Eric McCorkle
Post by Warner Losh
from.
It's fine if it's configurable, but there needs to be sane behavior if
the EFI vars aren't set.
Where do we get this info for such a broken setup? Do you have
actual examples?
Post by Warner Losh
4) At the very last, do the legacy (what loader.efi
currently does)
Post by Eric McCorkle
Post by Warner Losh
behavior.
This is bogus. It violates the uefi boot loader protocol. We
must
Post by Eric McCorkle
Post by Warner Losh
abandon this legacy behavior. The behavior is actively harmful
since
Post by Eric McCorkle
Post by Warner Losh
something random will boot. This has caused actual operational
issues at
Post by Eric McCorkle
Post by Warner Losh
Netflix. Guessing is really bad.
We can't just ditch the current behavior and break everyone's existing
install, though. Legacy behavior should be supported at least
until the
next major release.
What useful setups does this break? Absent a real example, we
absolutely are breaking this. There is a real cost to doing this
that as the de facto maintainer of stand I'm unwilling to maintain,
test or commit to not breaking. The legacy behavior is broken and
has caused me hours of pain in production. There has been no
articulated use case this enables, especially since boot loader can
be interrupted to specify something in recovery scenarios.
Post by Warner Losh
Step (3) is done by attempting to stat /boot/loader.conf
and
Post by Eric McCorkle
Post by Warner Losh
/boot/kernel. First, all partitions on the same disk are
searched, then
Post by Warner Losh
all remaining partitions are searched.
This should allow mechanisms like EFI vars and
command-line args to work
Post by Warner Losh
without interference from the fallback mechanisms.
However, it also
Post by Warner Losh
provides robustness in the face of failure modes and
uninitialized
Post by Warner Losh
systems (I personally ran into a problem a while back with
a linux
Post by Warner Losh
system, where I couldn't boot with EFI, because the EFI
vars weren't
Post by Warner Losh
set, because I couldn't set them if I couldn't boot with
EFI; had to use
Post by Warner Losh
Shell.efi to sort out the mess...)
More importantly, it provides a seamless transition from
the way things
Post by Warner Losh
are now to the way we want things to be.
Please provide comments and feedback.
Please listen when I say searching all devices is actively
harmful. The
Post by Warner Losh
uefi boot manager, which I'm in the process of bringing in,
offers a way
Post by Warner Losh
to specifically say what you want to boot. If someone needs
something
Post by Warner Losh
complicated, they must use that moving forward. Part of what
makes the
Post by Warner Losh
protocol work is loaders giving up early so the next one on
the list can
Post by Warner Losh
be tried.
We also have to deal with the reality that some EFI
implementations are
adversarial. We have to be able to deal with implementations that make
it difficult to set EFI vars, or which mess with their values
(Lenovo is
particularly notorious for this).
You can disable fallback mechanisms with command-line args or macros or
whatever, but they need to be there.
No. Absent a sane use case, I refuse. Give me a reasonable use case,
I will reconsider.
So the current behavior leads to absurd results that nobody else does,
If we boot loader.efi/boot1.efi off a hard drive, and find there's no
kernel, we'll load off cdrom or a floppy if we happen to find a kernel
there. That's nuts. What's more, we'll load off a different device (say
a thumb drive), which is also crazy. The last thing you want is to
accidentally pick the thumb drive recovery kernel that happens to be in
a USB slot when you have a primary and secondary partition on two main
disks, but today's behavior chooses that. It's so crazy that I can see
no benefit from supporting, testing and maintaining this. If someone
wants to recover a system, they can do it at the boot loader prompt now
(they couldn't before). If someone really wants to boot his crazy thing,
we have a new way to specify it specifically w/o any ambiguity based on
how the devices might move around.
We already support about 100 boot scenarios that are hard enough to
test. I don't want to commit to supporting this and making it 120 or 150
once you work out all the combinatorics. We have to trim the matrix of
useless things. So absent a use case that makes sense, that people are
actually doing, I'm having a hard time justifying keeping it around as
we transition.
Warner
P.S. On x86, we support geli/nogeli, gpt/mbr, ufs/zfs, and
uefi/legacy/both (24 combinations). Plus we support booting off CDROM,
netbooting, etc. For arm, and arm64 we have a similar number that are
possible. zfs/ufs, u-boot/uefi, and mbr/gpt (plus a number of different
u-boot boards). For mips we have a similar mix. Powerpc we support 4 or
6 ways. It's just too much to hope to test and ensure works. Each new
thing has an non-trivial cost, and I see zero benefit from this one more
thing, especially since it gets in the way of UEFI boot manager support.
Whatever happens, this needs to not break existing installs.
I don' tthink it will.

We can
Post by Eric McCorkle
remove probing floppy drives, fine (does anyone even HAVE those
anymore?).
The kernel is likely too big. But my point was more that if I boot
loader.efi off a hard drive, the floppy isn't the place to find a kernel by
default in the absence of very explicit instructions to do so.
Post by Eric McCorkle
CD-ROM drives, will break auto-detection when booting from a
liveDVD, but that can be mitigated by specifying loader args (I suppose
we'll need to have loader get args from the boot.config files
eventually).
CD/DVD booing won't break. We'll still load a kernel from them. No
boot.config needed for this case (though it might be for others).
Post by Eric McCorkle
But for now, loader.efi has got to work whether installed
in a boot1/loader (legacy) configuration, or installed directly to the
ESP. Otherwise, there's going to be a lot of unhappy people out there.
Correct. My proposed behavior will do just that, and if we get it wrong by
default (a) you can be explicit with boot variables or (b) you can type
something into the OK prompt, which you didn't have before.
Post by Eric McCorkle
As for the fallback search, it's just that: a fallback mechanism. Its
job is to make a sane guess as to where to find the system, but
ultimately it's not doing anything the user can't do themselves. And it
will only run if the EFI vars aren't set anyway, so it can't possibly
interfere with any of that.
And the fallback mechanism of typing what you want is wrong because? But
it's job isn't to guess. If we don't know for sure what to boot, it's our
job to fail so the next OS in the list gets a shot at booting.

So, if we look at the sequence coming up, I'd like to propose the following:

We look at BootCurrent. If this exists, we look at BootXXXX to see the
current boot vars. This bootvar will have two things in it. It will have a
path to what was boot (possibly with a path of what to boot next) and a
command line. This command line is also passed to us by the BIOS. If the
command line has a root filesystem specifier, use it for currdir. If there
was a next thing to load (eg HD(<mumble>)/boot/kernel/kernel), then use
HD(<mumble>) as currdir. Otherwise, if can find a ZFS pool (or there's more
than one and one is specified as bootenv), use it as currdir. Otherwise, if
we can find a UFS partition on the same drive as loader.efi came from that
has /boot/loader.rc (or whatever the file is in lua loader), use that for
currdir. If we still can't find currdir at this point, prompt for a currdir
(timeout after 10s) -- we have no scipt loaded at this point to do
prompting...

We could add loading boot.config from the same ESP \efi\freebsd\boot.config
at the beginning....

This is going to be tricky to code up as it is... This is basically what
I'd written up things in two docs:

https://docs.google.com/document/d/1aK9IqF-60JPEbUeSAUAkYjF2W_8EnmczFs6RqCT90Jg/edit#heading=h.jdwnfj2sxlfb
(UEFI boot protocol, lightly edited to include the above summary)
https://docs.google.com/document/d/1l9tognVBx_QmWx6ZvilgEj2ndoIaMJhPPNllZZyHJj0/edit#heading=h.9ps7k4bunurf
(ZFS UEFI media type to be able to specify things exactly if one wants)

to try to get this all sorted out...

Warner
Eric McCorkle
2017-12-16 15:31:35 UTC
Permalink
Post by Warner Losh
CD/DVD booing won't break. We'll still load a kernel from them. No
boot.config needed for this case (though it might be for others).
How is that possibly going to work for a liveDVD on a random system?
People expect it to "just work" (meaning, it correctly guesses the
kernel, then loads it).

I can see it working with boot.config (which I'd be fine with), but if
we don't search the CD drives, there's no way it can work.
Post by Warner Losh
But for now, loader.efi has got to work whether installed
in a boot1/loader (legacy) configuration, or installed directly to the
ESP.  Otherwise, there's going to be a lot of unhappy people out there.
Correct. My proposed behavior will do just that, and if we get it wrong
by default (a) you can be explicit with boot variables or (b) you can
type something into the OK prompt, which you didn't have before.
No, I'm talking about people with existing installations, which still
have both boot1 and loader.efi. A change this big needs to be phased in
over time, which means both modes of operation need to be supported for
a while.
 
Post by Warner Losh
As for the fallback search, it's just that: a fallback mechanism.  Its
job is to make a sane guess as to where to find the system, but
ultimately it's not doing anything the user can't do themselves.  And it
will only run if the EFI vars aren't set anyway, so it can't possibly
interfere with any of that.
And the fallback mechanism of typing what you want is wrong because?
Because every single person out there with an install is going to
suddenly have to type, and that's going to lead to a whole bunch of
people saying we broke loader.
Post by Warner Losh
But it's job isn't to guess. If we don't know for sure what to boot, it's
our job to fail so the next OS in the list gets a shot at booting.
That won't happen though. If loader fails to find an installed system,
it drops out to a prompt, but it doesn't exit. Given that, it makes
sense to make an effort at finding an installed system.
Warner Losh
2017-12-16 17:07:37 UTC
Permalink
Post by Eric McCorkle
Post by Warner Losh
CD/DVD booing won't break. We'll still load a kernel from them. No
boot.config needed for this case (though it might be for others).
How is that possibly going to work for a liveDVD on a random system?
People expect it to "just work" (meaning, it correctly guesses the
kernel, then loads it).
I can see it working with boot.config (which I'd be fine with), but if
we don't search the CD drives, there's no way it can work.
And it will. It booted off the CD device, and will search the CD device
(and only the CD device) for the kernel. It will find it there. How could
that not work?
Post by Eric McCorkle
Post by Warner Losh
But for now, loader.efi has got to work whether installed
in a boot1/loader (legacy) configuration, or installed directly to
the
Post by Warner Losh
ESP. Otherwise, there's going to be a lot of unhappy people out
there.
Post by Warner Losh
Correct. My proposed behavior will do just that, and if we get it wrong
by default (a) you can be explicit with boot variables or (b) you can
type something into the OK prompt, which you didn't have before.
No, I'm talking about people with existing installations, which still
have both boot1 and loader.efi. A change this big needs to be phased in
over time, which means both modes of operation need to be supported for
a while.
Unless they have a totally whacked out system, the proposed thing that I'm
suggesting will just work for them.

If they are booting with multiple disks where /boot/loader comes from a
different disk than the boot disk, they will have to do something to
configure it. The number of such people is likely zero given how fragile
this setup is (it breaks when you plug in a thumb drive with a release on
it, for example).
Post by Eric McCorkle
As for the fallback search, it's just that: a fallback mechanism. Its
Post by Warner Losh
job is to make a sane guess as to where to find the system, but
ultimately it's not doing anything the user can't do themselves.
And it
Post by Warner Losh
will only run if the EFI vars aren't set anyway, so it can't possibly
interfere with any of that.
And the fallback mechanism of typing what you want is wrong because?
Because every single person out there with an install is going to
suddenly have to type, and that's going to lead to a whole bunch of
people saying we broke loader.
I maintain no such people will have to do that. The UEFI BIOS is required
to set BootCurrent and BootXXXX. However, even in the absence of this,
we'll look for a ZFS pool (and disks) or UFS partition on the same disk.
This should generally work by default.
Post by Eric McCorkle
Post by Warner Losh
But it's job isn't to guess. If we don't know for sure what to boot, it's
our job to fail so the next OS in the list gets a shot at booting.
That won't happen though. If loader fails to find an installed system,
it drops out to a prompt, but it doesn't exit. Given that, it makes
sense to make an effort at finding an installed system.
No. It doesn't. You're assuming that if we fail, the system won't boot.
That's false. If we fail to boot device X, it's our job to fail so that if
there's a Y or a Z it can be next. We have no knowledge of whether the user
would prefer Y or Z as the next one to try, but the boot manager that runs
inside every single UEFI firmware does and it will go to the next one. Y
might be a recovery disk or copy of a freebsd memory stick release and Z
might be a redundant copy of X to use in cases where X fails. Or vice
versa. Do we want to boot to the installer? Not as a first choice, but
maybe as a last resort. But we should let UEFI orchestrate the retries.
Trying to second guess is fundamentally wrong, especially in UEFI where the
boot order and boot recovery stuff is so extensively and particularly
defined. Having fought the "oh, I'm going to guess" code in boot1.efi for
over a year and after having it consistently pick the wrong thing to boot
on some tiny fraction of the hundreds of systems I've had deployed give me
strong empirical data that shows the guessing too hard bit is actually
actively harmful. I've thought about this a lot. I've thought through all
the supported scenarios. I've written up documents and solicited feedback.
Nobody to date has said "oh no! I really want the random installed system
roulette! I love it! Don't kill it."

Warner
Richard Perini
2017-12-17 00:17:52 UTC
Permalink
Post by Warner Losh
Post by Eric McCorkle
Post by Warner Losh
CD/DVD booing won't break. We'll still load a kernel from them. No
boot.config needed for this case (though it might be for others).
How is that possibly going to work for a liveDVD on a random system?
People expect it to "just work" (meaning, it correctly guesses the
kernel, then loads it).
I can see it working with boot.config (which I'd be fine with), but if
we don't search the CD drives, there's no way it can work.
And it will. It booted off the CD device, and will search the CD device
(and only the CD device) for the kernel. It will find it there. How could
that not work?
Post by Eric McCorkle
Post by Warner Losh
But for now, loader.efi has got to work whether installed
in a boot1/loader (legacy) configuration, or installed directly to
the
Post by Warner Losh
ESP. Otherwise, there's going to be a lot of unhappy people out
there.
Post by Warner Losh
Correct. My proposed behavior will do just that, and if we get it wrong
by default (a) you can be explicit with boot variables or (b) you can
type something into the OK prompt, which you didn't have before.
No, I'm talking about people with existing installations, which still
have both boot1 and loader.efi. A change this big needs to be phased in
over time, which means both modes of operation need to be supported for
a while.
Unless they have a totally whacked out system, the proposed thing that I'm
suggesting will just work for them.
If they are booting with multiple disks where /boot/loader comes from a
different disk than the boot disk, they will have to do something to
configure it. The number of such people is likely zero given how fragile
this setup is (it breaks when you plug in a thumb drive with a release on
it, for example).
Post by Eric McCorkle
As for the fallback search, it's just that: a fallback mechanism. Its
Post by Warner Losh
job is to make a sane guess as to where to find the system, but
ultimately it's not doing anything the user can't do themselves.
And it
Post by Warner Losh
will only run if the EFI vars aren't set anyway, so it can't possibly
interfere with any of that.
And the fallback mechanism of typing what you want is wrong because?
Because every single person out there with an install is going to
suddenly have to type, and that's going to lead to a whole bunch of
people saying we broke loader.
I maintain no such people will have to do that. The UEFI BIOS is required
to set BootCurrent and BootXXXX. However, even in the absence of this,
we'll look for a ZFS pool (and disks) or UFS partition on the same disk.
This should generally work by default.
Post by Eric McCorkle
Post by Warner Losh
But it's job isn't to guess. If we don't know for sure what to boot, it's
our job to fail so the next OS in the list gets a shot at booting.
That won't happen though. If loader fails to find an installed system,
it drops out to a prompt, but it doesn't exit. Given that, it makes
sense to make an effort at finding an installed system.
No. It doesn't. You're assuming that if we fail, the system won't boot.
That's false. If we fail to boot device X, it's our job to fail so that if
there's a Y or a Z it can be next. We have no knowledge of whether the user
would prefer Y or Z as the next one to try, but the boot manager that runs
inside every single UEFI firmware does and it will go to the next one. Y
might be a recovery disk or copy of a freebsd memory stick release and Z
might be a redundant copy of X to use in cases where X fails. Or vice
versa. Do we want to boot to the installer? Not as a first choice, but
maybe as a last resort. But we should let UEFI orchestrate the retries.
Trying to second guess is fundamentally wrong, especially in UEFI where the
boot order and boot recovery stuff is so extensively and particularly
defined. Having fought the "oh, I'm going to guess" code in boot1.efi for
over a year and after having it consistently pick the wrong thing to boot
on some tiny fraction of the hundreds of systems I've had deployed give me
strong empirical data that shows the guessing too hard bit is actually
actively harmful. I've thought about this a lot. I've thought through all
the supported scenarios. I've written up documents and solicited feedback.
Nobody to date has said "oh no! I really want the random installed system
roulette! I love it! Don't kill it."
Warner
_______________________________________________
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To add support to Warner, as an administrator of 50+ FreeBSD systems on
a variety of hardware and disk configs, I totally support Warner's arguments.
Having the loader trying to guess in the case of unusual setups when it
can't find a kernel __on the same device as the loader__ causes grief. If
you want to have your thing boot that way, then configure it to do so, or
present a menu of boot options.
--
Richard Perini
Ramico Australia Pty Ltd Sydney, Australia ***@ci.com.au +61 2 9552 5500
-----------------------------------------------------------------------------
"The difference between theory and practice is that in theory there is no
difference, but in practice there is"
Eric McCorkle
2017-12-22 01:12:54 UTC
Permalink
Post by Richard Perini
Post by Warner Losh
No. It doesn't. You're assuming that if we fail, the system won't boot.
That's false. If we fail to boot device X, it's our job to fail so that if
there's a Y or a Z it can be next. We have no knowledge of whether the user
would prefer Y or Z as the next one to try, but the boot manager that runs
inside every single UEFI firmware does and it will go to the next one. Y
might be a recovery disk or copy of a freebsd memory stick release and Z
might be a redundant copy of X to use in cases where X fails. Or vice
versa. Do we want to boot to the installer? Not as a first choice, but
maybe as a last resort. But we should let UEFI orchestrate the retries.
Trying to second guess is fundamentally wrong, especially in UEFI where the
boot order and boot recovery stuff is so extensively and particularly
defined. Having fought the "oh, I'm going to guess" code in boot1.efi for
over a year and after having it consistently pick the wrong thing to boot
on some tiny fraction of the hundreds of systems I've had deployed give me
strong empirical data that shows the guessing too hard bit is actually
actively harmful. I've thought about this a lot. I've thought through all
the supported scenarios. I've written up documents and solicited feedback.
Nobody to date has said "oh no! I really want the random installed system
roulette! I love it! Don't kill it."
Warner
_______________________________________________
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To add support to Warner, as an administrator of 50+ FreeBSD systems on
a variety of hardware and disk configs, I totally support Warner's arguments.
Having the loader trying to guess in the case of unusual setups when it
can't find a kernel __on the same device as the loader__ causes grief. If
you want to have your thing boot that way, then configure it to do so, or
present a menu of boot options.
Ok, I've updated my review, simplifying the search code. The option to
search all devices is only enabled by a preprocessor macro at this point.

I have a mostly-complete follow-on, which adds parsing of args from the
boot.config files.

Anyway, the state of my review at this point is enough that I'm
unblocked on the GELI work.

Continue reading on narkive:
Search results for 'loader.efi architecture for replacing boot1.efi' (Questions and Answers)
4
replies
What are the advantages of a mac computer?
started 2006-08-17 12:36:32 UTC
computers & internet
Loading...