Discussion:
mtree "language" enhancements
Warner Losh
2015-11-29 18:04:26 UTC
Permalink
Greetings,

As part of making NanoBSD buildable by non-root, I've found a need to have
a richer mtree language than we currently have.

mtree started out as a language to express hierarchies of files. It does a
decent job at that, even if some of the tools that we have in the tree
aren't so great about manipulating them. One could easily wish for better
tools, but that's not the topic of this thread.

So, I've started to move the language into one that can also journal
changes to a tree, and have been moving NanoBSD to using wrappers that do
the changes to the tree and record the journal events at the end of the
metalog produced from buildworld. I have a second tool that reads the meta
log, and applies the actions to the earlier entries and then produces a
final metalog that's used for makefs. These tools are still evolving, but
before I got too close to the point of committing, I thought I'd post a
proposed extension to mtree for comments so I don't have to change too much.

I'd like a new type called 'action' (so type=action in the records). This
type is defined loosely to manipulate and earlier entry (or maybe entries,
still unsure) in the file.

Each action entry would have an 'action' keyword. The keywords I've defined
so far are as follows:
1. "unlink" which throws away the previous entry. That entry has been
removed. It may apply to files or directories, but it is an error not to
remove all entries in a directory when removing the directory.
2. "move" which relocates a previous entry. An additional targetpath
keyword specifies the ultimate destination for this entry.
3. "copy" which duplicates a previous entry. It too takes targetpath.
4. "meta" which changes the meta data of the previous entry. All keywords
on this are merged with the previous entry.

The one other thing that my merging tool does is to remove all size
keywords. In the NanoBSD environment, size is irrelevant. Files are
replaced and appended to all the time in the build process, and it doesn't
make sense to track the size. makefs fails if the size is different, so
post-processing of the tree, say to add a new default to
/etc/defaults/rc.conf or to tweak /etc/ttys to turn on/off a tty (or append
a new entry) will cause it to fail. I would be nice of mtree could do this,
but is simply can't (but see above for whining about better tools being
beyond the scope of this).

If things go well, we could eventually move these extensions into mtree so
that the post-processing stage is no longer necessary. I'm content to
maintain the hundred or two lines of awk I've written to implement it. I
chose awk because it does the job well enough, though python might do it
better. But I don't want to talk about that choice since right now it is
purely internal to NanoBSD (though I hope that other build orchestration
systems like src/release and crochet look to adopt).

Comments?

Warner
Poul-Henning Kamp
2015-11-29 18:16:07 UTC
Permalink
--------
Post by Warner Losh
As part of making NanoBSD buildable by non-root, I've found a need to have
a richer mtree language than we currently have.
I'd like a new type called 'action' (so type=action in the records). This
type is defined loosely to manipulate and earlier entry (or maybe entries,
still unsure) in the file.
I suggest you define this so that all records have an action, and that
the default action is "create"
Post by Warner Losh
2. "move" which relocates a previous entry. An additional targetpath
keyword specifies the ultimate destination for this entry.
3. "copy" which duplicates a previous entry. It too takes targetpath.
Is targetpath absolute or relative ?

Can it reach out of the mtree root ?
Post by Warner Losh
4. "meta" which changes the meta data of the previous entry. All keywords
on this are merged with the previous entry.
System-III called this "chmog" if I recall correctly :-)
Post by Warner Losh
The one other thing that my merging tool does is to remove all size
keywords.
That sounds wrong to me. Shouldn't you just emit "meta" records updating
the size as appropriate ?

What about digest fields ?
--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
***@FreeBSD.ORG | TCP/IP since RFC 956
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
Warner Losh
2015-11-29 18:58:48 UTC
Permalink
Post by Poul-Henning Kamp
--------
In message <
Post by Warner Losh
As part of making NanoBSD buildable by non-root, I've found a need to have
a richer mtree language than we currently have.
I'd like a new type called 'action' (so type=action in the records). This
type is defined loosely to manipulate and earlier entry (or maybe entries,
still unsure) in the file.
I suggest you define this so that all records have an action, and that
the default action is "create"
From a practical point of view, I didn't consider this, but that is
what would be a logical consequence of these extensions.
Post by Poul-Henning Kamp
Post by Warner Losh
2. "move" which relocates a previous entry. An additional targetpath
keyword specifies the ultimate destination for this entry.
3. "copy" which duplicates a previous entry. It too takes targetpath.
Is targetpath absolute or relative ?
relative to top of tree.
Post by Poul-Henning Kamp
Can it reach out of the mtree root ?
Nope. Those cases need entirely new entries.
Post by Poul-Henning Kamp
Post by Warner Losh
4. "meta" which changes the meta data of the previous entry. All keywords
on this are merged with the previous entry.
System-III called this "chmog" if I recall correctly :-)
I love that term. I'll steal it :)
Post by Poul-Henning Kamp
Post by Warner Losh
The one other thing that my merging tool does is to remove all size
keywords.
That sounds wrong to me. Shouldn't you just emit "meta" records updating
the size as appropriate ?
Emitting records that change the size is possible, but would add an extra
step. It's easy to catch mv, rm, etc, but hard to catch >>. I took the easy
way out of just ignoring size changes, though one could add a nano_resize
<path>
command that you need to call after changing the size of a file in the
post-processing phase.
Post by Poul-Henning Kamp
What about digest fields ?
In my use case, they are irrelevant. They aren't generated by buildworld's
metalog, and aren't generally useful. They might add some protection against
tampering between when the tree is created and when it is put into a
partition,
but that's racy. For an attacker, if they can replace the file after it is
created
but before the checksum is run, they win. So there's little value here for
me.

However, having said that, digest fields either should be discarded (for
the same
reason as size), or they should be correct before the dedup tool / enhanced
mtree
gets to them. This gets into the nuts and bolts of NanoBSD: we copy files
around
all the time, but have no spec for them. The usual answer is to have a bunch
of chmod / chown calls that 'fix' them up and generate a mtree for the image
so you can protect against corruption in the field (or at least know what
changed).
In a nopriv-build, you need to somehow record these changes. Do I continue
the
traditional behavior, or do I require a new mtree spec for all the files you
wish top copy and use that to modify the metalog, or hack the permissions
directly for the priv-build case. The decision between discard and check
likely is an input to the dedup tool. For NanoBSD the decision is likely
to default to discard. But other tools might want to check, and some
NanoBSD users may wish to climb the hill to being correct by adding
calls to correct the size everywhere.

My first goal is to create a tool that produces correct images with
the right permissions. A secondary goal would be to safe-guard the process
from unintended changes that would be caught by size and/or digest
changes. It isn't a current feature of NanoBSD, but that doesn't make
it undesirable. Especially if your NanoBSD build process puts precious
files onto the media that you want to make sure the rest of the build
process doesn't tamper with accidentally to guard against bugs...

Warner
Tim Kientzle
2015-11-29 18:59:36 UTC
Permalink
Sounds interesting.

Have you talked with Michal (CCed) who is working on a libmtree library?

The capabilities you're describing here really need to be bundled into a library, I think. In particular, the ability to "unlink", "copy", etc, is much more useful if you can directly query the mtree file contents to perform conditional changes. (For example, it may be important to remove an empty directory which requires you to be able to query whether a directory has files in it.)

I would also be interested in a description of the processing model. It sounds like you're assuming the same model used by the current mtree program -- mtree files are processed sequentially line-by-line as they are read.

For instance, libarchive's mtree processor works differently; it reads the entire input, merging redundant lines for the same file, and then processes the list. This is more explicitly declarative, and simplifies things like modifying the ownership or permissions of already-listed files.
Post by Warner Losh
Each action entry would have an 'action' keyword.
In terms of the language per se, this seems unnecessary. I've proposed alternate language below that omits the unnecessary "type=action" by just adding new keywords.
Post by Warner Losh
The keywords I've defined
1. "unlink" which throws away the previous entry. That entry has been
removed. It may apply to files or directories, but it is an error not to
remove all entries in a directory when removing the directory.
# When set on an entry, a matching file on disk will be removed.
# This would also be useful for things like ObsoleteFiles
unlink=true
Post by Warner Losh
2. "move" which relocates a previous entry. An additional targetpath
keyword specifies the ultimate destination for this entry.
# When set on an entry, moves the existing file to the new name
rename=<targetpath>

# Example
foo/bar type=file owner=root mode=0755 rename=foo/baz
Post by Warner Losh
3. "copy" which duplicates a previous entry. It too takes target path.
# As with rename, except it copies the contents.
copy_from=<original>

# properties that are not specified will be copied as well
# Create foo/bar by copying foo/baz, preserving all attributes
foo/bar type=file copy_from=foo/baz
# Create foo/bar as above, but modify the owner
foo/bar owner=dialer type=file copy_from=foo/baz
Post by Warner Losh
4. "meta" which changes the meta data of the previous entry. All keywords
on this are merged with the previous entry.
As above, libarchive's mtree processor already does this by default; no language change is needed.
Post by Warner Losh
The one other thing that my merging tool does is to remove all size
keywords. ... [comments about modifying existing files]
One common case here is appending new contents to an existing file. That could similarly be handled with the same pattern:

# Append from source
foo/bar append_from=<target path>

In particular, that removes the need to find the source file to modify it in-place. I've run into various headaches with Crochet when the /usr/obj layout changes between releases and Crochet cannot find the new location of a file. This would remove the need to always modify the file in-place. (But not all.)

Cheers,

Tim
Post by Warner Losh
Greetings,
As part of making NanoBSD buildable by non-root, I've found a need to have
a richer mtree language than we currently have.
mtree started out as a language to express hierarchies of files. It does a
decent job at that, even if some of the tools that we have in the tree
aren't so great about manipulating them. One could easily wish for better
tools, but that's not the topic of this thread.
So, I've started to move the language into one that can also journal
changes to a tree, and have been moving NanoBSD to using wrappers that do
the changes to the tree and record the journal events at the end of the
metalog produced from buildworld. I have a second tool that reads the meta
log, and applies the actions to the earlier entries and then produces a
final metalog that's used for makefs. These tools are still evolving, but
before I got too close to the point of committing, I thought I'd post a
proposed extension to mtree for comments so I don't have to change too much.
I'd like a new type called 'action' (so type=action in the records). This
type is defined loosely to manipulate and earlier entry (or maybe entries,
still unsure) in the file.
Each action entry would have an 'action' keyword. The keywords I've defined
1. "unlink" which throws away the previous entry. That entry has been
removed. It may apply to files or directories, but it is an error not to
remove all entries in a directory when removing the directory.
2. "move" which relocates a previous entry. An additional targetpath
keyword specifies the ultimate destination for this entry.
3. "copy" which duplicates a previous entry. It too takes targetpath.
4. "meta" which changes the meta data of the previous entry. All keywords
on this are merged with the previous entry.
The one other thing that my merging tool does is to remove all size
keywords. In the NanoBSD environment, size is irrelevant. Files are
replaced and appended to all the time in the build process, and it doesn't
make sense to track the size. makefs fails if the size is different, so
post-processing of the tree, say to add a new default to
/etc/defaults/rc.conf or to tweak /etc/ttys to turn on/off a tty (or append
a new entry) will cause it to fail. I would be nice of mtree could do this,
but is simply can't (but see above for whining about better tools being
beyond the scope of this).
If things go well, we could eventually move these extensions into mtree so
that the post-processing stage is no longer necessary. I'm content to
maintain the hundred or two lines of awk I've written to implement it. I
chose awk because it does the job well enough, though python might do it
better. But I don't want to talk about that choice since right now it is
purely internal to NanoBSD (though I hope that other build orchestration
systems like src/release and crochet look to adopt).
Comments?
Warner
_______________________________________________
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
Warner Losh
2015-11-29 19:22:13 UTC
Permalink
Post by Tim Kientzle
Sounds interesting.
Have you talked with Michal (CCed) who is working on a libmtree library?
No. I haven't. I've been thinking mostly what's the fastest way I can get
NanoBSD working in a nopriv (-DNO_ROOT) environment that wouldn't
be hard to push into a library later.
Post by Tim Kientzle
The capabilities you're describing here really need to be bundled into a
library, I think. In particular, the ability to "unlink", "copy", etc, is
much more useful if you can directly query the mtree file contents to
perform conditional changes. (For example, it may be important to remove
an empty directory which requires you to be able to query whether a
directory has files in it.)
In the NanoBSD context, these entries would be automatically generated,
so the tree is at hand. There'd be no need for this conditional stuff,
though
having it as an additional extension wouldn't be bad.
Post by Tim Kientzle
I would also be interested in a description of the processing model. It
sounds like you're assuming the same model used by the current mtree
program -- mtree files are processed sequentially line-by-line as they are
read.
The processing model is that the resulting mtree file is read sequentially.
Each
new entry either creates a new node in an internal representation, or
modifies
a previous node. Once everything has been processed, the internal
representation
would be used to do something. In my case, I'd output an mtree file free of
these
extensions.
Post by Tim Kientzle
For instance, libarchive's mtree processor works differently; it reads the
entire input, merging redundant lines for the same file, and then processes
the list. This is more explicitly declarative, and simplifies things like
modifying the ownership or permissions of already-listed files.
Yes. My awk script that is the first manifestation of these extensions
is implemented this way. That's why I described it as a journal, but
didn't explain that in my nomenclature, a journal is process
first to last to get the current state.
Post by Tim Kientzle
Post by Warner Losh
Each action entry would have an 'action' keyword.
In terms of the language per se, this seems unnecessary. I've proposed
alternate language below that omits the unnecessary "type=action" by just
adding new keywords.
That would work too. I came up with the type=action thing as a way to avoid
a lot of new keywords, and to segregate the new actions from the old, but
what you propose would also work and might be more general.
Post by Tim Kientzle
The keywords I've defined
Post by Warner Losh
1. "unlink" which throws away the previous entry. That entry has been
removed. It may apply to files or directories, but it is an error not to
remove all entries in a directory when removing the directory.
# When set on an entry, a matching file on disk will be removed.
# This would also be useful for things like ObsoleteFiles
unlink=true
OK. That's a little different than what I had in mind. My notion was that
the tree would be modified in place to remove the file, and this entry
would announce that action so the mtree internal representation could
be modified to reflect that. Though I do see value in your approach.
Post by Tim Kientzle
Post by Warner Losh
2. "move" which relocates a previous entry. An additional targetpath
keyword specifies the ultimate destination for this entry.
# When set on an entry, moves the existing file to the new name
rename=<targetpath>
# Example
foo/bar type=file owner=root mode=0755 rename=foo/baz
That would work.
Post by Tim Kientzle
Post by Warner Losh
3. "copy" which duplicates a previous entry. It too takes target path.
# As with rename, except it copies the contents.
copy_from=<original>
Yes.
Post by Tim Kientzle
# properties that are not specified will be copied as well
# Create foo/bar by copying foo/baz, preserving all attributes
foo/bar type=file copy_from=foo/baz
# Create foo/bar as above, but modify the owner
foo/bar owner=dialer type=file copy_from=foo/baz
s/owner/uname=/ but I like that.
Post by Tim Kientzle
Post by Warner Losh
4. "meta" which changes the meta data of the previous entry. All keywords
on this are merged with the previous entry.
As above, libarchive's mtree processor already does this by default; no
language change is needed.
OK. If it matches existing practice, I'm cool with the change.
Post by Tim Kientzle
Post by Warner Losh
The one other thing that my merging tool does is to remove all size
keywords. ... [comments about modifying existing files]
One common case here is appending new contents to an existing file. That
# Append from source
foo/bar append_from=<target path>
That's a novel idea. My most-processor might have a little trouble with it
if we were trying not
to modify the actual target tree. But with modify in place, we could make
it work.
Post by Tim Kientzle
In particular, that removes the need to find the source file to modify it
in-place. I've run into various headaches with Crochet when the /usr/obj
layout changes between releases and Crochet cannot find the new location of
a file. This would remove the need to always modify the file in-place.
(But not all.)
It is a useful pattern.

Most of the nanobsd scripts I've seen use >> to append individual files,
one line at a time.


Warner

Cheers,
Post by Tim Kientzle
Tim
Post by Warner Losh
Greetings,
As part of making NanoBSD buildable by non-root, I've found a need to
have
Post by Warner Losh
a richer mtree language than we currently have.
mtree started out as a language to express hierarchies of files. It does
a
Post by Warner Losh
decent job at that, even if some of the tools that we have in the tree
aren't so great about manipulating them. One could easily wish for better
tools, but that's not the topic of this thread.
So, I've started to move the language into one that can also journal
changes to a tree, and have been moving NanoBSD to using wrappers that do
the changes to the tree and record the journal events at the end of the
metalog produced from buildworld. I have a second tool that reads the
meta
Post by Warner Losh
log, and applies the actions to the earlier entries and then produces a
final metalog that's used for makefs. These tools are still evolving, but
before I got too close to the point of committing, I thought I'd post a
proposed extension to mtree for comments so I don't have to change too
much.
Post by Warner Losh
I'd like a new type called 'action' (so type=action in the records). This
type is defined loosely to manipulate and earlier entry (or maybe
entries,
Post by Warner Losh
still unsure) in the file.
Each action entry would have an 'action' keyword. The keywords I've
defined
Post by Warner Losh
1. "unlink" which throws away the previous entry. That entry has been
removed. It may apply to files or directories, but it is an error not to
remove all entries in a directory when removing the directory.
2. "move" which relocates a previous entry. An additional targetpath
keyword specifies the ultimate destination for this entry.
3. "copy" which duplicates a previous entry. It too takes targetpath.
4. "meta" which changes the meta data of the previous entry. All keywords
on this are merged with the previous entry.
The one other thing that my merging tool does is to remove all size
keywords. In the NanoBSD environment, size is irrelevant. Files are
replaced and appended to all the time in the build process, and it
doesn't
Post by Warner Losh
make sense to track the size. makefs fails if the size is different, so
post-processing of the tree, say to add a new default to
/etc/defaults/rc.conf or to tweak /etc/ttys to turn on/off a tty (or
append
Post by Warner Losh
a new entry) will cause it to fail. I would be nice of mtree could do
this,
Post by Warner Losh
but is simply can't (but see above for whining about better tools being
beyond the scope of this).
If things go well, we could eventually move these extensions into mtree
so
Post by Warner Losh
that the post-processing stage is no longer necessary. I'm content to
maintain the hundred or two lines of awk I've written to implement it. I
chose awk because it does the job well enough, though python might do it
better. But I don't want to talk about that choice since right now it is
purely internal to NanoBSD (though I hope that other build orchestration
systems like src/release and crochet look to adopt).
Comments?
Warner
_______________________________________________
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
Tim Kientzle
2015-11-29 22:49:03 UTC
Permalink
Post by Tim Kientzle
I would also be interested in a description of the processing model. It sounds like you're assuming the same model used by the current mtree program -- mtree files are processed sequentially line-by-line as they are read.
The processing model is that the resulting mtree file is read sequentially. Each
new entry either creates a new node in an internal representation, or modifies
a previous node. Once everything has been processed, the internal representation
would be used to do something. In my case, I'd output an mtree file free of these
extensions.
Good. I like that model.
Post by Tim Kientzle
Post by Warner Losh
1. "unlink" which throws away the previous entry.
# When set on an entry, a matching file on disk will be removed.
# This would also be useful for things like ObsoleteFiles
unlink=true
OK. That's a little different than what I had in mind. My notion was that
the tree would be modified in place to remove the file, and this entry
would announce that action so the mtree internal representation could
be modified to reflect that. Though I do see value in your approach.
I was thinking that the 'mtree' command-line tool could be useful for bulk-remove operations (or more generally for updating an existing tree including removal of obsolete files). But bulk-remove is probably easier to do with 'xargs rm', so that might be overkill.
Post by Tim Kientzle
which is good for manually maintained manifests,
usr/tests/bin/cat/d_align.in mode=0644 contents="/b/sjg/work/stable10/obj/stage/i386/usr/tests/bin/cat/d_align.in"
usr/tests/bin/cat/d_align.out mode=0644 contents="/b/sjg/work/stable10/obj/stage/i386/usr/tests/bin/cat/d_align.out"
the two can be combined - an mtree style header with autogenerated
info appended.
libarchive also supports this mixture. It's a little tricky to parse accurately, though. I think libarchive considers any line a "full path" line if the name has a '/' in it. So you occasionally need to use things like './foo' to force the right interpretation. And of course, there are tricky details like merging properties accurately when some are specified in the old format and some in the new.
Post by Tim Kientzle
Indeed I'd really like the ability to provide default uid/gid
for the case that a uname/gname cannot be looked up.
I think 'tar' got this right: If uname and uid are both specified, then look up uname and if that fails, use the specified uid. Ditto for gname/gid. In particular, this lets a single specification be used to rebuild a tree on another system with different UIDs or on a system that does not (yet) have a full password file. An option could be provided for the (rare) case that someone really wants to prefer UIDs to unames.

Tim
Tim Kientzle
2015-11-30 04:28:42 UTC
Permalink
Post by Tim Kientzle
Post by Simon J. Gerraty
Indeed I'd really like the ability to provide default uid/gid
for the case that a uname/gname cannot be looked up.
I think 'tar' got this right: If uname and uid are both specified, then look up uname and if that fails, use the specified uid. Ditto for gname/gid. In particular, this lets a single specification be used to rebuild a tree on another system with different UIDs or on a system that does not (yet) have a full password file. An option could be provided for the (rare) case that someone really wants to prefer UIDs to unames.
On further reflection, preferring UIDs to unames would actually be pretty common here.

In particular, NanoBSD (and Crochet and other similar tools) should prefer the UID when building images instead of looking up unames against the build host's password file.

Tim
Warner Losh
2015-11-30 05:49:10 UTC
Permalink
Post by Tim Kientzle
Post by Tim Kientzle
Post by Simon J. Gerraty
Indeed I'd really like the ability to provide default uid/gid
for the case that a uname/gname cannot be looked up.
I think 'tar' got this right: If uname and uid are both specified, then
look up uname and if that fails, use the specified uid. Ditto for
gname/gid. In particular, this lets a single specification be used to
rebuild a tree on another system with different UIDs or on a system that
does not (yet) have a full password file. An option could be provided for
the (rare) case that someone really wants to prefer UIDs to unames.
On further reflection, preferring UIDs to unames would actually be pretty common here.
In particular, NanoBSD (and Crochet and other similar tools) should prefer
the UID when building images instead of looking up unames against the build
host's password file.
I've implemented what we've talked about, except this. When doing the
makefs, we should use the /etc/master_password that's inside the image in
preference to either of these alternatives. That's the most correct thing
to do: use as much of the data as you can, as late as you can.

The thing I'm struggling with now is why would both be present? Would that
indicate an error? Or someone changing the defaults? And if they are
changing the defaults, why use a uid in preference to a uname? Is this to
avoid contamination? To set something not in the password file, or just
comfort level of the user? FreeBSD will write unames for install*.

So I'm left thinking that maybe the rule should be 'last one wins' at least
for the use case where we use the target's /etc/master_password. That's
what I've actually implemented.

Preliminary testing of http://people.freebsd.org/~imp/mtree-dedup.awk
appears to be working. I haven't tried all the cases yet, but it is looking
promising. I don't need append_from, so that's just a stub in this file.
Since this is in awk, I don't use the host's /etc/password at all. That's
one of the failures of mtree that I've seen when I tried to use it, and
perhaps the source of your concern. I'd love to see any libmtree be able to
manipulate mtree files absent the tree it describes and even any process of
uname -> uid at all to avoid these issues. The silly awk thing I wrote is
purely a path to set of key-value pair manipulation tool.

Once I'm more confident about this after some testing and integration into
NanoBSD, I'll post something to phabricator. But I'd welcome any comments
on what I've implemented in the mean time.

Warner
Tim Kientzle
2015-12-01 02:31:07 UTC
Permalink
Post by Warner Losh
Post by Tim Kientzle
Post by Tim Kientzle
Post by Simon J. Gerraty
Indeed I'd really like the ability to provide default uid/gid
for the case that a uname/gname cannot be looked up.
I think 'tar' got this right: If uname and uid are both specified, then
look up uname and if that fails, use the specified uid. Ditto for
gname/gid. In particular, this lets a single specification be used to
rebuild a tree on another system with different UIDs or on a system that
does not (yet) have a full password file. An option could be provided for
the (rare) case that someone really wants to prefer UIDs to unames.
On further reflection, preferring UIDs to unames would actually be pretty common here.
In particular, NanoBSD (and Crochet and other similar tools) should prefer
the UID when building images instead of looking up unames against the build
host's password file.
I've implemented what we've talked about, except this. When doing the
makefs, we should use the /etc/master_password that's inside the image in
preference to either of these alternatives. That's the most correct thing
to do: use as much of the data as you can, as late as you can.
The thing I'm struggling with now is why would both be present? Would that
indicate an error? Or someone changing the defaults? And if they are
changing the defaults, why use a uid in preference to a uname? Is this to
avoid contamination? To set something not in the password file, or just
comfort level of the user? FreeBSD will write unames for install*.
So I'm left thinking that maybe the rule should be 'last one wins' at least
for the use case where we use the target's /etc/master_password. That's
what I've actually implemented.
There are two key cases that drove this design for tar:

1. Handling user info that is not (yet) in the target password file. In practice, images get built up in different orders: I might add a bunch of new files owned by a new user before some other process gets a chance to add the user.

2. Restoring info when the target has different user numbering than the host. (Or when the user isn’t in the host password file at all.)

For #1, you need the UID since the uname can’t be looked up anywhere. For #2, you must have the uname since the UID would be wrong. An image that can work in either scenario needs to have both.

For NanoBSD, you may be able to enforce that users are always present in the target password file before any data owned by those users is added to the image. So it may be reasonable to just rely on uname everywhere for now.

Tim
Simon J. Gerraty
2015-12-01 18:25:23 UTC
Permalink
Post by Tim Kientzle
Post by Warner Losh
So I'm left thinking that maybe the rule should be 'last one wins' at least
for the use case where we use the target's /etc/master_password. That's
what I've actually implemented.
1. Handling user info that is not (yet) in the target password file.
In practice, images get built up in different orders: I might add a
bunch of new files owned by a new user before some other process gets
a chance to add the user.
This is the issue we face.
We don't like magic numbers so prefer to use names (uid=0 gid=0
is fine).

We use mtree with BSD.var.dist at various times, and in at least some of
those cases we cannot assume that the passwd or group databases will
be complete (or even valid - eg during recovery from corrupted storage).

In such cases we could easily tollerate mtree simply using 0:0 (or
current uid:gid) for any uname:gname it could not resolve, since we
aren't likely to care about those dirs until we are up and running
properly - by which time the ownership would have been fixed.

What we don't want is for mtree to toss its cookies or flood the console
with pointless noise (which it is wont to do).

What we currently have to do to avoid problems, is run BSD.var.dist
through sed to replace all \([gu]\)name=[^ ]* with \1id=0 and
and it would be nice to be able to skip that.
Masao Uebayashi
2015-12-03 09:58:32 UTC
Permalink
Post by Tim Kientzle
Post by Warner Losh
Post by Tim Kientzle
Post by Tim Kientzle
Post by Simon J. Gerraty
Indeed I'd really like the ability to provide default uid/gid
for the case that a uname/gname cannot be looked up.
I think 'tar' got this right: If uname and uid are both specified, then
look up uname and if that fails, use the specified uid. Ditto for
gname/gid. In particular, this lets a single specification be used to
rebuild a tree on another system with different UIDs or on a system that
does not (yet) have a full password file. An option could be provided for
the (rare) case that someone really wants to prefer UIDs to unames.
On further reflection, preferring UIDs to unames would actually be pretty common here.
In particular, NanoBSD (and Crochet and other similar tools) should prefer
the UID when building images instead of looking up unames against the build
host's password file.
I've implemented what we've talked about, except this. When doing the
makefs, we should use the /etc/master_password that's inside the image in
preference to either of these alternatives. That's the most correct thing
to do: use as much of the data as you can, as late as you can.
The thing I'm struggling with now is why would both be present? Would that
indicate an error? Or someone changing the defaults? And if they are
changing the defaults, why use a uid in preference to a uname? Is this to
avoid contamination? To set something not in the password file, or just
comfort level of the user? FreeBSD will write unames for install*.
So I'm left thinking that maybe the rule should be 'last one wins' at least
for the use case where we use the target's /etc/master_password. That's
what I've actually implemented.
1. Handling user info that is not (yet) in the target password file. In practice, images get built up in different orders: I might add a bunch of new files owned by a new user before some other process gets a chance to add the user.
When you say "image", you surely mean "file-system image".
File-system image contains on-disk data (inode), which contains
UID/GID instead of symbolic ones (uname/gname).

When you decide to create an image, you have a whole tree
(directories/files) that ends up in a generated file-system image.
Which means that when you create an image, you must know all the files
and UIDs/GIDs put there. If not, what you are creating should not be
an image. If you don't know UIDs/GIDs, can't you just create a tar
archive, and extract it when you really create an image later?

I don't really want mtree(1) unnecessarily smart so it makes
unnecessary decisions. I want it to be simple and deterministic.
Post by Tim Kientzle
2. Restoring info when the target has different user numbering than the host. (Or when the user isn’t in the host password file at all.)
For #1, you need the UID since the uname can’t be looked up anywhere. For #2, you must have the uname since the UID would be wrong. An image that can work in either scenario needs to have both.
For NanoBSD, you may be able to enforce that users are always present in the target password file before any data owned by those users is added to the image. So it may be reasonable to just rely on uname everywhere for now.
Tim
_______________________________________________
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
Mark Felder
2015-12-25 20:49:02 UTC
Permalink
Post by Tim Kientzle
Post by Tim Kientzle
Post by Simon J. Gerraty
Indeed I'd really like the ability to provide default uid/gid
for the case that a uname/gname cannot be looked up.
I think 'tar' got this right: If uname and uid are both specified, then look up uname and if that fails, use the specified uid. Ditto for gname/gid. In particular, this lets a single specification be used to rebuild a tree on another system with different UIDs or on a system that does not (yet) have a full password file. An option could be provided for the (rare) case that someone really wants to prefer UIDs to unames.
On further reflection, preferring UIDs to unames would actually be pretty common here.
Just don't lose the functionality to use unames. It's really useful when
changing lots of UIDs. Just schedule maintenance, do an mtree capture of
the filesystem, change UIDs, re-apply the mtree. It will fix everything
for you :-)
--
Mark Felder
ports-secteam member
***@FreeBSD.org
Simon J. Gerraty
2015-11-29 19:10:24 UTC
Permalink
Post by Warner Losh
As part of making NanoBSD buildable by non-root, I've found a need to have
a richer mtree language than we currently have.
No fundamental objection there.
Indeed I'd really like the ability to provide default uid/gid
for the case that a uname/gname cannot be looked up.
Or even just a flag to say if lookup fails use 0:0
This would avoid the need to post-process BSD.var.dist to replace all
uname/gname with uid=0/gid=0 during various bootstrap situations.
Post by Warner Losh
I'd like a new type called 'action' (so type=action in the records). This
type is defined loosely to manipulate and earlier entry (or maybe entries,
still unsure) in the file.
Each action entry would have an 'action' keyword. The keywords I've defined
would or could?
Post by Warner Losh
1. "unlink" which throws away the previous entry. That entry has been
removed. It may apply to files or directories, but it is an error not to
remove all entries in a directory when removing the directory.
2. "move" which relocates a previous entry. An additional targetpath
keyword specifies the ultimate destination for this entry.
3. "copy" which duplicates a previous entry. It too takes targetpath.
4. "meta" which changes the meta data of the previous entry. All keywords
on this are merged with the previous entry.
Probably need to know a bit more about how NanoBSD is built/packaged to
comment more usefully. Any useful references?
Post by Warner Losh
The one other thing that my merging tool does is to remove all size
keywords. In the NanoBSD environment, size is irrelevant. Files are
..
Post by Warner Losh
replaced and appended to all the time in the build process, and it doesn't
make sense to track the size. makefs fails if the size is different, so
Agreed.

Where do these size keywords come from?
We (Juniper) do not have them in any of our mtree based manifests.
Which we use directly with makefs.

On the off chance it is of interest...
I wonder if this style of manifest would simplify your problem?
I believe all the code needed (other than makefiles) is in head at least.

There are two styles supported, classic mtree:

#mtree
#
# Group IDs used:
# 0 wheel
#
# User IDs used:
# 0 root
#
/set uid=0 gid=0 mode=555 type=file

bin type=dir
cat contents="${STAGE_OBJTOP}/bin/cat"
cp contents="${STAGE_OBJTOP}/bin/cp"

..

which is good for manually maintained manifests,
and for autogenerated (eg via find) an full path format:

usr/tests/bin/cat/d_align.in mode=0644 contents="/b/sjg/work/stable10/obj/stage/i386/usr/tests/bin/cat/d_align.in"
usr/tests/bin/cat/d_align.out mode=0644 contents="/b/sjg/work/stable10/obj/stage/i386/usr/tests/bin/cat/d_align.out"

the two can be combined - an mtree style header with autogenerated
info appended.
Post by Warner Losh
If things go well, we could eventually move these extensions into mtree so
that the post-processing stage is no longer necessary. I'm content to
maintain the hundred or two lines of awk I've written to implement it. I
chose awk because it does the job well enough, though python might do it
better. But I don't want to talk about that choice since right now it is
purely internal to NanoBSD (though I hope that other build orchestration
systems like src/release and crochet look to adopt).
FWIW we use python when awk/sed etc prove insufficient or cumbersome
but awk/sed are usually adequate.
Loading...