From: Demi Marie Obenour <demiobenour@gmail.com>
To: Alyssa Ross <hi@alyssa.is>
Cc: Spectrum OS Development <devel@spectrum-os.org>
Subject: Re: [PATCH v3] Generate file lists from a script
Date: Sun, 21 Sep 2025 12:51:31 -0400 [thread overview]
Message-ID: <e8a7ce72-7f2a-480a-b6ee-55dcc5e31bac@gmail.com> (raw)
In-Reply-To: <87bjn4b39h.fsf@alyssa.is>
[-- Attachment #1.1.1: Type: text/plain, Size: 13897 bytes --]
On 9/21/25 04:47, Alyssa Ross wrote:
> Demi Marie Obenour <demiobenour@gmail.com> writes:
>
>> Right now, the makefiles in host/rootfs, vm/sys/net, and img/app have
>> manually-maintained lists of files and symlinks. These duplicate the
>> information in the git repository and can easily get out of sync or
>> cause unnecessary merge conflicts. Fix all of these issues by having
>> the git repository be the source of truth, and using a script to
>> generate the file lists. Developers can regenerate the lists before
>> every commit, or even add a git hook to do that.
>>
>> Signed-off-by: Demi Marie Obenour <demiobenour@gmail.com>
>> ---
>
> I like where this is going. :)
Yay!
>> Changes in v3:
>> - Only include the file list generator. Move the rest to separate patch
>> series.
>> - Remove the update-file-list make targets from img/app/Makefile and
>> vm/sys/net/Makefile.
>> - Link to v2: https://lore.kernel.org/r/20250910-genfiles-v2-0-37ebe07a3cdc@gmail.com
>>
>> Changes in v2:
>> - Drop the last patch (switching to /etc/s6-rc/compiled) as it is
>> controversial and should be reviewed separately.
>> - Add missing copyright notices.
>> - Use a wrapper shell script to make the awk code easier to read.
>> - Improve documentation.
>> - Add helper scripts for use in git hooks and rebasing.
>> - Link to v1: https://spectrum-os.org/lists/archives/spectrum-devel/20250903-genfiles-v1-0-cc993fcb1e4c@gmail.com/
>> ---
>> Documentation/development/built-in-vms.adoc | 17 ++++
>> host/rootfs/Makefile | 102 +----------------------
>> host/rootfs/file-list.mk | 99 +++++++++++++++++++++++
>> img/app/Makefile | 80 +++----------------
>> img/app/file-list.mk | 65 +++++++++++++++
>> lib/common.mk | 1 +
>> scripts/genfiles.awk | 120 ++++++++++++++++++++++++++++
>> scripts/genfiles.sh | 29 +++++++
>> scripts/git-rebase | 17 ++++
>> scripts/pre-commit.sh | 11 +++
>
> Let's take git-rebase and pre-commit.sh out of this patch, and focus on
> the generated file lists first.
Will change.
>> vm/sys/net/Makefile | 50 ++----------
>> vm/sys/net/file-list.mk | 42 ++++++++++
>> 12 files changed, 422 insertions(+), 211 deletions(-)
>>
>> diff --git a/Documentation/development/built-in-vms.adoc b/Documentation/development/built-in-vms.adoc
>> index e90009ee5a3c2c254a7ae11e36121576b819eee7..0addc7d1a2fd322fa12918656baa3d169478504d 100644
>> --- a/Documentation/development/built-in-vms.adoc
>> +++ b/Documentation/development/built-in-vms.adoc
>
> Copyright header please!
Will fix. Also, in the future you have permission to fix missing copyright
headers when you commit. It's fine if you aren't comfortable doing that.
>> @@ -44,6 +44,23 @@ NOTE: As a special convenience, it's not necessary to run `make clean`
>> if the only change to the Nix files is modifying the packages
>> installed in the VM.
>>
>> +The list of files used for the VM image is stored in a separate file,
>> +`file-lists.mk`. To update it, run `scripts/genfiles.sh`
>
> Typo: file-list*s*.mk. Also, so far we haven't used code syntax for
> file names.
>
> Maybe "used for images" would be better, since this also applies to
> host/rootfs. (Obviously the ideal would be if this documentation wasn't
> only written for VM images but that's out of scope. We'll get to it.)
Will fix.
>> +which will regenerate it from the output of `git ls-files`. Any
>> +changes you made will be lost. This script uses uses Git's index to
>
> I think "Any changes you made will be lost." is a bit scary, because
> it's not clear it only means changes to those files. The sentence could
> probably just be dropped altogether — I think it's implied by "regenerate".
Will fix.
>> +generate the list, so you need to use `git add`, `git rm`, and `git mv`
>> +to ensure that Git knows about your changes. It is not necessary to
>> +commit the changes.
>
> "so only staged changes will be reflected"? All the extra stuff has
> potential for confusion I think — for example "It is not necessary to
> commit the changes." could be read as "when you make a commit, do not
> include changes to file-list.mk".
Will fix.
>> diff --git a/lib/common.mk b/lib/common.mk
>> index 277c3544036d9a9057f8ba4ad37fe2207548cc59..0a03ff440cc671264d2b859a2ae048db9252d047 100644
>> --- a/lib/common.mk
>> +++ b/lib/common.mk
>> @@ -1,5 +1,6 @@
>> # SPDX-License-Identifier: EUPL-1.2+
>> # SPDX-FileCopyrightText: 2021, 2023, 2025 Alyssa Ross <hi@alyssa.is>
>> +# SPDX-FileCopyrightText: 2025 Demi Marie Obenour <demiobenour@gmail.com>
>>
>> BACKGROUND = background
>> CPIO = cpio
>
> Accident?
Yes.
>> diff --git a/scripts/genfiles.awk b/scripts/genfiles.awk
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..6fe327fd0a314d226dbce23854aa8f119e9c8f34
>> --- /dev/null
>> +++ b/scripts/genfiles.awk
>> @@ -0,0 +1,120 @@
>> +#!/usr/bin/env -S LC_ALL=C LANGUAGE=C awk -E
>> +# SPDX-License-Identifier: EUPL-1.2+
>> +# SPDX-FileCopyrightText: 2025 Demi Marie Obenour <demiobenour@gmail.com>
>> +BEGIN {
>> + RS = "\n";
>> + FS = "\t";
>> + file_count = 0;
>> + symlink_count = 0;
>> + rc_count = 0;
>> + is_rc = 0;
>> + exit_code = 0;
>> + done = 0;
>
> awk variables are implicitly initialized to 0 when you try to do
> arithmetic on an undefined variable, so no need for these.
GNU Awk can lint against that. I used its lint mode because it also
warns against non-portable constructs. Also, an undefined awk
variable used as an array subscript is treated as the empty string,
not 0, which could lead to confusion.
>> + modes["120000"] = "symlink";
>> + modes["040755"] = "directory";
>> + modes["100644"] = "regular";
>> + modes["100755"] = "regular";
>> +}
>> +
>> +function fail(msg, status) {
>> + if (status ~ /^([1-9][0-9]?|1[0-9]{2}|2[0-4][1-9]|25[1-5])$/) {
>> + exit_code = status;
>> + } else {
>> + exit_code = 1;
>> + status = 1;
>> + }
>> + print ("FATAL: " msg) > "/dev/stderr";
>> + exit status;
>
> Do we ever want to exit something other than 1 from this function?
Nope.
>> +}
>> +done { fail("Junk after DONE", 1); }
>> +/^DONE$/ {
>> + done = 1
>> + next
>> +}
>> +
>> +# Make sure git produced valid output.
>> +!/^[0-7]{6}\t[ -~]+$/ {
>> + fail("git ls-files produced invalid output", 1);
>> +}
>> +
>
> This is very unlikely to happen, and if it does, it will be obvious from
> the diff.
Will drop.
>> +# Extract data from built-in variables.
>> +{
>> + filename = $2;
>> + raw_mode = $1;
>> + # awk autocreates empty string entries if the key is invalid,
>> + # but the code exits in this case so that is okay.
>> + mode = modes[raw_mode];
>> +}
>> +
>> +# Another check for a git bug.
>> +filename ~ /^\/|((^|\/)\.{0,2}($|\/))/ {
>> + fail("git ls-files output non-canonical or absolute path '" filename "'", 1);
>> +}
>> +
>
> If there are git bugs, we will notice and report them. We do not need
> to be the test suite for git here.
Okay, fair!
>> +filename ~ /[^[:alnum:]_.+@/-]/ {
>> + fail("filename '" filename "' has forbidden characters", 1);
>> +}
>> +
>> +/\.license$/ {
>> + if (raw_mode != "100644") {
>> + fail("License file '" filename "' is executable or not regular file", 1);
>> + }
>> + next;
>> +}
>
> This is also not really in scope for a script that does not care about
> license files.
Fair. I will leave that to the reuse check.
>> +
>> +mode == "directory" { next }
>
> Getting a directory from git ls-files would be sufficiently unexpected
> that I don't think we should treat it any differently from an
> unrecognized mode.
Will fix.
>> +
>> +filename ~ /^image\/etc\/s6-rc\// {
>> + if (mode != "regular") {
>> + fail("s6-rc-compile input '" filename "' isn't a regular file");
>> + }
>> + rc_count += 1;
>> + rc_files[rc_count] = filename;
>
> rc_files[rc_count++]
>
> (will make it 0-indexed though so update the loops too)
I think this might break without explicit variable initialization.
>> + next;
>> +}
>> +
>> +mode == "symlink" {
>> + symlink_count += 1;
>> + symlinks[symlink_count] = filename;
>> + next;
>> +}
>> +
>> +mode == "regular" {
>> + file_count += 1;
>> + files[file_count] = filename;
>> + next;
>> +}
>> +
>> +{ fail("File '" filename "' is not regular file, directory, or symlink (mode " raw_mode ")"); }
>> +
>> +END {
>> + if (exit_code) {
>> + exit exit_code;
>> + }
>> + if (!done) {
>> + fail("Did not receive DONE line", 1);
>> + }
>> + printf ("# SPDX-License-Identifier: CC0-1.0\n" \
>> + "# SPDX-FileCopyrightText: 2025 Demi Marie Obenour <demiobenour@gmail.com>\n" \
>
> Okay, so, it's silly that this needs to have a copyright header on it at
> all, but since we have to have one to make reuse happy, I think it
> should be mine from 2021, because the comment about links is the closest
> thing to creative expression in here.
Will fix.
>> + "# Generated by scripts/genfile.sh. Any changes will be overwritten.\n" \
>> + "FILES ::=") > out_file;
>
> I note the change to ::=. Do you think we should do that across the
> board in our Makefiles?
POSIX specifies ::= and it has better semantics in most cases, but I don't
know if the BSD makes implement it. ::= causes the RHS to be expanded immediately,
so subsequent changes in variables referenced by it do not affect the LHS.
>> + for (array_index = 1; array_index <= file_count; array_index += 1) {
>> + printf " \\\n\t%s", files[array_index] > out_file;
>> + }
>> + printf ("\n\n" \
>> +"# These are separate because they need to be included, but putting\n" \
>> +"# them as make dependencies would confuse make.\n" \
>> +"LINKS ::=") > out_file;
>> + for (array_index = 1; array_index <= symlink_count; array_index += 1) {
>> + printf " \\\n\t%s", symlinks[array_index] > out_file;
>> + }
>> + printf "\n\nS6_RC_FILES ::=" > out_file;
>> + for (array_index = 1; array_index <= rc_count; array_index += 1) {
>> + printf " \\\n\t%s", rc_files[array_index] > out_file;
>> + }
>> + printf "\n" > out_file;
>> + if (close(out_file)) {
>> + print ("Cannot close output file: " ERRNO "\n") > "/dev/stderr";
>> + exit 1;
>> + }
>> +}
>> diff --git a/scripts/genfiles.sh b/scripts/genfiles.sh
>> new file mode 100755
>> index 0000000000000000000000000000000000000000..77a8d95e88b6851be9447698556efe4f1eab174b
>> --- /dev/null
>> +++ b/scripts/genfiles.sh
>> @@ -0,0 +1,29 @@
>> +#!/usr/bin/env -S LC_ALL=C LANGUAGE=C bash --
>
> env -S is not portable, and I don't think anything here needs bash
> specifically.
$'\t' doesn't work with all shells, though I believe it is either
part of the current POSIX standard or will be added. I'll use
/usr/bin/env bash, which breaks if the script is renamed to something
starting with '-'.
> We can set the locale variables after the script starts,
> because I don't think this wrapper script is going to do anything
> locale-specific. (And shouldn't they be C.UTF-8?)
The C locale is actually what I intended. The script does not rely
on support for non-ASCII characters, and it does use the fact that
negated character classes match all bytes. Admittedly, this will
only be needed if there is a git bug.
>> +set -euo pipefail
>> +unset output_file astatus
>
> This is a bit overly defensive IMO. Both of these variables are
> assigned before use, and if they weren't, the person making those
> changes would be very unlikely to not notice because they had those
> variables defined in their environment.
Fair!
>> +case $0 in
>> +(/*) cd "${0%/*}/..";;
>> +(*/*) cd "./${0%/*}/..";;
>> +(*) cd ..;;
>> +esac
>
> Perhaps we could use git rev-parse --show-toplevel?
git ls-files doesn't have that option.
>> +for i in host/rootfs img/app vm/sys/net; do
>> + output_file=$i/file-list.mk
>> + {
>> + git -C "$i" -c core.quotePath=true ls-files $'--format=%(objectmode)\t%(path)' -- image |
>> + sort -t $'\t' -k 2
>
> TIL sort -t and -k! 🤯
>
>> + echo DONE
>
> Why do we need this?
To avoid producing any output file if the input is truncated.
>> + } |
>> + gawk -v "out_file=$output_file.tmp" -E scripts/genfiles.awk
>
> Why not stdout?
The output file is created by awk so that it is only created if
nothing went wrong.
> And why gawk? I didn't immediately notice anything
> non-POSIX, and as usual would prefer to stick to it.
POSIX does not specify -E. I can use -f instead, though.
>> + if [ -f "$output_file" ]; then
>> + # Avoid changing output file if it is up to date, as that
>> + # would cause unnecessary rebuilds.
>> + if cmp -s -- "$output_file.tmp" "$output_file"; then
>> + rm -- "$output_file.tmp"
>> + continue
>> + else
>> + astatus=$?
>> + if [ "$astatus" != 1 ]; then exit "$astatus"; fi
>
> Could avoid the need for the variable and multiple ifs. Up to you
> whether you prefer it:
>
> set +e
> cmp -s -- "$output_file.tmp" "$output_file"
> set -e
> case $? in
> 0)
> rm -- "$output_file.tmp"
> continue
> ;;
> 1)
> ;;
> *)
> exit $?
> ;;
> esac
This might set $? to the return value of 'set -e' (0). Whether or
not it actually does is at least not obvious from reading the code.
>> + fi
>> + fi
>> + mv -- "$output_file.tmp" "$output_file"
>> +done
--
Sincerely,
Demi Marie Obenour (she/her/hers)
[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 7253 bytes --]
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2025-09-21 16:51 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-04 1:56 [PATCH 0/4] Generate file lists from a script Demi Marie Obenour
2025-09-04 1:56 ` [PATCH 1/4] Move all files for the image into a subdirectory Demi Marie Obenour
2025-09-04 1:56 ` [PATCH 2/4] Generate makefile file lists from a script Demi Marie Obenour
2025-09-08 9:59 ` Alyssa Ross
2025-09-08 18:45 ` Demi Marie Obenour
2025-09-09 14:51 ` Alyssa Ross
2025-09-04 1:56 ` [PATCH 3/4] Common make rules for building erofs images Demi Marie Obenour
2025-09-08 10:01 ` Alyssa Ross
2025-09-08 18:53 ` Demi Marie Obenour
2025-09-09 14:56 ` Alyssa Ross
2025-09-04 1:56 ` [PATCH 4/4] Use /etc/s6-rc/compiled for compiled s6-rc directory Demi Marie Obenour
2025-09-10 5:29 ` [PATCH v2 0/3] Generate file lists from a script Demi Marie Obenour
2025-09-10 5:29 ` [PATCH v2 1/3] Move all files for the image into a subdirectory Demi Marie Obenour
2025-09-10 18:58 ` Alyssa Ross
2025-09-11 12:21 ` Demi Marie Obenour
2025-09-10 5:29 ` [PATCH v2 2/3] Generate makefile file lists from a script Demi Marie Obenour
2025-09-10 5:29 ` [PATCH v2 3/3] Common make rules for building erofs images Demi Marie Obenour
2025-09-11 12:47 ` [PATCH v3 0/4] Generate file lists from a script Demi Marie Obenour
2025-09-11 12:47 ` [PATCH v3 1/4] Do not ignore errors from tar Demi Marie Obenour
2025-09-17 11:48 ` Alyssa Ross
2025-09-18 2:45 ` Demi Marie Obenour
2025-09-19 7:46 ` Alyssa Ross
2025-09-30 12:59 ` Alyssa Ross
2025-09-19 7:55 ` Alyssa Ross
2025-09-19 19:03 ` Demi Marie Obenour
2025-09-11 12:47 ` [PATCH v3 2/4] Move all files for the image into a subdirectory Demi Marie Obenour
2025-09-17 12:30 ` Alyssa Ross
2025-09-17 12:39 ` Alyssa Ross
2025-09-17 13:03 ` Alyssa Ross
2025-09-11 12:47 ` [PATCH v3 3/4] Generate makefile file lists from a script Demi Marie Obenour
2025-09-11 12:47 ` [PATCH v3 4/4] Common make rules for building erofs images Demi Marie Obenour
2025-09-21 2:23 ` [PATCH v3] Generate file lists from a script Demi Marie Obenour
2025-09-21 8:47 ` Alyssa Ross
2025-09-21 16:51 ` Demi Marie Obenour [this message]
2025-09-21 17:07 ` Alyssa Ross
2025-09-21 17:24 ` [PATCH v4] " Demi Marie Obenour
2025-09-25 11:22 ` Alyssa Ross
2025-09-26 16:31 ` [PATCH v5] " Demi Marie Obenour
2025-09-27 8:19 ` Alyssa Ross
2025-09-27 8:42 ` Demi Marie Obenour
2025-09-27 16:22 ` [PATCH v6] " Demi Marie Obenour
2025-09-29 8:12 ` Alyssa Ross
2025-09-29 17:20 ` Demi Marie Obenour
2025-09-29 17:18 ` [PATCH v7] " Demi Marie Obenour
2025-10-01 9:20 ` Alyssa Ross
2025-10-01 9:24 ` Demi Marie Obenour
2025-10-01 9:35 ` Alyssa Ross
2025-10-01 18:30 ` [PATCH v8] " Demi Marie Obenour
2025-10-02 9:46 ` Alyssa Ross
2025-10-02 17:37 ` [PATCH v9] " Demi Marie Obenour
2025-10-03 9:04 ` Alyssa Ross
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e8a7ce72-7f2a-480a-b6ee-55dcc5e31bac@gmail.com \
--to=demiobenour@gmail.com \
--cc=devel@spectrum-os.org \
--cc=hi@alyssa.is \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://spectrum-os.org/git/crosvm
https://spectrum-os.org/git/doc
https://spectrum-os.org/git/mktuntap
https://spectrum-os.org/git/nixpkgs
https://spectrum-os.org/git/spectrum
https://spectrum-os.org/git/ucspi-vsock
https://spectrum-os.org/git/www
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).