From: Alyssa Ross <hi@alyssa.is>
To: Demi Marie Obenour <demiobenour@gmail.com>
Cc: Spectrum OS Development <devel@spectrum-os.org>
Subject: Re: [PATCH v3] Generate file lists from a script
Date: Sun, 21 Sep 2025 10:47:54 +0200 [thread overview]
Message-ID: <87bjn4b39h.fsf@alyssa.is> (raw)
In-Reply-To: <20250920-genfiles-v3-1-d6c2b6767b42@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 11618 bytes --]
Demi Marie Obenour <demiobenour@gmail.com> writes:
> Right now, the makefiles in host/rootfs, vm/sys/net, and img/app have
> manually-maintained lists of files and symlinks. These duplicate the
> information in the git repository and can easily get out of sync or
> cause unnecessary merge conflicts. Fix all of these issues by having
> the git repository be the source of truth, and using a script to
> generate the file lists. Developers can regenerate the lists before
> every commit, or even add a git hook to do that.
>
> Signed-off-by: Demi Marie Obenour <demiobenour@gmail.com>
> ---
I like where this is going. :)
> Changes in v3:
> - Only include the file list generator. Move the rest to separate patch
> series.
> - Remove the update-file-list make targets from img/app/Makefile and
> vm/sys/net/Makefile.
> - Link to v2: https://lore.kernel.org/r/20250910-genfiles-v2-0-37ebe07a3cdc@gmail.com
>
> Changes in v2:
> - Drop the last patch (switching to /etc/s6-rc/compiled) as it is
> controversial and should be reviewed separately.
> - Add missing copyright notices.
> - Use a wrapper shell script to make the awk code easier to read.
> - Improve documentation.
> - Add helper scripts for use in git hooks and rebasing.
> - Link to v1: https://spectrum-os.org/lists/archives/spectrum-devel/20250903-genfiles-v1-0-cc993fcb1e4c@gmail.com/
> ---
> Documentation/development/built-in-vms.adoc | 17 ++++
> host/rootfs/Makefile | 102 +----------------------
> host/rootfs/file-list.mk | 99 +++++++++++++++++++++++
> img/app/Makefile | 80 +++----------------
> img/app/file-list.mk | 65 +++++++++++++++
> lib/common.mk | 1 +
> scripts/genfiles.awk | 120 ++++++++++++++++++++++++++++
> scripts/genfiles.sh | 29 +++++++
> scripts/git-rebase | 17 ++++
> scripts/pre-commit.sh | 11 +++
Let's take git-rebase and pre-commit.sh out of this patch, and focus on
the generated file lists first.
> vm/sys/net/Makefile | 50 ++----------
> vm/sys/net/file-list.mk | 42 ++++++++++
> 12 files changed, 422 insertions(+), 211 deletions(-)
>
> diff --git a/Documentation/development/built-in-vms.adoc b/Documentation/development/built-in-vms.adoc
> index e90009ee5a3c2c254a7ae11e36121576b819eee7..0addc7d1a2fd322fa12918656baa3d169478504d 100644
> --- a/Documentation/development/built-in-vms.adoc
> +++ b/Documentation/development/built-in-vms.adoc
Copyright header please!
> @@ -44,6 +44,23 @@ NOTE: As a special convenience, it's not necessary to run `make clean`
> if the only change to the Nix files is modifying the packages
> installed in the VM.
>
> +The list of files used for the VM image is stored in a separate file,
> +`file-lists.mk`. To update it, run `scripts/genfiles.sh`
Typo: file-list*s*.mk. Also, so far we haven't used code syntax for
file names.
Maybe "used for images" would be better, since this also applies to
host/rootfs. (Obviously the ideal would be if this documentation wasn't
only written for VM images but that's out of scope. We'll get to it.)
> +which will regenerate it from the output of `git ls-files`. Any
> +changes you made will be lost. This script uses uses Git's index to
I think "Any changes you made will be lost." is a bit scary, because
it's not clear it only means changes to those files. The sentence could
probably just be dropped altogether — I think it's implied by "regenerate".
> +generate the list, so you need to use `git add`, `git rm`, and `git mv`
> +to ensure that Git knows about your changes. It is not necessary to
> +commit the changes.
"so only staged changes will be reflected"? All the extra stuff has
potential for confusion I think — for example "It is not necessary to
commit the changes." could be read as "when you make a commit, do not
include changes to file-list.mk".
> diff --git a/lib/common.mk b/lib/common.mk
> index 277c3544036d9a9057f8ba4ad37fe2207548cc59..0a03ff440cc671264d2b859a2ae048db9252d047 100644
> --- a/lib/common.mk
> +++ b/lib/common.mk
> @@ -1,5 +1,6 @@
> # SPDX-License-Identifier: EUPL-1.2+
> # SPDX-FileCopyrightText: 2021, 2023, 2025 Alyssa Ross <hi@alyssa.is>
> +# SPDX-FileCopyrightText: 2025 Demi Marie Obenour <demiobenour@gmail.com>
>
> BACKGROUND = background
> CPIO = cpio
Accident?
> diff --git a/scripts/genfiles.awk b/scripts/genfiles.awk
> new file mode 100644
> index 0000000000000000000000000000000000000000..6fe327fd0a314d226dbce23854aa8f119e9c8f34
> --- /dev/null
> +++ b/scripts/genfiles.awk
> @@ -0,0 +1,120 @@
> +#!/usr/bin/env -S LC_ALL=C LANGUAGE=C awk -E
> +# SPDX-License-Identifier: EUPL-1.2+
> +# SPDX-FileCopyrightText: 2025 Demi Marie Obenour <demiobenour@gmail.com>
> +BEGIN {
> + RS = "\n";
> + FS = "\t";
> + file_count = 0;
> + symlink_count = 0;
> + rc_count = 0;
> + is_rc = 0;
> + exit_code = 0;
> + done = 0;
awk variables are implicitly initialized to 0 when you try to do
arithmetic on an undefined variable, so no need for these.
> + modes["120000"] = "symlink";
> + modes["040755"] = "directory";
> + modes["100644"] = "regular";
> + modes["100755"] = "regular";
> +}
> +
> +function fail(msg, status) {
> + if (status ~ /^([1-9][0-9]?|1[0-9]{2}|2[0-4][1-9]|25[1-5])$/) {
> + exit_code = status;
> + } else {
> + exit_code = 1;
> + status = 1;
> + }
> + print ("FATAL: " msg) > "/dev/stderr";
> + exit status;
Do we ever want to exit something other than 1 from this function?
> +}
> +done { fail("Junk after DONE", 1); }
> +/^DONE$/ {
> + done = 1
> + next
> +}
> +
> +# Make sure git produced valid output.
> +!/^[0-7]{6}\t[ -~]+$/ {
> + fail("git ls-files produced invalid output", 1);
> +}
> +
This is very unlikely to happen, and if it does, it will be obvious from
the diff.
> +# Extract data from built-in variables.
> +{
> + filename = $2;
> + raw_mode = $1;
> + # awk autocreates empty string entries if the key is invalid,
> + # but the code exits in this case so that is okay.
> + mode = modes[raw_mode];
> +}
> +
> +# Another check for a git bug.
> +filename ~ /^\/|((^|\/)\.{0,2}($|\/))/ {
> + fail("git ls-files output non-canonical or absolute path '" filename "'", 1);
> +}
> +
If there are git bugs, we will notice and report them. We do not need
to be the test suite for git here.
> +filename ~ /[^[:alnum:]_.+@/-]/ {
> + fail("filename '" filename "' has forbidden characters", 1);
> +}
> +
> +/\.license$/ {
> + if (raw_mode != "100644") {
> + fail("License file '" filename "' is executable or not regular file", 1);
> + }
> + next;
> +}
This is also not really in scope for a script that does not care about
license files.
> +
> +mode == "directory" { next }
Getting a directory from git ls-files would be sufficiently unexpected
that I don't think we should treat it any differently from an
unrecognized mode.
> +
> +filename ~ /^image\/etc\/s6-rc\// {
> + if (mode != "regular") {
> + fail("s6-rc-compile input '" filename "' isn't a regular file");
> + }
> + rc_count += 1;
> + rc_files[rc_count] = filename;
rc_files[rc_count++]
(will make it 0-indexed though so update the loops too)
> + next;
> +}
> +
> +mode == "symlink" {
> + symlink_count += 1;
> + symlinks[symlink_count] = filename;
> + next;
> +}
> +
> +mode == "regular" {
> + file_count += 1;
> + files[file_count] = filename;
> + next;
> +}
> +
> +{ fail("File '" filename "' is not regular file, directory, or symlink (mode " raw_mode ")"); }
> +
> +END {
> + if (exit_code) {
> + exit exit_code;
> + }
> + if (!done) {
> + fail("Did not receive DONE line", 1);
> + }
> + printf ("# SPDX-License-Identifier: CC0-1.0\n" \
> + "# SPDX-FileCopyrightText: 2025 Demi Marie Obenour <demiobenour@gmail.com>\n" \
Okay, so, it's silly that this needs to have a copyright header on it at
all, but since we have to have one to make reuse happy, I think it
should be mine from 2021, because the comment about links is the closest
thing to creative expression in here.
> + "# Generated by scripts/genfile.sh. Any changes will be overwritten.\n" \
> + "FILES ::=") > out_file;
I note the change to ::=. Do you think we should do that across the
board in our Makefiles?
> + for (array_index = 1; array_index <= file_count; array_index += 1) {
> + printf " \\\n\t%s", files[array_index] > out_file;
> + }
> + printf ("\n\n" \
> +"# These are separate because they need to be included, but putting\n" \
> +"# them as make dependencies would confuse make.\n" \
> +"LINKS ::=") > out_file;
> + for (array_index = 1; array_index <= symlink_count; array_index += 1) {
> + printf " \\\n\t%s", symlinks[array_index] > out_file;
> + }
> + printf "\n\nS6_RC_FILES ::=" > out_file;
> + for (array_index = 1; array_index <= rc_count; array_index += 1) {
> + printf " \\\n\t%s", rc_files[array_index] > out_file;
> + }
> + printf "\n" > out_file;
> + if (close(out_file)) {
> + print ("Cannot close output file: " ERRNO "\n") > "/dev/stderr";
> + exit 1;
> + }
> +}
> diff --git a/scripts/genfiles.sh b/scripts/genfiles.sh
> new file mode 100755
> index 0000000000000000000000000000000000000000..77a8d95e88b6851be9447698556efe4f1eab174b
> --- /dev/null
> +++ b/scripts/genfiles.sh
> @@ -0,0 +1,29 @@
> +#!/usr/bin/env -S LC_ALL=C LANGUAGE=C bash --
env -S is not portable, and I don't think anything here needs bash
specifically. We can set the locale variables after the script starts,
because I don't think this wrapper script is going to do anything
locale-specific. (And shouldn't they be C.UTF-8?)
> +set -euo pipefail
> +unset output_file astatus
This is a bit overly defensive IMO. Both of these variables are
assigned before use, and if they weren't, the person making those
changes would be very unlikely to not notice because they had those
variables defined in their environment.
> +case $0 in
> +(/*) cd "${0%/*}/..";;
> +(*/*) cd "./${0%/*}/..";;
> +(*) cd ..;;
> +esac
Perhaps we could use git rev-parse --show-toplevel?
> +for i in host/rootfs img/app vm/sys/net; do
> + output_file=$i/file-list.mk
> + {
> + git -C "$i" -c core.quotePath=true ls-files $'--format=%(objectmode)\t%(path)' -- image |
> + sort -t $'\t' -k 2
TIL sort -t and -k! 🤯
> + echo DONE
Why do we need this?
> + } |
> + gawk -v "out_file=$output_file.tmp" -E scripts/genfiles.awk
Why not stdout? And why gawk? I didn't immediately notice anything
non-POSIX, and as usual would prefer to stick to it.
> + if [ -f "$output_file" ]; then
> + # Avoid changing output file if it is up to date, as that
> + # would cause unnecessary rebuilds.
> + if cmp -s -- "$output_file.tmp" "$output_file"; then
> + rm -- "$output_file.tmp"
> + continue
> + else
> + astatus=$?
> + if [ "$astatus" != 1 ]; then exit "$astatus"; fi
Could avoid the need for the variable and multiple ifs. Up to you
whether you prefer it:
set +e
cmp -s -- "$output_file.tmp" "$output_file"
set -e
case $? in
0)
rm -- "$output_file.tmp"
continue
;;
1)
;;
*)
exit $?
;;
esac
> + fi
> + fi
> + mv -- "$output_file.tmp" "$output_file"
> +done
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]
next prev parent reply other threads:[~2025-09-21 8:48 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-04 1:56 [PATCH 0/4] Generate file lists from a script Demi Marie Obenour
2025-09-04 1:56 ` [PATCH 1/4] Move all files for the image into a subdirectory Demi Marie Obenour
2025-09-04 1:56 ` [PATCH 2/4] Generate makefile file lists from a script Demi Marie Obenour
2025-09-08 9:59 ` Alyssa Ross
2025-09-08 18:45 ` Demi Marie Obenour
2025-09-09 14:51 ` Alyssa Ross
2025-09-04 1:56 ` [PATCH 3/4] Common make rules for building erofs images Demi Marie Obenour
2025-09-08 10:01 ` Alyssa Ross
2025-09-08 18:53 ` Demi Marie Obenour
2025-09-09 14:56 ` Alyssa Ross
2025-09-04 1:56 ` [PATCH 4/4] Use /etc/s6-rc/compiled for compiled s6-rc directory Demi Marie Obenour
2025-09-10 5:29 ` [PATCH v2 0/3] Generate file lists from a script Demi Marie Obenour
2025-09-10 5:29 ` [PATCH v2 1/3] Move all files for the image into a subdirectory Demi Marie Obenour
2025-09-10 18:58 ` Alyssa Ross
2025-09-11 12:21 ` Demi Marie Obenour
2025-09-10 5:29 ` [PATCH v2 2/3] Generate makefile file lists from a script Demi Marie Obenour
2025-09-10 5:29 ` [PATCH v2 3/3] Common make rules for building erofs images Demi Marie Obenour
2025-09-11 12:47 ` [PATCH v3 0/4] Generate file lists from a script Demi Marie Obenour
2025-09-11 12:47 ` [PATCH v3 1/4] Do not ignore errors from tar Demi Marie Obenour
2025-09-17 11:48 ` Alyssa Ross
2025-09-18 2:45 ` Demi Marie Obenour
2025-09-19 7:46 ` Alyssa Ross
2025-09-30 12:59 ` Alyssa Ross
2025-09-19 7:55 ` Alyssa Ross
2025-09-19 19:03 ` Demi Marie Obenour
2025-09-11 12:47 ` [PATCH v3 2/4] Move all files for the image into a subdirectory Demi Marie Obenour
2025-09-17 12:30 ` Alyssa Ross
2025-09-17 12:39 ` Alyssa Ross
2025-09-17 13:03 ` Alyssa Ross
2025-09-11 12:47 ` [PATCH v3 3/4] Generate makefile file lists from a script Demi Marie Obenour
2025-09-11 12:47 ` [PATCH v3 4/4] Common make rules for building erofs images Demi Marie Obenour
2025-09-21 2:23 ` [PATCH v3] Generate file lists from a script Demi Marie Obenour
2025-09-21 8:47 ` Alyssa Ross [this message]
2025-09-21 16:51 ` Demi Marie Obenour
2025-09-21 17:07 ` Alyssa Ross
2025-09-21 17:24 ` [PATCH v4] " Demi Marie Obenour
2025-09-25 11:22 ` Alyssa Ross
2025-09-26 16:31 ` [PATCH v5] " Demi Marie Obenour
2025-09-27 8:19 ` Alyssa Ross
2025-09-27 8:42 ` Demi Marie Obenour
2025-09-27 16:22 ` [PATCH v6] " Demi Marie Obenour
2025-09-29 8:12 ` Alyssa Ross
2025-09-29 17:20 ` Demi Marie Obenour
2025-09-29 17:18 ` [PATCH v7] " Demi Marie Obenour
2025-10-01 9:20 ` Alyssa Ross
2025-10-01 9:24 ` Demi Marie Obenour
2025-10-01 9:35 ` Alyssa Ross
2025-10-01 18:30 ` [PATCH v8] " Demi Marie Obenour
2025-10-02 9:46 ` Alyssa Ross
2025-10-02 17:37 ` [PATCH v9] " Demi Marie Obenour
2025-10-03 9:04 ` Alyssa Ross
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87bjn4b39h.fsf@alyssa.is \
--to=hi@alyssa.is \
--cc=demiobenour@gmail.com \
--cc=devel@spectrum-os.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://spectrum-os.org/git/crosvm
https://spectrum-os.org/git/doc
https://spectrum-os.org/git/mktuntap
https://spectrum-os.org/git/nixpkgs
https://spectrum-os.org/git/spectrum
https://spectrum-os.org/git/ucspi-vsock
https://spectrum-os.org/git/www
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).