From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from atuin.qyliss.net (localhost [IPv6:::1]) by atuin.qyliss.net (Postfix) with ESMTP id D5C133F04; Sun, 21 Sep 2025 08:48:19 +0000 (UTC) Received: by atuin.qyliss.net (Postfix, from userid 993) id D3DC23EE7; Sun, 21 Sep 2025 08:48:17 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.1 (2024-03-26) on atuin.qyliss.net X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,DMARC_MISSING,RCVD_IN_DNSWL_LOW,SPF_HELO_PASS autolearn=unavailable autolearn_force=no version=4.0.1 Received: from fhigh-a6-smtp.messagingengine.com (fhigh-a6-smtp.messagingengine.com [103.168.172.157]) by atuin.qyliss.net (Postfix) with ESMTPS id ED67C3EE5 for ; Sun, 21 Sep 2025 08:48:15 +0000 (UTC) Received: from phl-compute-11.internal (phl-compute-11.internal [10.202.2.51]) by mailfhigh.phl.internal (Postfix) with ESMTP id 5185E14000D5; Sun, 21 Sep 2025 04:48:14 -0400 (EDT) Received: from phl-mailfrontend-01 ([10.202.2.162]) by phl-compute-11.internal (MEProxy); Sun, 21 Sep 2025 04:48:14 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=alyssa.is; h=cc :cc:content-type:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:subject :subject:to:to; s=fm3; t=1758444494; x=1758530894; bh=5LGeYVn1kh /j7zbIp6SgGyhMPfNIKE/gxrds32IeF2w=; b=o0bqF7R/+vKshzpOKDLZDXnuPp 1YazPhIL2y0Yd1yzNAGZPmYoHXJX6vMKJ0hqYykhFjjnpgtey2b43Iqr+FW5rVs3 RuZ97PVTX5DKQmNk+h3or04cNXg3pIVPSeSzAzAhXVVzBYiU3vnVBQOi7jj4dH/u xhhSobJCQNZJZdKKZfLDc/ZCRQdgzONuk83rNUi7pjiBF5J/pkQUgZg170DEJVOY Js79eaFhCAc8gt7/RpZawdfNDO0+jSY8GNDgZhnit8nSjiL6vVs+IiDLRV0kho5b GInDc9swDKzxO/dFSepJOc3MWzF6FAH9mK9/++fglosT3IJGtesd/YwVdoXA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:subject:subject:to :to:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm1; t= 1758444494; x=1758530894; bh=5LGeYVn1kh/j7zbIp6SgGyhMPfNIKE/gxrd s32IeF2w=; b=WuG4n/gduHM2pK1hiG1FvpcCqrXW1Wy1yw2GYjR1LiGS+pLOwCp iDdXOwc27n7xmfBOcGGnXppbU2/MynKckwO4aOa3tZBUdOpPpF8C+7T2nXp2zfDs W4R+OUxaTZD5jQ5pCrQkxYNLem2+4qrcQ9CMy8v3/CXWZLVNKqAZ+DFMjV9mmWwo 8DLot1Llm7WblSVISgTU4ZaFNQwNDd7ubNdk7WtjN3DjfOYewBSZHlDA2Oo6iM8O GwhVaPbHIREN8fIrJpsJOJLZL3OKFc7B7fPyq+JX488gG74hhgeKqVHqbIkoSEdx fq+6BWtkrWNORa9D/UUktoaDo475ibodd8Q== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeeffedrtdeggdehgeehvdcutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpuffrtefokffrpgfnqfghnecuuegr ihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenucfjug hrpefhvfevufgjfhffkfggtgesghdtreertddtjeenucfhrhhomheptehlhihsshgrucft ohhsshcuoehhihesrghlhihsshgrrdhisheqnecuggftrfgrthhtvghrnhepvdeuieelfe fgheelvddvjeevtedvfefgieehuddtfeejtdffjeeigeeuteejueeknecuffhomhgrihhn pehkvghrnhgvlhdrohhrghdpshhpvggtthhruhhmqdhoshdrohhrghenucevlhhushhtvg hrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehhihesrghlhihsshgrrdhi shdpnhgspghrtghpthhtohepvddpmhhouggvpehsmhhtphhouhhtpdhrtghpthhtohepug gvmhhiohgsvghnohhurhesghhmrghilhdrtghomhdprhgtphhtthhopeguvghvvghlsehs phgvtghtrhhumhdqohhsrdhorhhg X-ME-Proxy: Feedback-ID: i12284293:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Sun, 21 Sep 2025 04:48:13 -0400 (EDT) Received: by rock.qyliss.net (Postfix, from userid 1000) id 18FF528D357; Sun, 21 Sep 2025 10:47:57 +0200 (CEST) From: Alyssa Ross To: Demi Marie Obenour Subject: Re: [PATCH v3] Generate file lists from a script In-Reply-To: <20250920-genfiles-v3-1-d6c2b6767b42@gmail.com> References: <20250910-genfiles-v2-0-37ebe07a3cdc@gmail.com> <20250920-genfiles-v3-1-d6c2b6767b42@gmail.com> Date: Sun, 21 Sep 2025 10:47:54 +0200 Message-ID: <87bjn4b39h.fsf@alyssa.is> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha512; protocol="application/pgp-signature" Message-ID-Hash: PVDJQ7OPNPU5NWYRDUWZF2PBSQBXDAZA X-Message-ID-Hash: PVDJQ7OPNPU5NWYRDUWZF2PBSQBXDAZA X-MailFrom: hi@alyssa.is X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-devel.spectrum-os.org-0; header-match-devel.spectrum-os.org-1; header-match-devel.spectrum-os.org-2; header-match-devel.spectrum-os.org-3; header-match-devel.spectrum-os.org-4; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: Spectrum OS Development X-Mailman-Version: 3.3.9 Precedence: list List-Id: Patches and low-level development discussion Archived-At: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Demi Marie Obenour writes: > Right now, the makefiles in host/rootfs, vm/sys/net, and img/app have > manually-maintained lists of files and symlinks. These duplicate the > information in the git repository and can easily get out of sync or > cause unnecessary merge conflicts. Fix all of these issues by having > the git repository be the source of truth, and using a script to > generate the file lists. Developers can regenerate the lists before > every commit, or even add a git hook to do that. > > Signed-off-by: Demi Marie Obenour > --- I like where this is going. :) > Changes in v3: > - Only include the file list generator. Move the rest to separate patch > series. > - Remove the update-file-list make targets from img/app/Makefile and > vm/sys/net/Makefile. > - Link to v2: https://lore.kernel.org/r/20250910-genfiles-v2-0-37ebe07a3c= dc@gmail.com > > Changes in v2: > - Drop the last patch (switching to /etc/s6-rc/compiled) as it is > controversial and should be reviewed separately. > - Add missing copyright notices. > - Use a wrapper shell script to make the awk code easier to read. > - Improve documentation. > - Add helper scripts for use in git hooks and rebasing. > - Link to v1: https://spectrum-os.org/lists/archives/spectrum-devel/20250= 903-genfiles-v1-0-cc993fcb1e4c@gmail.com/ > --- > Documentation/development/built-in-vms.adoc | 17 ++++ > host/rootfs/Makefile | 102 +---------------------- > host/rootfs/file-list.mk | 99 +++++++++++++++++++++++ > img/app/Makefile | 80 +++---------------- > img/app/file-list.mk | 65 +++++++++++++++ > lib/common.mk | 1 + > scripts/genfiles.awk | 120 ++++++++++++++++++++++= ++++++ > scripts/genfiles.sh | 29 +++++++ > scripts/git-rebase | 17 ++++ > scripts/pre-commit.sh | 11 +++ Let's take git-rebase and pre-commit.sh out of this patch, and focus on the generated file lists first. > vm/sys/net/Makefile | 50 ++---------- > vm/sys/net/file-list.mk | 42 ++++++++++ > 12 files changed, 422 insertions(+), 211 deletions(-) > > diff --git a/Documentation/development/built-in-vms.adoc b/Documentation/= development/built-in-vms.adoc > index e90009ee5a3c2c254a7ae11e36121576b819eee7..0addc7d1a2fd322fa12918656= baa3d169478504d 100644 > --- a/Documentation/development/built-in-vms.adoc > +++ b/Documentation/development/built-in-vms.adoc Copyright header please! > @@ -44,6 +44,23 @@ NOTE: As a special convenience, it's not necessary to = run `make clean` > if the only change to the Nix files is modifying the packages > installed in the VM. >=20=20 > +The list of files used for the VM image is stored in a separate file, > +`file-lists.mk`. To update it, run `scripts/genfiles.sh` Typo: file-list*s*.mk. Also, so far we haven't used code syntax for file names. Maybe "used for images" would be better, since this also applies to host/rootfs. (Obviously the ideal would be if this documentation wasn't only written for VM images but that's out of scope. We'll get to it.) > +which will regenerate it from the output of `git ls-files`. Any > +changes you made will be lost. This script uses uses Git's index to I think "Any changes you made will be lost." is a bit scary, because it's not clear it only means changes to those files. The sentence could probably just be dropped altogether =E2=80=94 I think it's implied by "rege= nerate". > +generate the list, so you need to use `git add`, `git rm`, and `git mv` > +to ensure that Git knows about your changes. It is not necessary to > +commit the changes. "so only staged changes will be reflected"? All the extra stuff has potential for confusion I think =E2=80=94 for example "It is not necessary = to commit the changes." could be read as "when you make a commit, do not include changes to file-list.mk". > diff --git a/lib/common.mk b/lib/common.mk > index 277c3544036d9a9057f8ba4ad37fe2207548cc59..0a03ff440cc671264d2b859a2= ae048db9252d047 100644 > --- a/lib/common.mk > +++ b/lib/common.mk > @@ -1,5 +1,6 @@ > # SPDX-License-Identifier: EUPL-1.2+ > # SPDX-FileCopyrightText: 2021, 2023, 2025 Alyssa Ross > +# SPDX-FileCopyrightText: 2025 Demi Marie Obenour >=20=20 > BACKGROUND =3D background > CPIO =3D cpio Accident? > diff --git a/scripts/genfiles.awk b/scripts/genfiles.awk > new file mode 100644 > index 0000000000000000000000000000000000000000..6fe327fd0a314d226dbce2385= 4aa8f119e9c8f34 > --- /dev/null > +++ b/scripts/genfiles.awk > @@ -0,0 +1,120 @@ > +#!/usr/bin/env -S LC_ALL=3DC LANGUAGE=3DC awk -E > +# SPDX-License-Identifier: EUPL-1.2+ > +# SPDX-FileCopyrightText: 2025 Demi Marie Obenour > +BEGIN { > + RS =3D "\n"; > + FS =3D "\t"; > + file_count =3D 0; > + symlink_count =3D 0; > + rc_count =3D 0; > + is_rc =3D 0; > + exit_code =3D 0; > + done =3D 0; awk variables are implicitly initialized to 0 when you try to do arithmetic on an undefined variable, so no need for these. > + modes["120000"] =3D "symlink"; > + modes["040755"] =3D "directory"; > + modes["100644"] =3D "regular"; > + modes["100755"] =3D "regular"; > +} > + > +function fail(msg, status) { > + if (status ~ /^([1-9][0-9]?|1[0-9]{2}|2[0-4][1-9]|25[1-5])$/) { > + exit_code =3D status; > + } else { > + exit_code =3D 1; > + status =3D 1; > + } > + print ("FATAL: " msg) > "/dev/stderr"; > + exit status; Do we ever want to exit something other than 1 from this function? > +} > +done { fail("Junk after DONE", 1); } > +/^DONE$/ { > + done =3D 1 > + next > +} > + > +# Make sure git produced valid output. > +!/^[0-7]{6}\t[ -~]+$/ { > + fail("git ls-files produced invalid output", 1); > +} > + This is very unlikely to happen, and if it does, it will be obvious from the diff. > +# Extract data from built-in variables. > +{ > + filename =3D $2; > + raw_mode =3D $1; > + # awk autocreates empty string entries if the key is invalid, > + # but the code exits in this case so that is okay. > + mode =3D modes[raw_mode]; > +} > + > +# Another check for a git bug. > +filename ~ /^\/|((^|\/)\.{0,2}($|\/))/ { > + fail("git ls-files output non-canonical or absolute path '" filename "'= ", 1); > +} > + If there are git bugs, we will notice and report them. We do not need to be the test suite for git here. > +filename ~ /[^[:alnum:]_.+@/-]/ { > + fail("filename '" filename "' has forbidden characters", 1); > +} > + > +/\.license$/ { > + if (raw_mode !=3D "100644") { > + fail("License file '" filename "' is executable or not regular file", = 1); > + } > + next; > +} This is also not really in scope for a script that does not care about license files. > + > +mode =3D=3D "directory" { next } Getting a directory from git ls-files would be sufficiently unexpected that I don't think we should treat it any differently from an unrecognized mode. > + > +filename ~ /^image\/etc\/s6-rc\// { > + if (mode !=3D "regular") { > + fail("s6-rc-compile input '" filename "' isn't a regular file"); > + } > + rc_count +=3D 1; > + rc_files[rc_count] =3D filename; rc_files[rc_count++] (will make it 0-indexed though so update the loops too) > + next; > +} > + > +mode =3D=3D "symlink" { > + symlink_count +=3D 1; > + symlinks[symlink_count] =3D filename; > + next; > +} > + > +mode =3D=3D "regular" { > + file_count +=3D 1; > + files[file_count] =3D filename; > + next; > +} > + > +{ fail("File '" filename "' is not regular file, directory, or symlink (= mode " raw_mode ")"); } > + > +END { > + if (exit_code) { > + exit exit_code; > + } > + if (!done) { > + fail("Did not receive DONE line", 1); > + } > + printf ("# SPDX-License-Identifier: CC0-1.0\n" \ > + "# SPDX-FileCopyrightText: 2025 Demi Marie Obenour \n" \ Okay, so, it's silly that this needs to have a copyright header on it at all, but since we have to have one to make reuse happy, I think it should be mine from 2021, because the comment about links is the closest thing to creative expression in here. > + "# Generated by scripts/genfile.sh. Any changes will be overwri= tten.\n" \ > + "FILES ::=3D") > out_file; I note the change to ::=3D. Do you think we should do that across the board in our Makefiles? > + for (array_index =3D 1; array_index <=3D file_count; array_index +=3D 1= ) { > + printf " \\\n\t%s", files[array_index] > out_file; > + } > + printf ("\n\n" \ > +"# These are separate because they need to be included, but putting\n" \ > +"# them as make dependencies would confuse make.\n" \ > +"LINKS ::=3D") > out_file; > + for (array_index =3D 1; array_index <=3D symlink_count; array_index += =3D 1) { > + printf " \\\n\t%s", symlinks[array_index] > out_file; > + } > + printf "\n\nS6_RC_FILES ::=3D" > out_file; > + for (array_index =3D 1; array_index <=3D rc_count; array_index +=3D 1) { > + printf " \\\n\t%s", rc_files[array_index] > out_file; > + } > + printf "\n" > out_file; > + if (close(out_file)) { > + print ("Cannot close output file: " ERRNO "\n") > "/dev/stderr"; > + exit 1; > + } > +} > diff --git a/scripts/genfiles.sh b/scripts/genfiles.sh > new file mode 100755 > index 0000000000000000000000000000000000000000..77a8d95e88b6851be94476985= 56efe4f1eab174b > --- /dev/null > +++ b/scripts/genfiles.sh > @@ -0,0 +1,29 @@ > +#!/usr/bin/env -S LC_ALL=3DC LANGUAGE=3DC bash -- env -S is not portable, and I don't think anything here needs bash specifically. We can set the locale variables after the script starts, because I don't think this wrapper script is going to do anything locale-specific. (And shouldn't they be C.UTF-8?) > +set -euo pipefail > +unset output_file astatus This is a bit overly defensive IMO. Both of these variables are assigned before use, and if they weren't, the person making those changes would be very unlikely to not notice because they had those variables defined in their environment. > +case $0 in > +(/*) cd "${0%/*}/..";; > +(*/*) cd "./${0%/*}/..";; > +(*) cd ..;; > +esac Perhaps we could use git rev-parse --show-toplevel? > +for i in host/rootfs img/app vm/sys/net; do > + output_file=3D$i/file-list.mk > + { > + git -C "$i" -c core.quotePath=3Dtrue ls-files $'--format=3D%(objectmode= )\t%(path)' -- image | > + sort -t $'\t' -k 2 TIL sort -t and -k! =F0=9F=A4=AF > + echo DONE Why do we need this? > + } | > + gawk -v "out_file=3D$output_file.tmp" -E scripts/genfiles.awk Why not stdout? And why gawk? I didn't immediately notice anything non-POSIX, and as usual would prefer to stick to it. > + if [ -f "$output_file" ]; then > + # Avoid changing output file if it is up to date, as that > + # would cause unnecessary rebuilds. > + if cmp -s -- "$output_file.tmp" "$output_file"; then > + rm -- "$output_file.tmp" > + continue > + else > + astatus=3D$? > + if [ "$astatus" !=3D 1 ]; then exit "$astatus"; fi Could avoid the need for the variable and multiple ifs. Up to you whether you prefer it: set +e cmp -s -- "$output_file.tmp" "$output_file" set -e case $? in 0) rm -- "$output_file.tmp" continue ;; 1) ;; *) exit $? ;; esac > + fi > + fi > + mv -- "$output_file.tmp" "$output_file" > +done --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEEH9wgcxqlHM/ARR3h+dvtSFmyccAFAmjPu7oACgkQ+dvtSFmy ccAvgw/+Nt0dB8gsXrcHGCasJDKgSWLKOCI4tlif+rC+Dx9RdnDb8TgQKwYHuQVP 2M12XGNQ8vsQHNdsLCnxmhFua5KHtUK8j/NK5UEdzETjemsGQ44hT88oYppQ5n03 MYpicFG6qjy9l0ovLe2ZdStKy0ffnaXAI+UkPMu3BJxf4C6Ws1Q7oiYBrROOVgti O2mAehhgEafBDOdnT7o3ZeHVJoIsp26WvmwgDXayBPuub0XwOz3xdToxt74q+G3/ zcuIM54eeQEG/KS/AARUjs6UesLrXg70eYwg31vaEErAyvDOP879lHYEMHQNyXfP CjKOmFFmz3GaFL0UfmYAw9ClfrKr/IdooYY3nqlQRirdecRlqtHhu6imf/fFE/DU Rq3+ZAH8L1EdLP6XnWMwU6m2oG1HwuHY74/sGE88kIxA9I70H0SSTkS+PvaBIHey GxKJMBnF4jFsXK08q88zr86nu1HY71eP/x1E31wD41WXNXdA3ul7Xadyl5qay3FY bTqXEcFnPwDvWBOxR1VOJ/UUHe7ozIb+i045WkN5wZXLyU89tSz+hCebXIJMJo9y v6zRvUNhtZxeHx7YdrytfsKGt+KgiBEiAxY5fs7B8sCSd3bUcDgyilFNIyDbj1h9 YFe/jGIsLTDYXfSj3JFDN5ZZds7Vv3mA0tjdq/Wzyr+e6YB5gpQ= =+qNA -----END PGP SIGNATURE----- --=-=-=--