Skip to content

kiwi image builds fail with nspawn due to broken loop devices #1554

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Conan-Kudo opened this issue Feb 20, 2025 · 5 comments
Open

kiwi image builds fail with nspawn due to broken loop devices #1554

Conan-Kudo opened this issue Feb 20, 2025 · 5 comments

Comments

@Conan-Kudo
Copy link
Member

Conan-Kudo commented Feb 20, 2025

We've had this problem for a while now where kiwi image builds seem to fail in nspawn environments due to broken loop device nodes:

[ DEBUG   ]: 10:01:31 | EXEC: [losetup -f --show /builddir/kiwi-build/Fedora.x86_64-42.raw]
[ DEBUG   ]: 10:01:31 | Looking for systemd-id128 in /usr/bin:/bin:/usr/sbin:/sbin
[ DEBUG   ]: 10:01:31 | EXEC: [systemd-id128 show]
[ DEBUG   ]: 10:01:31 | Initialize gpt disk
[ DEBUG   ]: 10:01:31 | Looking for sgdisk in /usr/bin:/bin:/usr/sbin:/sbin
[ DEBUG   ]: 10:01:31 | EXEC: [sgdisk --zap-all /dev/loop0]
[ INFO    ]: 10:01:32 | --> creating EFI CSM(legacy bios) partition
[ DEBUG   ]: 10:01:32 | Looking for sgdisk in /usr/bin:/bin:/usr/sbin:/sbin
[ DEBUG   ]: 10:01:32 | EXEC: [sgdisk -n 1:2048:+2M -c 1:p.legacy /dev/loop0]
[ DEBUG   ]: 10:01:33 | Looking for sgdisk in /usr/bin:/bin:/usr/sbin:/sbin
[ DEBUG   ]: 10:01:33 | EXEC: [sgdisk -t 1:EF02 /dev/loop0]
[ INFO    ]: 10:01:34 | --> creating EFI partition
[ DEBUG   ]: 10:01:34 | Looking for sgdisk in /usr/bin:/bin:/usr/sbin:/sbin
[ DEBUG   ]: 10:01:34 | EXEC: [sgdisk -n 2:0:+500M -c 2:p.UEFI /dev/loop0]
[ DEBUG   ]: 10:01:35 | Looking for sgdisk in /usr/bin:/bin:/usr/sbin:/sbin
[ DEBUG   ]: 10:01:35 | EXEC: [sgdisk -t 2:EF00 /dev/loop0]
[ DEBUG   ]: 10:01:36 | Looking for sgdisk in /usr/bin:/bin:/usr/sbin:/sbin
[ DEBUG   ]: 10:01:36 | EXEC: [sgdisk --typecode 2:c12a7328f81f11d2ba4b00a0c93ec93b /dev/loop0]
[ INFO    ]: 10:01:37 | --> creating boot partition [with 0 clone(s)]
[ DEBUG   ]: 10:01:37 | Looking for sgdisk in /usr/bin:/bin:/usr/sbin:/sbin
[ DEBUG   ]: 10:01:37 | EXEC: [sgdisk -n 3:0:+1024M -c 3:p.lxboot /dev/loop0]
[ DEBUG   ]: 10:01:38 | Looking for sgdisk in /usr/bin:/bin:/usr/sbin:/sbin
[ DEBUG   ]: 10:01:38 | EXEC: [sgdisk -t 3:8300 /dev/loop0]
[ DEBUG   ]: 10:01:39 | Looking for sgdisk in /usr/bin:/bin:/usr/sbin:/sbin
[ DEBUG   ]: 10:01:39 | EXEC: [sgdisk --typecode 3:bc13c2ff59e64262a352b275fd6f7172 /dev/loop0]
[ INFO    ]: 10:01:40 | --> Using all_freeMB for the root(rw) partition if present
[ INFO    ]: 10:01:40 | --> creating root partition [with 0 clone(s)]
[ DEBUG   ]: 10:01:40 | Looking for sgdisk in /usr/bin:/bin:/usr/sbin:/sbin
[ DEBUG   ]: 10:01:40 | EXEC: [sgdisk -n 4:0:0 -c 4:p.lxroot /dev/loop0]
[ DEBUG   ]: 10:01:41 | Looking for sgdisk in /usr/bin:/bin:/usr/sbin:/sbin
[ DEBUG   ]: 10:01:41 | EXEC: [sgdisk -t 4:8300 /dev/loop0]
[ DEBUG   ]: 10:01:42 | Looking for sgdisk in /usr/bin:/bin:/usr/sbin:/sbin
[ DEBUG   ]: 10:01:42 | EXEC: [sgdisk --typecode 4:4f68bce3e8cd4db196e7fbcaf984b709 /dev/loop0]
[ DEBUG   ]: 10:01:43 | Looking for partx in /usr/bin:/bin:/usr/sbin:/sbin
[ DEBUG   ]: 10:01:43 | EXEC: [partx --add /dev/loop0]
[ INFO    ]: 10:01:43 | Cleaning up Disk instance
[ DEBUG   ]: 10:01:43 | Looking for partx in /usr/bin:/bin:/usr/sbin:/sbin
[ DEBUG   ]: 10:01:43 | EXEC: [partx --delete /dev/loop0]
[ DEBUG   ]: 10:01:43 | Looking for losetup in /usr/bin:/bin:/usr/sbin:/sbin
[ DEBUG   ]: 10:01:43 | EXEC: [losetup -d /dev/loop0]
[ ERROR   ]: 10:01:43 | KiwiMappedDeviceError: Device /dev/loop0p1 does not exist

I've got a test case script to reproduce the issue:

#!/bin/bash -euxo pipefail
# Author: Neal Gompa
# SPDX-License-Identifier: MIT

if [ -z "$1" ] || [ -z "$2" ] || [ -z "$3" ]; then
echo "Missing args!"
exit 1
fi

if [ -n "$1" ]; then
        if [[ "$1" =~ ^[0-9][0-9]$ ]]; then
                releasever="$1"
                branchver="f${1}"
        else
                releasever="$1"
                branchver="$1"
        fi
fi

mock_base="mock --root fedora-${releasever}-x86_64 --no-bootstrap-image --isolation=nspawn"
kiwi_build_cmd="kiwi-ng --debug --type=${2} --profile=${3} --kiwi-file=Fedora.kiwi system build --description=/builddir/fedora-kiwi-descriptions --target-dir=/builddir/kiwi-build"

$mock_base --clean
$mock_base --init
$mock_base --install kiwi-cli kiwi-systemdeps distribution-gpg-keys git-core
$mock_base --enable-network --chroot "git clone -b ${branchver} https://pagure.io/fedora-kiwi-descriptions.git /builddir/fedora-kiwi-descriptions"
$mock_base --enable-network --chroot "$kiwi_build_cmd"

This can be invoked like so: ./kiwi-mockbuild.sh 42 oem Tiny-Disk

We've done some fixes in the past about device nodes in nspawn, but this issue is rather baffling.

@Conan-Kudo
Copy link
Member Author

cc @DaanDeMeyer @keszybz if either of you can help with nspawn debugging.

@Conan-Kudo
Copy link
Member Author

Conan-Kudo commented Feb 20, 2025

@supakeen pointed me to systemd/systemd#6553 as the underlying issue. We've theorized that if Mock creates its own transient machine unit instead of letting nspawn do it, we can set it up in such a way that we can create the nodes the same way we do for regular chroots, which hopefully resolves the problem.

@Conan-Kudo
Copy link
Member Author

FYI @nirik

@supakeen
Copy link

supakeen commented Feb 20, 2025

Yea, kiwi isn't the only one running into this issue. I'm currently working on having image-builder work on koji builders as well where we need to opt for the "simple" isolation model, which for us causes other issues.

So far I've tried a few things:

  1. Add --capability CAP_SYS_ADMIN to nspawn_args in the mock config.
  2. Add --property 'DeviceAllow=/dev/loop* rwm' in the mock config.

Both of those get applied but we can't mknod inside the mock root yet.

My hunch is that while we're allowed to touch /dev/loop-control, and the /dev/loopX devices is that when we mount one of those a bunch of /dev/loopXpY devices should get created. However whatever wants to create them losetup in the case of kiwi, and image-builder calls mknod directly this is then denied and we fail on that.

Personally I think there's probably some magical incantation with --property, --bind, and other friends to allow this to work but we haven't found it yet.


For what it's worth, image-builder manually creates the loop devices as we've had previous issues along the same lines in containers. I'll tag in @mvo5 as he's worked on that bit previously and I might be misunderstanding.


Update: I misremembered, nowadays with image-builder we remount /dev as devtmpfs instead of doing the loop device node creation ourselves, when we detect that we are running in a container; see: https://github.com/osbuild/bootc-image-builder/blob/main/bib/internal/setup/setup.go#L19

@nikromen nikromen moved this from Needs triage to In 3 months in CPT Kanban Feb 26, 2025
@Conan-Kudo
Copy link
Member Author

@supakeen pointed me to systemd/systemd#6553 as the underlying issue. We've theorized that if Mock creates its own transient machine unit instead of letting nspawn do it, we can set it up in such a way that we can create the nodes the same way we do for regular chroots, which hopefully resolves the problem.

Just for everyone's edification, this did not work. The problem is that we cannot create and discover partitions in loops, since those loopXpY nodes do not show in the nspawn environment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants