Skip to content

support recursive read only (rro) option for mounts #25680

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions libpod/container_internal_common.go
Original file line number Diff line number Diff line change
Expand Up @@ -374,6 +374,27 @@ func (c *Container) generateSpec(ctx context.Context) (s *spec.Spec, cleanupFunc
// Podman decided for --no-dereference as many
// bin-utils tools (e..g, touch, chown, cp) do.
options = append(options, "copy-symlink")
// TODO: this also ends up checking non-user mounts
case "ro", "rro":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why check rro here? It seems like we should always pass it along unmodified, even if unsupported - if the user explicitly requested rro and we silently downgrade, that seems like a problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, rro must be passed as it is.

Also we need to have a way to restore the previous behavior. If a user really wants a ro, there is no way to achieve it when we always force ro -> rro

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why check rro here? It seems like we should always pass it along unmodified, even if unsupported - if the user explicitly requested rro and we silently downgrade, that seems like a problem.

Downgrade shouldn't happen. The intent for an rro check here is to upgrade ro to rro when supported and to error out when a user explicitly requests rro and it isn't supported.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also we need to have a way to restore the previous behavior. If a user really wants a ro, there is no way to achieve it when we always force ro -> rro

But why? I can understand for backwards compat but that is wrong from a security point of view. I don't know how bad the impact would be when we roll out this change, but we can introduce a non-recursive (norro) as part of the transition till the buggy ro behaviour fades out. Wdyt?

// There are 2 cases:
// 1. User requests `rro`
// * Return error if runtime does not support `rro`
// 2. User requests `ro`
// * Use `rro` if runtime supports `rro`
// * Use `ro` if runtime does not support `rro`
rro := true
if err := util.SupportsRecursiveReadonly(c.ociRuntime.Features()); err != nil {
rro = false
if o == "rro" {
return nil, nil, err
}
}

if rro {
options = append(options, "rro")
} else {
options = append(options, "ro")
}
default:
options = append(options, o)
}
Expand Down
4 changes: 4 additions & 0 deletions libpod/oci.go
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ import (
"github.com/containers/common/pkg/resize"
"github.com/containers/podman/v5/libpod/define"
"github.com/opencontainers/runtime-spec/specs-go"
"github.com/opencontainers/runtime-spec/specs-go/features"
)

// OCIRuntime is an implementation of an OCI runtime.
Expand Down Expand Up @@ -131,6 +132,9 @@ type OCIRuntime interface { //nolint:interfacebloat
// without KVM separation
SupportsKVM() bool

// Features returns the features struct from the OCI runtime
Features() *features.Features

// AttachSocketPath is the path to the socket to attach to a given
// container.
// TODO: If we move Attach code in here, this should be made internal.
Expand Down
27 changes: 27 additions & 0 deletions libpod/oci_conmon_common.go
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ import (
"github.com/containers/podman/v5/utils"
"github.com/containers/storage/pkg/idtools"
spec "github.com/opencontainers/runtime-spec/specs-go"
"github.com/opencontainers/runtime-spec/specs-go/features"
"github.com/sirupsen/logrus"
"golang.org/x/sys/unix"
)
Expand Down Expand Up @@ -66,6 +67,7 @@ type ConmonOCIRuntime struct {
supportsNoCgroups bool
enableKeyring bool
persistDir string
features *features.Features
}

// Make a new Conmon-based OCI runtime with the given options.
Expand Down Expand Up @@ -131,6 +133,12 @@ func newConmonOCIRuntime(name string, paths []string, conmonPath string, runtime
break
}

features, err := runtime.getOCIRuntimeFeatures()
if err != nil {
return nil, fmt.Errorf("getting %s features: %w", runtime.name, err)
}
runtime.features = features
Comment on lines +136 to +140
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this is not acceptable for performance reasons. Every podman command would then have to run this even when the vast majority of them will never need this.
I know the features command seems very fast but it is still a total overhead of a few ms. And we have users on embedded systems where they are much more concerned about every small thing.

Second how many runtimes actually support the features command? Right not this hard error which means it can cause regressions.

I see three options:

  • don't for any feature checks and assume rro is always supported (that means we would need to bump the kernel baseline and document what runtimes it support)
  • wrap this into a sync.OnceValues() function so it is only called onca and only when needed.
  • add a compile time option for podman to specify if rro should be used. Distros that ship podman with a new kernel and runtime then don't have to pay the feature check overhead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have also, in the past, added fields to containers.conf to indicate which runtimes supported certain features - see supportsNoCgroups for example. Adding to containers.conf is not perfect (it shifts the burden to the packager to make sure their combination of kernel + runtime version supports RRO) but it is very performant.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, I wonder if we could lazy-initialize this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ro is a common option, so we will always end up with the price of running $OCI_RUNTIME features on each invocation. Also there is the risk with podman system service that we will not detect when the OCI runtime was updated.

IMO we could cache the result under the tmpdir using the runtime path as the key, and then use the runtime binary stat (st_dev, st_ino and mtime) information to detect when the cache is out of date.

What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, I wonder if we could lazy-initialize this

That is my suggest 2, using sync.OnceValues()

ro is a common option, so we will always end up with the price of running $OCI_RUNTIME features on each invocation

How so? Any command that does not start a container will certainly never need it. podman stop, podman rm, podman ps, podman logs, etc... How I look at this only the minority will actually need this.

IMO we could cache the result under the tmpdir using the runtime path as the key, and then use the runtime binary stat (st_dev, st_ino and mtime) information to detect when the cache is out of date.

Yes that sounds good to me, there is an overhead for stat and reading the file but compared to the rest we do that is likely not a problem. Should be much better than spawning a new process each time.


Even worse reading the code again we actually call newConmonOCIRuntime() on all runtimes from containers.conf so we pay the overhead multiple times for all runtimes which really doesn't feel right.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ro is a common option, so we will always end up with the price of running $OCI_RUNTIME features on each invocation

How so? Any command that does not start a container will certainly never need it. podman stop, podman rm, podman ps, podman logs, etc... How I look at this only the minority will actually need this.

I was referring only to the commands that run a container, and that is where we care more about performance. sync.OnceValues() won't work with system service as we need to check that the runtime didn't change.

On my system I've:

➜ hyperfine 'crun features'
Benchmark 1: crun features
  Time (mean ± σ):       2.3 ms ±   0.6 ms    [User: 0.8 ms, System: 1.6 ms]
  Range (min … max):     1.6 ms …   6.2 ms    553 runs

a stat() is much faster than that. If we really care about performance we could even consider statx() and query only the information we need. Most of the price will come from dealing with json anyway, so maybe we want to store the information in a different format and store only the bits we care about.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the price will come from dealing with json anyway, so maybe we want to store the information in a different format and store only the bits we care about.

Sounds like a good idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(sorry for the multi-week delay in response)


wrap this into a sync.OnceValues() function so it is only called onca and only when needed.

I prefer lazy-initializing the runtime feature over the other two options suggested by @Luap99 as well. As far as caching the output under tmpdir, I'm under the assumption--as @giuseppe pointed out--that it's to avoid a situation where the runtime changes. It shouldn't make too much of a dent in terms of performance.


// Search the $PATH as last fallback
if !foundPath {
if foundRuntime, err := exec.LookPath(name); err == nil {
Expand Down Expand Up @@ -839,6 +847,11 @@ func (r *ConmonOCIRuntime) SupportsKVM() bool {
return r.supportsKVM
}

// Features returns the features struct from the OCI runtime
func (r *ConmonOCIRuntime) Features() *features.Features {
return r.features
}

// AttachSocketPath is the path to a single container's attach socket.
func (r *ConmonOCIRuntime) AttachSocketPath(ctr *Container) (string, error) {
if ctr == nil {
Expand Down Expand Up @@ -1485,6 +1498,20 @@ func (r *ConmonOCIRuntime) getConmonVersion() (string, error) {
return strings.TrimSuffix(strings.Replace(output, "\n", ", ", 1), "\n"), nil
}

func (r *ConmonOCIRuntime) getOCIRuntimeFeatures() (*features.Features, error) {
var features *features.Features
output, err := utils.ExecCmd(r.path, "features")
if err != nil {
return features, err
}

if jsonErr := json.Unmarshal([]byte(output), &features); jsonErr != nil {
return features, err
}

return features, nil
}

// getOCIRuntimeVersion returns a string representation of the OCI runtime's
// version.
func (r *ConmonOCIRuntime) getOCIRuntimeVersion() (string, error) {
Expand Down
6 changes: 6 additions & 0 deletions libpod/oci_missing.go
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ import (
"github.com/containers/common/pkg/resize"
"github.com/containers/podman/v5/libpod/define"
spec "github.com/opencontainers/runtime-spec/specs-go"
"github.com/opencontainers/runtime-spec/specs-go/features"
"github.com/sirupsen/logrus"
)

Expand Down Expand Up @@ -194,6 +195,11 @@ func (r *MissingRuntime) SupportsKVM() bool {
return false
}

// Features returns nil since this is a missing runtime
func (r *MissingRuntime) Features() *features.Features {
return nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should still return an error, so I recommend (*features.Features, error) - the missing runtime can have methods called on it and we want to avoid null pointer panics

}

// AttachSocketPath does not work as there is no runtime to attach to.
// (Theoretically we could follow ExitFilePath but there is no guarantee the
// container is running and thus has an attach socket...)
Expand Down
3 changes: 1 addition & 2 deletions pkg/specgen/volumes.go
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,7 @@ type OverlayVolume struct {
}

// ImageVolume is a volume based on a container image. The container image is
// first mounted on the host and is then bind-mounted into the container. An
// ImageVolume is always mounted read-only.
// first mounted on the host and is then bind-mounted into the container.
type ImageVolume struct {
// Source is the source of the image volume. The image can be referred
// to by name and by ID.
Expand Down
36 changes: 20 additions & 16 deletions pkg/specgenutil/volumes.go
Original file line number Diff line number Diff line change
Expand Up @@ -345,34 +345,38 @@ func parseMountOptions(mountType string, args []string) (*universalMount, error)
} else {
mnt.mount.Options = append(mnt.mount.Options, "idmap")
}
case "readonly", "ro", "rw":
case "readonly", "ro", "recursivereadonly", "rro", "rw":
if setRORW {
return nil, fmt.Errorf("cannot pass 'readonly', 'ro', or 'rw' mnt.Options more than once: %w", errOptionArg)
return nil, fmt.Errorf("cannot pass 'readonly', 'ro', 'rro', 'recursivereadonly' or 'rw' options more than once: %w", errOptionArg)
}
setRORW = true

// Can be formatted as one of:
// readonly
// readonly=[true|false]
// ro
// ro=[true|false]
// recursivereadonly
// recursivereadonly=[true|false]
// rro
// rro=[true|false]
// rw
// rw=[true|false]
if name == "readonly" {
name = "ro"
}
if hasValue {
switch strings.ToLower(value) {
case "true":
mnt.mount.Options = append(mnt.mount.Options, name)
case "false":
// Set the opposite only for rw
// ro's opposite is the default
if name == "rw" {
mnt.mount.Options = append(mnt.mount.Options, "ro")
switch name {
case "rro", "recursivereadonly":
mnt.mount.Options = append(mnt.mount.Options, "rro")
case "ro", "readonly":
mnt.mount.Options = append(mnt.mount.Options, "ro")
case "rw":
if hasValue {
switch strings.ToLower(value) {
case "true":
mnt.mount.Options = append(mnt.mount.Options, name)
case "false":
// default to rro instead of ro
mnt.mount.Options = append(mnt.mount.Options, "rro")
}
}
} else {
mnt.mount.Options = append(mnt.mount.Options, name)
}
case "nodev", "dev":
if setDev {
Expand Down
2 changes: 1 addition & 1 deletion pkg/util/mount_opts.go
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ func processOptionsInternal(options []string, isTmpfs bool, sourcePath string, g
return nil, fmt.Errorf("only one of 'nodev' and 'dev' can be used: %w", ErrDupeMntOption)
}
foundDev = true
case "rw", "ro":
case "rw", "ro", "rro":
if foundWrite {
return nil, fmt.Errorf("only one of 'rw' and 'ro' can be used: %w", ErrDupeMntOption)
}
Expand Down
76 changes: 75 additions & 1 deletion pkg/util/utils_linux.go
Original file line number Diff line number Diff line change
Expand Up @@ -6,21 +6,28 @@ import (
"io/fs"
"os"
"path/filepath"
"slices"
"strconv"
"strings"
"sync"
"syscall"

"github.com/containers/podman/v5/libpod/define"
"github.com/containers/podman/v5/pkg/rootless"
"github.com/containers/psgo"
spec "github.com/opencontainers/runtime-spec/specs-go"
"github.com/opencontainers/runtime-spec/specs-go/features"
"github.com/opencontainers/runtime-tools/generate"
"github.com/sirupsen/logrus"
"golang.org/x/sys/unix"
)

var (
errNotADevice = errors.New("not a device node")
errNotADevice = errors.New("not a device node")
errKernelDoesNotSupportRRO = errors.New("kernel does not support recursive readonly mount option `rro`")
errRuntimeDoesNotSupportRRO = errors.New("runtime does not support recursive readonly mount option `rro`")

kernelSupportsRROOnce sync.Once
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new code should use sync.OnceValue(s) instead

)

// GetContainerPidInformationDescriptors returns a string slice of all supported
Expand Down Expand Up @@ -258,3 +265,70 @@ func DeviceFromPath(path string) (*spec.LinuxDevice, error) {
Minor: int64(unix.Minor(devNumber)),
}, nil
}

// kernelSupportsRecursivelyReadOnly returns true if the kernel supports recursive readonly mounts
// from https://github.com/moby/moby/blob/master/daemon/daemon_linux.go#L222
func kernelSupportsRecursivelyReadOnly() error {
fn := func() error {
tmpMnt, err := os.MkdirTemp("", "podman-detect-rro")
if err != nil {
return fmt.Errorf("failed to create a temp directory: %w", err)
}
for {
err = unix.Mount("", tmpMnt, "tmpfs", 0, "")
if !errors.Is(err, unix.EINTR) {
break
}
}
if err != nil {
return fmt.Errorf("failed to mount tmpfs on %q: %w", tmpMnt, err)
}
defer func() {
var umErr error
for {
umErr = unix.Unmount(tmpMnt, 0)
if !errors.Is(umErr, unix.EINTR) {
break
}
}
if umErr != nil {
logrus.Errorf("Failed to unmount %q: %v", tmpMnt, umErr)
}
}()
Comment on lines +273 to +297
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use open_tree or fsopen so we don't have to create a mount that could be leaked to the host.

attr := &unix.MountAttr{
Attr_set: unix.MOUNT_ATTR_RDONLY,
}
for {
err = unix.MountSetattr(-1, tmpMnt, unix.AT_RECURSIVE, attr)
if !errors.Is(err, unix.EINTR) {
break
}
}
// ENOSYS on kernel < 5.12
if err != nil {
return fmt.Errorf("failed to call mount_setattr with AT_RECURSIVE: %w", err)
}
return nil
}

kernelSupportsRROOnce.Do(func() {
errKernelDoesNotSupportRRO = fn()
})
return errKernelDoesNotSupportRRO
Comment on lines +314 to +317
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can simplify this by using sync.OnceFunc.

}

// SupportsRecursiveReadonly returns true if the runtime supports recursive readonly mounts
func SupportsRecursiveReadonly(features *features.Features) error {
if err := kernelSupportsRecursivelyReadOnly(); err != nil {
return err
}

if features == nil || features.MountOptions == nil {
return errRuntimeDoesNotSupportRRO
}
if !slices.Contains(features.MountOptions, "rro") {
return errRuntimeDoesNotSupportRRO
}

return nil
}
25 changes: 25 additions & 0 deletions test/system/060-mount.bats
Original file line number Diff line number Diff line change
Expand Up @@ -573,3 +573,28 @@ glob | /* | /mountroot/ | in

run_podman rmi -f $img
}

# bats test_tags=ci:parallel
@test "podman bind mount rro" {
skip_if_rootless

volName="vol-$(safename)"
volPath=${PODMAN_TMPDIR}/$volName
mkdir -p $volPath

mount -t tmpfs tmpfs $volPath
mkdir -p $volPath/foo $volPath/bar
mount -t tmpfs tmpfs $volPath/foo
mount -t tmpfs tmpfs $volPath/bar

run_podman 1 run --rm -it -v $volPath/:/tmp/mounts:rro $IMAGE touch /tmp/mounts/foo/test
assert "$output" =~ "Read-only file system" "Error should indicate read-only filesystem"

run_podman 1 run --rm -it --mount type=bind,source=$volPath,destination=/tmp/mounts,recursivereadonly=true $IMAGE touch /tmp/mounts/bar/test
assert "$output" =~ "Read-only file system" "Error should indicate read-only filesystem"

umount $volPath/foo
umount $volPath/bar
umount $volPath
rm -rf $volPath
}
Loading