Skip to content

Optionally make exec session terminate with parent #1204

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ghost opened this issue Nov 15, 2022 · 15 comments
Open

Optionally make exec session terminate with parent #1204

ghost opened this issue Nov 15, 2022 · 15 comments

Comments

@ghost
Copy link

ghost commented Nov 15, 2022

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind feature

Description

Add a flag to podman exec to make exec session terminate with parent, similar to bubblewrap's bwrap --die-with-parent.

Immutable distributions make use of toolbox / distrobox to provide a mutable environment. A common use is to run commands directly within container (toolbox run [COMMAND] / distrobox enter -- [COMMAND]), since they use exec session, they have the same limitation of not terminating child-proceess when terminal emulator is closed.

Steps to reproduce the issue:

  1. Open System Monitor / Task Manager equivilent in your desktop environment, search for sleep

  2. Run the following command in your terminal emulator (either one will work):

podman:

podman run --rm -it \
    --name debian \
    --entrypoint /bin/sh \
    docker.io/library/debian:11

# In a new terminal emulator window
podman exec debian sleep 30

toolbox:

toolbox create
toolbox run sleep 30

distrobox:

distrobox create
distrobox enter -- sleep 30
  1. Then try to close terminal emulator, it'll prompt something like this:

image

  1. Insist closing it, then look at System Monitor

Describe the results you received:

sleep 30 still runs within container.

Describe the results you expected:

Nah, this is expected, hence this feature request.

Additional information you deem important (e.g. issue happens only occasionally):

Output of podman version:

Client:       Podman Engine
Version:      4.2.1
API Version:  4.2.1
Go Version:   go1.18.5
Built:        Thu Sep  8 03:58:19 2022
OS/Arch:      linux/amd64

Output of podman info:

Click me
host:
  arch: amd64
  buildahVersion: 1.27.0
  cgroupControllers:
  - cpu
  - io
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.4-3.fc36.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.4, commit: '
  cpuUtilization:
    idlePercent: 60.53
    systemPercent: 23.27
    userPercent: 16.2
  cpus: 4
  distribution:
    distribution: fedora
    variant: silverblue
    version: "36"
  eventLogger: journald
  hostname: fedora
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 6.0.5-200.fc36.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 224858112
  memTotal: 16705081344
  networkBackend: netavark
  ociRuntime:
    name: crun
    package: crun-1.6-2.fc36.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.6
      commit: 18cf2efbb8feb2b2f20e316520e0fd0b6c41ef4d
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  remoteSocket:
    exists: true
    path: /run/user/1000/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.0-0.2.beta.0.fc36.x86_64
    version: |-
      slirp4netns version 1.2.0-beta.0
      commit: 477db14a24ff1a3de3a705e51ca2c4c1fe3dda64
      libslirp: 4.6.1
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.3
  swapFree: 28844679168
  swapTotal: 34359734272
  uptime: 42h 55m 22.00s (Approximately 1.75 days)
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
  - quay.io
store:
  configFile: /var/home/user/.config/containers/storage.conf
  containerStore:
    number: 2
    paused: 0
    running: 1
    stopped: 1
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /var/home/user/.local/share/containers/storage
  graphRootAllocated: 510389125120
  graphRootUsed: 201430126592
  graphStatus:
    Backing Filesystem: btrfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 104
  runRoot: /run/user/1000/containers
  volumePath: /var/home/user/.local/share/containers/storage/volumes
version:
  APIVersion: 4.2.1
  Built: 1662580699
  BuiltTime: Thu Sep  8 03:58:19 2022
  GitCommit: ""
  GoVersion: go1.18.5
  Os: linux
  OsArch: linux/amd64
  Version: 4.2.1

Package info (e.g. output of rpm -q podman or apt list podman or brew info podman):

podman-4.2.1-2.fc36.x86_64

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/main/troubleshooting.md)

  • Latest: No
  • Troubleshooting: Yes

Additional environment details (AWS, VirtualBox, physical, etc.):

Fedora Silverblue 36

@mheon
Copy link
Member

mheon commented Nov 15, 2022

It's more accurate to say that we can't make them not terminate when the container exits. The kernel enforces the rule that any PID namespace will kill every process in the namespace if PID 1 in the namespace dies; Podman will take down PID 1, guaranteeing that the kernel will unwind the rest of the namespace. For containers without a PID namespace, it's a bit trickier, but we do have an accurate list of processes in the container, which we then individually kill as part of stopping the container. In short, I strongly doubt your Podman reproducer actually does what you think it does; the kernel simply won't allow that to happen.

@ghost
Copy link
Author

ghost commented Nov 15, 2022

Sorry if i didn't explain it deeper. I'm not talking about the process being lingering when container is stopped, as this had never been the case, just as you said.

With toolbox / distrobox executing commands inside of container, the container is NOT stopped after the command is finished.

And the parent I'm talking isn't PID 1 of the container, but the podman exec process in terminal emulator. I guess a better term should be used here, since structurally podman exec process isn't a direct parent of container process.

This is a screenshot which represent the issue better:

image

If I try to close the terminal emulator, it'll prompt the following:

image

If I press "Close Terminal", sh, toolbox and podman (which runs exec command) will be terminated because they're child process of the vte session.

However, notice the conmon and its child process sleep aren't part of gnome-terminal-server. When sh is terminated, podman (exec command) will be terminated but the corresponding conmon process will be kept intact. As a result, sleep 30 isn't terminated properly.

And sleep 30 is only used for demonstration. In reality one could run something resource intensive, and then close the terminal emulator not knowing they're lingering in the background.

This is probably only an issue for pet container usecase. toolbox / distrobox tends to start a trap program inside container to keep it running. Anything interactive is executed by podman exec, hence this issue.

The feature request, to be precise, is to add an optional flag that make the conmon process terminates when the corresponding podman process is dead.

@Luap99
Copy link
Member

Luap99 commented Nov 15, 2022

@mheon I think the request is basically to not double fork conmon and not let it create a new process group to keep it attached to the podman parent process.

@ghost
Copy link
Author

ghost commented Nov 15, 2022

Yes. This would work as well.

@mheon
Copy link
Member

mheon commented Nov 15, 2022

@Luap99 Don't know if that works. Conmon dying is only going to take out the first PID the exec session started; anything else it did, probably just reparents on top of PID 1 in the container. So we can definitely kill a single-process exec session, but a podman exec -ti $ctr bash like Toolbox does, we only get bash, not anything bash was doing (unless the shell automatically kills its children on exit, not something we can guarantee for every program).

We don't really have a robust way of tracking what processes were spawned from an exec session right now. We'd basically have to walk the process tree in the container, which seems potentially racy. On CGv2, a child cgroup might be a solution? Just need to make sure it doesn't interfere with the container itself being stopped...

@rhatdan
Copy link
Member

rhatdan commented Nov 15, 2022

I believe it walks the cgroup and kills all of the pids within the cgroup, or at least I remember this is what we wrote many years ago.

@debarshiray
Copy link
Member

I wonder if Toolbx could detect this scenario and explicitly terminate the process that it had launched inside the container.

@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

@rhatdan
Copy link
Member

rhatdan commented Dec 18, 2022

This seems like more of an issue for toolbx rather then podman.

@rhatdan rhatdan transferred this issue from containers/podman Dec 18, 2022
@debarshiray
Copy link
Member

This seems like more of an issue for toolbx rather then podman.

Umm... it's not really clear to me what Toolbx could do here. Is there a recommended way to get to the process ID of the conmon process?

@debarshiray
Copy link
Member

@Luap99 Don't know if that works. Conmon dying is only going to take out the first PID the exec session started;

I think it's good enough if conmon died and took out the first PID that the exec session started, because ...

anything else it did, probably just reparents on top of PID 1 in the container. So we can definitely kill a single-process exec session, but a podman exec -ti $ctr bash like Toolbox does, we only get bash, not anything bash was doing (unless the shell automatically kills its children on exit, not something we can guarantee for every program).

... if this was a shell directly running on the host without involving any containers, the expectation is that closing the terminal emulator takes out the shell and anything that's willing to die with it. If someone started a process in the background (say, sleep +Inf &), then it's OK if it keeps running in the background.

@debarshiray
Copy link
Member

@giuseppe ameliorated one problematic outcome of this - the processes inside the exec sessions blocking shutdown. See containers/podman#17025

However, it's still worth trying to ensure that the processes inside the exec session goes away as soon as the terminal emulator is closed, just as it happens when one is working directly on the host.

I have to say that I am a bit puzzled that the processes are outliving their controlling terminal. I know there's an inner nested terminal device for the container, but isn't it supposed to go away with the outer terminal?

@debarshiray
Copy link
Member

This seems like more of an issue for toolbx rather then podman.

Umm... it's not really clear to me what Toolbx could do here. Is there a recommended way to get to the process ID of the conmon process?

@mheon @rhatdan @Luap99 @giuseppe Could one of you please help answer this question?

We are brainstorming various options at #1207 but it's not clear if it's possible for the podman exec caller to get the process ID of conmon(8) or the process inside the container.

Also, it's not clear to me why podman exec --interactive --tty should not terminate the foreground container process with it. Especially when podman exec -it is getting terminated by a SIGHUP from its controlling terminal.

@giuseppe
Copy link
Member

giuseppe commented Oct 4, 2023

I wonder if it will be easier for you to just use the OCI runtime to do the exec.

e.g. if you do crun exec you circumvent podman and conmon, I am fine to add something like --die-with-parent to crun in a similar way to what bwrap does.

Can you please play with it and see if "crun exec" does all you need?

Adding it to Podman/conmon will be much more complicated, we will need to change the way conmon works to not perform a double fork.

@Luap99
Copy link
Member

Luap99 commented Oct 4, 2023

That said podman run is forwarding all signals (well the ones that can be caught) into the container so maybe should podman exec do that to.

ref #1400

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants