Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add --wait option to wait for acquiring a transaction lock instead of failing immediately #2186

Open
realsobek opened this issue Apr 8, 2025 · 7 comments
Labels
Priority: LOW RFE Request For Enhancement (as opposed to a bug) Triaged Someone on the DNF 5 team has read the issue and determined the next steps to take

Comments

@realsobek
Copy link
Contributor

dnf5 does not seem to wait and try multiple times to gain transaction lock. I could not find an answer or a hint at changed behaviour ("Changes between DNF and DNF5") at https://dnf5.readthedocs.io/en/latest/ hence writing this text.

Is there an option to make dnf wait&retry?
If not, could dnf5 be reprogrammed to wait&retry like dnf-3 before?

dnf's behaviour for "try to gain transaction lock" has changed compared to dnf on F40. Since Fedora 41 dnf5 is default for dnf. On Fedora 42 beta with:

$ rpm -q dnf5 akmods
dnf5-5.2.12.0-2.fc42.x86_64
akmods-0.6.0-9.fc42.noarch

you can reproduce it easily with:

  • open 2 terminal sessions on the same computer
  • in both terminals run the same long-running command at the same time; works with install and remove:
    dnf remove -y libreoffice*, dnf install -y libreoffice*

Another use-case (might be harder, because timing due to akmod->kmod compilation and installation):

  • open 1 terminal session on computer ; 2nd recommended to check with top for cc1
  • dnf remove -y kmod-v4l2loopback-6.14.0-63.fc42.x86_64-0.14.0-1.fc42.x86_64 kmod-intel-ipu6-6.14.0-63.fc42.x86_64-0.0-20.20250115git13c466e.fc42.x86_64
  • for P in $(dnf repoquery gstreamer* --qf "%{NAME}.%{ARCH} %{repoid}\n" | grep -vE '-devel'|i686| awk '{print $1}'); do echo _ $P; dnf remove -yq $P; done
  • reboot # to purge built kmod RPM files
  • dnf install -y gstreamer*
  • dnf install -y libreoffice*

On F42 dnf only tries once before failing:

# dnf [install|remove] libreoffice*
...
Transaction Summary:
 Removing:          X packages
 
After this operation, X MiB will be freed (install X B, remove X MiB).
Running transaction
Transaction failed: Failed to obtain rpm transaction lock. Another transaction is in progress.

I noticed the "dnf5 does not wait&retry" behaviour initially with dnf install -y gstreamer*, which pulls in akmod-intel-ipu6 and akmod-v4l2loopback, which in turn runs akmods, which creates kmod-*.rpm files, which might or might not get installed, when installed can prevent installation of other packages during automated script-based system installation. Other akmod packages use similar code:

$ rpm -q --scripts akmod-v4l2loopback
postinstall scriptlet (using /bin/sh):
[ -x /usr/sbin/akmods-ostree-post ] && /usr/sbin/akmods-ostree-post v4l2loopback /usr/src/akmods/v4l2loopback-kmod-0.14.0-1.fc42.src.rpm
posttrans scriptlet (using /bin/sh):
nohup /usr/sbin/akmods --from-akmod-posttrans --akmod v4l2loopback &> /dev/null &

akmods creates and installs: kmod-v4l2loopback-6.14.0-63.fc42.x86_64-0.14.0-1.fc42.x86_64 kmod-intel-ipu6-6.14.0-63.fc42.x86_64-0.0-20.2025011.x86_64
These packages get installed automatically with dnf -y install --nogpgcheck --disablerepo=* /tmp/akmods.JioRK8g4/results/kmod-v4l2loopback-6.14.0-63.fc42.x86_64-0.14.0-1.fc42

More information in:
$ sudo journalctl -u akmods.service
$ sudo cat /var/cache/akmods/v4l2loopback/0.14.0-1-for-6.14.0-63.fc42.x86_64.failed.log

Brainstorming: Failed kmod RPM installation can be fixed without restart with systemctl restart akmods. It blocks prompt until compilation and installation is done. Maybe akmods or akmods --force do the same.

On F40 dnf waits to finish task (same for kmod package installation (not shown here)). I show only the "waiting" side:

# dnf remove -y libreoffice*
...
Running transaction check
Waiting for process with pid 1958 to finish.
Error: An rpm exception occurred: package not installed

# dnf install -y libreoffice*
Waiting for process with pid 2021 to finish. # following lines appear after other terminal finished dependency check
Last metadata expiration check: 0:00:02 ago on Tue 08 Apr 2025 12:41:56 PM CEST.
Dependencies resolved.
...
Transaction Summary
======================================================================
Install  1396 Packages
Upgrade    11 Packages

Total download size: 1.1 G
Downloading Packages:
Waiting for process with pid 2021 to finish.
[SKIPPED] biber-2.19-5.fc40.noarch.rpm: Already downloaded # this line and the following line appear while installation is performed in other terminal
...
[SKIPPED] qt6-qtwayland-6.8.2-1.fc40.x86_64.rpm: Already downloaded
Running transaction check
Waiting for process with pid 2021 to finish. # following lines appear after installation is done in other terminal
[Errno 2] No such file or directory: '/var/cache/dnf/fedora-80710cde32d1ec51/packages/texlive-transparent-svn64852-71.fc40.noarch.rpm'
The downloaded packages were saved in cache until the next successful transaction.
You can remove cached packages by executing 'dnf clean packages'.
@ppisar
Copy link
Contributor

ppisar commented Apr 8, 2025

While DNF5 could gain this feature, I don't think it would be good default behavior:

My reason is an already running RPM transaction can affect the waiting transaction. Either because the new state will conflict with the latter transaction (e.g. this first transaction installs a package which will conflict with a package from the second transaction), or worse it will result into a different package set (e.g. the first transaction installs a package which will satisfy a rich dependency from the latter transaction).

I understand that the feature can be handy for batching installing large package sets where the user is confident that the batches are commutative.

Thus, I think that DNF5 should not wait on a concurrent transactions by default. But there could be a command-line opton, or a configuration option to resort to waiting on the lock. Preferably parametrized with a timeout value.

Examples:

  • Non-blocking operation like now:
$ dnf5 install FOO
  • Wait until the lock is acquired:
$ dnf5 install --wait FOO
  • Wait at most 5 seconds for the lock:
$ dnf5 install --wait=5 FOO

I'm not sure how the waiting was implemented in DNF4. I guess it was an loop with try-an-RPM-transaction and sleep.

Would a feature like that be sufficient for you?

@ppisar ppisar added RFE Request For Enhancement (as opposed to a bug) Priority: LOW labels Apr 8, 2025
@realsobek
Copy link
Contributor Author

Would your proposal solve the akmod-kmod part? If "wait&retry" is not the default behaviour, then I guess it would not.
I cannot remember ever intentionally running 2 dnf install/remove commands at the same time in production. Only ever one after the other, i.e. let us skip the "wait&retry to gain transaction lock" part and focus on the "akmod-induced kmod RPM building/installation can make following RPM action fail" part.

After running dnf install -y gstreamer* (installs akmod-intel-ipu6 and akmod-v4l2loopback) the prompt is returned. I can confirm running akmods blocks prompt till both kmod RPMs are built and installed.
Building kmod RPMs takes different amounts of time and can even fail. I would leave handling up to akmods. Would it be possible to teach dnf5 the following?

if package installation transaction is done and any of the newly installed package names match pattern "^akmod-.*", then run akmods

That should block the prompt from returning till kmod RPMs are built and installed (or other conclusion). Hence no more problems with following package actions.

@ppisar
Copy link
Contributor

ppisar commented Apr 9, 2025

Would your proposal solve the akmod-kmod part? If "wait&retry" is not the default behaviour, then I guess it would not.

These packages get installed automatically with dnf -y install --nogpgcheck --disablerepo=* /tmp/akmods.JioRK8g4/results/kmod-v4l2loopback-6.14.0-63.fc42.x86_64-0.14.0-1.fc42

Why not? Add --wait option to this automatic command.

if package installation transaction is done and any of the newly installed package names match pattern "^akmod-.*", then run akmods.

Sure, you can write a DNF plugin which will do it. But I cannot see how it would help you besides not exiting DNF until akmod finshes.

The transaction lock you observe is an RPM lock. Not a DNF lock. Once the RPM transaction finishes and when the DNF plugin has been executing akmod, a concurrent DNF instance would be able to run and perform another RPM transaction in parallel. Of course you could add the locking on DNF level to your DNF plugin.

Frankly the whole akmod orchestration is a piece of antipatterns. If akmod needs to do something asynchronously, then it is it's job handle the synchronization.

@realsobek
Copy link
Contributor Author

Ahhhh! I am sorry for my confusion. Now I see my mistake (apply new option to gstreamer* or akmods installation, not kmod installation). Now I understand your proposal:

  1. add option to dnf5 to be able to wait until the lock is acquired
  2. after dnf5 got the new option modify akmods script to use new option (request via bug report at https://bugz.fedoraproject.org/akmods)

To answer your queston:

Would a feature like that be sufficient for you?

yes

@ppisar ppisar changed the title multiple retries to gain transaction lock Add --wait option to wait for acquiring a transaction lock instead of failing immediatelly Apr 9, 2025
@ppisar ppisar added the Triaged Someone on the DNF 5 team has read the issue and determined the next steps to take label Apr 9, 2025
@realsobek realsobek changed the title Add --wait option to wait for acquiring a transaction lock instead of failing immediatelly Add --wait option to wait for acquiring a transaction lock instead of failing immediately Apr 9, 2025
@realsobek
Copy link
Contributor Author

RFE for akmods to use to-be-implemented --wait https://bugzilla.redhat.com/show_bug.cgi?id=2358625
Thank you for your help. Take your time.

@kwizart
Copy link

kwizart commented Apr 11, 2025

Thanks for rising the issue. (akmods maintainer speaking).

Somehow I never managed to reproduce on my setups on f41+ (dnf5) for now.

Using --wait is not easy as the argument doesn't exist yet. So I will have to parse the dnf exact (future) version to know if I can use this term...

Also, maybe I can deal with the "new behavior" if a proper error code can help me to identify the situation.

For now I would advocate to revert to the previous behavior (aka blocking for the transaction lock to be acquired), because this is less surprising. Eventually introduce --no-block (same option as systemd-run) to change the default-locking mechanism

I don't know which "locks" are available in dnf5, but I think I used to be able to do dnf searches while installing packages on another process.

@ppisar
Copy link
Contributor

ppisar commented Apr 11, 2025

Thanks for rising the issue. (akmods maintainer speaking).

Somehow I never managed to reproduce on my setups on f41+ (dnf5) for now.

Using --wait is not easy as the argument doesn't exist yet. So I will have to parse the dnf exact (future) version to know if I can use this term...

"dnf install --help" output will enumerate the new option once it becomes supported.

Also, maybe I can deal with the "new behavior" if a proper error code can help me to identify the situation.

Interesting idea. DNF4 documents an exit code 200 for that purpose. DNF5 returns 1. We should change the exist code of DNF5. However, I don't think any number > 127 is usable because coded above that limit are used for termination by a signal. If that's so, then DNF5 should not emit 200. At the same time we cannot change exit code of DNF4 for backward compatibility.

For now I would advocate to revert to the previous behavior (aka blocking for the transaction lock to be acquired), because this is less surprising. Eventually introduce --no-block (same option as systemd-run) to change the default-locking mechanism

There is nothing to revert. DNF5 simply never implemented that blocking behavior.

I don't know which "locks" are available in dnf5, but I think I used to be able to do dnf searches while installing packages on another process.

Yes, DNF5 does not lock anything accept when updating a repository cache, it locks write access to them, and RPM locks its database when installing packages. That's why with DNF5 you can perform multiple read-only operations in parallel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority: LOW RFE Request For Enhancement (as opposed to a bug) Triaged Someone on the DNF 5 team has read the issue and determined the next steps to take
Projects
None yet
Development

No branches or pull requests

3 participants