Skip to content

Add Sys.is_directory, Sys.remove and Sys.rename #304

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 29 commits into
base: main
Choose a base branch
from
Open

Conversation

jmid
Copy link
Collaborator

@jmid jmid commented Mar 6, 2023

This PR adds Sys.is_directory, Sys.remove and Sys.rename to the current model-based STM tests.

While doing so, I realized I could remove some of the duplicated code by pulling out special cases of insert and remove functions, and modeling things with generic insert/remove abstractions instead.

While writing this I discovered

  • that Sys.rename is not documented to work on anything but files
  • the Windows behaviour of Sys.rename differs in two ways:
    • On Windows it is OK to Sys.rename existing dir to existing file
    • On Windows it is not OK to Sys.rename existing dir to existing dir ("permission denied")

The these reasons the CI tests currently fail.

I've opened a PR and an issue on the compiler for the above.

@jmid jmid force-pushed the add-sys-is-dir branch from 867b9f8 to 7f10339 Compare May 8, 2023 15:30
@jmid
Copy link
Collaborator Author

jmid commented May 8, 2023

Rebased on main to trigger a fresh run now that ocaml/ocaml#12184 has been merged.
As a result, I expect (read: hope) that the trunk tests will now pass... 🤞

@jmid jmid force-pushed the add-sys-is-dir branch from 7f10339 to d645f6f Compare May 24, 2023 13:36
@jmid
Copy link
Collaborator Author

jmid commented May 24, 2023

I've spent some time on this PR today:

  • I refactored the code into a Model module with relevant operations
  • I restructured the error cases in postcond to be more uniform reusing ideas from @shym's Mkfile case
  • I added conditions for the various Windows corner cases that differ
  • Finally I removed 3 spurious "Permission denied" matches that weren't used!

I've observed that the Linux parallel test may now sometimes fail within 200 attempts.
I'll wait and see if it needs to be turned back into a negative test again.
I've checked manually that the resulting test runs on 5.0 under Linux, MingW, and macOS.

Some of the error cases could still benefit from being rewritten with suitable function abstractions.
The overhaul nevertheless represents a good footing for adding symlinks and permissions... 🤓

@jmid
Copy link
Collaborator Author

jmid commented Jun 7, 2023

Rebased on main as I wanted to get an idea of whether this helps on #359.
I expect the Windows runs to still fail.

Also (as a note to my self) the 'focused runners' are GitHub actions only - so this will trigger a full testsuite run of Multicoretests-CI despite be5d514.

@jmid
Copy link
Collaborator Author

jmid commented Jul 14, 2023

Rebased yesterday to trigger these on the patched 5.1.0~beta1 release.
There is clearly some way to go 😬
While deciphering errors, I've identified that this has a side-effect of removing an existing empty target dir during a failing Sys.rename:

# Sys.mkdir "sandbox\\eee" 0o755;;
- : unit = ()
# Sys.rename "sandbox" "sandbox\\eee";;
Exception: Sys_error "Invalid argument".
# Sys.file_exists "sandbox\\eee";;
- : bool = false

@jmid
Copy link
Collaborator Author

jmid commented Apr 19, 2024

I did a fresh rebase of this PR.
As part of it, I removed the pattern-matching on specific error messages, as I found this was digressing into reverse-engineering of Linux/macOS/Windows-specific error message conditions, rather than focusing on a specification.
After all the Sys specification says

Every function in this module raises Sys_error with an informative message when the underlying system call signal an error.

The above is considered a clean-up, paving the way for further extensions.
It does not solve the Windows issue mentioned above, so the CI should still fail on Windows...

@jmid
Copy link
Collaborator Author

jmid commented Apr 22, 2024

I pushed a clean-up commit d2a1345 which

  • accepts the Windows rename regression
  • quantifies all "accept workarounds" with OCaml versions

CI summary

  • 32bit trunk found an unexpected counterexample in STM Sys test parallel in 1/10 runs
  • linux-s390x-5.2 failed while setting up with gnutls_handshake() failed: The TLS connection was non-properly terminated
  • freebsd-amd64-5.2 failed with Internal error

Out of these, 1 is a false alarm and 2 are CI infrastructure errors.

@jmid
Copy link
Collaborator Author

jmid commented May 17, 2024

Rebased on main with conflicts resolved

@jmid
Copy link
Collaborator Author

jmid commented May 21, 2024

CI summary for f8dc3b9 - this is still running focused Sys STM tests:

  • 32bit 5.2 - 0/10 attempts triggered a parallel counterexample
  • 32bit trunk - 4/10 attempts triggered a parallel counterexample
  • Bytecode 5.2 - 3/10 attempts triggered a parallel counterexample
  • Bytecode trunk - 2/10 attempts triggered a parallel counterexample
  • FP 5.2 - 1/10 attempts triggered a parallel counterexample
  • FP trunk - 5/10 attempts triggered a parallel counterexample
  • Linux 5.2 - 4/10 attempts triggered a parallel counterexample
  • Linux 5.2 debug - 2/10 attempts triggered a parallel counterexample
  • Linux trunk - 3/10 attempts triggered a parallel counterexample
  • Linux trunk debug - 2/10 attempts triggered a parallel counterexample - and one assertion failure [ocaml5-issue] Assertion failure s->running during parallel STM or Lin tests #447
  • MSVC bytecode trunk - 10/10 sequential tests failed as the Sys.rename fix has been merged to trunk
  • MSVC trunk - 10/10 sequential tests failed as the Sys.rename fix has been merged to trunk
  • MinGW bytecode trunk - 10/10 sequential tests failed as the Sys.rename fix has been merged to trunk
  • MinGW trunk - 10/10 sequential tests failed as the Sys.rename fix has been merged to trunk

It is clearly harder to trigger counterexamples under Linux.
For the latter 4 failures, the model should be adjusted to account for the patch merge.

@jmid jmid force-pushed the add-sys-is-dir branch from f8dc3b9 to 6782a89 Compare May 23, 2025 09:52
@jmid
Copy link
Collaborator Author

jmid commented May 23, 2025

I've removed the attempt to test for sequential consistency, as we end up testing properties of the underlying file system, which may vary widely across Linux and Windows and between CI and locally. I'm keeping a parallel stress-test though, as we would like to ensure that no crashes, infinite loops, or assertion errors are triggered by parallel usage.

In addition I've reworked the generator based on statistics (stats code removed again on this branch), as they revealed relatively few "happy path" tests, e.g., to rename an existing file. This lead me to increase the chance of generating random files and directories both generally and at the beginning of a test run, to have a better basis, for operations such as rename to choose from.

Finally, I've

  • further cleaned up the workarounds for past bugs on OCaml 5.0-5.2
  • added a focused test that repeats the Sys test 5 times
  • rebased the code on main

While playing and testing this on 5.4 workflows I saw a timeout after 3h on macOS Intel 5.4 in the middle of the 4th run:
https://github.com/ocaml-multicore/multicoretests/actions/runs/15164152332/job/42637405898

Starting 4-th run

random seed: 423573053
generated error fail pass / total     time test name

[ ]    0    0    0    0 / 1000     0.0s STM Sys test sequential
[ ]    0    0    0    0 / 1000     0.0s STM Sys test sequential (generating)
[✓] 1000    0    0 1000 / 1000     9.8s STM Sys test sequential

[ ]    0    0    0    0 / 1000     0.0s STM Sys stress test parallel
[ ]  139    0    0  139 / 1000    50.5s STM Sys stress test parallel
[ ]  271    0    0  271 / 1000   110.7s STM Sys stress test parallel
[ ]  398    0    0  398 / 1000   171.3s STM Sys stress test parallel
[ ]  564    0    0  564 / 1000   231.3s STM Sys stress test parallel
Error: The operation was canceled.

I'm curious whether this will show up elsewhere, e.g., on 5.3 and 5.5 too... 🤔

@jmid
Copy link
Collaborator Author

jmid commented May 23, 2025

Oh dear, I've gotten used to triggering CI runs on branches without PRs, and hence included the usual workaround in commit 6782a89, resulting in twice as many CI workflow runs as intended... 😬

@jmid
Copy link
Collaborator Author

jmid commented May 26, 2025

CI summary for 6782a89

  • Linux-ARM64 5.3 failed and was strangely cancelled after 54m in run 5 of STM Sys stress test parallel
  • macOS-ARM64 5.4 failed and was strangely cancelled after 164m in run 2 of STM Sys stress test parallel
  • macOS-intel 5.3 was cancelled after 185m in run 3 of STM Sys stress test parallel
  • macOS-intel 5.3 was cancelled after 185m in run 4 of STM Sys stress test parallel
  • macOS-intel 5.4 was cancelled after 185m in run 1 of STM Sys stress test parallel
  • macOS-intel trunk was cancelled after 185m in run 2 of STM Sys stress test parallel
  • macOS-intel trunk was cancelled after 185m in run 2 of STM Sys stress test parallel

All failures were due to #557

Out of 85 workflows, 7 failed or timeout with what appears to be a genuine issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant