Skip to content

Mask values in h5dump output files #5473

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 6, 2025

Conversation

mattjala
Copy link
Contributor

Some values in h5dump output are potentially inconsistent even between files created using the exact same set of instructions with the Native connector. Specifically, these values are the offset of objects within the file and the index assigned to anonymous datatypes by h5dump.

  • Replace the displayed values of these fields with 'XXX' in saved reference files

  • Update the CMake testing code to perform this masking on output files before comparing them to the saved reference files

This is done through the TEST_FILTER argument to runTest/grepTest. Because some tests already provide one filter to apply, adding the masking capability required changing runTest and grepTest to support applying multiple filters from a provided semicolon-separated list of filters.

  • Add a command line script to convert specific output files to the masked form (h5dump_mask_output.sh)

This was used to convert existing test files, and will be useful if more test files are added, though it is not currently used during testing runs.

@mattjala mattjala added Priority - 2. Medium It would be nice to have this in the next release Component - Testing Code in test or testpar directories, GitHub workflows Type - Improvement labels Apr 22, 2025
@mattjala mattjala self-assigned this Apr 22, 2025
Copy link
Contributor

@byrnHDF byrnHDF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding the capability to have multiple filters is a good idea. However masks should use the TEST_MASK argument.

The runTest.cmake file probably should pull the TEST_MASK_STORE and TEST_MASK_MOD options out of runTest.cmake to the calling tools and make it consistent.

TEST_MASK_ERROR is a different issue and probably should be dealt with in a different issue.

@mattjala
Copy link
Contributor Author

Adding the capability to have multiple filters is a good idea. However masks should use the TEST_MASK argument.

The runTest.cmake file probably should pull the TEST_MASK_STORE and TEST_MASK_MOD options out of runTest.cmake to the calling tools and make it consistent.

Right now, TEST_MASK just removes the expression entirely. If we modified it to perform a replace operation in order to do the masking, then it'd be an exact copy of TEST_FILTER/TEST_FILTER_APPEND.

Masking should probably use a generic find/replace argument to runTest - there's no reason for runTest to care whether the specific operation is 'masking'.

Regardless, changes to runTest seem out of scope for this. TEST_FILTER is already used for some masks (see ADD_H5_COMP_TEST), so I think reworking runTest should be considered separately.

@byrnHDF
Copy link
Contributor

byrnHDF commented Apr 28, 2025

Regardless, changes to runTest seem out of scope for this. TEST_FILTER is already used for some masks (see ADD_H5_COMP_TEST), so I think reworking runTest should be considered separately.

Agree the larger changes is a separate issue - the whether to use in the CMake add_test or incorporate in the runTests should be decided on how much it is to be used - if every test in the use case needs to do it - put it in the runTest - if only a special couple tests need to use the mask, put it in the add_test command.

@mattjala mattjala requested a review from byrnHDF April 30, 2025 21:04
@mattjala
Copy link
Contributor Author

mattjala commented Apr 30, 2025

  • Output files and reference files are now temporarily masked at test time, with the copies on disk still having their real values

  • Factored out the masking work into a macro

  • In order to avoid running into secondary behavior for TEST_FILTER, I moved the masking behavior to the (formerly unused) TEST_MASK and TEST_MASK_REPLACE arguments


# Apply mask replacements to FILENAME and write the results to FILENAME_masked
# Uses TEST_MASKS and TEST_MASKS_REPLACE from its execution environment
macro(H5_MASK_FILE FILENAME)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This macro uses variables from its environment that aren't specified as arguments. This is not ideal, but trying to directly pass in the list of mask replace expressions as a parameter proved difficult due to 1. It collapsing from a list to a string, and 2. Loss of special character escaping in a way that created a string CMake can't directly evaluate at all, preventing a simple \ -> \\ replacement to re-escape it.

@lrknox lrknox merged commit bafca72 into HDFGroup:develop May 6, 2025
60 checks passed
@mattjala mattjala deleted the mask_h5dump_tests branch May 29, 2025 19:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component - Testing Code in test or testpar directories, GitHub workflows Priority - 2. Medium It would be nice to have this in the next release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants