re-enable `sort_query_fuzzer_runner` #16491

adriangb · 2025-06-20T20:25:46Z

Re-enable test, verify CI fails, might need to run a couple times?
Revert TopK dynamic filter pushdown attempt 2 #15770 (suspected cause).
Verify CI doesn't fail after multiple runs.

alamb · 2025-06-21T11:42:54Z

Context for anyone interested: #16452 (comment)

This reverts commit 6e83cf4.

adriangb · 2025-06-21T13:27:35Z

datafusion/common/Cargo.toml

@@ -55,6 +55,7 @@ apache-avro = { version = "0.17", default-features = false, features = [
 arrow = { workspace = true }
 arrow-ipc = { workspace = true }
 base64 = "0.22.1"
+chrono = { workspace = true }


This is temporary until the upstream bug gets fixed in arrow, plus it's necessarily already in the dependency tree because arrow uses it.

adriangb · 2025-06-21T13:29:35Z

I think with these fixes to Display<ScalarValue> the tests will pass consistently.

I used this script to test:

#!/usr/bin/env python3

import argparse
import subprocess
from concurrent.futures import ThreadPoolExecutor, as_completed
from threading import Event

def run_test(command, run_num, total_runs, stop_event):
    """Run a single test and return result"""
    if stop_event.is_set():
        return run_num, "SKIPPED", None
    
    try:
        result = subprocess.run(command, shell=True, capture_output=True, text=True)
        status = "PASS" if result.returncode == 0 else "FAIL"
        print(f"Run {run_num}/{total_runs}: {status}")
        return run_num, status, result
    except Exception as e:
        print(f"Run {run_num}/{total_runs}: ERROR - {e}")
        return run_num, "ERROR", None

def main():
    parser = argparse.ArgumentParser(description="Run a command multiple times and report failure rate")
    parser.add_argument("-P", "--parallel", type=int, default=1, help="Number of parallel jobs (default: 1)")
    parser.add_argument("-n", "--runs", type=int, default=100, help="Number of runs (default: 100)")
    parser.add_argument("-x", "--stop-on-failure", action="store_true", help="Stop at first failure")
    parser.add_argument("command", nargs=argparse.REMAINDER, help="Command to run")
    
    args = parser.parse_args()
    
    command = " ".join(args.command)
    print(f"Running command {args.runs} times with {args.parallel} parallel jobs...")
    print(f"Command: {command}")
    print("----------------------------------------")
    
    stop_event = Event()
    failures = 0
    completed_runs = 0
    failure_outputs = []
    
    with ThreadPoolExecutor(max_workers=args.parallel) as executor:
        # Submit all jobs
        futures = []
        for i in range(1, args.runs + 1):
            future = executor.submit(run_test, command, i, args.runs, stop_event)
            futures.append(future)
        
        # Process results as they complete
        for future in as_completed(futures):
            run_num, status, result = future.result()
            completed_runs += 1
            
            if status == "FAIL" or status == "ERROR":
                failures += 1
                if result and (result.stdout or result.stderr):
                    failure_outputs.append((run_num, result.stdout, result.stderr))
                if args.stop_on_failure:
                    print(f"Stopping at first failure (run {run_num})")
                    stop_event.set()
                    # Cancel remaining futures
                    for f in futures:
                        f.cancel()
                    break
    
    print("----------------------------------------")
    print("Results:")
    print(f"Total runs: {completed_runs}")
    print(f"Failures: {failures}")
    print(f"Passes: {completed_runs - failures}")
    if completed_runs > 0:
        failure_rate = (failures * 100) / completed_runs
        print(f"Failure rate: {failure_rate:.2f}%")
    else:
        print("Failure rate: 0%")
    
    # Print failure outputs
    if failure_outputs:
        print("\n" + "="*50)
        print("FAILURE OUTPUTS:")
        print("="*50)
        for run_num, stdout, stderr in failure_outputs:
            print(f"\n--- Run {run_num} ---")
            if stdout:
                print("STDOUT:")
                print(stdout)
            if stderr:
                print("STDERR:")
                print(stderr)

if __name__ == "__main__":
    main()

And was able to run with no errors:

./run-test.py -P 10 -n 600 -x cargo test --package datafusion --test fuzz -- fuzz_cases::sort_query_fuzz::sort_query_fuzzer_runner --exact --show-output

I'm running a 1200 run to confirm now.

adriangb · 2025-06-21T13:32:49Z

I understand why but I do find it kind of strange that Literal::new() calls Display on ScalarValue. I wonder if we could just make the Field name "lit"?

adriangb · 2025-06-21T13:57:43Z

For context the failures reported here and here are both related to this overflow error fixed in this PR. If someone has seen a different error for sort_query_fuzzer_runner since #16465 was merged please share it!

adriangb · 2025-06-21T13:58:56Z

@AdamGS @alamb @blaginin could you please review?

AdamGS · 2025-06-21T15:07:51Z

LGTM. IDK if there's a precedence to formatting the error case as an empty string elsewhere in Datafusion. It seems like format_option! uses "NULL" as sort of a display sentinel value, maybe values like this need their own placeholder?

revert

2ff68f0

github-actions bot added the core Core DataFusion crate label Jun 20, 2025

Revert "Dynamic filter pushdown for TopK sorts (apache#15770)"

6e83cf4

alamb marked this pull request as draft June 21, 2025 11:42

adriangb added 3 commits June 21, 2025 07:46

fix ScalarValue Display impl for Date64

5134a72

handle another case

a5d646b

Revert "Revert "Dynamic filter pushdown for TopK sorts (apache#15770)""

c2f4954

This reverts commit 6e83cf4.

github-actions bot removed documentation Improvements or additions to documentation optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) datasource Changes to the datasource crate physical-plan Changes to the physical-plan crate labels Jun 21, 2025

adriangb added 2 commits June 21, 2025 08:26

move test

6e5f037

move test

d0d4689

github-actions bot removed the physical-expr Changes to the physical-expr crates label Jun 21, 2025

fmt

d0cbfd4

adriangb commented Jun 21, 2025

View reviewed changes

adriangb marked this pull request as ready for review June 21, 2025 13:27

adriangb mentioned this pull request Jun 21, 2025

SortQueryFuzzer found a failing case on main #16452

Open

adriangb changed the title ~~(debugging) re-enable sort fuzz test~~ re-enable sort_query_fuzzer_runner Jun 21, 2025

adriangb mentioned this pull request Jun 21, 2025

use 'lit' as the field name for literal values #16498

Open

adriangb requested a review from alamb June 21, 2025 13:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

re-enable `sort_query_fuzzer_runner` #16491

re-enable `sort_query_fuzzer_runner` #16491

adriangb commented Jun 20, 2025 •

edited

Loading

Uh oh!

alamb commented Jun 21, 2025

Uh oh!

adriangb Jun 21, 2025

Uh oh!

adriangb commented Jun 21, 2025

Uh oh!

adriangb commented Jun 21, 2025

Uh oh!

adriangb commented Jun 21, 2025

Uh oh!

adriangb commented Jun 21, 2025

Uh oh!

AdamGS commented Jun 21, 2025

Uh oh!

Uh oh!

re-enable sort_query_fuzzer_runner #16491

Are you sure you want to change the base?

re-enable sort_query_fuzzer_runner #16491

Conversation

adriangb commented Jun 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alamb commented Jun 21, 2025

Uh oh!

adriangb Jun 21, 2025

Choose a reason for hiding this comment

Uh oh!

adriangb commented Jun 21, 2025

Uh oh!

adriangb commented Jun 21, 2025

Uh oh!

adriangb commented Jun 21, 2025

Uh oh!

adriangb commented Jun 21, 2025

Uh oh!

AdamGS commented Jun 21, 2025

Uh oh!

Uh oh!

re-enable `sort_query_fuzzer_runner` #16491

re-enable `sort_query_fuzzer_runner` #16491

adriangb commented Jun 20, 2025 •

edited

Loading