Skip to content

fix: flatten before detect #1361

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 28, 2025

Conversation

nikhilsinhaparseable
Copy link
Contributor

@nikhilsinhaparseable nikhilsinhaparseable commented Jun 28, 2025

  • flatten the input json
  • convert to json array
  • add separator as expected in ingestion flow
  • then infer schema

Summary by CodeRabbit

  • Bug Fixes

    • Improved error handling for deeply nested JSON inputs during schema detection.
    • Enhanced feedback when JSON flattening fails, providing clearer error messages.
  • New Features

    • Added automatic flattening of JSON data before schema inference for better compatibility with various JSON structures.

Copy link
Contributor

coderabbitai bot commented Jun 28, 2025

Walkthrough

The detect_schema function in src/handlers/http/logstream.rs was updated to introduce multiple JSON flattening steps before schema inference. The function now checks for excessive nesting, applies flattening, and handles errors for deeply nested or unflattenable JSON, returning appropriate error responses when necessary.

Changes

File(s) Change Summary
src/handlers/http/logstream.rs Modified detect_schema to check for deep nesting, perform JSON flattening, handle flattening errors, and reject overly nested JSON before schema inference. No signature changes.
build.rs Updated println! macro to use inline variable interpolation syntax {ui_version} instead of positional {} for setting UI_VERSION environment variable during build.
src/alerts/alerts_utils.rs, src/alerts/mod.rs, src/catalog/mod.rs, src/cli.rs, src/connectors/common/mod.rs, src/connectors/kafka/config.rs, src/handlers/airplane.rs, src/handlers/http/cluster/mod.rs, src/handlers/http/cluster/utils.rs, src/handlers/http/llm.rs, src/handlers/http/mod.rs, src/handlers/http/modal/ingest_server.rs, src/handlers/http/oidc.rs, src/handlers/http/query.rs, src/handlers/http/users/dashboards.rs, src/handlers/http/users/filters.rs, src/hottier.rs, src/option.rs, src/otel/metrics.rs, src/parseable/mod.rs, src/rbac/utils.rs, src/storage/azure_blob.rs, src/storage/localfs.rs, src/storage/object_storage.rs, src/storage/s3.rs, src/storage/store_metadata.rs, src/users/filters.rs, src/utils/arrow/flight.rs, src/utils/human_size.rs, src/utils/json/mod.rs, src/utils/mod.rs Modernized string formatting across multiple files by replacing positional {} placeholders with Rust's inline named variable interpolation {var} syntax. No changes to logic, control flow, or error handling.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Handler (detect_schema)
    participant Flatten Utils

    Client->>Handler (detect_schema): Send JSON input
    Handler->>Handler: Check nesting level
    alt Too deeply nested
        Handler-->>Client: Return BAD_REQUEST (cannot flatten)
    else Not deeply nested
        Handler->>Flatten Utils: generic_flattening
        alt Flattening fails
            Handler-->>Client: Return BAD_REQUEST (flattening failed)
        else Flattening succeeds
            Handler->>Flatten Utils: flatten::flatten
            alt Detailed flattening fails
                Handler-->>Client: Return BAD_REQUEST (flattening failed)
            else Flattened JSON ready
                Handler->>Handler: Convert to array & infer schema
                Handler-->>Client: Return inferred schema
            end
        end
    end
Loading

Possibly related PRs

Suggested labels

for next release

Poem

A warren of JSON, nested and deep,
Now flattened before schema we keep.
If levels are many, a warning we send—
"Too nested!" we say, and politely defend.
But if all goes well, the schema appears,
Flattened and ready—let's give three cheers!
🐇✨


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 228428e and 95f4782.

📒 Files selected for processing (32)
  • src/alerts/alerts_utils.rs (3 hunks)
  • src/alerts/mod.rs (1 hunks)
  • src/catalog/mod.rs (1 hunks)
  • src/cli.rs (4 hunks)
  • src/connectors/common/mod.rs (1 hunks)
  • src/connectors/kafka/config.rs (3 hunks)
  • src/handlers/airplane.rs (1 hunks)
  • src/handlers/http/cluster/mod.rs (1 hunks)
  • src/handlers/http/cluster/utils.rs (1 hunks)
  • src/handlers/http/llm.rs (1 hunks)
  • src/handlers/http/logstream.rs (2 hunks)
  • src/handlers/http/mod.rs (1 hunks)
  • src/handlers/http/modal/ingest_server.rs (1 hunks)
  • src/handlers/http/oidc.rs (1 hunks)
  • src/handlers/http/query.rs (2 hunks)
  • src/handlers/http/users/dashboards.rs (3 hunks)
  • src/handlers/http/users/filters.rs (3 hunks)
  • src/hottier.rs (2 hunks)
  • src/option.rs (1 hunks)
  • src/otel/metrics.rs (1 hunks)
  • src/parseable/mod.rs (1 hunks)
  • src/rbac/utils.rs (1 hunks)
  • src/storage/azure_blob.rs (2 hunks)
  • src/storage/localfs.rs (1 hunks)
  • src/storage/object_storage.rs (1 hunks)
  • src/storage/s3.rs (1 hunks)
  • src/storage/store_metadata.rs (2 hunks)
  • src/users/filters.rs (1 hunks)
  • src/utils/arrow/flight.rs (1 hunks)
  • src/utils/human_size.rs (1 hunks)
  • src/utils/json/mod.rs (1 hunks)
  • src/utils/mod.rs (2 hunks)
✅ Files skipped from review due to trivial changes (24)
  • src/option.rs
  • src/connectors/common/mod.rs
  • src/handlers/http/llm.rs
  • src/rbac/utils.rs
  • src/handlers/http/users/filters.rs
  • src/parseable/mod.rs
  • src/handlers/http/modal/ingest_server.rs
  • src/storage/localfs.rs
  • src/storage/s3.rs
  • src/cli.rs
  • src/catalog/mod.rs
  • src/handlers/airplane.rs
  • src/handlers/http/cluster/mod.rs
  • src/storage/store_metadata.rs
  • src/connectors/kafka/config.rs
  • src/utils/arrow/flight.rs
  • src/handlers/http/oidc.rs
  • src/otel/metrics.rs
  • src/storage/azure_blob.rs
  • src/handlers/http/query.rs
  • src/alerts/mod.rs
  • src/storage/object_storage.rs
  • src/alerts/alerts_utils.rs
  • src/hottier.rs
🧰 Additional context used
🧠 Learnings (6)
📓 Common learnings
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1329
File: src/handlers/http/kinesis.rs:89-98
Timestamp: 2025-05-28T04:45:40.879Z
Learning: In Kinesis log processing, when handling deeply nested JSON that exceeds flattening limits, the original JSON structure must be preserved as-is. If the JSON is an array at the root level, attempting to add metadata fields like `requestId` and `timestamp` would fail because arrays don't support direct key-value insertion. The flattening process handles arrays by converting elements to individual objects, but when flattening is skipped, the original array structure must remain intact to prevent data loss.
Learnt from: de-sh
PR: parseablehq/parseable#1185
File: src/handlers/http/logstream.rs:255-261
Timestamp: 2025-02-14T09:49:25.818Z
Learning: In Parseable's logstream handlers, stream existence checks must be performed for both query and standalone modes. The pattern `!PARSEABLE.streams.contains(&stream_name) && (PARSEABLE.options.mode != Mode::Query || !PARSEABLE.create_stream_and_schema_from_storage(&stream_name).await?)` ensures proper error handling in both modes.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1271
File: src/prism/home/mod.rs:207-224
Timestamp: 2025-03-26T06:13:48.898Z
Learning: In the Parseable codebase, if a stream is found, the stream_jsons array will always have at least one element. Additionally, for any valid stream_json object, the log_source array will always have at least one element. This is a design invariant that makes additional null checks unnecessary.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1346
File: src/handlers/http/health_check.rs:81-90
Timestamp: 2025-06-16T02:04:58.990Z
Learning: In the shutdown function in src/handlers/http/health_check.rs, the design approach is to log errors from sync operations rather than propagate them. This is intentional because the shutdown function is called on SIGTERM/SIGINT signals, and the goal is to perform best-effort cleanup (syncing pending files to object storage) while allowing the shutdown to proceed regardless of sync failures. Logging provides debugging information without blocking the shutdown process.
src/handlers/http/users/dashboards.rs (4)
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1305
File: src/handlers/http/users/dashboards.rs:0-0
Timestamp: 2025-05-01T10:27:56.858Z
Learning: The `add_tile()` function in `src/handlers/http/users/dashboards.rs` should use `get_dashboard_by_user(dashboard_id, &user_id)` instead of `get_dashboard(dashboard_id)` to ensure proper authorization checks when modifying a dashboard.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1305
File: src/users/dashboards.rs:154-165
Timestamp: 2025-05-01T12:22:42.363Z
Learning: Title validation for dashboards is performed in the `create_dashboard` HTTP handler function rather than in the `DASHBOARDS.create` method, avoiding redundant validation.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1305
File: src/handlers/http/users/dashboards.rs:63-71
Timestamp: 2025-05-01T10:22:45.695Z
Learning: Dashboard design in Parseable follows a sharing model where any user can read (list/get) any dashboard, enabling dashboard sharing functionality, but only the dashboard author can modify (update/delete) their own dashboards.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1348
File: src/prism/home/mod.rs:366-368
Timestamp: 2025-06-15T18:18:14.590Z
Learning: In the Parseable dashboard system, dashboard_id is guaranteed to always be present (never None) because IDs are generated at dashboard creation time, and there are no pre-v1 dashboards that could lack IDs.
src/handlers/http/mod.rs (1)
Learnt from: de-sh
PR: parseablehq/parseable#1185
File: src/handlers/http/logstream.rs:255-261
Timestamp: 2025-02-14T09:49:25.818Z
Learning: In Parseable's logstream handlers, stream existence checks must be performed for both query and standalone modes. The pattern `!PARSEABLE.streams.contains(&stream_name) && (PARSEABLE.options.mode != Mode::Query || !PARSEABLE.create_stream_and_schema_from_storage(&stream_name).await?)` ensures proper error handling in both modes.
src/utils/human_size.rs (1)
Learnt from: vkhinvasara
PR: parseablehq/parseable#1352
File: src/handlers/http/resource_check.rs:70-74
Timestamp: 2025-06-18T08:43:30.112Z
Learning: The sysinfo crate (version 0.30+) returns memory values from `used_memory()` and `total_memory()` in bytes, not kilobytes. Converting bytes to GiB requires dividing by 1024³ (1,073,741,824). In older versions (pre-0.30), the methods returned kilobytes.
src/utils/mod.rs (1)
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1340
File: src/query/mod.rs:64-66
Timestamp: 2025-06-18T06:39:04.775Z
Learning: In src/query/mod.rs, QUERY_SESSION_STATE and QUERY_SESSION serve different architectural purposes: QUERY_SESSION_STATE is used for stats calculation and allows dynamic registration of individual parquet files from the staging path (files created every minute), while QUERY_SESSION is used for object store queries with the global schema provider. Session contexts with schema providers don't support registering individual tables/parquets, so both session objects are necessary for their respective use cases.
src/handlers/http/logstream.rs (6)
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1329
File: src/handlers/http/kinesis.rs:89-98
Timestamp: 2025-05-28T04:45:40.879Z
Learning: In Kinesis log processing, when handling deeply nested JSON that exceeds flattening limits, the original JSON structure must be preserved as-is. If the JSON is an array at the root level, attempting to add metadata fields like `requestId` and `timestamp` would fail because arrays don't support direct key-value insertion. The flattening process handles arrays by converting elements to individual objects, but when flattening is skipped, the original array structure must remain intact to prevent data loss.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1271
File: src/prism/home/mod.rs:207-224
Timestamp: 2025-03-26T06:13:48.898Z
Learning: In the Parseable codebase, if a stream is found, the stream_jsons array will always have at least one element. Additionally, for any valid stream_json object, the log_source array will always have at least one element. This is a design invariant that makes additional null checks unnecessary.
Learnt from: de-sh
PR: parseablehq/parseable#1185
File: src/handlers/http/logstream.rs:255-261
Timestamp: 2025-02-14T09:49:25.818Z
Learning: In Parseable's logstream handlers, stream existence checks must be performed for both query and standalone modes. The pattern `!PARSEABLE.streams.contains(&stream_name) && (PARSEABLE.options.mode != Mode::Query || !PARSEABLE.create_stream_and_schema_from_storage(&stream_name).await?)` ensures proper error handling in both modes.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1346
File: src/handlers/http/health_check.rs:81-90
Timestamp: 2025-06-16T02:04:58.990Z
Learning: In the shutdown function in src/handlers/http/health_check.rs, the design approach is to log errors from sync operations rather than propagate them. This is intentional because the shutdown function is called on SIGTERM/SIGINT signals, and the goal is to perform best-effort cleanup (syncing pending files to object storage) while allowing the shutdown to proceed regardless of sync failures. Logging provides debugging information without blocking the shutdown process.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1288
File: src/handlers/http/modal/mod.rs:279-301
Timestamp: 2025-04-07T13:23:10.092Z
Learning: For critical operations like writing metadata to disk in NodeMetadata::put_on_disk(), it's preferred to let exceptions propagate (using expect/unwrap) rather than trying to recover with fallback mechanisms, as the failure indicates a fundamental system issue that needs immediate attention.
Learnt from: de-sh
PR: parseablehq/parseable#0
File: :0-0
Timestamp: 2025-03-20T15:50:45.435Z
Learning: Pay close attention to code comments for typos and semantic clarity during reviews for the Parseable project.
🧬 Code Graph Analysis (2)
src/handlers/http/users/dashboards.rs (1)
src/storage/object_storage.rs (1)
  • dashboard_path (1312-1314)
src/handlers/http/cluster/utils.rs (1)
src/cli.rs (1)
  • format (616-617)
⏰ Context from checks skipped due to timeout of 90000ms (10)
  • GitHub Check: Build Default x86_64-pc-windows-msvc
  • GitHub Check: Build Default x86_64-unknown-linux-gnu
  • GitHub Check: Build Default aarch64-apple-darwin
  • GitHub Check: Build Default x86_64-apple-darwin
  • GitHub Check: Build Default aarch64-unknown-linux-gnu
  • GitHub Check: Build Kafka aarch64-apple-darwin
  • GitHub Check: Build Kafka x86_64-unknown-linux-gnu
  • GitHub Check: Quest Smoke and Load Tests for Standalone deployments
  • GitHub Check: Quest Smoke and Load Tests for Distributed deployments
  • GitHub Check: coverage
🔇 Additional comments (15)
src/utils/json/mod.rs (1)

114-114: LGTM - String formatting modernization.

Good update to use Rust's inline variable interpolation syntax, improving code readability while maintaining identical functionality.

src/handlers/http/mod.rs (1)

92-92: LGTM - Path formatting improvement.

The inline variable interpolation makes the path construction more readable while maintaining the same functionality.

src/handlers/http/cluster/utils.rs (1)

206-206: LGTM - URL formatting modernization.

The inline variable interpolation improves readability while preserving the URL construction logic.

src/utils/human_size.rs (1)

42-42: LGTM - Byte formatting improvement.

Good modernization of the string formatting syntax while maintaining identical output.

src/utils/mod.rs (2)

56-56: LGTM - Time formatting improvement.

The inline variable interpolation makes it clearer which variables are being formatted into the time string.


85-85: LGTM - Error message formatting improvement.

The inline variable interpolation improves readability of the error message construction.

src/handlers/http/logstream.rs (7)

33-35: LGTM!

The new imports for JSON flattening utilities are properly added to support the enhanced detect_schema function.


108-127: Approve the nesting check and generic flattening logic.

The implementation correctly:

  • Checks nesting levels before attempting flattening
  • Uses proper error handling instead of unwrap()
  • Provides descriptive error messages for debugging

The error handling aligns with the retrieved learning about preserving JSON structure when flattening exceeds limits.


128-133: LGTM!

The detailed flattening step is properly implemented with appropriate error handling and follows the established pattern for error conversion.


134-138: LGTM!

The array conversion logic correctly handles both array and object cases that can result from the flattening process.


139-147: LGTM!

The schema inference properly handles errors with descriptive messages. The previous unwrap() issue has been correctly addressed with proper error handling.


148-150: LGTM!

The data type override logic correctly processes each flattened JSON object to refine the inferred schema.


152-161: LGTM!

The error handling for deeply nested JSON provides clear feedback to users and uses the appropriate HTTP status code for client errors.

src/users/filters.rs (1)

137-137: LGTM!

The string formatting has been modernized to use Rust's inline variable interpolation syntax, improving readability.

src/handlers/http/users/dashboards.rs (1)

86-86: LGTM!

The string formatting has been consistently modernized across all dashboard operations (post, update, delete) to use Rust's inline variable interpolation syntax.

Also applies to: 123-123, 148-148

✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
src/handlers/http/logstream.rs (1)

137-143: Consider providing more detailed error message.

The error message for heavily nested JSON could be more helpful by indicating the actual nesting level or the configured limit.

-        Err(StreamError::Custom {
-            msg: "heavily nested, cannot flatten this JSON".to_string(),
-            status: StatusCode::BAD_REQUEST,
-        })
+        Err(StreamError::Custom {
+            msg: format!("JSON is too deeply nested (exceeds level {}), cannot flatten", 
+                        PARSEABLE.options.event_flatten_level),
+            status: StatusCode::BAD_REQUEST,
+        })
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 04078cf and a843a5f.

📒 Files selected for processing (1)
  • src/handlers/http/logstream.rs (2 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1329
File: src/handlers/http/kinesis.rs:89-98
Timestamp: 2025-05-28T04:45:40.879Z
Learning: In Kinesis log processing, when handling deeply nested JSON that exceeds flattening limits, the original JSON structure must be preserved as-is. If the JSON is an array at the root level, attempting to add metadata fields like `requestId` and `timestamp` would fail because arrays don't support direct key-value insertion. The flattening process handles arrays by converting elements to individual objects, but when flattening is skipped, the original array structure must remain intact to prevent data loss.
Learnt from: de-sh
PR: parseablehq/parseable#1185
File: src/handlers/http/logstream.rs:255-261
Timestamp: 2025-02-14T09:49:25.818Z
Learning: In Parseable's logstream handlers, stream existence checks must be performed for both query and standalone modes. The pattern `!PARSEABLE.streams.contains(&stream_name) && (PARSEABLE.options.mode != Mode::Query || !PARSEABLE.create_stream_and_schema_from_storage(&stream_name).await?)` ensures proper error handling in both modes.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1271
File: src/prism/home/mod.rs:207-224
Timestamp: 2025-03-26T06:13:48.898Z
Learning: In the Parseable codebase, if a stream is found, the stream_jsons array will always have at least one element. Additionally, for any valid stream_json object, the log_source array will always have at least one element. This is a design invariant that makes additional null checks unnecessary.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1346
File: src/handlers/http/health_check.rs:81-90
Timestamp: 2025-06-16T02:04:58.990Z
Learning: In the shutdown function in src/handlers/http/health_check.rs, the design approach is to log errors from sync operations rather than propagate them. This is intentional because the shutdown function is called on SIGTERM/SIGINT signals, and the goal is to perform best-effort cleanup (syncing pending files to object storage) while allowing the shutdown to proceed regardless of sync failures. Logging provides debugging information without blocking the shutdown process.
src/handlers/http/logstream.rs (1)
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1329
File: src/handlers/http/kinesis.rs:89-98
Timestamp: 2025-05-28T04:45:40.879Z
Learning: In Kinesis log processing, when handling deeply nested JSON that exceeds flattening limits, the original JSON structure must be preserved as-is. If the JSON is an array at the root level, attempting to add metadata fields like `requestId` and `timestamp` would fail because arrays don't support direct key-value insertion. The flattening process handles arrays by converting elements to individual objects, but when flattening is skipped, the original array structure must remain intact to prevent data loss.
🧬 Code Graph Analysis (1)
src/handlers/http/logstream.rs (2)
src/utils/json/flatten.rs (4)
  • flatten (58-93)
  • convert_to_array (351-357)
  • generic_flattening (269-328)
  • has_more_than_max_allowed_levels (335-348)
src/event/format/mod.rs (2)
  • new (126-131)
  • override_data_type (324-368)
⏰ Context from checks skipped due to timeout of 90000ms (10)
  • GitHub Check: Build Default x86_64-pc-windows-msvc
  • GitHub Check: Build Default x86_64-apple-darwin
  • GitHub Check: Build Default x86_64-unknown-linux-gnu
  • GitHub Check: Build Kafka x86_64-unknown-linux-gnu
  • GitHub Check: Build Default aarch64-apple-darwin
  • GitHub Check: Build Default aarch64-unknown-linux-gnu
  • GitHub Check: Build Kafka aarch64-apple-darwin
  • GitHub Check: coverage
  • GitHub Check: Quest Smoke and Load Tests for Distributed deployments
  • GitHub Check: Quest Smoke and Load Tests for Standalone deployments
🔇 Additional comments (5)
src/handlers/http/logstream.rs (5)

33-35: LGTM: Import additions are appropriate.

The imported flatten utilities are necessary for the new flattening functionality in the detect_schema function.


120-125: Approve the detailed flattening logic.

The detailed flattening step with separator "_" and proper error handling is correctly implemented. The error message could be more descriptive, but the logic is sound.


126-130: Good pattern matching for array conversion.

The logic correctly handles both Array and Object cases from the flattened JSON, and the unreachable case is appropriate since the flatten function would have failed earlier for invalid types.


109-109: No changes required for nesting level check

Starting at current_level = 1 correctly treats the top-level JSON as level 1, aligns with the default event_flatten_level (10), and matches its use elsewhere in the codebase.


107-144: Two-stage flattening is intentional and follows established pattern

I’ve confirmed that this same sequence—calling generic_flattening to expand arrays followed by flatten::flatten to flatten nested objects—is used in src/utils/json/mod.rs. The two functions are complementary (one handles array branching, the other handles object nesting) and do not conflict. No changes needed.

coderabbitai[bot]
coderabbitai bot previously approved these changes Jun 28, 2025
@nitisht nitisht merged commit 32d105e into parseablehq:main Jun 28, 2025
13 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants