Skip to content

Ayushpal11/Wrangler-Zeotap_assignmnet-

Repository files navigation

CDAP Wrangler Enhancement: Byte Size & Time Duration Parsing

📌 Overview

This enhancement adds native support in CDAP Wrangler for parsing and processing Byte Size (e.g., 10KB, 1.5MB) and Time Duration (e.g., 10ms, 2s) units. It enables cleaner and more efficient recipes when working with data sizes and time intervals, including a new directive for aggregation.


✅ Key Features

1. New Unit Parsers

  • ByteSize.java: Parses strings like 10KB, 1.5MB, returns value in bytes.
  • TimeDuration.java: Parses 10ms, 2s, returns value in nanoseconds.

2. Grammar Enhancements

  • Lexer & parser rules added in Directives.g4.
    • New tokens: BYTE_SIZE, TIME_DURATION
    • New parser rules: byteSizeArg, timeDurationArg
  • Regenerated using mvn compile.

3. New Directive: aggregate-stats

Aggregates ByteSize and TimeDuration columns in recipes.

Usage:

aggregate-stats :data_transfer_size :response_time total_size_mb total_time_sec

Supports:

  • Unit conversion (e.g., bytes → MB, nanoseconds → seconds)
  • Aggregation types: total (default), optional: average, median, etc.

4. API Updates

  • ByteSize and TimeDuration extend Token in wrangler-api
  • Canonical conversion methods:
    • getBytes() for ByteSize
    • getNanoseconds() for TimeDuration

🧪 Testing

1. Unit Tests

  • ByteSizeTest: Validates parsing ("10KB"10240 bytes).
  • TimeDurationTest: Validates conversion ("2s"2_000_000_000 ns).

2. Parser Tests

  • Ensures grammar accepts valid unit formats, rejects invalid ones.

3. Directive Tests

Sample input:

data_transfer_size response_time
1024 150
2048 200
3072 250

Recipe:

aggregate-stats :data_transfer_size :response_time total_size_mb total_time_sec

Expected Output:

total_size_mb total_time_sec
0.003 0.0008

🚀 Quick Start

# Clone
git clone https://github.com/<your-username>/wrangler.git
cd wrangler

# Add upstream
git remote add upstream https://github.com/data-integrations/wrangler.git

# Build
mvn clean install

# Run tests
mvn test

📂 File Changes Summary

  • wrangler-api:
    • ByteSize.java, TimeDuration.java, Token updates
  • wrangler-core:
    • Directives.g4, parser visitor updates
    • AggregateStats.java directive
  • wrangler-core/src/test:
    • Unit tests for classes and directive

📄 Assignment Checklist

  • Grammar updated (Directives.g4)
  • Token parser classes added
  • New directive: aggregate-stats
  • Unit & parser tests written
  • End-to-end test with sample input
  • Prompts.txt committed (AI is used)

About

No description, website, or topics provided.

Resources

License

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published