This enhancement adds native support in CDAP Wrangler for parsing and processing Byte Size (e.g., 10KB
, 1.5MB
) and Time Duration (e.g., 10ms
, 2s
) units. It enables cleaner and more efficient recipes when working with data sizes and time intervals, including a new directive for aggregation.
- ByteSize.java: Parses strings like
10KB
,1.5MB
, returns value in bytes. - TimeDuration.java: Parses
10ms
,2s
, returns value in nanoseconds.
- Lexer & parser rules added in
Directives.g4
.- New tokens:
BYTE_SIZE
,TIME_DURATION
- New parser rules:
byteSizeArg
,timeDurationArg
- New tokens:
- Regenerated using
mvn compile
.
Aggregates ByteSize and TimeDuration columns in recipes.
Usage:
aggregate-stats :data_transfer_size :response_time total_size_mb total_time_sec
Supports:
- Unit conversion (e.g., bytes → MB, nanoseconds → seconds)
- Aggregation types: total (default), optional: average, median, etc.
ByteSize
andTimeDuration
extendToken
inwrangler-api
- Canonical conversion methods:
getBytes()
for ByteSizegetNanoseconds()
for TimeDuration
ByteSizeTest
: Validates parsing ("10KB"
→10240
bytes).TimeDurationTest
: Validates conversion ("2s"
→2_000_000_000
ns).
- Ensures grammar accepts valid unit formats, rejects invalid ones.
Sample input:
data_transfer_size | response_time |
---|---|
1024 | 150 |
2048 | 200 |
3072 | 250 |
Recipe:
aggregate-stats :data_transfer_size :response_time total_size_mb total_time_sec
Expected Output:
total_size_mb | total_time_sec |
---|---|
0.003 | 0.0008 |
# Clone
git clone https://github.com/<your-username>/wrangler.git
cd wrangler
# Add upstream
git remote add upstream https://github.com/data-integrations/wrangler.git
# Build
mvn clean install
# Run tests
mvn test
wrangler-api
:ByteSize.java
,TimeDuration.java
, Token updates
wrangler-core
:Directives.g4
, parser visitor updatesAggregateStats.java
directive
wrangler-core/src/test
:- Unit tests for classes and directive
- Grammar updated (
Directives.g4
) - Token parser classes added
- New directive:
aggregate-stats
- Unit & parser tests written
- End-to-end test with sample input
- Prompts.txt committed (AI is used)