Skip to content

1.1 BPF Compiler Implementation

Mark Bednarczyk edited this page Nov 11, 2024 · 2 revisions

BPF Compiler Implementation Guide

This document describes the implementation architecture of the JNetRuntime BPF compiler system, which supports multiple filter expression dialects through a common compilation framework.

Architecture Overview

┌───────────────────┐
│  Filter Expression│
└────────┬──────────┘
         ↓
┌────────────────────┐
│ Dialect Compiler   │
├────────────────────┤
│   1. Lexical       │
│   2. Parsing       │
│   3. IR Generation │
│   4. Optimization  │
│   5. Code Gen      │
└────────────────────┘
         ↓
┌────────────────────┐
│   BPF Program      │
└────────────────────┘

Base Compiler Implementation

The AbstractBpfCompiler class provides a common framework for all dialect implementations:

public abstract class AbstractBpfCompiler<T extends TokenType, N extends ASTNode>
        implements BpfCompiler<T, N> {
    
    protected CompilerDialect<T, N> dialect;
    protected CompilerOptions options;
    
    // Main compilation pipeline
    public BpfProgram compile(String source, CompilerOptions options) {
        1. Create lexer
        2. Create parser
        3. Generate AST
        4. Convert to IR
        5. Optimize (optional)
        6. Validate
        7. Generate BPF program
    }
}

Compilation Pipeline

1. Lexical Analysis

Each dialect implements its own lexer:

protected abstract Lexer<T> createLexer(String source) throws CompilerException;

PCAP/TCPDump Lexer

class PcapLexer extends Lexer<PcapToken> {
    // Tokens: proto, port, host, and, or, not, numbers, etc.
    // Example: "tcp port 80 and not broadcast"
}

Wireshark Lexer

class WiresharkLexer extends Lexer<WiresharkToken> {
    // Tokens: field.subfield, ==, !=, &&, ||, strings, numbers, etc.
    // Example: "http.request.method == "GET" && ip.addr != 10.0.0.0/8"
}

NTPL Lexer

class NtplLexer extends Lexer<NtplToken> {
    // Tokens: Layer3Protocol, IPv4, TCP, [field], ==, AND, OR, etc.
    // Example: "Layer3Protocol == IPv4 AND TCP[DstPort] == 80"
}

2. Parsing

Each dialect implements its own parser:

protected abstract Parser<T, N> createParser(Lexer<T> lexer) throws CompilerException;

Parser Implementation by Dialect

PCAP Parser
class PcapParser extends Parser<PcapToken, PcapASTNode> {
    // Grammar rules:
    // expression → primitive (and|or primitive)*
    // primitive → [not] (proto|port|host) value
}
Wireshark Parser
class WiresharkParser extends Parser<WiresharkToken, WiresharkASTNode> {
    // Grammar rules:
    // expression → field op value (and|or expression)*
    // field → proto.field(.subfield)?
}
NTPL Parser
class NtplParser extends Parser<NtplToken, NtplASTNode> {
    // Grammar rules:
    // expression → proto op value (AND|OR expression)*
    // proto → Layer[2-4]Protocol | proto[field]
}

3. IR Generation

Convert AST to BPF Intermediate Representation:

protected abstract BpfIR generateIR(N ast) throws CompilerException;

IR Generation Process

1. AST Traversal
   ├── Visit each node
   ├── Generate IR instructions
   └── Link instructions

2. IR Structure
   ├── Basic blocks
   ├── Control flow
   └── Data flow

4. Optimization

Applied when optimization level > 0:

if (options.getOptimizationLevel() > 0) {
    ir.optimize();
}

Optimization Levels

Level 0: No optimization
Level 1: Basic optimizations
         ├── Constant folding
         ├── Dead code elimination
         └── Jump optimization

Level 2: Advanced optimizations
         ├── Instruction combining
         ├── Redundancy elimination
         └── Control flow optimization

5. Code Generation

Convert IR to BPF bytecode:

protected abstract BPfProgram generateProgram(BpfIR ir) throws CompilerException;

Dialect-Specific Implementations

1. PCAP/TCPDump Compiler

public class PcapCompiler extends AbstractBpfCompiler<PcapToken, PcapASTNode> {
    @Override
    protected Lexer<PcapToken> createLexer(String source) {
        return new PcapLexer(source);
    }
    
    @Override
    protected Parser<PcapToken, PcapASTNode> createParser(Lexer<PcapToken> lexer) {
        return new PcapParser(lexer);
    }
}

2. Wireshark Compiler

public class WiresharkCompiler extends AbstractBpfCompiler<WiresharkToken, WiresharkASTNode> {
    @Override
    protected Lexer<WiresharkToken> createLexer(String source) {
        return new WiresharkLexer(source);
    }
    
    @Override
    protected Parser<WiresharkToken, WiresharkASTNode> createParser(Lexer<WiresharkToken> lexer) {
        return new WiresharkParser(lexer);
    }
}

3. NTPL Compiler

public class NtplCompiler extends AbstractBpfCompiler<NtplToken, NtplASTNode> {
    @Override
    protected Lexer<NtplToken> createLexer(String source) {
        return new NtplLexer(source);
    }
    
    @Override
    protected Parser<NtplToken, NtplASTNode> createParser(Lexer<NtplToken> lexer) {
        return new NtplParser(lexer);
    }
}

Usage Examples

Basic Compilation

// Choose dialect
BpfCompiler compiler = new PcapCompiler();

// Compile with default options
BpfProgram program = compiler.compile("tcp port 80");

// Compile with specific options
CompilerOptions options = new CompilerOptions()
    .setOptimizationLevel(2)
    .setDebug(true);
BpfProgram program = compiler.compile("tcp port 80", options);

Multi-Dialect Support

// Use factory for dialect selection
BpfCompiler compiler = BpfCompiler.forDialect(CompilerDialect.PCAP);
BpfCompiler compiler = BpfCompiler.forDialect(CompilerDialect.WIRESHARK);
BpfCompiler compiler = BpfCompiler.forDialect(CompilerDialect.NTPL);

Error Handling

try {
    BpfProgram program = compiler.compile(expression);
} catch (LexicalException e) {
    // Handle lexical errors
} catch (SyntaxException e) {
    // Handle parsing errors
} catch (SemanticException e) {
    // Handle semantic errors
} catch (CompilerException e) {
    // Handle general compilation errors
}