Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Go Target: Parser Fails at EOF Expecting Different Token (Minimal Example) #4813

Open
aprice2704 opened this issue Apr 7, 2025 · 7 comments

Comments

@aprice2704
Copy link

(composed for me by gemini 4.5 pro experimental),hope it is ok. Antlr has helped us out a couple of times, but we can't get past this :(

Title: Go Target: Parser Fails at EOF Expecting Different Token (Minimal Example)

Environment:

  • ANTLR Tool Version: 4.13.2 (from antlr4-4.13.2-complete.jar)
  • Target Language: Go
  • Go ANTLR Runtime: github.com/antlr4-go/antlr/v4 (Specify version if known, otherwise state 'latest' or 'standard import')
  • Go Version: go version go1.24.1 linux/amd64

Description:

We are encountering a persistent parsing error with the ANTLR Go target where the parser fails upon reaching the EOF token, incorrectly expecting a different token based on the preceding rule structure. This occurs even with a highly simplified grammar designed to parse just one line followed by EOF.

Minimal Reproducible Example:

  1. Grammar (NeuroDataChecklist.g4 v0.0.37):

    // Grammar for parsing simple Markdown-style checklists.
    // ULTRA-SIMPLIFIED FOR DEBUGGING: Only one itemLine followed by EOF.
    // Version: 0.0.37
    grammar NeuroDataChecklist;
    
    // --- PARSER RULES ---
    checklistFile : itemLine EOF ;
    itemLine      : ITEM_LINE_CONTENT NEWLINE+ ;
    
    // --- LEXER RULES ---
    WS              : [ \t]+             -> channel(HIDDEN) ;
    ITEM_LINE_CONTENT : '-' [ \t]* '[' [ \t]* [xX ] [ \t]* ']' ( [ \t]+ ~[\r\n]+ )? ;
    NEWLINE         : ( '\r'? '\n' | '\r' )+ ;
    // ENDOFLIST (kept lexer rule from prior test, but not used in parser rule)
    ENDOFLIST       : '---ENDOFLIST---' ;
  2. Go Test Code (checklist_test.go):

    // pkg/neurodata/checklist/checklist_test.go
    package checklist
    
    import (
    	"reflect"
    	"testing"
    )
    
    // Minimal valid input string for grammar v0.0.37 (Single Line Test)
    const minimalValidInput = `- [ ] Task 1
    ` // NOTE: Only one line, trailing newline required by itemLine rule.
    
    // TestParseChecklistContent simplified for happy path
    func TestParseChecklistContent(t *testing.T) {
    
    	// Expected result for the single-line input
    	want := []map[string]interface{}{
    		{"text": "Task 1", "status": "pending"},
    	}
    
    	t.Run("Ultra_Simple_Single_Line_Test", func(t *testing.T) {
    		// Parse the input string
    		// (Keep debug prints from checklist.go enabled during testing)
    		got, err := ParseChecklistContent(minimalValidInput) // Assumes ParseChecklistContent exists
    
    		// Check for unexpected errors
    		if err != nil {
    			t.Errorf("ParseChecklistContent() returned unexpected error = %v", err)
    			t.Logf("Input:\n---\n%s\n---", minimalValidInput)
    			return // Stop test on error
    		}
    
    		// Check result (Listener logic would populate 'got', test checks if it matches 'want')
    		if !reflect.DeepEqual(got, want) {
    			t.Errorf("ParseChecklistContent() mismatch:")
    			t.Logf("  Input:\n---\n%s\n---", minimalValidInput)
    			t.Logf("  Got : %#v", got)  // Actual 'got' depends on listener implementation
    			t.Logf("  Want: %#v", want)
    		} else {
    			t.Logf("ParseChecklistContent() Success. Got: %#v", got)
    		}
    	})
    }
  3. Go Parser Invocation (checklist.go - relevant function):

    // Relevant parts of ParseChecklistContent function from checklist.go
    package checklist
    
    import (
    	"fmt"
    	"strings"
        // ... other imports: antlr, generated, regexp etc.
    	"github.com/antlr4-go/antlr/v4"
    	generated "github.com/aprice2704/neuroscript/pkg/neurodata/checklist/generated" // Adjust import path
    )
    
    // Custom error listener (standard implementation)
    type checklistErrorListener struct {
    	*antlr.DefaultErrorListener
    	Errors     []string
    	SourceName string
    }
    func newChecklistErrorListener(sourceName string) *checklistErrorListener { /* ... */ }
    func (l *checklistErrorListener) SyntaxError(recognizer antlr.Recognizer, offendingSymbol interface{}, line, column int, msg string, e antlr.RecognitionException) { /* ... appends error */ }
    
    
    // Simplified ParseChecklistContent focusing on parser setup
    func ParseChecklistContent(content string) ([]map[string]interface{}, error) {
    	fmt.Println("\n--- [DEBUG CL Parser] Starting ParseChecklistContent ---")
    	inputStream := antlr.NewInputStream(content)
    	lexer := generated.NewNeuroDataChecklistLexer(inputStream)
    	lexerErrorListener := newChecklistErrorListener("lexer")
    	lexer.RemoveErrorListeners()
    	lexer.AddErrorListener(lexerErrorListener)
    
    	tokenStream := antlr.NewCommonTokenStream(lexer, antlr.TokenDefaultChannel)
    	tokenStream.Fill() // Generate all tokens
    
        // --- Token Debug Printing ---
    	tokens := tokenStream.GetAllTokens()
    	fmt.Printf("--- [DEBUG CL Parser] Lexer Tokens (%d total) ---\n", len(tokens))
    	// (Loop to print token details as shown in test output)
        // ... token printing loop ...
    	fmt.Println("--- [DEBUG CL Parser] End Lexer Tokens ---")
    	tokenStream.Reset() // Reset stream for parser
    
    	parser := generated.NewNeuroDataChecklistParser(tokenStream)
    	parserErrorListener := newChecklistErrorListener("parser")
    	parser.RemoveErrorListeners()
    	parser.AddErrorListener(parserErrorListener)
    
    	// *** PARSE EXECUTION ***
    	tree := parser.ChecklistFile() // Call the start rule
    
    	allErrors := append(lexerErrorListener.Errors, parserErrorListener.Errors...)
    	if len(allErrors) > 0 {
    		fmt.Println("--- [DEBUG CL Parser] PARSE FAILED ---")
    		errorString := strings.Join(allErrors, "\n        ")
    		return nil, fmt.Errorf("syntax error(s) parsing checklist:\n        %s", errorString)
    	}
    
    	fmt.Println("--- [DEBUG CL Parser] PARSE OK, walking tree... ---")
        // Listener logic would go here if parse succeeded
        // listener := newChecklistListener(parser)
        // antlr.ParseTreeWalkerDefault.Walk(listener, tree)
        // ... process listener.Items ...
    	return make([]map[string]interface{}, 0), nil // Dummy success return
    }

    (Note: Listener details omitted as the error occurs during parsing, before the listener walk)

  4. Test Output (go test -v .):

    === RUN   TestParseChecklistContent
    === RUN   TestParseChecklistContent/Ultra_Simple_Single_Line_Test
    
    --- [DEBUG CL Parser] Starting ParseChecklistContent ---
    --- [DEBUG CL Parser] Lexer Tokens (3 total) ---
      [0] Type=ITEM_LINE_CONTENT (2), Text="- [ ] Task 1", Line=1, Col=0, Channel=0
      [1] Type=NEWLINE (4), Text="\n", Line=1, Col=12, Channel=0
      [2] Type=EOF (-1), Text="<EOF>", Line=2, Col=0, Channel=0
    --- [DEBUG CL Parser] End Lexer Tokens ---
    --- [DEBUG CL Parser] PARSE FAILED ---
        checklist_test.go:31: ParseChecklistContent() returned unexpected error = syntax error(s) parsing checklist:
                parser:2:0: mismatched input '<EOF>' expecting ITEM_LINE_CONTENT
        checklist_test.go:32: Input:
            ---
            - [ ] Task 1
    
            ---
    --- FAIL: TestParseChecklistContent (0.00s)
        --- FAIL: TestParseChecklistContent/Ultra_Simple_Single_Line_Test (0.00s)
    FAIL
    FAIL    github.com/aprice2704/neuroscript/pkg/neurodata/checklist       0.005s
    ?       github.com/aprice2704/neuroscript/pkg/neurodata/checklist/generated     [no test files]
    FAIL
    

Observed Behavior:

The lexer correctly tokenizes the input string - [ ] Task 1\n into ITEM_LINE_CONTENT, NEWLINE, EOF.
The parser rule checklistFile : itemLine EOF ; successfully consumes the itemLine (ITEM_LINE_CONTENT NEWLINE+).
However, when the parser then encounters the EOF token (at Line 2, Col 0), it reports an error: mismatched input '<EOF>' expecting ITEM_LINE_CONTENT.

Expected Behavior:

According to the grammar rule checklistFile : itemLine EOF ;, after successfully parsing itemLine, the parser should expect and successfully match the EOF token, resulting in a successful parse.

Debugging Steps Attempted:

We arrived at this minimal example after trying several variations on a more complex grammar intended to parse multiple checklist items:

  • Using itemLine+ EOF
  • Using itemLine (itemLine)* EOF
  • Introducing an explicit ENDOFLIST token before EOF (... ENDOFLIST EOF)
  • Allowing optional NEWLINE tokens around ENDOFLIST (... NEWLINE* ENDOFLIST NEWLINE* EOF)

All variations exhibited the same fundamental failure mode: the parser correctly consumed the structure before the EOF (or the element immediately preceding EOF) but then failed upon seeing EOF, incorrectly expecting ITEM_LINE_CONTENT instead.

This suggests a potential issue in the Go target's prediction or state handling when approaching the end of the input stream for these grammar structures.

@kaby76
Copy link
Contributor

kaby76 commented Apr 7, 2025

I am not getting any parse error.

$ trgen -t Go
CSharp  NeuroDataChecklist.g4 success 0.0361382
Dependency graph of grammar:
2 vertices, 1 edges
NeuroDataChecklistParser: NeuroDataChecklistParser->NeuroDataChecklistLexer
NeuroDataChecklistLexer:

Top-level grammars NeuroDataChecklistLexer NeuroDataChecklistParser
Start rule checklistFile
Rendering template file from Go/st.build.ps1 to ./Generated-Go/build.ps1
Rendering template file from Go/st.build.sh to ./Generated-Go/build.sh
Rendering template file from Go/st.clean.ps1 to ./Generated-Go/clean.ps1
Rendering template file from Go/st.clean.sh to ./Generated-Go/clean.sh
Rendering template file from Go/st.go.mod to ./Generated-Go/go.mod
Rendering template file from Go/st.makefile to ./Generated-Go/makefile
Rendering template file from Go/st.perf.sh to ./Generated-Go/perf.sh
Rendering template file from Go/st.run.ps1 to ./Generated-Go/run.ps1
Rendering template file from Go/st.run.sh to ./Generated-Go/run.sh
Rendering template file from Go/st.test.ps1 to ./Generated-Go/test.ps1
Rendering template file from Go/st.test.sh to ./Generated-Go/test.sh
Rendering template file from Go/Test.go.st to ./Generated-Go/Test.go
Copying template file from C:/msys64/home/Kenne/issues/a4-4813/NeuroDataChecklist.g4 to ./Generated-Go/parser/NeuroDataChecklist.g4
04/07-06:42:43 ~/issues/a4-4813
$ cd Generated-Go/
04/07-06:42:46 ~/issues/a4-4813/Generated-Go
$ make
bash build.sh
go: github.com/antlr4-go/antlr/[email protected] requires go >= 1.22; switching to go1.23.8
go: upgraded go 1.20 => 1.22
go: added toolchain go1.23.8
go: added github.com/antlr4-go/antlr/v4 v4.13.1
go: added golang.org/x/exp v0.0.0-20240506185415-9bf2ced13842
04/07-06:43:05 ~/issues/a4-4813/Generated-Go
$ ./Test.exe -tokens
build.ps1  clean.ps1  go.mod     makefile   perf.sh    run.sh     Test.go    test.sh
build.sh   clean.sh   go.sum     parser/    run.ps1    Test.exe   test.ps1
04/07-06:43:05 ~/issues/a4-4813/Generated-Go
$ ./Test.exe -tokens ../Generated-CSharp/in.txt
[@0,0:11='- [ ] Task 1',<2>,1:0]
[@1,12:13='\r\n',<3>,1:12]
[@2,14:13='<EOF>',<-1>,2:0]
Go 0 ../Generated-CSharp/in.txt success 0.000
Total Time: 0.001
04/07-06:44:07 ~/issues/a4-4813/Generated-Go
$ cat ../Generated-CSharp/in.txt
- [ ] Task 1
04/07-06:45:26 ~/issues/a4-4813/Generated-Go
$ cat parser/NeuroDataChecklist.g4
// Grammar for parsing simple Markdown-style checklists.
// ULTRA-SIMPLIFIED FOR DEBUGGING: Only one itemLine followed by EOF.
// Version: 0.0.37
grammar NeuroDataChecklist;

// --- PARSER RULES ---
checklistFile : itemLine EOF ;
itemLine      : ITEM_LINE_CONTENT NEWLINE+ ;

// --- LEXER RULES ---
WS              : [ \t]+             -> channel(HIDDEN) ;
ITEM_LINE_CONTENT : '-' [ \t]* '[' [ \t]* [xX ] [ \t]* ']' ( [ \t]+ ~[\r\n]+ )? ;
NEWLINE         : ( '\r'? '\n' | '\r' )+ ;
// ENDOFLIST (kept lexer rule from prior test, but not used in parser rule)
ENDOFLIST       : '---ENDOFLIST---' ;
04/07-06:46:22 ~/issues/a4-4813/Generated-Go

@jimidle
Copy link
Collaborator

jimidle commented Apr 7, 2025

Have tried completely cleaning your output from compiling and regenerating?

@aprice2704
Copy link
Author

Hey folks! Thanks so much for looking into this, especially so quickly :D :D Antlr has been a real help with NeuroScript and has fished me out of the soup a couple of time, so was surprised when we hit this snag with a very simple format. As an interim measure I am using goyacc for the checklist format for now.

I built a new, empty directory and put the files below in it, along with the 4.13.2 jar. When I ran the _test file, I got the same error as before. Using go 1.24.1 and github.com/antlr4-go/antlr/v4 v4.13.1 (which seems to be the latest I can get. Also java claims a bunch of old 4.10 jars which I tried to try are corrupt btw)

I got the same error, which Gemini says means there must be something different between my go approach and the trgen one, which succeeded:

dev/bugtest/checklist  $ ./build.sh
Generating parser code...
Running tests...
=== RUN   TestParseChecklistContent
=== RUN   TestParseChecklistContent/Ultra_Simple_Single_Line_Test

--- [DEBUG CL Parser] Starting ParseChecklistContent ---
--- [DEBUG CL Parser] Lexer Tokens (3 total) ---
  [0] Type=ITEM_LINE_CONTENT (2), Text="- [ ] Task 1", Line=1, Col=0, Channel=0
  [1] Type=NEWLINE (3), Text="\n", Line=1, Col=12, Channel=0
  [2] Type=EOF (-1), Text="<EOF>", Line=2, Col=0, Channel=0
--- [DEBUG CL Parser] End Lexer Tokens ---
--- [DEBUG CL Parser] PARSE FAILED ---
    checklist_test.go:44: ParseChecklistContent() correctly failed with expected EOF error: syntax error(s) parsing checklist:
                parser:2:0: mismatched input '<EOF>' expecting ITEM_LINE_CONTENT
--- PASS: TestParseChecklistContent (0.00s)
    --- PASS: TestParseChecklistContent/Ultra_Simple_Single_Line_Test (0.00s)
PASS
ok      minimal_antlr_bug/checklist     0.003s

here are the source files for completeness:

#!/bin/bash

# Ensure ANTLR tool jar is available (e.g., in parent dir or specified path)
# Using 4.13.2 as specified in original build.md [cite: 1]
ANTLR_JAR="antlr4-4.13.2-complete.jar"
# Or use 4.13.1 if testing that specific alignment:
# ANTLR_JAR="../../antlr4-4.13.1-complete.jar"

# Check if JAR exists
if [ ! -f "$ANTLR_JAR" ]; then
    echo "Error: ANTLR JAR not found at $ANTLR_JAR"
    exit 1
fi

echo "Generating parser code..."
# Generate into a 'generated' subdirectory
java -jar "$ANTLR_JAR" -Dlanguage=Go -o generated -visitor -listener -package generated NeuroDataChecklist.g4

echo "Running tests..."
# Fetch dependencies and run tests
go mod tidy
go test -v .
// checklist_test.go
package checklist

import (
	"strings"
	"testing"
)

// Minimal valid input string for grammar v0.0.37 (Single Line Test)
const minimalValidInput = `- [ ] Task 1
` // NOTE: Only one line, trailing newline required by itemLine rule.

// TestParseChecklistContent simplified for happy path
func TestParseChecklistContent(t *testing.T) {

	// Expected result for the single-line input
	// Note: The parse fails BEFORE the listener runs, so the content doesn't matter for the bug itself,
	// but this is what we *would* expect if it worked.
	_ = []map[string]interface{}{
		{"text": "Task 1", "status": "pending"},
	}

	t.Run("Ultra_Simple_Single_Line_Test", func(t *testing.T) {
		// Parse the input string (using the ANTLR-based function)
		got, err := ParseChecklistContent(minimalValidInput)

		// Check for the specific parse error
		if err == nil {
			t.Errorf("ParseChecklistContent() did NOT return an error, expected EOF mismatch error.")
			t.Logf("Input:\n---\n%s\n---", minimalValidInput)
			t.Logf("Got result: %#v", got) // Log what was returned if no error occurred
			return
		}

		// Check if the error message contains the expected mismatch (adjust if needed)
		// Example check: Look for "mismatched input '<EOF>' expecting" or similar
		expectedErrorSubstring := "mismatched input '<EOF>' expecting ITEM_LINE_CONTENT" // Or the "missing" variant
		if !strings.Contains(err.Error(), expectedErrorSubstring) {
			t.Errorf("ParseChecklistContent() returned an unexpected error:")
			t.Logf("  Input:\n---\n%s\n---", minimalValidInput)
			t.Logf("  Got error: %v", err)
			t.Logf("  Expected error containing: %q", expectedErrorSubstring)
		} else {
			t.Logf("ParseChecklistContent() correctly failed with expected EOF error: %v", err)
		}

		// We don't compare 'got' and 'want' here because the parse is expected to fail.
	})
}
// checklist.go
package checklist

import (
	"fmt"
	"regexp"
	"strings"

	"github.com/antlr4-go/antlr/v4"
	// NOTE: Remove core/tool imports if isolating this package
	// "github.com/aprice2704/neuroscript/pkg/core"
	generated "minimal_antlr_bug/checklist/generated" // Adjust import path if needed
)

// --- Custom ANTLR Error Listener ---
type checklistErrorListener struct {
	*antlr.DefaultErrorListener
	Errors     []string
	SourceName string
}

func newChecklistErrorListener(sourceName string) *checklistErrorListener {
	return &checklistErrorListener{Errors: make([]string, 0), SourceName: sourceName}
}
func (l *checklistErrorListener) SyntaxError(recognizer antlr.Recognizer, offendingSymbol interface{}, line, column int, msg string, e antlr.RecognitionException) {
	errorMessage := fmt.Sprintf("%s:%d:%d: %s", l.SourceName, line, column, msg)
	l.Errors = append(l.Errors, errorMessage)
}

// === PARSER LOGIC ===
type ChecklistItem struct{ Text, Status string }

// checklistListener
type checklistListener struct {
	*generated.BaseNeuroDataChecklistListener
	Items     []ChecklistItem
	err       error
	itemRegex *regexp.Regexp
	parser    *generated.NeuroDataChecklistParser
}

func newChecklistListener(parser *generated.NeuroDataChecklistParser) *checklistListener {
	regex := regexp.MustCompile(`^\s*-\s*\[\s*([xX ])\s*\](?:\s*(.*))?\s*$`)
	return &checklistListener{
		Items:     make([]ChecklistItem, 0),
		itemRegex: regex,
		parser:    parser,
	}
}

// EnterItemLine (Needed for v0.0.37 grammar)
func (l *checklistListener) EnterItemLine(ctx *generated.ItemLineContext) {
	// This logic runs if parsing succeeds, extracts item data
	fmt.Printf("[DEBUG CL Listener] === Method EnterItemLine called ===\n")
	if l.err != nil {
		return
	}

	itemToken := ctx.ITEM_LINE_CONTENT()
	if itemToken == nil {
		l.err = fmt.Errorf("internal listener error: ITEM_LINE_CONTENT token missing in itemLine context: %q", ctx.GetText())
		fmt.Printf("[ERROR CL Listener] %v\n", l.err)
		return
	}
	rawLineText := itemToken.GetText()
	lineText := strings.TrimSuffix(rawLineText, "\n")
	lineText = strings.TrimSuffix(lineText, "\r")

	matches := l.itemRegex.FindStringSubmatch(lineText)
	if len(matches) == 3 {
		mark := matches[1]
		text := strings.TrimSpace(matches[2])
		status := "pending"
		if strings.ToLower(mark) == "x" {
			status = "done"
		}
		newItem := ChecklistItem{Text: text, Status: status}
		l.Items = append(l.Items, newItem)
		fmt.Printf("[DEBUG CL Listener]   -> Parsed Item: %+v\n", newItem)
	} else {
		fmt.Printf("[WARN CL Listener] Regex failed for ITEM_LINE_CONTENT token text: %q\n", lineText)
	}
}

// ParseChecklistContent (Entry point for parsing)
func ParseChecklistContent(content string) ([]map[string]interface{}, error) {
	fmt.Println("\n--- [DEBUG CL Parser] Starting ParseChecklistContent ---")
	inputStream := antlr.NewInputStream(content)
	lexer := generated.NewNeuroDataChecklistLexer(inputStream)
	lexerErrorListener := newChecklistErrorListener("lexer")
	lexer.RemoveErrorListeners()
	lexer.AddErrorListener(lexerErrorListener)

	tokenStream := antlr.NewCommonTokenStream(lexer, antlr.TokenDefaultChannel)
	tokenStream.Fill()
	tokens := tokenStream.GetAllTokens()
	fmt.Printf("--- [DEBUG CL Parser] Lexer Tokens (%d total) ---\n", len(tokens))
	// (Token printing loop - same as before)
	lexerSymbolicNames := lexer.GetSymbolicNames()
	lexerLiteralNames := lexer.GetLiteralNames()
	for i, token := range tokens {
		tokenType := token.GetTokenType()
		tokenName := fmt.Sprintf("Type(%d)", tokenType)
		if tokenType == antlr.TokenEOF {
			tokenName = "EOF"
		} else if tokenType > 0 {
			if tokenType < len(lexerSymbolicNames) && lexerSymbolicNames[tokenType] != "" {
				tokenName = lexerSymbolicNames[tokenType]
			} else if tokenType < len(lexerLiteralNames) && lexerLiteralNames[tokenType] != "" {
				tokenName = lexerLiteralNames[tokenType]
			}
		}
		fmt.Printf("  [%d] Type=%s (%d), Text=%q, Line=%d, Col=%d, Channel=%d\n", i, tokenName, tokenType, token.GetText(), token.GetLine(), token.GetColumn(), token.GetChannel())
	}
	fmt.Println("--- [DEBUG CL Parser] End Lexer Tokens ---")
	tokenStream.Reset()

	parser := generated.NewNeuroDataChecklistParser(tokenStream)
	parserErrorListener := newChecklistErrorListener("parser")
	parser.RemoveErrorListeners()
	parser.AddErrorListener(parserErrorListener)

	// *** PARSE EXECUTION ***
	// Wrap call to ChecklistFile in a recover block if needed, but main error check is below
	tree := parser.ChecklistFile() // Call start rule

	allErrors := append(lexerErrorListener.Errors, parserErrorListener.Errors...)
	if len(allErrors) > 0 {
		fmt.Println("--- [DEBUG CL Parser] PARSE FAILED ---")
		errorString := strings.Join(allErrors, "\n        ")
		// Return early on parse error
		return nil, fmt.Errorf("syntax error(s) parsing checklist:\n        %s", errorString)
	}

	// --- Parse Tree Walk (Only if parse succeeds) ---
	fmt.Println("--- [DEBUG CL Parser] PARSE OK, walking tree... ---")
	listener := newChecklistListener(parser)
	antlr.ParseTreeWalkerDefault.Walk(listener, tree)

	if listener.err != nil {
		return nil, fmt.Errorf("error during parse tree walk: %w", listener.err)
	}

	// Convert listener's items to result map
	resultMaps := make([]map[string]interface{}, len(listener.Items))
	for i, item := range listener.Items {
		resultMaps[i] = map[string]interface{}{"text": item.Text, "status": item.Status}
	}
	fmt.Printf("--- [DEBUG CL Parser] ParseChecklistContent finished. Found %d items.\n", len(resultMaps))
	return resultMaps, nil
}
// Grammar for parsing simple Markdown-style checklists.
// ULTRA-SIMPLIFIED FOR DEBUGGING: Only one itemLine followed by EOF.
// Version: 0.0.37
grammar NeuroDataChecklist;

// --- PARSER RULES ---
checklistFile : itemLine EOF ;
itemLine      : ITEM_LINE_CONTENT NEWLINE+ ;

// --- LEXER RULES ---
WS              : [ \t]+             -> channel(HIDDEN) ;
ITEM_LINE_CONTENT : '-' [ \t]* '[' [ \t]* [xX ] [ \t]* ']' ( [ \t]+ ~[\r\n]+ )? ;
NEWLINE         : ( '\r'? '\n' | '\r' )+ ;

HTH!

Andy

@kaby76
Copy link
Contributor

kaby76 commented Apr 8, 2025

Please remove the token print out and reset code and test the parse. Tokenstream reset is deprecated.

@aprice2704
Copy link
Author

I see the same error with this code:

// checklist.go
package checklist

import (
	"fmt"
	"regexp"
	"strings"

	"github.com/antlr4-go/antlr/v4"
	// NOTE: Remove core/tool imports if isolating this package
	// "github.com/aprice2704/neuroscript/pkg/core"
	generated "minimal_antlr_bug/checklist/generated" // Adjust import path if needed
)

// --- Custom ANTLR Error Listener ---
type checklistErrorListener struct {
	*antlr.DefaultErrorListener
	Errors     []string
	SourceName string
}

func newChecklistErrorListener(sourceName string) *checklistErrorListener {
	return &checklistErrorListener{Errors: make([]string, 0), SourceName: sourceName}
}
func (l *checklistErrorListener) SyntaxError(recognizer antlr.Recognizer, offendingSymbol interface{}, line, column int, msg string, e antlr.RecognitionException) {
	errorMessage := fmt.Sprintf("%s:%d:%d: %s", l.SourceName, line, column, msg)
	l.Errors = append(l.Errors, errorMessage)
}

// === PARSER LOGIC ===
type ChecklistItem struct{ Text, Status string }

// checklistListener
type checklistListener struct {
	*generated.BaseNeuroDataChecklistListener
	Items     []ChecklistItem
	err       error
	itemRegex *regexp.Regexp
	parser    *generated.NeuroDataChecklistParser
}

func newChecklistListener(parser *generated.NeuroDataChecklistParser) *checklistListener {
	regex := regexp.MustCompile(`^\s*-\s*\[\s*([xX ])\s*\](?:\s*(.*))?\s*$`)
	return &checklistListener{
		Items:     make([]ChecklistItem, 0),
		itemRegex: regex,
		parser:    parser,
	}
}

// EnterItemLine (Needed for v0.0.37 grammar)
func (l *checklistListener) EnterItemLine(ctx *generated.ItemLineContext) {
	// This logic runs if parsing succeeds, extracts item data
	fmt.Printf("[DEBUG CL Listener] === Method EnterItemLine called ===\n")
	if l.err != nil {
		return
	}

	itemToken := ctx.ITEM_LINE_CONTENT()
	if itemToken == nil {
		l.err = fmt.Errorf("internal listener error: ITEM_LINE_CONTENT token missing in itemLine context: %q", ctx.GetText())
		fmt.Printf("[ERROR CL Listener] %v\n", l.err)
		return
	}
	rawLineText := itemToken.GetText()
	lineText := strings.TrimSuffix(rawLineText, "\n")
	lineText = strings.TrimSuffix(lineText, "\r")

	matches := l.itemRegex.FindStringSubmatch(lineText)
	if len(matches) == 3 {
		mark := matches[1]
		text := strings.TrimSpace(matches[2])
		status := "pending"
		if strings.ToLower(mark) == "x" {
			status = "done"
		}
		newItem := ChecklistItem{Text: text, Status: status}
		l.Items = append(l.Items, newItem)
		fmt.Printf("[DEBUG CL Listener]   -> Parsed Item: %+v\n", newItem)
	} else {
		fmt.Printf("[WARN CL Listener] Regex failed for ITEM_LINE_CONTENT token text: %q\n", lineText)
	}
}

// ParseChecklistContent (Entry point for parsing)
func ParseChecklistContent(content string) ([]map[string]interface{}, error) {
	fmt.Println("\n--- [DEBUG CL Parser] Starting ParseChecklistContent ---")
	inputStream := antlr.NewInputStream(content)
	lexer := generated.NewNeuroDataChecklistLexer(inputStream)
	lexerErrorListener := newChecklistErrorListener("lexer")
	lexer.RemoveErrorListeners()
	lexer.AddErrorListener(lexerErrorListener)

	tokenStream := antlr.NewCommonTokenStream(lexer, antlr.TokenDefaultChannel)
	// tokenStream.Fill() // <-- REMOVED/COMMENTED OUT: Not strictly necessary, parser pulls tokens as needed
	// tokens := tokenStream.GetAllTokens() // <-- REMOVED/COMMENTED OUT
	// fmt.Printf("--- [DEBUG CL Parser] Lexer Tokens (%d total) ---\n", len(tokens)) // <-- REMOVED/COMMENTED OUT
	// // (Token printing loop - same as before)
	// lexerSymbolicNames := lexer.GetSymbolicNames() // <-- REMOVED/COMMENTED OUT
	// lexerLiteralNames := lexer.GetLiteralNames() // <-- REMOVED/COMMENTED OUT
	// // --- START: Token Print Out ---
	// for i, token := range tokens { // <-- ENTIRE LOOP REMOVED/COMMENTED OUT
	// 	tokenType := token.GetTokenType()
	// 	tokenName := fmt.Sprintf("Type(%d)", tokenType)
	// 	if tokenType == antlr.TokenEOF {
	// 		tokenName = "EOF"
	// 	} else if tokenType > 0 {
	// 		if tokenType < len(lexerSymbolicNames) && lexerSymbolicNames[tokenType] != "" {
	// 			tokenName = lexerSymbolicNames[tokenType]
	// 		} else if tokenType < len(lexerLiteralNames) && lexerLiteralNames[tokenType] != "" {
	// 			tokenName = lexerLiteralNames[tokenType]
	// 		}
	// 	}
	// 	fmt.Printf("  [%d] Type=%s (%d), Text=%q, Line=%d, Col=%d, Channel=%d\n", i, tokenName, tokenType, token.GetText(), token.GetLine(), token.GetColumn(), token.GetChannel())
	// }
	// fmt.Println("--- [DEBUG CL Parser] End Lexer Tokens ---") // <-- REMOVED/COMMENTED OUT
	// // --- END: Token Print Out ---
	// tokenStream.Reset() // <-- Reset Code REMOVED/COMMENTED OUT

	// The parser is created directly with the token stream.
	// It will pull tokens from the lexer via the stream as it parses.
	parser := generated.NewNeuroDataChecklistParser(tokenStream)
	parserErrorListener := newChecklistErrorListener("parser")
	parser.RemoveErrorListeners()
	parser.AddErrorListener(parserErrorListener)

	// *** PARSE EXECUTION ***
	// Wrap call to ChecklistFile in a recover block if needed, but main error check is below
	fmt.Println("--- [DEBUG CL Parser] Calling parser.ChecklistFile() ---") // Added for clarity
	tree := parser.ChecklistFile()                                          // Call start rule

	allErrors := append(lexerErrorListener.Errors, parserErrorListener.Errors...)
	if len(allErrors) > 0 {
		fmt.Println("--- [DEBUG CL Parser] PARSE FAILED ---")
		errorString := strings.Join(allErrors, "\n        ")
		// Return early on parse error
		return nil, fmt.Errorf("syntax error(s) parsing checklist:\n        %s", errorString)
	}

	// --- Parse Tree Walk (Only if parse succeeds) ---
	fmt.Println("--- [DEBUG CL Parser] PARSE OK, walking tree... ---")
	listener := newChecklistListener(parser)
	antlr.ParseTreeWalkerDefault.Walk(listener, tree)

	if listener.err != nil {
		return nil, fmt.Errorf("error during parse tree walk: %w", listener.err)
	}

	// Convert listener's items to result map
	resultMaps := make([]map[string]interface{}, len(listener.Items))
	for i, item := range listener.Items {
		resultMaps[i] = map[string]interface{}{"text": item.Text, "status": item.Status}
	}
	fmt.Printf("--- [DEBUG CL Parser] ParseChecklistContent finished. Found %d items.\n", len(resultMaps))
	return resultMaps, nil
}

@jimidle
Copy link
Collaborator

jimidle commented Apr 8, 2025

The jar you use and the runtime you use, need to be the same version, but whatever is happening here, it isn’t ANTLR or the Go runtime. Maybe dump the contents of the context var before trying to parse it? But take this down to the minimum by just providing the input string in the code in a hardcoded assignment.

@kaby76
Copy link
Contributor

kaby76 commented Apr 8, 2025

I rewrote the generated app from trgen to be more like your program, simplifying your code even more. It still works. Here's a zip with the program. Adjust build.sh to what you need. other.zip I adjusted the build.sh script to use the 4.13.2 tool with the 4.13.1 runtime, and it still parses just fine. go version => go version go1.23.8 windows/amd64.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants