-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Go Target: Parser Fails at EOF Expecting Different Token (Minimal Example) #4813
Comments
I am not getting any parse error.
|
Have tried completely cleaning your output from compiling and regenerating? |
Hey folks! Thanks so much for looking into this, especially so quickly :D :D Antlr has been a real help with NeuroScript and has fished me out of the soup a couple of time, so was surprised when we hit this snag with a very simple format. As an interim measure I am using goyacc for the checklist format for now. I built a new, empty directory and put the files below in it, along with the 4.13.2 jar. When I ran the _test file, I got the same error as before. Using go 1.24.1 and github.com/antlr4-go/antlr/v4 v4.13.1 (which seems to be the latest I can get. Also java claims a bunch of old 4.10 jars which I tried to try are corrupt btw) I got the same error, which Gemini says means there must be something different between my go approach and the trgen one, which succeeded: dev/bugtest/checklist $ ./build.sh
Generating parser code...
Running tests...
=== RUN TestParseChecklistContent
=== RUN TestParseChecklistContent/Ultra_Simple_Single_Line_Test
--- [DEBUG CL Parser] Starting ParseChecklistContent ---
--- [DEBUG CL Parser] Lexer Tokens (3 total) ---
[0] Type=ITEM_LINE_CONTENT (2), Text="- [ ] Task 1", Line=1, Col=0, Channel=0
[1] Type=NEWLINE (3), Text="\n", Line=1, Col=12, Channel=0
[2] Type=EOF (-1), Text="<EOF>", Line=2, Col=0, Channel=0
--- [DEBUG CL Parser] End Lexer Tokens ---
--- [DEBUG CL Parser] PARSE FAILED ---
checklist_test.go:44: ParseChecklistContent() correctly failed with expected EOF error: syntax error(s) parsing checklist:
parser:2:0: mismatched input '<EOF>' expecting ITEM_LINE_CONTENT
--- PASS: TestParseChecklistContent (0.00s)
--- PASS: TestParseChecklistContent/Ultra_Simple_Single_Line_Test (0.00s)
PASS
ok minimal_antlr_bug/checklist 0.003s here are the source files for completeness: #!/bin/bash
# Ensure ANTLR tool jar is available (e.g., in parent dir or specified path)
# Using 4.13.2 as specified in original build.md [cite: 1]
ANTLR_JAR="antlr4-4.13.2-complete.jar"
# Or use 4.13.1 if testing that specific alignment:
# ANTLR_JAR="../../antlr4-4.13.1-complete.jar"
# Check if JAR exists
if [ ! -f "$ANTLR_JAR" ]; then
echo "Error: ANTLR JAR not found at $ANTLR_JAR"
exit 1
fi
echo "Generating parser code..."
# Generate into a 'generated' subdirectory
java -jar "$ANTLR_JAR" -Dlanguage=Go -o generated -visitor -listener -package generated NeuroDataChecklist.g4
echo "Running tests..."
# Fetch dependencies and run tests
go mod tidy
go test -v . // checklist_test.go
package checklist
import (
"strings"
"testing"
)
// Minimal valid input string for grammar v0.0.37 (Single Line Test)
const minimalValidInput = `- [ ] Task 1
` // NOTE: Only one line, trailing newline required by itemLine rule.
// TestParseChecklistContent simplified for happy path
func TestParseChecklistContent(t *testing.T) {
// Expected result for the single-line input
// Note: The parse fails BEFORE the listener runs, so the content doesn't matter for the bug itself,
// but this is what we *would* expect if it worked.
_ = []map[string]interface{}{
{"text": "Task 1", "status": "pending"},
}
t.Run("Ultra_Simple_Single_Line_Test", func(t *testing.T) {
// Parse the input string (using the ANTLR-based function)
got, err := ParseChecklistContent(minimalValidInput)
// Check for the specific parse error
if err == nil {
t.Errorf("ParseChecklistContent() did NOT return an error, expected EOF mismatch error.")
t.Logf("Input:\n---\n%s\n---", minimalValidInput)
t.Logf("Got result: %#v", got) // Log what was returned if no error occurred
return
}
// Check if the error message contains the expected mismatch (adjust if needed)
// Example check: Look for "mismatched input '<EOF>' expecting" or similar
expectedErrorSubstring := "mismatched input '<EOF>' expecting ITEM_LINE_CONTENT" // Or the "missing" variant
if !strings.Contains(err.Error(), expectedErrorSubstring) {
t.Errorf("ParseChecklistContent() returned an unexpected error:")
t.Logf(" Input:\n---\n%s\n---", minimalValidInput)
t.Logf(" Got error: %v", err)
t.Logf(" Expected error containing: %q", expectedErrorSubstring)
} else {
t.Logf("ParseChecklistContent() correctly failed with expected EOF error: %v", err)
}
// We don't compare 'got' and 'want' here because the parse is expected to fail.
})
} // checklist.go
package checklist
import (
"fmt"
"regexp"
"strings"
"github.com/antlr4-go/antlr/v4"
// NOTE: Remove core/tool imports if isolating this package
// "github.com/aprice2704/neuroscript/pkg/core"
generated "minimal_antlr_bug/checklist/generated" // Adjust import path if needed
)
// --- Custom ANTLR Error Listener ---
type checklistErrorListener struct {
*antlr.DefaultErrorListener
Errors []string
SourceName string
}
func newChecklistErrorListener(sourceName string) *checklistErrorListener {
return &checklistErrorListener{Errors: make([]string, 0), SourceName: sourceName}
}
func (l *checklistErrorListener) SyntaxError(recognizer antlr.Recognizer, offendingSymbol interface{}, line, column int, msg string, e antlr.RecognitionException) {
errorMessage := fmt.Sprintf("%s:%d:%d: %s", l.SourceName, line, column, msg)
l.Errors = append(l.Errors, errorMessage)
}
// === PARSER LOGIC ===
type ChecklistItem struct{ Text, Status string }
// checklistListener
type checklistListener struct {
*generated.BaseNeuroDataChecklistListener
Items []ChecklistItem
err error
itemRegex *regexp.Regexp
parser *generated.NeuroDataChecklistParser
}
func newChecklistListener(parser *generated.NeuroDataChecklistParser) *checklistListener {
regex := regexp.MustCompile(`^\s*-\s*\[\s*([xX ])\s*\](?:\s*(.*))?\s*$`)
return &checklistListener{
Items: make([]ChecklistItem, 0),
itemRegex: regex,
parser: parser,
}
}
// EnterItemLine (Needed for v0.0.37 grammar)
func (l *checklistListener) EnterItemLine(ctx *generated.ItemLineContext) {
// This logic runs if parsing succeeds, extracts item data
fmt.Printf("[DEBUG CL Listener] === Method EnterItemLine called ===\n")
if l.err != nil {
return
}
itemToken := ctx.ITEM_LINE_CONTENT()
if itemToken == nil {
l.err = fmt.Errorf("internal listener error: ITEM_LINE_CONTENT token missing in itemLine context: %q", ctx.GetText())
fmt.Printf("[ERROR CL Listener] %v\n", l.err)
return
}
rawLineText := itemToken.GetText()
lineText := strings.TrimSuffix(rawLineText, "\n")
lineText = strings.TrimSuffix(lineText, "\r")
matches := l.itemRegex.FindStringSubmatch(lineText)
if len(matches) == 3 {
mark := matches[1]
text := strings.TrimSpace(matches[2])
status := "pending"
if strings.ToLower(mark) == "x" {
status = "done"
}
newItem := ChecklistItem{Text: text, Status: status}
l.Items = append(l.Items, newItem)
fmt.Printf("[DEBUG CL Listener] -> Parsed Item: %+v\n", newItem)
} else {
fmt.Printf("[WARN CL Listener] Regex failed for ITEM_LINE_CONTENT token text: %q\n", lineText)
}
}
// ParseChecklistContent (Entry point for parsing)
func ParseChecklistContent(content string) ([]map[string]interface{}, error) {
fmt.Println("\n--- [DEBUG CL Parser] Starting ParseChecklistContent ---")
inputStream := antlr.NewInputStream(content)
lexer := generated.NewNeuroDataChecklistLexer(inputStream)
lexerErrorListener := newChecklistErrorListener("lexer")
lexer.RemoveErrorListeners()
lexer.AddErrorListener(lexerErrorListener)
tokenStream := antlr.NewCommonTokenStream(lexer, antlr.TokenDefaultChannel)
tokenStream.Fill()
tokens := tokenStream.GetAllTokens()
fmt.Printf("--- [DEBUG CL Parser] Lexer Tokens (%d total) ---\n", len(tokens))
// (Token printing loop - same as before)
lexerSymbolicNames := lexer.GetSymbolicNames()
lexerLiteralNames := lexer.GetLiteralNames()
for i, token := range tokens {
tokenType := token.GetTokenType()
tokenName := fmt.Sprintf("Type(%d)", tokenType)
if tokenType == antlr.TokenEOF {
tokenName = "EOF"
} else if tokenType > 0 {
if tokenType < len(lexerSymbolicNames) && lexerSymbolicNames[tokenType] != "" {
tokenName = lexerSymbolicNames[tokenType]
} else if tokenType < len(lexerLiteralNames) && lexerLiteralNames[tokenType] != "" {
tokenName = lexerLiteralNames[tokenType]
}
}
fmt.Printf(" [%d] Type=%s (%d), Text=%q, Line=%d, Col=%d, Channel=%d\n", i, tokenName, tokenType, token.GetText(), token.GetLine(), token.GetColumn(), token.GetChannel())
}
fmt.Println("--- [DEBUG CL Parser] End Lexer Tokens ---")
tokenStream.Reset()
parser := generated.NewNeuroDataChecklistParser(tokenStream)
parserErrorListener := newChecklistErrorListener("parser")
parser.RemoveErrorListeners()
parser.AddErrorListener(parserErrorListener)
// *** PARSE EXECUTION ***
// Wrap call to ChecklistFile in a recover block if needed, but main error check is below
tree := parser.ChecklistFile() // Call start rule
allErrors := append(lexerErrorListener.Errors, parserErrorListener.Errors...)
if len(allErrors) > 0 {
fmt.Println("--- [DEBUG CL Parser] PARSE FAILED ---")
errorString := strings.Join(allErrors, "\n ")
// Return early on parse error
return nil, fmt.Errorf("syntax error(s) parsing checklist:\n %s", errorString)
}
// --- Parse Tree Walk (Only if parse succeeds) ---
fmt.Println("--- [DEBUG CL Parser] PARSE OK, walking tree... ---")
listener := newChecklistListener(parser)
antlr.ParseTreeWalkerDefault.Walk(listener, tree)
if listener.err != nil {
return nil, fmt.Errorf("error during parse tree walk: %w", listener.err)
}
// Convert listener's items to result map
resultMaps := make([]map[string]interface{}, len(listener.Items))
for i, item := range listener.Items {
resultMaps[i] = map[string]interface{}{"text": item.Text, "status": item.Status}
}
fmt.Printf("--- [DEBUG CL Parser] ParseChecklistContent finished. Found %d items.\n", len(resultMaps))
return resultMaps, nil
} // Grammar for parsing simple Markdown-style checklists.
// ULTRA-SIMPLIFIED FOR DEBUGGING: Only one itemLine followed by EOF.
// Version: 0.0.37
grammar NeuroDataChecklist;
// --- PARSER RULES ---
checklistFile : itemLine EOF ;
itemLine : ITEM_LINE_CONTENT NEWLINE+ ;
// --- LEXER RULES ---
WS : [ \t]+ -> channel(HIDDEN) ;
ITEM_LINE_CONTENT : '-' [ \t]* '[' [ \t]* [xX ] [ \t]* ']' ( [ \t]+ ~[\r\n]+ )? ;
NEWLINE : ( '\r'? '\n' | '\r' )+ ; HTH! Andy |
Please remove the token print out and reset code and test the parse. Tokenstream reset is deprecated. |
I see the same error with this code: // checklist.go
package checklist
import (
"fmt"
"regexp"
"strings"
"github.com/antlr4-go/antlr/v4"
// NOTE: Remove core/tool imports if isolating this package
// "github.com/aprice2704/neuroscript/pkg/core"
generated "minimal_antlr_bug/checklist/generated" // Adjust import path if needed
)
// --- Custom ANTLR Error Listener ---
type checklistErrorListener struct {
*antlr.DefaultErrorListener
Errors []string
SourceName string
}
func newChecklistErrorListener(sourceName string) *checklistErrorListener {
return &checklistErrorListener{Errors: make([]string, 0), SourceName: sourceName}
}
func (l *checklistErrorListener) SyntaxError(recognizer antlr.Recognizer, offendingSymbol interface{}, line, column int, msg string, e antlr.RecognitionException) {
errorMessage := fmt.Sprintf("%s:%d:%d: %s", l.SourceName, line, column, msg)
l.Errors = append(l.Errors, errorMessage)
}
// === PARSER LOGIC ===
type ChecklistItem struct{ Text, Status string }
// checklistListener
type checklistListener struct {
*generated.BaseNeuroDataChecklistListener
Items []ChecklistItem
err error
itemRegex *regexp.Regexp
parser *generated.NeuroDataChecklistParser
}
func newChecklistListener(parser *generated.NeuroDataChecklistParser) *checklistListener {
regex := regexp.MustCompile(`^\s*-\s*\[\s*([xX ])\s*\](?:\s*(.*))?\s*$`)
return &checklistListener{
Items: make([]ChecklistItem, 0),
itemRegex: regex,
parser: parser,
}
}
// EnterItemLine (Needed for v0.0.37 grammar)
func (l *checklistListener) EnterItemLine(ctx *generated.ItemLineContext) {
// This logic runs if parsing succeeds, extracts item data
fmt.Printf("[DEBUG CL Listener] === Method EnterItemLine called ===\n")
if l.err != nil {
return
}
itemToken := ctx.ITEM_LINE_CONTENT()
if itemToken == nil {
l.err = fmt.Errorf("internal listener error: ITEM_LINE_CONTENT token missing in itemLine context: %q", ctx.GetText())
fmt.Printf("[ERROR CL Listener] %v\n", l.err)
return
}
rawLineText := itemToken.GetText()
lineText := strings.TrimSuffix(rawLineText, "\n")
lineText = strings.TrimSuffix(lineText, "\r")
matches := l.itemRegex.FindStringSubmatch(lineText)
if len(matches) == 3 {
mark := matches[1]
text := strings.TrimSpace(matches[2])
status := "pending"
if strings.ToLower(mark) == "x" {
status = "done"
}
newItem := ChecklistItem{Text: text, Status: status}
l.Items = append(l.Items, newItem)
fmt.Printf("[DEBUG CL Listener] -> Parsed Item: %+v\n", newItem)
} else {
fmt.Printf("[WARN CL Listener] Regex failed for ITEM_LINE_CONTENT token text: %q\n", lineText)
}
}
// ParseChecklistContent (Entry point for parsing)
func ParseChecklistContent(content string) ([]map[string]interface{}, error) {
fmt.Println("\n--- [DEBUG CL Parser] Starting ParseChecklistContent ---")
inputStream := antlr.NewInputStream(content)
lexer := generated.NewNeuroDataChecklistLexer(inputStream)
lexerErrorListener := newChecklistErrorListener("lexer")
lexer.RemoveErrorListeners()
lexer.AddErrorListener(lexerErrorListener)
tokenStream := antlr.NewCommonTokenStream(lexer, antlr.TokenDefaultChannel)
// tokenStream.Fill() // <-- REMOVED/COMMENTED OUT: Not strictly necessary, parser pulls tokens as needed
// tokens := tokenStream.GetAllTokens() // <-- REMOVED/COMMENTED OUT
// fmt.Printf("--- [DEBUG CL Parser] Lexer Tokens (%d total) ---\n", len(tokens)) // <-- REMOVED/COMMENTED OUT
// // (Token printing loop - same as before)
// lexerSymbolicNames := lexer.GetSymbolicNames() // <-- REMOVED/COMMENTED OUT
// lexerLiteralNames := lexer.GetLiteralNames() // <-- REMOVED/COMMENTED OUT
// // --- START: Token Print Out ---
// for i, token := range tokens { // <-- ENTIRE LOOP REMOVED/COMMENTED OUT
// tokenType := token.GetTokenType()
// tokenName := fmt.Sprintf("Type(%d)", tokenType)
// if tokenType == antlr.TokenEOF {
// tokenName = "EOF"
// } else if tokenType > 0 {
// if tokenType < len(lexerSymbolicNames) && lexerSymbolicNames[tokenType] != "" {
// tokenName = lexerSymbolicNames[tokenType]
// } else if tokenType < len(lexerLiteralNames) && lexerLiteralNames[tokenType] != "" {
// tokenName = lexerLiteralNames[tokenType]
// }
// }
// fmt.Printf(" [%d] Type=%s (%d), Text=%q, Line=%d, Col=%d, Channel=%d\n", i, tokenName, tokenType, token.GetText(), token.GetLine(), token.GetColumn(), token.GetChannel())
// }
// fmt.Println("--- [DEBUG CL Parser] End Lexer Tokens ---") // <-- REMOVED/COMMENTED OUT
// // --- END: Token Print Out ---
// tokenStream.Reset() // <-- Reset Code REMOVED/COMMENTED OUT
// The parser is created directly with the token stream.
// It will pull tokens from the lexer via the stream as it parses.
parser := generated.NewNeuroDataChecklistParser(tokenStream)
parserErrorListener := newChecklistErrorListener("parser")
parser.RemoveErrorListeners()
parser.AddErrorListener(parserErrorListener)
// *** PARSE EXECUTION ***
// Wrap call to ChecklistFile in a recover block if needed, but main error check is below
fmt.Println("--- [DEBUG CL Parser] Calling parser.ChecklistFile() ---") // Added for clarity
tree := parser.ChecklistFile() // Call start rule
allErrors := append(lexerErrorListener.Errors, parserErrorListener.Errors...)
if len(allErrors) > 0 {
fmt.Println("--- [DEBUG CL Parser] PARSE FAILED ---")
errorString := strings.Join(allErrors, "\n ")
// Return early on parse error
return nil, fmt.Errorf("syntax error(s) parsing checklist:\n %s", errorString)
}
// --- Parse Tree Walk (Only if parse succeeds) ---
fmt.Println("--- [DEBUG CL Parser] PARSE OK, walking tree... ---")
listener := newChecklistListener(parser)
antlr.ParseTreeWalkerDefault.Walk(listener, tree)
if listener.err != nil {
return nil, fmt.Errorf("error during parse tree walk: %w", listener.err)
}
// Convert listener's items to result map
resultMaps := make([]map[string]interface{}, len(listener.Items))
for i, item := range listener.Items {
resultMaps[i] = map[string]interface{}{"text": item.Text, "status": item.Status}
}
fmt.Printf("--- [DEBUG CL Parser] ParseChecklistContent finished. Found %d items.\n", len(resultMaps))
return resultMaps, nil
} |
The jar you use and the runtime you use, need to be the same version, but whatever is happening here, it isn’t ANTLR or the Go runtime. Maybe dump the contents of the |
I rewrote the generated app from trgen to be more like your program, simplifying your code even more. It still works. Here's a zip with the program. Adjust build.sh to what you need. other.zip I adjusted the build.sh script to use the 4.13.2 tool with the 4.13.1 runtime, and it still parses just fine. |
(composed for me by gemini 4.5 pro experimental),hope it is ok. Antlr has helped us out a couple of times, but we can't get past this :(
Title: Go Target: Parser Fails at EOF Expecting Different Token (Minimal Example)
Environment:
antlr4-4.13.2-complete.jar
)github.com/antlr4-go/antlr/v4
(Specify version if known, otherwise state 'latest' or 'standard import')Description:
We are encountering a persistent parsing error with the ANTLR Go target where the parser fails upon reaching the
EOF
token, incorrectly expecting a different token based on the preceding rule structure. This occurs even with a highly simplified grammar designed to parse just one line followed by EOF.Minimal Reproducible Example:
Grammar (
NeuroDataChecklist.g4
v0.0.37):Go Test Code (
checklist_test.go
):Go Parser Invocation (
checklist.go
- relevant function):(Note: Listener details omitted as the error occurs during parsing, before the listener walk)
Test Output (
go test -v .
):Observed Behavior:
The lexer correctly tokenizes the input string
- [ ] Task 1\n
intoITEM_LINE_CONTENT
,NEWLINE
,EOF
.The parser rule
checklistFile : itemLine EOF ;
successfully consumes theitemLine
(ITEM_LINE_CONTENT NEWLINE+
).However, when the parser then encounters the
EOF
token (at Line 2, Col 0), it reports an error:mismatched input '<EOF>' expecting ITEM_LINE_CONTENT
.Expected Behavior:
According to the grammar rule
checklistFile : itemLine EOF ;
, after successfully parsingitemLine
, the parser should expect and successfully match theEOF
token, resulting in a successful parse.Debugging Steps Attempted:
We arrived at this minimal example after trying several variations on a more complex grammar intended to parse multiple checklist items:
itemLine+ EOF
itemLine (itemLine)* EOF
ENDOFLIST
token beforeEOF
(... ENDOFLIST EOF
)NEWLINE
tokens aroundENDOFLIST
(... NEWLINE* ENDOFLIST NEWLINE* EOF
)All variations exhibited the same fundamental failure mode: the parser correctly consumed the structure before the
EOF
(or the element immediately precedingEOF
) but then failed upon seeingEOF
, incorrectly expectingITEM_LINE_CONTENT
instead.This suggests a potential issue in the Go target's prediction or state handling when approaching the end of the input stream for these grammar structures.
The text was updated successfully, but these errors were encountered: