Skip to content

Support option to limit RE2 program size #1142

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
3 tasks done
sergiitk opened this issue Mar 12, 2025 · 0 comments
Open
3 tasks done

Support option to limit RE2 program size #1142

sergiitk opened this issue Mar 12, 2025 · 0 comments

Comments

@sergiitk
Copy link
Member

sergiitk commented Mar 12, 2025

Feature request checklist

  • There are no issues that match the desired change
  • The change is large enough it can't be addressed with a simple Pull Request
  • If this is a bug, please file a Bug Report.

Change

To make CEL environment setup consistent across CEL implementations, I propose to add a program option to set maximum RE2 program. This option should work similar to CelOptions#maxRegexProgramSize(int) in CEL-Java and InterpreterOptions.regex_max_program_size in CEL-Cpp (see nuances below).

Implementation details

  1. To support this option, cel-go will have to switch its regexp implementation from the regex package to the regexp/syntax (both in standard go library). In latter, the instruction count is available as the length of regexp/syntax#Prog.Inst.
  2. RE2 program size should be verified when the CEL program is created from the AST (AFAIK this is how cel-cpp works). If this is not possible, java-style implementation where the error is raised during program execution is acceptable as well.

Nuances

The program size represents a very approximate measure of a regexp's "cost". There are no guarantees on the implementation details or claims about the properties of the program size (except "larger numbers are more expensive than smaller numbers").

Currently the program size is the same as the number of instructions of the regex program. However, the number of instructions in a regex depends on the concrete RE2 implementation. The implication of using the number of instructions as the program size is that:

Important

There's no guarantee that RE2 program size has the exact same value in CPP, Go and Java.
We should communicate this in the docs.

For example:

["", "a", "^", "^$", "a+b", "a+b?", "(a+b)", "a+b.*", "(a+b?)"]  # pattern
[4,   5,   2,   2,     7,     8,       9,      15,       10]     # program size: cpp re2-2024-07-02
[3,   3,   3,   4,     5,     6,       7,       7,        8]     # program size: java re2j v1.8 
[3,   3,   3,   4,     5,     6,       7,       7,        8]     # program size: go v1.24.1
# (go v1.24.1 is identical to re2j v1.8)

Example

// Actual option names up to the implementer.
prg, err := env.Program(ast, cel.MaxRegexProgramSize(100))

Related

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant