Skip to content

suggested feature: super-linear regex detection #256

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
davisjam opened this issue Apr 12, 2018 · 4 comments
Closed

suggested feature: super-linear regex detection #256

davisjam opened this issue Apr 12, 2018 · 4 comments

Comments

@davisjam
Copy link

See my vuln-regex-detector project.

I also created an npm module that queries a service I'm hosting. See discussion of behavior here.

For a lighter-weight approach there's safe-regex but it misses 90% of vulnerable regexes and has about a 90% false positive rate. I'm working on improvements.

@gskinner
Copy link
Owner

gskinner commented Jun 5, 2018

RegExr currently runs all matches asynchronously, and will display an error if a match takes too long. I know that's not exactly what you're suggesting, but it serves a similar purpose, and is much more straightforward for us than relying on a third party service or hosting a computationally expensive module ourselves.

I'm open to a discussion of the value for this if you think I'm missing something.

@gskinner gskinner closed this as completed Jun 5, 2018
@davisjam
Copy link
Author

davisjam commented Jun 5, 2018

RegExr will only detect problematic regexes if long-running input is supplied. But if I am testing a super-linear regex on non-triggering input then I won't realize it. Since I think RegExr is a widely-used regex service, I think identifying super-linear regexes would be a helpful enhancement.

@gskinner
Copy link
Owner

gskinner commented Jun 5, 2018

Can you go into more details on this:

  1. how would it be hosted? On our server, or yours? What if your server goes down? Can it handle potentially tens of thousands of tests a day?
  2. how do you see it being integrated into the UI? Would it be a Tool I need to specifically run, or would it run on edit like everything else (obviously the latter fits our model better, but the former is much less resource intensive).

@davisjam
Copy link
Author

davisjam commented Jun 5, 2018

  1. Hosting: I have open-sourced the code necessary to answer these queries. The service I am currently hosting (basically a DB so you don't need to recompute previously-computed results) would not scale to 10Ks of queries, since it is just a desktop in our lab. But it could, for example, be containerized for scaling if some generous sponsor were willing to provide hosting. The queries are independent and the results just get merged into a DB so this wouldn't be too hard.

  2. I would suggest a Tool that will do best-effort responses during editing, but which can answer queries on-demand on a button press. Since the vulnerability of a regex doesn't change over time, once a regex has been checked (expensive) the result can be saved for subsequent lookup (cheap), so during editing we can flag already-known vulnerable regexes.

Since testing for vulnerability can be expensive, doing on-demand lookups for never-before-seen regexes could be a bottleneck and I wouldn't recommend doing so without an explicit request. That way never-before-seen regexes can be tested in the background and used the next time the regex is seen. This would be beneficial provided that RegExr is used widely enough that you see the same regex more than once -- I imagine this is the case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants