`--allowHashUrls` option silently does nothing #790

Mr0grog · 2025-03-08T04:50:52Z

The crawler has a documented --allowHashUrls option, but it doesn’t appear to do anything. Searching the codebase, I can’t find any references to it except for the argument parser, so it doesn’t seem to actually be used.

I had expected this to allow a seed I’d listed with a hash URL to get captured. For example, using the following config:

scopeType: page
allowHashUrls: true

seeds:
  - 'https://www.eia.gov/naturalgas/ngqs/#?report=RP9&year1=2017&year2=2017&company=Name'

Is this something that’s just not hooked up, or maybe a vestigial feature that was supposed to be removed?

The workarounds I’m currently trying are:

scopeType: page
include: ['.*']

or:

scopeType: custom
depth: 0
include: ['.*']

or:

scopeType: custom
depth: 0

seeds:
  - url: 'https://www.eia.gov/naturalgas/ngqs/#?report=RP9&year1=2017&year2=2017&company=Name'
    allowHash: true

(Side note: I’d hoped I could use allowHash on the seed with scopeType: page at the top level, but it looks like that scope type always prevents allowHash from being configured, which seems less than ideal.)

The text was updated successfully, but these errors were encountered:

github-project-automation bot added this to Webrecorder Projects Mar 8, 2025

github-project-automation bot moved this to Triage in Webrecorder Projects Mar 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`--allowHashUrls` option silently does nothing #790

`--allowHashUrls` option silently does nothing #790

Mr0grog commented Mar 8, 2025

--allowHashUrls option silently does nothing #790

--allowHashUrls option silently does nothing #790

Comments

Mr0grog commented Mar 8, 2025

`--allowHashUrls` option silently does nothing #790

`--allowHashUrls` option silently does nothing #790