404 response with empty body causes crawler to think page crashed and not record response in WARC #789

Mr0grog · 2025-03-08T02:45:50Z

When trying to archive a URL that returns a 404 status code and an empty response body, the crawler logs that the page crashed, retries a few times, and then never records the request and response in the WARC, despite the fact that it is a correct, complete, successful HTTP response. Skimming the code, I suspect this might be the case for any non-2xx response, since that causes the direct fetch to fail. But there are obviously also issues in the part of this that is automating the browser, too.

Here’s an example URL with this behavior: https://www.whitehouse.gov/wp-content/uploads/2023/01/01-2023-Framework-for-Federal-Scientific-Integrity-Policy-and-Practice.pdf

To reproduce:

Using the webrecorder/browsertrix-crawler:1.5.8 Docker image and the following config:

# test.crawl.yaml
scopeType: page
rolloverSize: 8000000000
workers: 1
saveStateHistory: 1

seeds:
  - 'https://www.whitehouse.gov/wp-content/uploads/2023/01/01-2023-Framework-for-Federal-Scientific-Integrity-Policy-and-Practice.pdf'

And the following command:

docker run \
    --rm \
    --attach stdout --attach stderr \
    --volume "./test.crawl.yaml:/app/config.yaml" \
    --volume "./crawls:/crawls/" \
    webrecorder/browsertrix-crawler:1.5.8 \
    crawl \
    --config /app/config.yaml \
    --collection "test--20250307182348" \
    --saveState always \
    --logging debug,stats

Logs warnings and errors like:

{"context":"recorder","message":"Skipping URL from unknown frame","details":{"url":"https://www.whitehouse.gov/wp-content/uploads/2023/01/01-2023-Framework-for-Federal-Scientific-Integrity-Policy-and-Practice.pdf","frameId":"3FB853DBA7F1F9F08A07C0C540C98813"}}
{"context":"recorder","message":"Request failed","details":{"url":"https://www.whitehouse.gov/wp-content/uploads/2023/01/01-2023-Framework-for-Federal-Scientific-Integrity-Policy-and-Practice.pdf","errorText":"net::ERR_HTTP_RESPONSE_CODE_FAILURE","type":"Document","status":404,"page":"https://www.whitehouse.gov/wp-content/uploads/2023/01/01-2023-Framework-for-Federal-Scientific-Integrity-Policy-and-Practice.pdf","workerid":0}}
"context":"pageStatus","message":"Page Crashed on Load: will retry","details":{"retry":0,"retries":2,"url":"https://www.whitehouse.gov/wp-content/uploads/2023/01/01-2023-Framework-for-Federal-Scientific-Integrity-Policy-and-Practice.pdf","status":404,"page":"https://www.whitehouse.gov/wp-content/uploads/2023/01/01-2023-Framework-for-Federal-Scientific-Integrity-Policy-and-Practice.pdf","workerid":0}}

...Repeat a few times...

{"context":"pageStatus","message":"Page Crashed on Load: retry limit reached","details":{"retry":2,"retries":2,"url":"https://www.whitehouse.gov/wp-content/uploads/2023/01/01-2023-Framework-for-Federal-Scientific-Integrity-Policy-and-Practice.pdf","status":404,"page":"https://www.whitehouse.gov/wp-content/uploads/2023/01/01-2023-Framework-for-Federal-Scientific-Integrity-Policy-and-Practice.pdf","workerid":0}}

Complete Log Output

{"timestamp":"2025-03-08T02:23:49.505Z","logLevel":"info","context":"general","message":"Browsertrix-Crawler 1.5.8 (with warcio.js 2.4.3)","details":{}}
{"timestamp":"2025-03-08T02:23:49.505Z","logLevel":"info","context":"general","message":"Seeds","details":[{"url":"https://www.whitehouse.gov/wp-content/uploads/2023/01/01-2023-Framework-for-Federal-Scientific-Integrity-Policy-and-Practice.pdf","scopeType":"page","include":[],"exclude":[],"allowHash":false,"depth":-1,"sitemap":null,"auth":null,"_authEncoded":null,"maxExtraHops":0,"maxDepth":0}]}
{"timestamp":"2025-03-08T02:23:49.505Z","logLevel":"info","context":"general","message":"Link Selectors","details":[{"selector":"a[href]","extract":"href","isAttribute":false}]}
{"timestamp":"2025-03-08T02:23:49.505Z","logLevel":"info","context":"general","message":"Behavior Options","details":{"message":"{\"autoplay\":true,\"autofetch\":true,\"autoscroll\":true,\"siteSpecific\":true,\"log\":\"__bx_log\",\"startEarly\":true,\"clickSelector\":\"a\"}"}}
{"timestamp":"2025-03-08T02:23:49.544Z","logLevel":"debug","context":"state","message":"Storing state via Redis redis://localhost:6379/0 @ key prefix \"4c25568b7dc8\"","details":{}}
{"timestamp":"2025-03-08T02:23:49.544Z","logLevel":"debug","context":"state","message":"Max Page Time: 190 seconds","details":{}}
{"timestamp":"2025-03-08T02:23:49.545Z","logLevel":"debug","context":"state","message":"Saving crawl state every 300 seconds, keeping last 1 states","details":{}}
{"timestamp":"2025-03-08T02:23:49.551Z","logLevel":"debug","context":"general","message":"Text Extraction: None","details":{}}
{"timestamp":"2025-03-08T02:23:49.552Z","logLevel":"debug","context":"general","message":"Text Extraction: None","details":{}}
{"timestamp":"2025-03-08T02:23:49.564Z","logLevel":"debug","context":"links","message":"Queued new page url","details":{"url":"https://www.whitehouse.gov/wp-content/uploads/2023/01/01-2023-Framework-for-Federal-Scientific-Integrity-Policy-and-Practice.pdf"}}
{"timestamp":"2025-03-08T02:23:49.784Z","logLevel":"info","context":"worker","message":"Creating 1 workers","details":{}}
{"timestamp":"2025-03-08T02:23:49.784Z","logLevel":"info","context":"worker","message":"Worker starting","details":{"workerid":0}}
{"timestamp":"2025-03-08T02:23:49.787Z","logLevel":"debug","context":"worker","message":"Getting page in new window","details":{"workerid":0}}
{"timestamp":"2025-03-08T02:23:49.848Z","logLevel":"debug","context":"browser","message":"Service Workers: always disabled","details":{}}
{"timestamp":"2025-03-08T02:23:49.855Z","logLevel":"info","context":"worker","message":"Starting page","details":{"workerid":0,"page":"https://www.whitehouse.gov/wp-content/uploads/2023/01/01-2023-Framework-for-Federal-Scientific-Integrity-Policy-and-Practice.pdf"}}
{"timestamp":"2025-03-08T02:23:49.856Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":0,"total":1,"pending":1,"failed":0,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2025-03-08T02:23:49.786Z\",\"extraHops\":0,\"url\":\"https:\\/\\/www.whitehouse.gov\\/wp-content\\/uploads\\/2023\\/01\\/01-2023-Framework-for-Federal-Scientific-Integrity-Policy-and-Practice.pdf\",\"added\":\"2025-03-08T02:23:49.563Z\",\"depth\":0}"]}}
{"timestamp":"2025-03-08T02:23:49.856Z","logLevel":"debug","context":"memoryStatus","message":"Memory","details":{"maxHeapUsed":41141704,"maxHeapTotal":72593408,"rss":129142784,"heapTotal":72593408,"heapUsed":41141704,"external":5673430,"arrayBuffers":571525}}
{"timestamp":"2025-03-08T02:23:49.867Z","logLevel":"debug","context":"recorder","message":"Async started: fetch","details":{"url":"https://www.whitehouse.gov/wp-content/uploads/2023/01/01-2023-Framework-for-Federal-Scientific-Integrity-Policy-and-Practice.pdf"}}
{"timestamp":"2025-03-08T02:23:50.088Z","logLevel":"debug","context":"fetch","message":"Direct fetch response not accepted, continuing with browser fetch","details":{"page":"https://www.whitehouse.gov/wp-content/uploads/2023/01/01-2023-Framework-for-Federal-Scientific-Integrity-Policy-and-Practice.pdf","workerid":0}}
{"timestamp":"2025-03-08T02:23:50.088Z","logLevel":"info","context":"general","message":"Awaiting page load","details":{"page":"https://www.whitehouse.gov/wp-content/uploads/2023/01/01-2023-Framework-for-Federal-Scientific-Integrity-Policy-and-Practice.pdf","workerid":0}}
{"timestamp":"2025-03-08T02:23:50.257Z","logLevel":"warn","context":"recorder","message":"Skipping URL from unknown frame","details":{"url":"https://www.whitehouse.gov/wp-content/uploads/2023/01/01-2023-Framework-for-Federal-Scientific-Integrity-Policy-and-Practice.pdf","frameId":"3FB853DBA7F1F9F08A07C0C540C98813"}}
{"timestamp":"2025-03-08T02:23:50.259Z","logLevel":"debug","context":"general","message":"Setting page timestamp","details":{"ts":"2025-03-08T02:23:50.091Z","url":"https://www.whitehouse.gov/wp-content/uploads/2023/01/01-2023-Framework-for-Federal-Scientific-Integrity-Policy-and-Practice.pdf","status":404}}
{"timestamp":"2025-03-08T02:23:50.262Z","logLevel":"warn","context":"recorder","message":"Request failed","details":{"url":"https://www.whitehouse.gov/wp-content/uploads/2023/01/01-2023-Framework-for-Federal-Scientific-Integrity-Policy-and-Practice.pdf","errorText":"net::ERR_HTTP_RESPONSE_CODE_FAILURE","type":"Document","status":404,"page":"https://www.whitehouse.gov/wp-content/uploads/2023/01/01-2023-Framework-for-Federal-Scientific-Integrity-Policy-and-Practice.pdf","workerid":0}}
{"timestamp":"2025-03-08T02:23:50.293Z","logLevel":"debug","context":"behaviorScript","message":"Using AutoFetcher","details":{"page":"chrome-error://chromewebdata/","workerid":0}}
{"timestamp":"2025-03-08T02:23:50.294Z","logLevel":"debug","context":"behaviorScript","message":"Using Autoplay","details":{"page":"chrome-error://chromewebdata/","workerid":0}}
{"timestamp":"2025-03-08T02:23:50.294Z","logLevel":"debug","context":"behaviorScript","message":"Using Autoscroll","details":{"page":"chrome-error://chromewebdata/","workerid":0}}
{"timestamp":"2025-03-08T02:23:50.861Z","logLevel":"warn","context":"pageStatus","message":"Page Crashed on Load: will retry","details":{"retry":0,"retries":2,"url":"https://www.whitehouse.gov/wp-content/uploads/2023/01/01-2023-Framework-for-Federal-Scientific-Integrity-Policy-and-Practice.pdf","status":404,"page":"https://www.whitehouse.gov/wp-content/uploads/2023/01/01-2023-Framework-for-Federal-Scientific-Integrity-Policy-and-Practice.pdf","workerid":0}}
{"timestamp":"2025-03-08T02:23:50.862Z","logLevel":"debug","context":"worker","message":"Closing page","details":{"crashed":false,"workerid":0}}
{"timestamp":"2025-03-08T02:23:50.894Z","logLevel":"debug","context":"recorder","message":"WARC Record Written","details":{"type":"pageinfo","url":"https://www.whitehouse.gov/wp-content/uploads/2023/01/01-2023-Framework-for-Federal-Scientific-Integrity-Policy-and-Practice.pdf"}}
{"timestamp":"2025-03-08T02:23:50.897Z","logLevel":"debug","context":"worker","message":"Getting page in new window","details":{"workerid":0}}
{"timestamp":"2025-03-08T02:23:50.967Z","logLevel":"debug","context":"browser","message":"Service Workers: always disabled","details":{}}
{"timestamp":"2025-03-08T02:23:50.980Z","logLevel":"info","context":"worker","message":"Starting page","details":{"workerid":0,"page":"https://www.whitehouse.gov/wp-content/uploads/2023/01/01-2023-Framework-for-Federal-Scientific-Integrity-Policy-and-Practice.pdf"}}
{"timestamp":"2025-03-08T02:23:50.980Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":0,"total":1,"pending":1,"failed":0,"limit":{"max":0,"hit":false},"pendingPages":["{\"url\":\"https:\\/\\/www.whitehouse.gov\\/wp-content\\/uploads\\/2023\\/01\\/01-2023-Framework-for-Federal-Scientific-Integrity-Policy-and-Practice.pdf\",\"seedId\":0,\"started\":\"2025-03-08T02:23:50.896Z\",\"extraHops\":0,\"depth\":0,\"added\":\"2025-03-08T02:23:49.563Z\",\"retry\":1}"]}}
{"timestamp":"2025-03-08T02:23:50.980Z","logLevel":"debug","context":"memoryStatus","message":"Memory","details":{"maxHeapUsed":46035520,"maxHeapTotal":72855552,"rss":150286336,"heapTotal":72855552,"heapUsed":46035520,"external":7002810,"arrayBuffers":846374}}
{"timestamp":"2025-03-08T02:23:50.981Z","logLevel":"debug","context":"recorder","message":"Async started: fetch","details":{"url":"https://www.whitehouse.gov/wp-content/uploads/2023/01/01-2023-Framework-for-Federal-Scientific-Integrity-Policy-and-Practice.pdf"}}
{"timestamp":"2025-03-08T02:23:51.020Z","logLevel":"debug","context":"fetch","message":"Direct fetch response not accepted, continuing with browser fetch","details":{"page":"https://www.whitehouse.gov/wp-content/uploads/2023/01/01-2023-Framework-for-Federal-Scientific-Integrity-Policy-and-Practice.pdf","workerid":0}}
{"timestamp":"2025-03-08T02:23:51.020Z","logLevel":"info","context":"general","message":"Awaiting page load","details":{"page":"https://www.whitehouse.gov/wp-content/uploads/2023/01/01-2023-Framework-for-Federal-Scientific-Integrity-Policy-and-Practice.pdf","workerid":0}}
{"timestamp":"2025-03-08T02:23:51.092Z","logLevel":"warn","context":"recorder","message":"Skipping URL from unknown frame","details":{"url":"https://www.whitehouse.gov/wp-content/uploads/2023/01/01-2023-Framework-for-Federal-Scientific-Integrity-Policy-and-Practice.pdf","frameId":"4B9D860A4321D84512E77C75AAE9A47F"}}
{"timestamp":"2025-03-08T02:23:51.092Z","logLevel":"debug","context":"general","message":"Setting page timestamp","details":{"ts":"2025-03-08T02:23:51.022Z","url":"https://www.whitehouse.gov/wp-content/uploads/2023/01/01-2023-Framework-for-Federal-Scientific-Integrity-Policy-and-Practice.pdf","status":404}}
{"timestamp":"2025-03-08T02:23:51.095Z","logLevel":"warn","context":"recorder","message":"Request failed","details":{"url":"https://www.whitehouse.gov/wp-content/uploads/2023/01/01-2023-Framework-for-Federal-Scientific-Integrity-Policy-and-Practice.pdf","errorText":"net::ERR_HTTP_RESPONSE_CODE_FAILURE","type":"Document","status":404,"page":"https://www.whitehouse.gov/wp-content/uploads/2023/01/01-2023-Framework-for-Federal-Scientific-Integrity-Policy-and-Practice.pdf","workerid":0}}
{"timestamp":"2025-03-08T02:23:51.125Z","logLevel":"debug","context":"behaviorScript","message":"Using AutoFetcher","details":{"page":"chrome-error://chromewebdata/","workerid":0}}
{"timestamp":"2025-03-08T02:23:51.126Z","logLevel":"debug","context":"behaviorScript","message":"Using Autoplay","details":{"page":"chrome-error://chromewebdata/","workerid":0}}
{"timestamp":"2025-03-08T02:23:51.126Z","logLevel":"debug","context":"behaviorScript","message":"Using Autoscroll","details":{"page":"chrome-error://chromewebdata/","workerid":0}}
{"timestamp":"2025-03-08T02:23:51.680Z","logLevel":"warn","context":"pageStatus","message":"Page Crashed on Load: will retry","details":{"retry":1,"retries":2,"url":"https://www.whitehouse.gov/wp-content/uploads/2023/01/01-2023-Framework-for-Federal-Scientific-Integrity-Policy-and-Practice.pdf","status":404,"page":"https://www.whitehouse.gov/wp-content/uploads/2023/01/01-2023-Framework-for-Federal-Scientific-Integrity-Policy-and-Practice.pdf","workerid":0}}
{"timestamp":"2025-03-08T02:23:51.681Z","logLevel":"debug","context":"worker","message":"Closing page","details":{"crashed":false,"workerid":0}}
{"timestamp":"2025-03-08T02:23:51.699Z","logLevel":"debug","context":"recorder","message":"WARC Record Written","details":{"type":"pageinfo","url":"https://www.whitehouse.gov/wp-content/uploads/2023/01/01-2023-Framework-for-Federal-Scientific-Integrity-Policy-and-Practice.pdf"}}
{"timestamp":"2025-03-08T02:23:51.702Z","logLevel":"debug","context":"worker","message":"Getting page in new window","details":{"workerid":0}}
{"timestamp":"2025-03-08T02:23:51.782Z","logLevel":"debug","context":"browser","message":"Service Workers: always disabled","details":{}}
{"timestamp":"2025-03-08T02:23:51.789Z","logLevel":"info","context":"worker","message":"Starting page","details":{"workerid":0,"page":"https://www.whitehouse.gov/wp-content/uploads/2023/01/01-2023-Framework-for-Federal-Scientific-Integrity-Policy-and-Practice.pdf"}}
{"timestamp":"2025-03-08T02:23:51.790Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":0,"total":1,"pending":1,"failed":0,"limit":{"max":0,"hit":false},"pendingPages":["{\"extraHops\":0,\"seedId\":0,\"started\":\"2025-03-08T02:23:51.701Z\",\"url\":\"https:\\/\\/www.whitehouse.gov\\/wp-content\\/uploads\\/2023\\/01\\/01-2023-Framework-for-Federal-Scientific-Integrity-Policy-and-Practice.pdf\",\"depth\":0,\"added\":\"2025-03-08T02:23:49.563Z\",\"retry\":2}"]}}
{"timestamp":"2025-03-08T02:23:51.790Z","logLevel":"debug","context":"memoryStatus","message":"Memory","details":{"maxHeapUsed":48391960,"maxHeapTotal":72855552,"rss":152907776,"heapTotal":72855552,"heapUsed":48391960,"external":7452239,"arrayBuffers":1033659}}
{"timestamp":"2025-03-08T02:23:51.791Z","logLevel":"debug","context":"recorder","message":"Async started: fetch","details":{"url":"https://www.whitehouse.gov/wp-content/uploads/2023/01/01-2023-Framework-for-Federal-Scientific-Integrity-Policy-and-Practice.pdf"}}
{"timestamp":"2025-03-08T02:23:51.828Z","logLevel":"debug","context":"fetch","message":"Direct fetch response not accepted, continuing with browser fetch","details":{"page":"https://www.whitehouse.gov/wp-content/uploads/2023/01/01-2023-Framework-for-Federal-Scientific-Integrity-Policy-and-Practice.pdf","workerid":0}}
{"timestamp":"2025-03-08T02:23:51.828Z","logLevel":"info","context":"general","message":"Awaiting page load","details":{"page":"https://www.whitehouse.gov/wp-content/uploads/2023/01/01-2023-Framework-for-Federal-Scientific-Integrity-Policy-and-Practice.pdf","workerid":0}}
{"timestamp":"2025-03-08T02:23:51.868Z","logLevel":"warn","context":"recorder","message":"Skipping URL from unknown frame","details":{"url":"https://www.whitehouse.gov/wp-content/uploads/2023/01/01-2023-Framework-for-Federal-Scientific-Integrity-Policy-and-Practice.pdf","frameId":"98B10A39F38EEE8BDCCE7A493F1488C8"}}
{"timestamp":"2025-03-08T02:23:51.869Z","logLevel":"debug","context":"general","message":"Setting page timestamp","details":{"ts":"2025-03-08T02:23:51.830Z","url":"https://www.whitehouse.gov/wp-content/uploads/2023/01/01-2023-Framework-for-Federal-Scientific-Integrity-Policy-and-Practice.pdf","status":404}}
{"timestamp":"2025-03-08T02:23:51.871Z","logLevel":"warn","context":"recorder","message":"Request failed","details":{"url":"https://www.whitehouse.gov/wp-content/uploads/2023/01/01-2023-Framework-for-Federal-Scientific-Integrity-Policy-and-Practice.pdf","errorText":"net::ERR_HTTP_RESPONSE_CODE_FAILURE","type":"Document","status":404,"page":"https://www.whitehouse.gov/wp-content/uploads/2023/01/01-2023-Framework-for-Federal-Scientific-Integrity-Policy-and-Practice.pdf","workerid":0}}
{"timestamp":"2025-03-08T02:23:51.899Z","logLevel":"debug","context":"behaviorScript","message":"Using AutoFetcher","details":{"page":"chrome-error://chromewebdata/","workerid":0}}
{"timestamp":"2025-03-08T02:23:51.900Z","logLevel":"debug","context":"behaviorScript","message":"Using Autoplay","details":{"page":"chrome-error://chromewebdata/","workerid":0}}
{"timestamp":"2025-03-08T02:23:51.900Z","logLevel":"debug","context":"behaviorScript","message":"Using Autoscroll","details":{"page":"chrome-error://chromewebdata/","workerid":0}}
{"timestamp":"2025-03-08T02:23:52.458Z","logLevel":"error","context":"pageStatus","message":"Page Crashed on Load: retry limit reached","details":{"retry":2,"retries":2,"url":"https://www.whitehouse.gov/wp-content/uploads/2023/01/01-2023-Framework-for-Federal-Scientific-Integrity-Policy-and-Practice.pdf","status":404,"page":"https://www.whitehouse.gov/wp-content/uploads/2023/01/01-2023-Framework-for-Federal-Scientific-Integrity-Policy-and-Practice.pdf","workerid":0}}
{"timestamp":"2025-03-08T02:23:52.458Z","logLevel":"debug","context":"worker","message":"Closing page","details":{"crashed":false,"workerid":0}}
{"timestamp":"2025-03-08T02:23:52.471Z","logLevel":"debug","context":"recorder","message":"WARC Record Written","details":{"type":"pageinfo","url":"https://www.whitehouse.gov/wp-content/uploads/2023/01/01-2023-Framework-for-Federal-Scientific-Integrity-Policy-and-Practice.pdf"}}
{"timestamp":"2025-03-08T02:23:52.475Z","logLevel":"info","context":"general","message":"Saving crawl state to: /crawls/collections/test--20250307182348/crawls/crawl-20250308022352-4c25568b7dc8.yaml","details":{}}
{"timestamp":"2025-03-08T02:23:52.485Z","logLevel":"info","context":"worker","message":"Worker done, all tasks complete","details":{"workerid":0}}
{"timestamp":"2025-03-08T02:23:52.486Z","logLevel":"debug","context":"recorder","message":"Finishing Fetcher Queue","details":{"page":"https://www.whitehouse.gov/wp-content/uploads/2023/01/01-2023-Framework-for-Federal-Scientific-Integrity-Policy-and-Practice.pdf","workerid":0}}
{"timestamp":"2025-03-08T02:23:52.486Z","logLevel":"debug","context":"recorder","message":"Finishing WARC writing","details":{"page":"https://www.whitehouse.gov/wp-content/uploads/2023/01/01-2023-Framework-for-Federal-Scientific-Integrity-Policy-and-Practice.pdf","workerid":0}}
{"timestamp":"2025-03-08T02:23:52.534Z","logLevel":"info","context":"general","message":"Saving crawl state to: /crawls/collections/test--20250307182348/crawls/crawl-20250308022352-4c25568b7dc8.yaml","details":{}}
{"timestamp":"2025-03-08T02:23:52.535Z","logLevel":"info","context":"general","message":"Removing old save-state: /crawls/collections/test--20250307182348/crawls/crawl-20250308022352-4c25568b7dc8.yaml","details":{}}
{"timestamp":"2025-03-08T02:23:52.536Z","logLevel":"info","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":0,"total":1,"pending":0,"failed":1,"limit":{"max":0,"hit":false},"pendingPages":[]}}
{"timestamp":"2025-03-08T02:23:52.536Z","logLevel":"debug","context":"memoryStatus","message":"Memory","details":{"maxHeapUsed":50368720,"maxHeapTotal":73117696,"rss":154611712,"heapTotal":73117696,"heapUsed":50368720,"external":7833883,"arrayBuffers":1153159}}
{"timestamp":"2025-03-08T02:23:52.537Z","logLevel":"info","context":"general","message":"Crawling done","details":{}}
{"timestamp":"2025-03-08T02:23:52.542Z","logLevel":"info","context":"general","message":"Exiting, Crawl status: done","details":{}}

The text was updated successfully, but these errors were encountered:

ikreymer · 2025-04-01T04:41:36Z

This is an interesting edge-case, I think the browser considers this a crash, as it shows the chrome error page here, since it generates to content and can't be loaded.
It's possible to detect and write to WARC, though. Pehaps also shouldn't retry? I guess that's probably better than current behavior

- chrome returns net::ERR_HTTP_RESPONSE_CODE_FAILURE - store WARC record with empty response - don't retry page, save with loadState: 1 - fixes #789

Mr0grog · 2025-04-02T21:06:38Z

I think the browser considers this a crash, as it shows the chrome error page here

Oh interesting, I tried it in Safari and Firefox, which just show a blank screen and no error, but did not try Chrome. I wonder if it would make sense to handle net::ERR_HTTP_RESPONSE_CODE_FAILURE specially.

Taking a quick look at the Chromium source, it looks like it intentionally bails out and declares this error code if there is no response body on a non-2xx response (see the corresponding header file for a comment explaining a bit more). Later on it uses that signal to render a custom page instead of a blank screen like other browsers. So FWIW, I don’t think Chromium is really considering this a crash so much as it’s taking a kind of roundabout way to render a nice error message for users (much nicer than the blank screen!) that happens to have weird results for CDP/Puppeteer consumers.

It's possible to detect and write to WARC, though. Pehaps also shouldn't retry?

Yes please to both of these.

github-project-automation bot added this to Webrecorder Projects Mar 8, 2025

github-project-automation bot moved this to Triage in Webrecorder Projects Mar 8, 2025

ikreymer added a commit that referenced this issue Apr 1, 2025

handle 404 / other error code with no response:

4522f42

- chrome returns net::ERR_HTTP_RESPONSE_CODE_FAILURE - store WARC record with empty response - don't retry page, save with loadState: 1 - fixes #789

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

404 response with empty body causes crawler to think page crashed and not record response in WARC #789

404 response with empty body causes crawler to think page crashed and not record response in WARC #789

Mr0grog commented Mar 8, 2025

ikreymer commented Apr 1, 2025

Mr0grog commented Apr 2, 2025 •

edited

Loading

404 response with empty body causes crawler to think page crashed and not record response in WARC #789

404 response with empty body causes crawler to think page crashed and not record response in WARC #789

Comments

Mr0grog commented Mar 8, 2025

ikreymer commented Apr 1, 2025

Mr0grog commented Apr 2, 2025 • edited Loading

Mr0grog commented Apr 2, 2025 •

edited

Loading