-
-
Notifications
You must be signed in to change notification settings - Fork 102
Add WARC-Protocol header #715
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Re: the WARC-Protocol header being comma separated rather than repeated, it looks like the IIPC membership is trying to get consensus on whether this is acceptable this week. See ongoing discussion here: iipc/warc-specifications#42 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One suggestion re: import cleanup.
Community consensus in iipc/warc-specifications#42 seems to be for repeated headers rather than a single header with a comma-separated list so we should probably modify this PR to go with that approach.
Updated to now generate multiple WARC-Protocol headers, per consensus there. |
Though, may also want to get clarification on WARC-Cipher-Suite since its not an exact one-to-one mapping there.. |
…parated - add WARC-Cipher-Suite header, mapping Chrome NetworkSecurityDetails to known cipher suites - fixes #641
support WARC-Protocol as multiple headers tests: add tests for WARC-Protocol, WARC-Cipher-Suite
Co-authored-by: Tessa Walsh <[email protected]>
check if protocol ever matches HTTP/1.0 and use that in WARC header, otherwise always use HTTP/1.1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested independently with the crawler and this is working as expected. Nice work.
add WARC-Cipher-Suite header, mapping Chrome NetworkSecurityDetails to known cipher suitesA few caveats:
For now, just adding WARC-Protocol here as WARC-Cipher-Suite needs more testing.
- The WARC-Cipher-Suite data is not directly available from the browser, so must be inferred based on the available info.