You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When recording on specific sites, pywb record appears to be duplicating content in the recorded URL, this also seems to be happening in playback, the original target page seems to be captured ok but when you navigate away from it and try to return to the page you get a 404. I've also tried this using ArchiveWeb.page and am getting similar behaviour.
Steps to reproduce the bug
Attempt to record one of the affected pages with pywb record. The recording URL will look something like this
If you click on another captured page linked to from the page and try to go back you get a 'URL not found' error
Expected behavior
I would expect it not to insert the additional information in the URLs and to play back normally.
Screenshots
How the page looks after the URL has changed
Similar issue with ArchiveWeb.page playback
Environment
We have just updated to the latest version of pywb, I can try and find some more specific info on this if required.
I am using v0.11.3 of ArchiveWeb.page
Not sure if this is related, but it also looks like there are some minor layout differences in the captured versions from the live web (i.e. the title text is left aligned instead of centred in the captured version)
The text was updated successfully, but these errors were encountered:
ikreymer
added a commit
to webrecorder/wombat
that referenced
this issue
Feb 23, 2024
It loads a script from the base64 string, eg: <script type="text/javascript" src="data:text/javascript;base64,dmFyIHJlbGV2YW5zc2l... This causes the rewriting to not be applied properly, though, fortunately its possible to detect in the history intercept.
It also overrides self with var self, which conflicts with how the rewriting works - which overrides self with let self assignment.
The history fix needs to be done in wombat, while the other fixes need to be done in pywb / wabac.js
Describe the bug
When recording on specific sites, pywb record appears to be duplicating content in the recorded URL, this also seems to be happening in playback, the original target page seems to be captured ok but when you navigate away from it and try to return to the page you get a 404. I've also tried this using ArchiveWeb.page and am getting similar behaviour.
Steps to reproduce the bug
[pywb instance URL]/[collection]/record/https://teesvalley-ca.gov.uk/about/leadership/cabinet-boards-committees/meetings/local-enterprise-partnership/
[pywb instance URL]/[collection]/record/https://teesvalley-ca.gov.uk/[collection]/record/mp_/https://teesvalley-ca.gov.uk/about/leadership/cabinet-boards-committees/meetings/local-enterprise-partnership/
[pywb instance URL]/[collection]/20240222092319/https://teesvalley-ca.gov.uk/about/leadership/cabinet-boards-committees/meetings/local-enterprise-partnership/
The page renders but the URL changes to
[pywb instance URL]/[collection]/20240222092319/https://teesvalley-ca.gov.uk/[collection]/20240222092319mp_/https://teesvalley-ca.gov.uk/about/leadership/cabinet-boards-committees/meetings/local-enterprise-partnership/
Expected behavior
I would expect it not to insert the additional information in the URLs and to play back normally.
Screenshots
How the page looks after the URL has changed

Similar issue with ArchiveWeb.page playback

Environment
We have just updated to the latest version of pywb, I can try and find some more specific info on this if required.
I am using v0.11.3 of ArchiveWeb.page
Additional context
This only seems to have occurred on this site, other sites seem to be capturing as normal. The specific pages I have tried are:
https://teesvalley-ca.gov.uk/business/tees-valley-business-board/
https://teesvalley-ca.gov.uk/about/leadership/cabinet-boards-committees/meetings/local-enterprise-partnership
https://teesvalley-ca.gov.uk/about/leadership/cabinet-boards-committees/meetings/local-enterprise-partnership/local-enterprise-partnership-archive/
Not sure if this is related, but it also looks like there are some minor layout differences in the captured versions from the live web (i.e. the title text is left aligned instead of centred in the captured version)
The text was updated successfully, but these errors were encountered: