Skip to content

Issue with embed PDFs on www.professeurphifix.net #801

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
benoit74 opened this issue Mar 28, 2025 · 2 comments
Open

Issue with embed PDFs on www.professeurphifix.net #801

benoit74 opened this issue Mar 28, 2025 · 2 comments

Comments

@benoit74
Copy link
Contributor

I'm trying to crawl www.professeurphifix.net and I've an issue with embedded PDFs

Let's focus on https://www.professeurphifix.net/orthographe_impression/ortho_a_1.html as an example.

The code showing the PDF is :

<embed src="ortho_a_1.pdf" width="680px" height="600px">

It is hence not explored by default by the crawler, but this is not a big deal thanks to the "recent" --selectLinks setting ;)

Command used:

crawl --scopeIncludeRx ortho_a_1 --selectLinks "a[href]->href,embed[src]->src" --seeds https://www.professeurphifix.net/orthographe_impression/ortho_a_1.html

With this "tweak", the resulting WARC contains the PDF but "something" seems to prevent it to be displayed on replayweb.page (and in the ZIM as well obviously).

Do I miss something? Is this rather a wombat.js issue?

Sample WARC with the HTML and the PDF:
rec-da74c0c8fc0b-20250328092919995-0.warc.gz

@ikreymer
Copy link
Member

ikreymer commented Apr 1, 2025

I was able to load the PDF in both Chrome and Firefox just now in ReplayWeb.page.. Or maybe it works in some cases?
What browser were you using?

@benoit74
Copy link
Contributor Author

benoit74 commented Apr 1, 2025

I still don't achieve to do it from both Firefox and Chrome on MacOS (latest versions or so)

Firefox:
Image

Chrome (message is a bit clearer):
Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Triage
Development

No branches or pull requests

2 participants