Skip to content

Charset in meta content does not correctly parse for trailing semi-colon #92

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
1619digital opened this issue Jul 19, 2013 · 2 comments

Comments

@1619digital
Copy link

Reference: http://www.w3.org/html/wg/drafts/html/master/infrastructure.html#algorithm-for-extracting-a-character-encoding-from-a-meta-element

Because the ContentAttrParser is looking only for a space character to terminate an unquoted charset

<meta http-equiv="Content-Type" content="charset=iso8859-2;text/html">

will incorrectly be inferred to have the charset 'iso8859-2;text/html'. The fix is to add a semicolon to the spaceCharacters scanned in SkipUntil - line 860.

EDIT: as per specification. Also, I don't know what the status is of the parser tests, but they're out of date and incorrect and (obviously) not used. Although most of the tests are still valid, so it would not take much to bring them back into the full test regime.

@gsnedders gsnedders modified the milestones: 1.0, 0.999999999 Jun 6, 2016
@willkg willkg modified the milestones: 0.9999999999, 1.0 Oct 3, 2017
@willkg willkg removed this from the 1.0 milestone Oct 31, 2017
@willkg
Copy link
Contributor

willkg commented Oct 31, 2017

If someone needs this, please submit a PR.

@gsnedders
Copy link
Member

gsnedders commented Nov 8, 2017

We have a problem with the tests for this that they currently don't agree whether they're testing the eventual encoding (including potentially with the tokenizer changing the encoding while parsing, after the pre-parse) or just the pre-parse.

See html5lib/html5lib-tests#28 for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants