Skip to content

Diff output for HTML standard broken #66

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
annevk opened this issue Jun 2, 2020 · 9 comments
Closed

Diff output for HTML standard broken #66

annevk opened this issue Jun 2, 2020 · 9 comments

Comments

@annevk
Copy link
Contributor

annevk commented Jun 2, 2020

At the top I see:

<html lang=en-US-x-hixie class=split><script src="/link-fixup.js" defer></script><meta charset=utf-8><meta content="width=device-width, initial-scale=1, shrink-to-fit=no" name=viewport><title>HTML Standard</title><meta content=#3c790a name=theme-color><link href="https://resources.whatwg.org/spec.css" crossorigin="" rel=stylesheet><link href="https://resources.whatwg.org/standard.css" crossorigin="" rel=stylesheet><link href="https://resources.whatwg.org/standard-shared-with-dev.css" crossorigin="" rel=stylesheet><link href="https://resources.whatwg.org/logo.svg" crossorigin="" rel=icon><link href="/styles.css" crossorigin="" rel=stylesheet><script>
   function toggleStatus(div) {
     div.parentNode.classList.toggle('wrapped');
   }
   function setLinkFragment(link) {
     link.hash = location.hash;
   }
<style type='text/css'>
.diff-old-a {
  font-size: smaller;
  color: red;
}

.diff-new { background-color: yellow; }
.diff-chg { background-color: lime; }
.diff-new:before,
.diff-new:after
    { content: "\2191" }
.diff-chg:before, .diff-chg:after
    { content: "\2195" }
.diff-old { text-decoration: line-through; background-color: #FBB; }
.diff-old:before,
.diff-old:after
    { content: "\2193" }
</style>
<script src="https://w3c.github.io/htmldiff-nav/index.js"></script><script async crossorigin="" src="/html-dfn.js">
</script>

Note how various things end up nested inside a script element somehow. The document is also in quirks mode.

Could this be because we made a change to this header at some point to have more external style sheets?

whatwg/html#5600

cc @domenic @tobie

@tobie
Copy link
Owner

tobie commented Jun 2, 2020

There seems to be a number of JS files that aren't pulled in properly. Not too sure what the issue is, here.

How recent is that issue?

@annevk
Copy link
Contributor Author

annevk commented Jun 2, 2020

According to IRC reports it's recentish. Also note how before the inline script is closed you get <style type='text/css'>. What ends up generating this HTML? Is this something we can control?

@tobie
Copy link
Owner

tobie commented Jun 2, 2020

Hey @dontcallmedom, anything changed in the way services.w3.org/htmldiff appends its CSS and JS libs? It seems broken here.

@annevk
Copy link
Contributor Author

annevk commented Jun 3, 2020

If the links below https://services.w3.org/htmldiff are accurate there have been no recent changes that could have caused this I think. We did change something in source (moving an inline style sheet to be external) which I thought could perhaps be the reason, but I don't really know how this system works.

I suspect it's that and the Perl script just handles our HTML badly or something like that.

@dontcallmedom
Copy link

no recent change in htmldiff; the parsing algorithm used by the tool to insert the style and script are pretty fragile, so I'm guessing it's not happy with the recent change @annevk mentioned

@annevk
Copy link
Contributor Author

annevk commented Jun 10, 2020

@dontcallmedom I created whatwg/html#5629 and that is indeed what is going on. You can see https://whatpr.org/html/5629/acknowledgements.html as an input to the diff and https://whatpr.org/html/5629/78ba017...66dd925/acknowledgements.html as the mangled output.

So this means that the end of our "head section" is marked with <body> and it looks like https://github.com/w3c/htmldiff-ui/blob/master/htmldiff.pl#L369 does anticipate such a scenario, but somehow it ends up removing the entire line as I guess it expects <body> to appear on its own line or some such? Would that be easy to fix or should we mangle our HTML serializer somehow?

@gosko
Copy link

gosko commented Jun 10, 2020

@annevk I think this is fixed – our server was using an obsolete version of htmldiff.pl

@annevk
Copy link
Contributor Author

annevk commented Jun 11, 2020

@gosko thank you, that was indeed the problem. It's still a bit weird that the DOCTYPE ends up being stripped. I filed w3c/htmldiff-ui#4 on that.

I'm also a little uncomfortable with pulling in a third-party script, but I guess that's okay for now as this is all on a separate domain anyway.

@annevk annevk closed this as completed Jun 11, 2020
@tobie
Copy link
Owner

tobie commented Jun 11, 2020

Filed a new issue to make it more clear who to file an issue with when you're encountering a problem with a particular service: #71.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants