Skip to content

fix: parse select correctly when it contains HTML element #1070

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

lbrooney
Copy link

Closes #1069. Select was the problem not options, but that issue was prior to me looking at the WHATWG documentation and the Astro parser.

Changes

  • This fixes parsing of select elements when they contain HTML elements.
  • The issue is specifically with how End Tags were being handle when inSelect mode
    • Only option, optgroup, select, and template end tags were being handled.
    • Others were unhandled, this PR just changes so that if the end tag is not any of the ones from the previous bullet point, the parser should treat the tag as if it were inBody.
    • Start tag handles it this way already.
  • Previously the compiler could correctly produce a select containing a single HTML element, which I'll refer to as X, or a select containing an option containing a single X.
  • However if a HTML element, Y, is added as the next sibling to either the select, option, or X, Y will cannibalized by X becoming its child. Each subsequent element(child or sibling) will have whatever element was preceding it as its parent.

ie:

<select id="select-span-sibling">
  <span>Lemon</span>
  <option>Lime</option>
</select>
<div><span>Orange</span></div>
<div>Melon</div>

produces (formatted for readability):

<select id="select-span-sibling">
  <span>
    Lemon
    <option>
      Lime
    </option>
    <div>
      <span>
        Orange
        <div>
          Melon
        </div>
      </span>
    </div>
  </span>
</select>
  • This change fixes it so that the previous example now produces the expected output:
<select id="select-span-sibling">
  <span>Lemon</span> 
  <option>Lime</option>
</select>
<div><span>Orange</span></div>
<div>Melon</div>
  • I briefly tested the compiled WASM in a local Astro project and the generated HTML from the .astro was matching the expected/"correct" output.
  • I don't know if there are additional changes needed to be made to handle special cases or better match the spec or if this somehow broke how people were previously using it, but I believe this PR produces a more correct output than before.

Testing

I wasn't sure where best to do testing so I just added to test_printer where I could most immediately observe outputs. Additionally I wasn't sure what was the appropriate amount of tests I should add or how verbose to be with the naming, so I added only a few to cover some limited scenarios and it's far from exhaustive. I can add, remove, or change them if it is desired.

Docs

I did not change any documentation as I believe it's only a bugfix.

Copy link

changeset-bot bot commented Apr 24, 2025

🦋 Changeset detected

Latest commit: f7af818

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@astrojs/compiler Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Copy link
Member

@ematipico ematipico left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@ematipico
Copy link
Member

@lbrooney are you sure about the validity of the bug? The w3c validator returns an error when span is a child of select

@lbrooney
Copy link
Author

@lbrooney are you sure about the validity of the bug? The w3c validator returns an error when span is a child of select

So I believe you are correct in that a span is not a valid child of a select(if my understanding is correct the current spec says anything not explicity defined is an error).

To my knowledge, currently a span in an option is also not spec compliant either, but considering as of Chrome 135, with the introduction of customizable selects and the ability for them to contain rich HTML content that is changing. There is an active spec pull request here: whatwg/html#9799. Additional details can also be found in the previously linked Chrome dev blog as well.

Admittedly my example in the initial PR is bad as I'm not sure that even within the new spec changes whether that would be valid, but as of right now Chrome doesn't raise any concerns about it, it exists within the DOM, but it not visible/presented(maybe with some CSS you can make it visible). Apologies, however, I do think it demonstrated how broken the output produced could be.

I probably should have included another example in my opening issue, but for instance:

<select>
  <option><span>Lemon</span></option>
  <option><span>Lime</span></option>
</select>

currently produces(formatted):

<select>
  <option>
    <span>Lemon
      <option>
        <span>Lime</span>
      </option>
    </span>
  </option>
</select>

Taking advantage of this new feature is currently impossible within an Astro component.

Blink currently has this behind a runtime flag (the function above it does the actual check). Checking the references you can see how they've changed parsing, with html_tree_builder.cc I believe being most relevent in this case.

@MoustaphaDev
Copy link
Member

Thanks for contributing!

Looking at the updated spec and this https://chromium-review.googlesource.com/c/chromium/src/+/5118452/10/third_party/blink/renderer/core/html/parser/html_tree_builder.cc the change looks a bit more involved than parsing with the inBody insertion mode.

Wondering if we should get rid of HTML error correction. Browsers are already optimized to do that, and it would avoid having to keep up with the HTML spec.

@MoustaphaDev
Copy link
Member

I posted a proposal to discuss that!

withastro/roadmap#1166

@lbrooney
Copy link
Author

lbrooney commented May 5, 2025

Thanks for contributing!

For why my solution is rather terse is that while, to my understanding, the new select spec is largely finalized, I didn't want to engineer a spec-for-spec implementation only for it to change and have it now be incorrect (long way to say I was lazy 😅). The current changes introduced by the PR solved my problems so I was satisfied with it. To your second point, if the produced HTML with this solution was not spec compliant, the browser would handle it, which while not ideal is well within their capabilities.

I can push some additional snapshot tests, but with the current PR if a select tag (properly closed/opened or not) is a child of an an element within a select, it is does not exist in the produced output.

<!-- any of these inputs -->
<select><option><span><select>Lemon</select></span></option></select>
<select><option><span><select></select>Lemon</span></option></select>
<select><option><span><select>Lemon</span></option></select>
<select><option><span></select>Lemon</span></option></select>
<!-- same output by astro -->
<select><option><span>Lemon</span></option></select>`

The behaviour on handled cases within inSelect.

<!-- select tag in option any of these inputs -->
<select><option><select></select><span>Lemon</span></option></select>
<select><option><select><span>Lemon</span></option></select>
<select><option></select><span>Lemon</span></option></select>
<!-- same output by astro -->
<select><option></option></select><span>Lemon</span>

I posted a proposal to discuss that!

withastro/roadmap#1166

In regards to your proposal, I was having similar thoughts when I initally came across this bug and found that I didn't encounter it with React TSX. I believe that's essentially how the current Astro TSX parser workers by just ignoring parse errors, and I may be incorrect in this, I believe React TSX also doesn't "correct" improper HTML.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Astro incorrectly closes option tag when it contains an HTML tag
3 participants