Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend INP attribution with extra LoAF information: longest script and buckets #592

Open
wants to merge 16 commits into
base: v5
Choose a base branch
from

Conversation

tunetheweb
Copy link
Member

@tunetheweb tunetheweb commented Feb 27, 2025

Closes #559

This is an alternative to #574 which provides some extra attribution information:

  • totalScriptDuration - the total duration of all intersecting scripts
  • totalStyleAndLayoutDuration - The total style and layout duration including any end-of-frame style and layout duration plus any forced style and layout duration.
  • totalPaintDuration - the off-main-thread presentation delays (end of LoAF -> end of INP).
  • totalUnattributedDuration - The total unattributed time not included in any of the previous totals including scripts < 5 milliseconds and other timings not attributed by LoAF (including when a frame is < 50ms and so has no LoAF).
  • longestScript - the longest intersection script
    • entry - the entry
    • subpart - the subpart the script ocured in
    • intersectingDuration - the intersecting duration (as it may be not all of the script intersected the INP time).

The first four give more information on whether this is a script, or style and layout, or frame presentation delay.

The last item pulls out the important script (by intersecting duration, which is not easily obtainable form the current raw LoAF entry).

Example output:

{
    "interactionTarget": "div.devsite-top-logo-row-wrapper>div.devsite-top-logo-row>devsite-appearance-selector",
    "interactionType": "pointer",
    "interactionTime": 25221.30000000447,
    "nextPaintTime": 25477.30000000447,
    "processedEventEntries": ...,
    "longAnimationFrameEntries": ...,
    "inputDelay": 32.599999994039536,
    "processingDuration": 164.20000000298023,
    "presentationDelay": 59.20000000298023,
    "loadState": "complete",
    "longestScript": {
        "entry": {
            "name": "script",
            "entryType": "script",
            "startTime": 25264.89999999851,
            "duration": 141,
            "invoker": "DEVSITE-HEADER.onclick",
            "invokerType": "event-listener",
            "windowAttribution": "self",
            "executionStart": 25264.89999999851,
            "forcedStyleAndLayoutDuration": 141,
            "pauseDuration": 0,
            "sourceURL": "https://www.gstatic.com/devrel-devsite/prod/v6bfb74446ce17cd0d3af9b93bf26e056161cb79c5a6475bd6a9c25286fcb7861/js/devsite_app_module.js",
            "sourceFunctionName": "",
            "sourceCharPosition": -1
        },
        "subpart": "processing-duration",
        "intersectingDuration": 141
    },
    "totalScriptDuration": 15,
    "totalStyleAndLayoutDuration": 156.29999999701977,
    "totalPaintDuration": 43.900000005960464,
    "totalUnattributedDuration": 40.79999999701977
}

@mmocny
Copy link
Member

mmocny commented Feb 28, 2025

Before review, some input on naming (since you asked!)


"thrashing" is only thrashing if there is actually duplication of effort. Forced style and layout can happen during script for several reasons and often it just eagerly does what would otherwise happened lazily, not actually more work. So its hard to know if its thrashing without looking. (Also scriptThrashing without style/layout is a weird name).

WDYT:

  • styleAndLayoutInScript
  • styleAndLayoutInRendering

Or, prepare for a sort of "sub-parts":

  • scriptsDuration for the whole, and then subparts:
    • scriptsStyleAndLayout
  • renderingDuration for the whole, and then subparts:
    • renderingStyleAndLayout
    • renderingRafAndObservers (Not sure about this)

RE: framePresentationDelay -- I like that fine. It matches the PaintTimingMixin direction that we have paintTime and presentationTime and so this is the paintTime -> presentationTime gap, and it should be a "delay" not "duration" because it isn't occupying the main thread (much like input delay)


RE: longestScript and especially longestScript.subpart -- Hmmm. I guess the main value would be to differentiate input delay vs not, though we already get that via sub-parts?

I'm sort of worried that you could have multiple long scripts (such as a long event and input delay).

Given that we have actual INP sub-parts, and access to the loaf entry with the full list of scripts, I'm just not sure about the value of highlighting just a single longest script-- but no specific concerns either.

@mmocny
Copy link
Member

mmocny commented Feb 28, 2025

Oh I think I see now the value of intersectingDuration. I think this could only ever apply to a single script which only partially overlaps first event timeStamp (and so partially affects input delay).

Maybe instead of longestScript with an overlappingDuration we have a specific value for this:

  • scriptWhichAddedSignificantInputDelayBikeshedName

...which could be null or not and would have extra metadata like "overlappingDuration" or "blockingDurationAdded" or something. All the other scripts in the LoAF entries after this one would affect the main processing time of the frame?

@tunetheweb
Copy link
Member Author

Dammit! I thought I'd found a workaround to avoid confusing with the "total style and layout" not including forced 😔

WDYT:

  • styleAndLayoutInScript
  • styleAndLayoutInRendering

I'm not loving the inconsistency away from the LoAF styleAndLayoutStart to be honest.

I'm still not 100% convinced knowing forcedStyleAndLayout is that useful (and therefore is worth the confusion with end of frame style and layout). Once you know the script is long running, it should be easy to repeat that in devtools and then see it forced style and layout (whether due to thrashing, or just an early style and layout calc). So I think labelling this as just "script" time is maybe OK. So maybe this should just be dropped to avoid that confusion.

As to the by subpart breakdown, I actually experimented with fully subparts (and have this on another branch):

image

In the end I decided that was very verbose. People will really care about all the scripts (in which case look at the LoAF entries directly), or the longest script (in which case you need to consider intersecting durations only so library should help make that easier). Having script details for each subpart seems a bit verbose for the library to expose. So I took an opinionated choice of only showing the longest one, but reporting extra details (like which subpart it was in and the intersecting duration).

Part of my issue with #574 is I found the totalDurationsPerSubpart object similarly overly complex.

Oh I think I see now the value of intersectingDuration. I think this could only ever apply to a single script which only partially overlaps first event timeStamp (and so partially affects input delay).

Yes that's correct. It's to explain when a script duration exceeds the total INP. Or when script is just a small part of INP. Maybe this is niche enough that we can just accept it will be over the duration sometimes and skip it?

Maybe instead of longestScript with an overlappingDuration we have a specific value for this:

scriptWhichAddedSignificantInputDelayBikeshedName
...which could be null or not and would have extra metadata like "overlappingDuration" or "blockingDurationAdded" or > something. All the other scripts in the LoAF entries after this one would affect the main processing time of the frame?

Well the way I see it, this new data in this PR allow you to see if your problem is:

  • script (including forced style and layout in that script) - and if so the worst one.
  • style and layout
  • off main thread work (i.e. browser work).

So the longest script is still valuable even if it's not in input delay. Granted you don't need intersectingDuration then and can just look at script duration but it will be the same as intersectingDuration then anyway so no harm to always look at that.

@tunetheweb
Copy link
Member Author

tunetheweb commented Mar 3, 2025

Updated scriptThrashing to scriptsForcedStyleAndLayoutDuration.

@philipwalton
Copy link
Member

One minor concern I have with adding new "duration" properties to the top-level attribution object is that then it's much harder to know which properties are part of the official INP subparts and which properties are just other duration values not related to the official subparts.

I'm not sure if there's much we can do about that, and anything we do would likely be inconsistent with the subparts in LCP attribution, but I wanted to make the comment anyway and have the discussion.

@tunetheweb
Copy link
Member Author

One minor concern I have with adding new "duration" properties to the top-level attribution object is that then it's much harder to know which properties are part of the official INP subparts and which properties are just other duration values not related to the official subparts.

I'm not sure if there's much we can do about that, and anything we do would likely be inconsistent with the subparts in LCP attribution, but I wanted to make the comment anyway and have the discussion.

It's a fair concern. I can think of two options to solve this:

  1. We have talked about whether this all should fit under a "LoAF" object (which would also make it clear why it's missing on non-LoAF supporting browsers). I've kind of moved away from that here as I thought it made the attribution object more complex in documentation and the like, but we could bring it back. We always have longestScriptDuraton, but that could be come buried more with this. Think my preference is to leave it flat, but not a super strong preference so could be persuaded.

  2. We could lean into the "total" more. At the moment styleAndLayoutDuration and framePresentationDelay are basically the two new properties for your concern. They are currently the last frame entries, as per

styleAndLayoutDuration - the style and layout duration of the final presentation frame (note we do not include other frames if there are multiple LoAFs to avoid double counting work, as that should be counted in "input delay").

They could also be totals for when INP spans more that one frame (so totalStyleAndLayoutDuration and totalFramePresentationDelay) which would differentiate them from the 3 core durations more. In most cases they would probably be for one entry, but for multi-frame INPs I was trying to decide which to show anyway.

I think there are positives and negatives to both options even with that that:

  • Totals across all frames - allows you to split the whole INP time in an alternative way (totalScript/totalStyleAndLayout/totalFramePresentationDelay). Though that could also be confusing as a competitor to the core subparts. Then again, since we've done it for scripts already, maybe it's weird not to do it for the last two? The other issue is you could get a totalFramePresentationDelay larger than presentationDelay so that would be weird.
  • Only final frame - allows you to split presentationDelay phase more. In which case they are more presentationDelay subparts (and we could name them more like this). But then maybe it's weird that we have total for scripts, but not for non-script durations?

At the moment I'm thinking 2) makes more sense. The third option is to just leave it as is. WDYT?

@tunetheweb tunetheweb requested a review from philipwalton March 9, 2025 13:06
@philipwalton
Copy link
Member

philipwalton commented Mar 10, 2025

At the moment I'm thinking 2) makes more sense. The third option is to just leave it as is. WDYT?

Yeah, option (2) seems reasonable to me. My first reaction was that any style/layout/presentation work not in the last frame should be part of "input delay", but the more I thought about it the more I think there is value in an "alternate" breakdown, e.g. phases vs. buckets—where the buckets are not necessarily sequential and help you determine whether poor INP is primarily caused by script, style/layout, or off-main-thread work.

That said,

  • The other issue is you could get a totalFramePresentationDelay larger than presentationDelay so that would be weird.

I'm not super bothered by this—however, I do think it's a bit weird that "presentation delay" in the subparts is not the same thing as "presentation delay" in totalFramePresentationDelay (the former includes main thread work and the latter doesn't).

What if we didn't limit this to just presentation and included all off-main-thread work (including off-main thread input delay)? Then we could call it something like totalOffMainThreadDuration or totalOffMainThreadDelay? WDYT?

Also @mmocny since he may have opinions here.

@tunetheweb
Copy link
Member Author

But how can we measure off main thread work? The only reason we have the framePresentationDelay is because it’s the time between LoAF end and INP end.

Which btw means we also don’t have this for prior frames. But then we also don’t need it for them either when looking at buckets, as we’d double count as that would overlap the next frame starting.

So framePresentationDelay should stay as it is, without the total I think. Which also means it’s back to being a subpart of presentationDelay so maybe that’s OK?

Alternatively we just exclude framePresentationDelay and leave it as unattributed time—and maybe we should add totalUnattributedTime for that matter?

@philipwalton
Copy link
Member

But how can we measure off main thread work? The only reason we have the framePresentationDelay is because it’s the time between LoAF end and INP end.

I think you can use LoAF to determine what was happening when the interaction took place, and if it's not happening during script execution or style/layout, then you can assume it's off-main delay. I'm not sure if that's always 100% true, but with some quick testing it seems to be.

Which btw means we also don’t have this for prior frames. But then we also don’t need it for them either when looking at buckets, as we’d double count as that would overlap the next frame starting.

For bucketing, I don't think you would double count it—at least not how I was imagining things working. E.g. with bucketing you wouldn't have input delay, so any style/layout or presentation time in the previous frame would contribute to those respective buckets.

Alternatively we just exclude framePresentationDelay and leave it as unattributed time—and maybe we should add totalUnattributedTime for that matter?

I think this would be a reasonable option as well. And we could document the various things that could end up being "unattributed", e.g. off-main presentation, input delay with no LoAF script attribution, <5ms, etc.

@tunetheweb
Copy link
Member Author

But how can we measure off main thread work? The only reason we have the framePresentationDelay is because it’s the time between LoAF end and INP end.

I think you can use LoAF to determine what was happening when the interaction took place, and if it's not happening during script execution or style/layout, then you can assume it's off-main delay. I'm not sure if that's always 100% true, but with some quick testing it seems to be.

Here's an example from just yesterday showing there is on-main-thread unattributed time:

https://webperformance.slack.com/archives/C04BK7K1X/p1741616198612019
https://issues.chromium.org/issues/402028633

Which btw means we also don’t have this for prior frames. But then we also don’t need it for them either when looking at buckets, as we’d double count as that would overlap the next frame starting.

For bucketing, I don't think you would double count it—at least not how I was imagining things working. E.g. with bucketing you wouldn't have input delay, so any style/layout or presentation time in the previous frame would contribute to those respective buckets.

I meant if we could measure off-main thread frame presentation delay for the first LoAF, it would overlap the on-main thread second LoAF. But we can't do measure that anyway so doesn't really matter. So agree it couldn't overlap.

Alternatively we just exclude framePresentationDelay and leave it as unattributed time—and maybe we should add totalUnattributedTime for that matter?

I think this would be a reasonable option as well. And we could document the various things that could end up being "unattributed", e.g. off-main presentation, input delay with no LoAF script attribution, <5ms, etc.

The other option is to bucket end of frame styleAndLayout and frame presentation delay as totalRenderTime so you'd have:

  • totalScriptDuration (with a subpart of totalForcedStyleAndLayoutDuration)
  • totalRenderDuration (with a subpart of totalStyleAndLayoutDuration and totalFramePresentationDuration if we want)
  • totalUnattributedTime (anything else not in above buckets).

@philipwalton
Copy link
Member

Here's an example from just yesterday showing there is on-main-thread unattributed time:

https://webperformance.slack.com/archives/C04BK7K1X/p1741616198612019 https://issues.chromium.org/issues/402028633

Interesting. I'm having trouble reproducing a LoAF for that frame. Are you able to repro it? Is the style block that shows up in the trace just a gap in the LoAF?

The other option is to bucket end of frame styleAndLayout and frame presentation delay as totalRenderTime so you'd have:

What if instead of trying to solve the presentation delay naming issue, we just didn't have a property for that at all. So we only have the following (all of which include intersecting durations from all reported LoAFs):

  • totalScriptDuration
  • totalStyleAndLayoutDuration
  • totalForcedStyleAndLayoutDuration

TBH, if we did want to have another bucket within the "presentation" subpart, I'd rather have something like totalRenderingScriptDuration, which would only include script entries that occur after renderStart within each LoAF. I think this would address some of Michal's original concerns that large presentation delays can be caused by main-thread scripts.

@tunetheweb
Copy link
Member Author

tunetheweb commented Mar 14, 2025

As discussed off-line, moved this to a bucketing model which matches total INP:

image

@tunetheweb tunetheweb changed the title Extend INP attribution with LoAF information Extend INP attribution with extra LoAF information: longest script and buckets Mar 14, 2025
@mmocny
Copy link
Member

mmocny commented Mar 19, 2025

I like where this conversation ended up, and I think the final diagram Barry put together is starting to look really good. (I haven't started to review the patch for details, but will do so after a few questions).

  1. I see that the buckets are: Script, S&L, Paint, Unattributed--- but I think paint is also unattributed in LoAF? Are there just 3 total buckets? (looking at some of my old diagrams, I think thats where I settled: Yellow, Gray, Purple, with Green being for "devtools only")

  2. I think the simplified buckets, without splitting by phases, might be "just right" in the aggregate. But would we also still report "input delay" (up to first event processing) for historical consistency and just to differentiate actual-interaction-slow from page-TBT-type-issues? And also perhaps expose an (now-off-main-only) "presentation delay"?

@tunetheweb
Copy link
Member Author

To answer your questions:

  1. "Paint and composite" in the diagram is from end of LoAF until end of INP. So yes it doesn't include any on-main-thread Paint timing (as that's not in LoAF). Open to ideas on better naming here?
  2. So we're keeping the traditional 3-way subparts, but now have an alternative view of buckets as well. This does worry me a little that there's basically two breakdowns now, which is not ideal but I think the alternative of breaking down the subparts into buckets (so potentially 3 x 4 = 12 breakdowns) also has it's downsides.

To me the subparts are our primary recommendation (is it your interaction that's the problem, or general business?), and the buckets give an alternative view as to what's the overall problem (is it primarily scripting, s&l, paint, something we don't know, or a mixture of all three?). It's not quite the full invoker/invoker-type combo as was proposed in #574 but I found that overly complex to grok so hopefully this is, as you say, "just right".

So with this we now kind of have the view you see in DevTools - the subparts is like the Interactions lane with the 3 phases, and the buckets are like the Summary breakdown if you highlight that section of the flame chart. We of course don't have the full flame chart details, but the Longest Script at least gives the entry point of the worst part of that (which is probably where you should concentrate if that's large). And those wanting a fuller picture, to be even closer to the flame chart can use the LoAF entries to get all the scripts if they so desire, but that's overkill for many IMHO. With subparts, buckets, and longest script we think we're striking a balance between enough detail, but not too much complexity.

@mmocny
Copy link
Member

mmocny commented Mar 20, 2025

Open to ideas on better naming here?

Ah, thanks. In that case it ~mostly what I wanted: off-main-presentation-delay. I think that given the history, probably "presentation delay" is not a good name to use, so:

  • Do as you suggest and name it for ALL the actual stages of rendering:

    • Looking at RenderingNG docs, simplifying a bit, I think the stages are: Paint, Commit, Raster (+Image Decode), Draw.
    • This can get wordy and in theory can go out of sync with what happens.
    • Interop concerns, too.
  • Find a single name and make it stick. I think the main-thread stages are generally known as "rendering" to folks (though there is some inconsistency), so maybe all the off-thread stages shoul dhave a single common name...

    • From the HTML spec, step 22, nothing jumps out:

      update the rendering or user interface of doc and its node navigable to reflect the current state

    • Perhaps "Rasterization" or "GPU Rendering" or something

So we're keeping the traditional 3-way subparts,

I think that the four timestamps:

  • first-event-timeStamp
  • first-event-processing-start
  • last-event-processing-end
  • presentationTime

Are all useful because it lets you "draw" the interaction.

All the 3 phases can easily be computed from these values, but maybe only the "input delay" should be done by default, as the other buckets just do a better job? Not sure!

So with this we now kind of have the view you see in DevTools [...]

+1.

@tunetheweb
Copy link
Member Author

I think that the four timestamps:

  • first-event-timeStamp
  • first-event-processing-start
  • last-event-processing-end
  • presentationTime

Are all useful because it lets you "draw" the interaction.

All the 3 phases can easily be computed from these values, but maybe only the "input delay" should be done by default, as the other buckets just do a better job? Not sure!

Alternatively, having the 3 subpart durations avoids breaking backwards compatibility and allows those timings to be easily computed from them anyway! So not seeing the need to change this.

Also not understanding what you mean by "but maybe only the "input delay" should be done by default, as the other buckets just do a better job?"? totalScriptDuration will span both input delay, and processing duration and even presentation delay so doesn't tell you if it was a "your event handlers" problem (i.e. "processing duration"), or a before or after problem (i.e. "input delay" or "presentation delay"—which can often be unrelated to the specific interaction and just a sign of general busyness.

@mmocny
Copy link
Member

mmocny commented Mar 20, 2025

I meant, maybe exposed as a unique value by default-- but I missed that we already expose these values so not worth removing.

@@ -260,6 +261,91 @@ export const onINP = (
return intersectingLoAFs;
};

const attributeLoAFDetails = (attribution: INPAttribution, value: number) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Sending first pass comments before reviewing this fyi)

Copy link
Member

@philipwalton philipwalton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be worth adding another test that creates a situation where there are two LoAFs, just to make sure that case is handled. (I'm happy to offer some ideas for how to do that if it's unclear.)

@tunetheweb
Copy link
Member Author

I think it might be worth adding another test that creates a situation where there are two LoAFs, just to make sure that case is handled. (I'm happy to offer some ideas for how to do that if it's unclear.)

Given the flakiness of LoAFs in CI already at the moment (it's one of the tests that fails most often), and your recent discovery that they aren't always emitted in time without experimental web platform features enabled, I'd prefer to wait until one or both of those are fixes before expanding on this in the test suite

WDYT @philipwalton ?

@philipwalton
Copy link
Member

Given the flakiness of LoAFs in CI already at the moment (it's one of the tests that fails most often), and your recent discovery that they aren't always emitted in time without experimental web platform features enabled, I'd prefer to wait until one or both of those are fixes before expanding on this in the test suite

WDYT @philipwalton ?

Sure, happy to address as a separate PR, but we should manually verify that it works in these cases and then come up with a plan to ensure it's properly tested in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants