fix: apply final hashes deterministically with stable placeholders set #5644

mattkubej · 2024-09-11T16:13:41Z

This PR contains:

Are tests included?

yes (bugfixes and features will not be merged without tests)
no

Breaking Changes?

yes (breaking changes will not be merged unless absolutely necessary)
no

List any relevant issue numbers:

Issue does not exist, but I can create one if that would be helpful.

Description

The generateFinalHashes iterates over the renderedChunksByPlaceholder map in a for..of loop, which iterates via insertion order. The renderedChunksByPlaceholder has key/values set asynchronously within transformChunksAndGenerateContentHashes. Due to this, generateFinalHashes does not apply final hashes in the same order between builds with the same input. So, when hash collisions occur, hashes associated with collisions may apply to chunks in different order across multiple builds. This change uses the stable placeholders to iterate over the chunks and deterministically apply the final hashes.

Why is this important?

We encountered an issue where a chunk with the same name had different file contents between builds, which then failed an SRI check on the produced asset. We expected that if the hash did not change, then the contents did not change.

What we found occurred was a manual chunk depended on two dependencies with the same name and same file contents, which then resulted in a collision. The two dependencies of the manual chunk would have their hashes flip between builds depending on who got processed first within generateFinalHashes. Processing chunks in the same order when applying final hashes will insure a deterministic result and avoid this problem.

How this happens

renderChunks invokes render on all chunks to set the preliminaryFileName for each chunk (i.e. placeholder hash) (ref)
transformChunksAndGenerateContentHashes invoked with chunks and returns renderedChunksByPlaceholder (ref)
1. Iterates through all chunks and transforms them asynchronously (ref)
2. Sets the transformedChunk on the renderedChunksByPlaceholder map (ref)
  1. This is the beginning of the problem
generateFinalHashes invoked with renderedChunksByPlaceholder (ref)
1. Iterates through renderedChunksByPlaceholder map (ref)
  1. Maps are iterated over by insertion order. Async insertion in step 2b means even if the map contains the same key/values, then it may not be iterated over in a consistent manner as it may have the key/values inserted in different orders across builds.
  2. For each chunk, it calculates the hash via contentToHash
    1. contentToHash is calculated by combining the hashes of the current chunk and its dependencies (ref, ref)
    2. getHash is invoked on contentToHash and sliced by placeholder length (ref)
      1. If a collision occurs, then the hash becomes contentToHash and is hashed again on the next loop (ref)

vercel · 2024-09-11T16:13:46Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
rollup	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Sep 19, 2024 4:24am

codecov · 2024-09-12T04:34:41Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.39%. Comparing base (184bc4e) to head (ddf4b13).
Report is 1 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #5644   +/-   ##
=======================================
  Coverage   99.39%   99.39%           
=======================================
  Files         242      242           
  Lines        9348     9349    +1     
  Branches     2470     2470           
=======================================
+ Hits         9291     9292    +1     
  Misses         48       48           
  Partials        9        9

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

lukastaegert · 2024-09-12T05:05:04Z

Makes sense, though I guess it is probably hard to write a meaningful test. One could write a manual test that bundles the same input twice and introduces a renderChunk hook that slows down the rendering of one or the other chunk in both runs and in the end verifies that the hashes for both runs match.
I think that would fit into misc.js where we already have a test that chunks are sorted.

I wonder, though, if we really need to manually sort the placeholders before iterating. There is a placeholders Set in transformChunksAndGenerateContentHashes that is created synchronously from the chunks respecting their order. I wonder if that Set would have a sufficiently stable insertion order so that when returned from transformChunksAndGenerateContentHashes and used in generateFinalHashes would have the same effect as the manual sorting?

mattkubej · 2024-09-12T15:58:22Z

One could write a manual test that bundles the same input twice and introduces a renderChunk hook that slows down the rendering of one or the other chunk in both runs and in the end verifies that the hashes for both runs match.
I think that would fit into misc.js where we already have a test that chunks are sorted.

I can take a look into drafting that up. Just to insure I found what you're referring to, you're suggesting modeling that test off of this one?

I wonder, though, if we really need to manually sort the placeholders before iterating. There is a placeholders Set in transformChunksAndGenerateContentHashes that is created synchronously from the chunks respecting their order. I wonder if that Set would have a sufficiently stable insertion order so that when returned from transformChunksAndGenerateContentHashes and used in generateFinalHashes would have the same effect as the manual sorting?

Just to be clear, you're suggesting the following?

Have transformChunksAndGenerateContentHashes return the placeholders set, which contains all the placeholders all the hashPlaceholders subsequently used as keys for renderedChunksByPlaceholder
Pass the resulting placeholders set as an additional param to generateFinalHashes
Iterate over the placeholder set and retrieve chunks from renderedChunksByPlaceholder by those placeholder keys

I think that could work too. As you said, it seems that the placeholder set is synchronously built and should be deterministic. This would also avoid the additional overhead of the sort by keys, which is likely fairly minimal, but can empathize with the desire to squeeze performance and memory overhead. I think my only concern with using placeholders is that there is an implicit assumption that the the placeholders set and renderedChunksByPlaceholder keys have the same one-to-one correspondence (i.e. same size and values). That does appear to be the case, but if that assumption no longer holds true, then there might be a bug. Sorting the keys of renderedChunksByPlaceholder avoids that potential problem, but does have more cost.

I can refactor this change with your suggested approach and vet it against my use case if you think that is preferable. I'm good either way.

lukastaegert · 2024-09-13T04:37:46Z

you're suggesting modeling that test off of this one?

Yes, roughly, except that of course you would build twice. And note that this is an old test, you can use async-await these days to make the test much nicer ;)

I can refactor this change with your suggested approach and vet it against my use case if you think that is preferable. I'm good either way.

That is why I would also like to have this test. As I see it, the renderChunk hook is the major source of asynchronicity we have, so if we have a test specifically for that, then I would feel good that we do not break it in the future.
I must admit that it is micro-optimization to use placeholders instead of sorting, on the other hand, the data structure is already created. And my expectation is that is is as deterministic as sorting by placeholder name. Which is to say, the latter could also be broken. Which brings me back to the test ;)

In any case, thank you so much for looking into this and providing a solution for this issue in the first place! I know that indeterminism is a major source of pain for those affected by it, and you are making it better for everyone. If you need help, let me know, then I can also have a look at writing that test.

mattkubej · 2024-09-13T17:04:28Z

you're suggesting modeling that test off of this one?

Yes, roughly, except that of course you would build twice. And note that this is an old test, you can use async-await these days to make the test much nicer ;)

I can refactor this change with your suggested approach and vet it against my use case if you think that is preferable. I'm good either way.

That is why I would also like to have this test. As I see it, the renderChunk hook is the major source of asynchronicity we have, so if we have a test specifically for that, then I would feel good that we do not break it in the future. I must admit that it is micro-optimization to use placeholders instead of sorting, on the other hand, the data structure is already created. And my expectation is that is is as deterministic as sorting by placeholder name. Which is to say, the latter could also be broken. Which brings me back to the test ;)

In any case, thank you so much for looking into this and providing a solution for this issue in the first place! I know that indeterminism is a major source of pain for those affected by it, and you are making it better for everyone. If you need help, let me know, then I can also have a look at writing that test.

Thanks for your thoughts and guidance! I'll take a swing at the test shortly and also adjust the PR to use the existing placeholder set. I'll report back once that is done or if I could use any assistance with the test. Thanks again!

mattkubej · 2024-09-17T04:47:08Z

@lukastaegert I removed the sorting, switched to the stable placeholder set, and introduced a test.

I validated that the test fails on master as well.

Test failing on master

I've updated the PR description to reflect the latest changes. Let me know if you'd like me to make any additional changes.

lukastaegert

Amazing, great work!

github-actions · 2024-09-19T04:56:08Z

This PR has been released as part of rollup@4.22.0. You can test it via npm install rollup.

It appears that this might cause some build failures that need further investigation.

vercel bot deployed to Preview September 11, 2024 16:15 View deployment

fix: apply final hashes deterministically with stable placeholders set

af0fb72

mattkubej force-pushed the fix/deterministic-final-hash-assignment branch from 5726962 to af0fb72 Compare September 17, 2024 04:41

vercel bot deployed to Preview September 17, 2024 04:43 View deployment

mattkubej changed the title ~~fix: apply final hashes deterministically via sorted processing of chunks~~ fix: apply final hashes deterministically with stable placeholders set Sep 17, 2024

Merge branch 'master' into fix/deterministic-final-hash-assignment

ddf4b13

lukastaegert approved these changes Sep 19, 2024

View reviewed changes

lukastaegert enabled auto-merge September 19, 2024 04:23

vercel bot deployed to Preview September 19, 2024 04:24 View deployment

lukastaegert added this pull request to the merge queue Sep 19, 2024

Merged via the queue into rollup:master with commit 447c191 Sep 19, 2024
38 checks passed

mattkubej mentioned this pull request Sep 20, 2024

fix: upgrade rollup 4.22.2, deterministic chunk hashing fix vitejs/vite#18151

Closed

lukastaegert added a commit that referenced this pull request Sep 20, 2024

Partially revert #5644

68c23da

It appears that this might cause some build failures that need further investigation.

lukastaegert added a commit that referenced this pull request Sep 20, 2024

Partially revert #5658 and re-apply #5644

0cd259e

It appears that this might cause some build failures that need further investigation.

lukastaegert added a commit that referenced this pull request Sep 20, 2024

Partially revert #5658 and re-apply #5644 (#5667)

d5ff63d

It appears that this might cause some build failures that need further investigation.

This was referenced Oct 22, 2024

[Snyk] Upgrade rollup from 3.29.5 to 4.22.5 leonardoadame/Affiliate-tech#991

Open

[Snyk] Upgrade rollup from 3.29.5 to 4.24.0 leonardoadame/Affiliate-tech#998

Open

leonardoadame mentioned this pull request Oct 28, 2024

[Snyk] Upgrade rollup from 3.29.5 to 4.24.0 leonardoadame/Affiliate-tech#1008

Open

YashGovekar mentioned this pull request Oct 30, 2024

[Snyk] Upgrade rollup from 4.12.1 to 4.24.0 YashGovekar/Gove-Commerce#424

Open

leonardoadame mentioned this pull request Nov 2, 2024

[Snyk] Upgrade rollup from 3.29.5 to 4.24.0 leonardoadame/Affiliate-tech#1038

Open

YashGovekar mentioned this pull request Nov 6, 2024

[Snyk] Upgrade rollup from 4.12.1 to 4.24.0 YashGovekar/Gove-Commerce#427

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix: apply final hashes deterministically with stable placeholders set #5644

fix: apply final hashes deterministically with stable placeholders set #5644

Uh oh!

mattkubej commented Sep 11, 2024 •

edited

Loading

Uh oh!

vercel bot commented Sep 11, 2024 •

edited

Loading

Uh oh!

codecov bot commented Sep 12, 2024 •

edited

Loading

Uh oh!

lukastaegert commented Sep 12, 2024

Uh oh!

mattkubej commented Sep 12, 2024

Uh oh!

lukastaegert commented Sep 13, 2024

Uh oh!

mattkubej commented Sep 13, 2024

Uh oh!

mattkubej commented Sep 17, 2024 •

edited

Loading

Uh oh!

lukastaegert left a comment

Uh oh!

Uh oh!

github-actions bot commented Sep 19, 2024

Uh oh!

Uh oh!

Uh oh!

fix: apply final hashes deterministically with stable placeholders set #5644

fix: apply final hashes deterministically with stable placeholders set #5644

Uh oh!

Conversation

mattkubej commented Sep 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Why is this important?

How this happens

Uh oh!

vercel bot commented Sep 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Sep 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

lukastaegert commented Sep 12, 2024

Uh oh!

mattkubej commented Sep 12, 2024

Uh oh!

lukastaegert commented Sep 13, 2024

Uh oh!

mattkubej commented Sep 13, 2024

Uh oh!

mattkubej commented Sep 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lukastaegert left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Sep 19, 2024

Uh oh!

Uh oh!

mattkubej commented Sep 11, 2024 •

edited

Loading

vercel bot commented Sep 11, 2024 •

edited

Loading

codecov bot commented Sep 12, 2024 •

edited

Loading

mattkubej commented Sep 17, 2024 •

edited

Loading