You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
TLDR: We've discovered several files where we get inconsistent scan results due to platform-local line endings (e.g. a match is found from mac/linux but NOT windows).
Generated wfps are identical cross-platform with the exception of the initial md5sum + filesize. The server appears to find a match based ONLY on that checksum and not the actual content.
Sample file
We've got a simple 15-line example file, which is also available here:
// Copyright 1998-2018 Epic Games, Inc. All Rights Reserved.
#pragma once
#include "CoreMinimal.h"
#include "Modules/ModuleManager.h"
class FMirrorAnimationSystemModule : public IModuleInterface
{
public:
/** IModuleInterface implementation */
virtual void StartupModule() override;
virtual void ShutdownModule() override;
};
Mostly-identical Fingerprints
Since winnowing normalizes away everything but [a-zA-Z0-9] and maintains its own line-numbers based purely on 0x0a, the meat of the file remains the same.
On mac/linux, it's a 333 byte file, and the generated wfp:
Depending how exactly it gets to a windows machine, the line endings get altered (e.g. by doing a git checkout).
As expected, if those get changed to CRLF, the 15-line file gains an additional 15 bytes, and the generated wfp becomes:
As implied, it's a 100% match, but it's discovering that ONLY based on the md5sum, not the other winnowed bits. For instance, adding a single carriage return changes the md5sum of the file but NOT the winnowed bits:
Closely related, it's possible for files to get different matches based on this.
To simulate, I've grabbed a file and tested both it as verbatim from upstream and it plus a carriage return so the fingerprints are identical but the md5sum is NOT.
As a result, our Windows-based developers get very different results than anyone else and/or our CI system:
Some files are missing completely
Some files match against different packages
Missing files means there's no UI to do anything about them with something like code compare, while different matches means it shows up in the UI, but anything they do based on it winds up being ignored elsewhere since the purl is completely different from what other developers/the CI system sees.
It would be possible to tell everything to not do md5-based whole-file checking, but we'd like to keep that enabled; these files are above the default minimum size threshold for scanning (256 bytes) and clearly are legitimate matches.
The text was updated successfully, but these errors were encountered:
TLDR: We've discovered several files where we get inconsistent scan results due to platform-local line endings (e.g. a match is found from mac/linux but NOT windows).
Generated wfps are identical cross-platform with the exception of the initial md5sum + filesize. The server appears to find a match based ONLY on that checksum and not the actual content.
Sample file
We've got a simple 15-line example file, which is also available here:
Mostly-identical Fingerprints
Since winnowing normalizes away everything but
[a-zA-Z0-9]
and maintains its own line-numbers based purely on0x0a
, the meat of the file remains the same.On mac/linux, it's a 333 byte file, and the generated wfp:
Depending how exactly it gets to a windows machine, the line endings get altered (e.g. by doing a git checkout).
As expected, if those get changed to CRLF, the 15-line file gains an additional 15 bytes, and the generated wfp becomes:
Different results
Hitting the scan service with those two wfps gets different results:
As implied, it's a 100% match, but it's discovering that ONLY based on the md5sum, not the other winnowed bits. For instance, adding a single carriage return changes the md5sum of the file but NOT the winnowed bits:
DIFFERENT Matches Based On Line Endings
Closely related, it's possible for files to get different matches based on this.
To simulate, I've grabbed a file and tested both it as verbatim from upstream and it plus a carriage return so the fingerprints are identical but the md5sum is NOT.
E.g. this file:
What to do?
As a result, our Windows-based developers get very different results than anyone else and/or our CI system:
Missing files means there's no UI to do anything about them with something like code compare, while different matches means it shows up in the UI, but anything they do based on it winds up being ignored elsewhere since the purl is completely different from what other developers/the CI system sees.
It would be possible to tell everything to not do md5-based whole-file checking, but we'd like to keep that enabled; these files are above the default minimum size threshold for scanning (256 bytes) and clearly are legitimate matches.
The text was updated successfully, but these errors were encountered: