Skip to content

xplat issues with detection and use of SSE instructions for SSE versions greater than SSE2 #3551

Open
@cosinusoidally

Description

@cosinusoidally

In my spare time I have been chipping away at #3494 . Whilst doing this I had a look at some of the existing code that is used to selectively use SSE greater than SSE2 instructions at runtime. For example https://github.com/Microsoft/ChakraCore/blob/6a222b7d049ad10b3dde81d5a97f29e7fba7fc00/lib/Backend/Encoder.cpp#L997

uint Encoder::CalculateCRC(uint bufferCRC, size_t data)
{
#if defined(_WIN32) || defined(__SSE4_2__)
#if defined(_M_IX86)
    if (AutoSystemInfo::Data.SSE4_2Available())
    {
        return _mm_crc32_u32(bufferCRC, data);
    }
#elif defined(_M_X64)
    if (AutoSystemInfo::Data.SSE4_2Available())
    {
        //CRC32 always returns a 32-bit result
        return (uint)_mm_crc32_u64(bufferCRC, data);
    }
#endif
#endif
    return CalculateCRC32(bufferCRC, data);
}

In order to generate a portable x86_64 binary I need to compile without -msse4.2. Doing so will essentially turn this function in to this:

uint Encoder::CalculateCRC(uint bufferCRC, size_t data)
{
    return CalculateCRC32(bufferCRC, data);
}

Note that this unconditionally disables use of the SSE4.2 crc32 instruction.

Unfortunately if -msse4.2 is enabled then SSE4.2 instructions will be enabled globally throughout the codebase. This means that Clang is free to emit SSE4.2 code wherever it pleases (not just where you have used an intrinsic such as _mm_crc32_u32). This means it will generate code that will not work correctly on processors that only have support for SSE2. I think this is an example of how Clang differs in behaviour to MSVC. I think MSVC will only emit SSE4.2 instructions specifically where you have used SSE4.2 intrinsics.

When the -msse4.2 flag is enabled you will essentially compile the following:

uint Encoder::CalculateCRC(uint bufferCRC, size_t data)
{
    if (AutoSystemInfo::Data.SSE4_2Available())
    {
        //CRC32 always returns a 32-bit result
        return (uint)_mm_crc32_u64(bufferCRC, data);
    }
    return CalculateCRC32(bufferCRC, data);
}

But, if you substitute in that implementation for CalculateCRC and then compile without -msse4.2 you will get the following error:

[ 30%] Building CXX object lib/Backend/CMakeFiles/Chakra.Backend.dir/Encoder.cpp.o
/root/src/ChakraCore-1.7.0/lib/Backend/Encoder.cpp:1033:14: error: always_inline function '_mm_crc32_u64' requires target feature 'ssse3', but would be inlined into function 'CalculateCRC' that is compiled without support for 'ssse3'
return (uint)_mm_crc32_u64(bufferCRC, data);
             ^
1 error generated.
make[2]: *** [lib/Backend/CMakeFiles/Chakra.Backend.dir/Encoder.cpp.o] Error 1
make[1]: *** [lib/Backend/CMakeFiles/Chakra.Backend.dir/all] Error 2
make: *** [all] Error 2
See error details above. Exit code was 2

Which means that you cannot use the intrinsic without the -msse4.2 flag.

There are a couple of work arounds for this (eg putting the SSE4.2 code in a separate file and then compiling that code -msse4.2). Using inline assembly is probably closest to what the intrinsic does. I've had a couple of tries at this, but I can't seem to get the right incantation. Eg, the following illustrates the idea, but is buggy (oddly it seems to work fine when compiling -O0 but breaks when compiling -O3, on Clang 3.8.0):

uint Encoder::CalculateCRC(uint bufferCRC, size_t data)
{
#if defined(_WIN32)
#if defined(_M_IX86)
    if (AutoSystemInfo::Data.SSE4_2Available())
    {   
        return _mm_crc32_u32(bufferCRC, data);
    }
#elif defined(_M_X64)
    if (AutoSystemInfo::Data.SSE4_2Available())
    {   
        //CRC32 always returns a 32-bit result
        return (uint)_mm_crc32_u64(bufferCRC, data);
    }
#endif
#else
#if defined(_M_X64)
    if (AutoSystemInfo::Data.SSE4_2Available())
    {   
        unsigned long long tmp;
        unsigned long long tmp2=0;
        tmp=(unsigned long long)bufferCRC;
        __asm__ __volatile__("push %1;crc32q %2, %1; movq %1, %0;pop %1" : "=r" (tmp2): "r" (tmp), "r" ((unsigned long long)data));
        return (uint)tmp2;
    }
#endif
#endif
    return CalculateCRC32(bufferCRC, data);
}

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions