Skip to content

[Issue]: hip::host not working for NVIDIA Platforms #3748

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wme7 opened this issue Feb 14, 2025 · 5 comments · May be fixed by ROCm/clr#150
Open

[Issue]: hip::host not working for NVIDIA Platforms #3748

wme7 opened this issue Feb 14, 2025 · 5 comments · May be fixed by ROCm/clr#150

Comments

@wme7
Copy link

wme7 commented Feb 14, 2025

Problem Description

I'm working in porting a kaiju CUDA/C++17 project. To do this, I'm using an Nvidia-based workstation. Moreover, I'm using HIP 6.2.4 on top of a cudaToolKit 12.6, and to play safe, I'm using Ubuntu 24.04 as my OS. All packages installations went smoothly.

My target is to port my code on a AMD platform with ROCm 6.0.1 and a GPU architecture "gfx90a",

However, while doing this porting work, I started to notice that my build with HIP on my Nvidia platform always fail due to linkage errors. Investigating on the matter further, I started to suspect that the hip::host module is not doing its job as indicated in the HIP documentation: Consuming the HIP API in C++ code

I reported this issue on the discourse.cmake. But after inspecting the build trace and have a CMake maintainer being able to reproduce my issue, I suspect that the NVIDIA-backed implementation of HIP's cmake is to blame.

Here, I request an investigation of this issue.

Operating System

Ubuntu 24.04.1 LTS (Noble Numbat)

CPU

11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz

GPU

NVIDIA Corporation GA100 [A100 SXM4 40GB] (rev a1)

ROCm Version

ROCm 6.2.4

ROCm Component

HIP

Steps to Reproduce

A full reproductive example is explained in this post on the discourse.cmake forum.

For the sake of completeness, I here also attach a full copy of the example to reproduce and study the issue:

devEvent_library.zip

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

$ /opt/rocm/bin/rocminfo --support
ROCk module is NOT loaded, possibly no GPU devices

Additional Information

I have verified that my example works well on AMD Platforms with MI200x GPUs.

Again, the issue is that when the present example is build on any NIVIDA Platform, the CMake build fail usually in linking process like no hip::host module exist in the process. Producing an output like:

$ cmake ..
-- The CXX compiler identification is GNU 13.3.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- The HIP compiler identification is NVIDIA 12.6.85
-- Detecting HIP compiler ABI info
-- Detecting HIP compiler ABI info - done
-- Check for working HIP compiler: /usr/local/cuda-12.6/bin/nvcc - skipped
-- Detecting HIP compile features
-- Detecting HIP compile features - done
-- Configuring done (2.1s)
-- Generating done (0.0s)
-- Build files have been written to: /home/mdiaz/Depots/devLibrary/devEvent_library/build

$ make
[ 16%] Building HIP object CMakeFiles/Test.dir/library/library.cpp.o
[ 33%] Linking HIP shared library libTest.so
[ 33%] Built target Test
[ 50%] Building CXX object CMakeFiles/devEvent_library_clang.dir/main.cpp.o
In file included from /home/mdiaz/Depots/devLibrary/devEvent_library/library/library.h:6,
                 from /home/mdiaz/Depots/devLibrary/devEvent_library/main.cpp:1:
/home/mdiaz/Depots/devLibrary/devEvent_library_FAIL/library/common.h:16:10: fatal error: hip/hip_runtime.h: No such file or directory
   16 | #include <hip/hip_runtime.h>
      |          ^~~~~~~~~~~~~~~~~~~
compilation terminated.
make[2]: *** [CMakeFiles/devEvent_library_clang.dir/build.make:76: CMakeFiles/devEvent_library_clang.dir/main.cpp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:113: CMakeFiles/devEvent_library_clang.dir/all] Error 2
make: *** [Makefile:101: all] Error 2

Let me remark that this example builds and executes correctly on any AMD Platform, but fails to build on any NVIDIA Platform.

@ppanchad-amd
Copy link

Hi @wme7. Internal ticket has been created to investigate your issue. Thanks!

@wme7
Copy link
Author

wme7 commented Mar 26, 2025

Hi @ppanchad-amd, just checking if there is any progress with the investigation?

@darren-amd darren-amd linked a pull request Mar 27, 2025 that will close this issue
@darren-amd
Copy link

darren-amd commented Mar 27, 2025

Hi @wme7,

I was able to reproduce the issue and it is indeed due to an incomplete hip-config-nvidia.cmake implementation, as the hip::host target is created but never populated with set_target_properties. I have a fix for this issue in the PR here: ROCm/clr#150, which you can build from source. Alternatively, you could replace /opt/rocm-6.3.3/lib/cmake/hip/hip-config-nvidia.cmake with:

set(_IMPORT_PREFIX ${HIP_PACKAGE_PREFIX_DIR})
foreach(__lib device host amdhip64)
    if (NOT TARGET hip::${__lib})
        add_library(hip::${__lib} INTERFACE IMPORTED)
        set_target_properties(hip::${__lib} PROPERTIES
		INTERFACE_COMPILE_DEFINITIONS "__HIP_PLATFORM_NVIDIA__=1"
		INTERFACE_INCLUDE_DIRECTORIES "${_IMPORT_PREFIX}/include"
		INTERFACE_SYSTEM_INCLUDE_DIRECTORIES "${_IMPORT_PREFIX}/include")
    endif()
endforeach()

You can't replace the cmake file directly with the PR version since @PACKAGE_INCLUDE_INSTALL_DIR@ is being parameterized during the build. Please give that a try and let me know if you run into any issues, thanks!

@wme7
Copy link
Author

wme7 commented Mar 28, 2025

Thx @darren-amd,
Glad to know that you manage locate the source of problem and provide a quick fix for it!
I'll modify my /opt/rocm-6.3.3/lib/cmake/hip/hip-config-nvidia.cmake as you indicated and confirm to you if it works !

@wme7
Copy link
Author

wme7 commented Apr 6, 2025

Hi @darren-amd,

I tried replacing the /opt/rocm/lib/cmake/hip/hip-config-nvidia.cmake as

# Commented previous definitions
#add_library(hip::device INTERFACE IMPORTED)
#add_library(hip::host INTERFACE IMPORTED)
#add_library(hip::amdhip64 INTERFACE IMPORTED)

set(_IMPORT_PREFIX ${HIP_PACKAGE_PREFIX_DIR})
foreach(__lib device host amdhip64)
    if (NOT TARGET hip::${__lib})
        add_library(hip::${__lib} INTERFACE IMPORTED)
        set_target_properties(hip::${__lib} PROPERTIES
		INTERFACE_COMPILE_DEFINITIONS "__HIP_PLATFORM_NVIDIA__=1"
		INTERFACE_INCLUDE_DIRECTORIES "${_IMPORT_PREFIX}/include"
		INTERFACE_SYSTEM_INCLUDE_DIRECTORIES "${_IMPORT_PREFIX}/include")
    endif()
endforeach()

and proceed to rebuild the example as

$ rm -rf build/

$ mkdir build && cd build

$ cmake ..
-- The CXX compiler identification is GNU 13.3.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- The HIP compiler identification is NVIDIA 12.6.85
-- Detecting HIP compiler ABI info
-- Detecting HIP compiler ABI info - done
-- Check for working HIP compiler: /usr/local/cuda-12.6/bin/nvcc - skipped
-- Detecting HIP compile features
-- Detecting HIP compile features - done
-- Configuring done (2.9s)
-- Generating done (0.0s)
-- Build files have been written to: /home/mdiaz/Depots/devLibrary/devEvent_library/build

$make
[ 16%] Building HIP object CMakeFiles/Test.dir/library/library.cpp.o
[ 33%] Linking HIP shared library libTest.so
[ 33%] Built target Test
[ 50%] Building CXX object CMakeFiles/devEvent_library_clang.dir/main.cpp.o
In file included from /opt/rocm/include/hip/hip_runtime.h:64,
                 from /home/mdiaz/Depots/devLibrary/devEvent_library/library/common.h:16,
                 from /home/mdiaz/Depots/devLibrary/devEvent_library/library/library.h:6,
                 from /home/mdiaz/Depots/devLibrary/devEvent_library/main.cpp:1:
/opt/rocm/include/hip/nvidia_detail/nvidia_hip_runtime.h:26:10: fatal error: cuda_runtime.h: No such file or directory
   26 | #include <cuda_runtime.h>
      |          ^~~~~~~~~~~~~~~~
compilation terminated.
make[2]: *** [CMakeFiles/devEvent_library_clang.dir/build.make:76: CMakeFiles/devEvent_library_clang.dir/main.cpp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:113: CMakeFiles/devEvent_library_clang.dir/all] Error 2
make: *** [Makefile:101: all] Error 2

Perhaps I miss something?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants