Basic implementation of a new communication layer for Charm++
If you want to run Reconverse locally (single node), all you have to do is the following:
$ cd reconverse
$ mkdir build
$ cd build
$ cmake -DRECONVERSE_TRY_ENABLE_COMM_LCI2=OFF ..
$ make
- -DENABLE_CPU_AFFINITY (off by default): Enable setting CPU affinity with HWLOC (must have HWLOC installed)
Currently, Reconverse multi-node support is based on LCI (https://github.com/uiuc-hpc/lci). You could either install LCI by your own or use the cmake autofetch support.
To use the cmake autofetch support:
$ cd reconverse
$ mkdir build
$ cd build
$ cmake -DRECONVERSE_TRY_ENABLE_COMM_LCI2=ON -DRECONVERSE_AUTOFETCH_LCI2=ON ..
$ make
Additional cmake variable can be passed to further fine-tune the build of LCI. Useful ones include
-DLCI_NETWORK_BACKENDS=[ofi|ibv]
: explicitly select the LCI backend to be libfabric (ofi) or libibverbs (ibv).ibv
should be used for Infiniband and RoCE clusters.ofi
should be used for shared memory system (e.g. laptop) and slingshot-11 clusters.-DLCT_PMI_BACKEND_ENABLE_MPI=ON
(Default:OFF
): let LCI bootstrap with MPI. This can be useful when the running environment does not have PMI support andlcrun
becomes slow.
Note: LCI by default will automatically probe and select available network backends, but this procedure sometimes leads to unsatifactory results (e.g. on Delta where libibverbs is installed but no Infiniband devices available).
In the build/examples/<program_name> folder, run the reconverse_<program_name>
executable. Currently, the first arguments must be +pe <num_pes>
.
libfabric
as LCI's network backend for shared memory system. You can install them with
$ sudo apt install libfabric-bin libfabric-dev
$ git clone https://github.com/charmplusplus/reconverse.git
$ cd reconverse
$ mkdir build
$ cd build
$ cmake -DRECONVERSE_TRY_ENABLE_COMM_LCI2=ON -DRECONVERSE_AUTOFETCH_LCI2=ON -DLCI_NETWORK_BACKENDS=ofi ..
$ make
Using lcrun
to run the reconverse example is typically the most simplest way. First, you need to locate LCI's lcrun
executable. It is located in the LCI source directory and will be installed to the bin
folder if you installed LCI by yourself. If you used the cmake autofetch support, it will typically be located in the <build_directory>/_deps/lci-src
folder.
Then, run the reconverse example with lcrun
:
$ cd build/examples/pingpong
$ lcrun -n 2 ./reconverse_ping_ack +pe 4
Note: if you installed libfabric
in a non-standard location, the linker may complain it cannot find the libfabric shared library, in which case you need to let the linker find them by
export LD_LIBRARY_PATH=<path_to_libfabric_lib>:${LD_LIBRARY_PATH}
To use CMake Autofetch support:
$ git clone https://github.com/charmplusplus/reconverse.git
$ cd reconverse
$ mkdir build
$ cd build
$ cmake -DRECONVERSE_TRY_ENABLE_COMM_LCI2=ON -DRECONVERSE_AUTOFETCH_LCI2=ON -DLCI_NETWORK_BACKENDS=ofi ..
$ make
Note: NCSA has built-in PMI support. LCI will automatically detect and use it.
If you want to install LCI by yourself, here is an example build procedure on NCSA's Delta machine using the OFI layer:
$ git clone https://github.com/uiuc-hpc/lci.git --branch=lci2
$ cd lci
$ export CMAKE_INSTALL_PREFIX=/u/<username>/opt (or somewhere else you prefer)
$ export OFI_ROOT=/opt/cray/libfabric/1.15.2.0
$ cmake -DLCI_NETWORK_BACKENDS=ofi .
$ make install
$ cd ..
$ git clone https://github.com/charmplusplus/reconverse.git
$ cd reconverse
$ mkdir build && cd build
$ cmake ..
$ make
$ cd build/examples/pingpong
$ srun -n 2 ./reconverse_ping_ack +pe 4