I have a distributed program which communicates with ZeroMQ that runs on HPC clusters.
ZeroMQ uses TCP sockets, so by default on HPC clusters the communications will use the admin network, so I have introduced an environment variable read by my code to force communication on a particular network interface. With Infiniband (IB), usually it is ib0. But there are cases where another IB interface is used for the parallel file system, or on Cray systems the interface is ipogif, on some non-HPC systems it can be eth1, eno1, p4p2, em2, enp96s0f0, or whatever...
The problem is that I need to ask the administrator of the cluster the name of the network interface to use, while codes using MPI don't need to because MPI "knows" which network to use.
What is the most portable way to discover the name of the high-performance network interface on a linux HPC cluster? (I don't mind writing a small MPI program for this if there is no simple way)
There is no simple way and I doubt a complete solution exists. For example, Open MPI comes with an extensive set of ranked network communication modules and tries to instantiate all of them, selecting in the end the one that has the highest rank. The idea is that ranks somehow reflect the speed of the underlying network and that if a given network type is not present, its module will fail to instantiate, so faced with a system that has both Ethernet and InfiniBand, it will pick InfiniBand as its module has higher precedence. This is why larger Open MPI jobs start relatively slowly and is definitely not fool proof - in some cases one has to intervene and manually select the right modules, especially if the node has several network interfaces of InfiniBand HCAs and not all of them provide node-to-node connectivity. This is usually configured system-wide by the system administrator or the vendor and is why MPI "just works" (pro tip: in not-so-small number of cases it actually doesn't).
You may copy the approach taken by Open MPI and develop a set of detection modules for your program. For TCP, spawn two or more copies on different nodes, list their active network interfaces and the corresponding IP addresses, match the network addresses and bind on all interfaces on one node, then try to connect to it from the other node(s). Upon successful connection, run something like the TCP version of NetPIPE to measure the network speed and latency and pick the fastest network. Once you've gotten this information from the initial small set of nodes, it is very likely that the same interface is used on all other nodes too, since most HPC systems are as homogeneous as possible when it comes to their nodes' network configuration.
If there is a working MPI implementation installed, you can use it to launch the test program. You may also enable debug logging in the MPI library and parse the output, but this will require that the target system has an MPI implementation supported by your log parser. Also, most MPI libraries use native InfiniBand or whatever high-speed network API there is and will not tell you which is the IP-over-whatever interface, because they won't use it at all (unless configured otherwise by the system administrator).