By the link is written: https://docs.nvidia.com/deploy/pdf/CUDA_Multi_Process_Service_Overview.pdf
1.1. AT A GLANCE
1.1.1. MPS
The Multi-Process Service (MPS) is an alternative, binary-compatible implementation of the CUDA Application Programming Interface (API). The MPS runtime architecture is designed to transparently enable co-operative multi-process CUDA applications, typically MPI jobs, to utilize Hyper-Q capabilities on the latest NVIDIA (Kepler-based) Tesla and Quadro GPUs. Hyper-Q allows CUDA kernels to be processed concurrently on the same GPU; this can benefit performance when the GPU compute capacity is underutilized by a single application process.
Do I have to use the MPS (MULTI-PROCESS SERVICE) when using CUDA6.5 + MPI (OpenMPI / IntelMPI), or can I not use MPS with lost some performance but without any errors?
If I will not use MPS, does it mean that all my MPI-processes on a single server will execute their GPU-kernel-functions sequentially (not concurrent) on a single GPU-card, but all other behavior will stay the same?
MPS is not required to use MPI
If you don't use MPS, but you launch multiple MPI ranks per node (i.e. per GPU), then if you have the compute mode set to default, then your GPU activity will serialize. If you have your compute mode set to EXCLUSIVE_PROCESS or EXCLUSIVE_THREAD, you'll get errors when multiple MPI ranks attempt to use a single GPU.
CUDA MPS documentation is available here.