593 articles – 2606 references  [version française]
HAL: in2p3-00703341, version 1

Detailed view  Export this paper
MPI Support in the DIRAC Pilot Job Workload Management System, New-York : États-Unis (2012)
MPI support in the DIRAC Pilot Job Workload Management System
A. Tsaregorodtsev1, V. Hamar1
(2012-05-22)

Parallel job execution in the grid environment using MPI technology presents a number of challenges for the sites providing this support. Multiple flavors of the MPI libraries, shared working directories required by certain applications, special settings for the batch systems make the MPI support difficult for the site managers. On the other hand the workload management systems with pilot jobs became ubiquitous although the support for the MPI applications in the pilot frameworks was not available. This support was recently added in the DIRAC Project in the context of the GISELA Latin American Grid. Special services for dynamic allocation of virtual computer pools on the grid sites were developed in order to deploy MPI rings corresponding to the requirements of the jobs in the central task queue of the DIRAC Workload Management systems. The required MPI software is installed automatically by the pilot agents using user space file system techniques. The same technique is used to emulate shared working directories for the parallel MPI processes. This makes it possible to execute MPI jobs even on the sites not supporting them officially. Reusing so constructed MPI rings for execution of a series of parallel jobs increases dramatically their efficiency and turnaround.
1:  CPPM - Centre de Physique des Particules de Marseille
Computer Science/Databases

Computer Science/Operating Systems