s'authentifier
version française rss feed
HAL : in2p3-00657498, version 1

Fiche concise  Récupérer au format
Parallelizing ATLAS Reconstruction and Simulation: Issues and Optimization Solutions for Scaling on Multi- and Many-CPU Platforms
Leggett C., Binet S., Jackson K., Levinthal D., Tatarkhanov M. et al
Dans Journal of Physics: Conference Series - Conference on Computing in High Energy and Nuclear Physics 2010 (CHEP 2010), Taipei : Taïwan (2010) - http://hal.in2p3.fr/in2p3-00657498
Informatique/Modélisation et simulation
Informatique/Performance et fiabilité
Parallelizing ATLAS Reconstruction and Simulation: Issues and Optimization Solutions for Scaling on Multi- and Many-CPU Platforms
C. Leggett1, S. Binet2, K. Jackson1, D. Levinthal, M. Tatarkhanov1, Y. Yao1
1 :  LBNL - Lawrence Berkeley National Laboratory
http://www.lbl.gov
Lawrence Berkeley National Lab
Lawrence Berkeley National Lab MS 50F Cyclotron Rd., Berkeley, CA 94720
États-Unis
2 :  LAL - Laboratoire de l'Accélérateur Linéaire
http://www.lal.in2p3.fr/
CNRS : UMR8607 – IN2P3 – Université Paris XI - Paris Sud
Centre Scientifique d'Orsay B.P. 34 91898 ORSAY Cedex
France
Thermal limitations have forced CPU manufacturers to shift from simply increasing clock speeds to improve processor performance, to producing chip designs with multi- and many-core architectures. Further the cores themselves can run multiple threads as a zero overhead context switch allowing low level resource sharing (Intel Hyperthreading). To maximize bandwidth and minimize memory latency, memory access has become non uniform (NUMA). As manufacturers add more cores to each chip, a careful understanding of the underlying architecture is required in order to fully utilize the available resources. We present AthenaMP and the Atlas event loop manager, the driver of the simulation and reconstruction engines, which have been rewritten to make use of multiple cores, by means of event based parallelism, and final stage I/O synchronization. However, initial studies on 8 andl6 core Intel architectures have shown marked non-linearities as parallel process counts increase, with as much as 30% reductions in event throughput in some scenarios. Since the Intel Nehalem architecture (both Gainestown and Westmere) will be the most common choice for the next round of hardware procurements, an understanding of these scaling issues is essential. Using hardware based event counters and Intel's Performance Tuning Utility, we have studied the performance bottlenecks at the hardware level, and discovered optimization schemes to maximize processor throughput. We have also produced optimization mechanisms, common to all large experiments, that address the extreme nature of today's HEP code, which due to it's size, places huge burdens on the memory infrastructure of today's processors.
ATLAS

Communications avec actes
2011
10/2010
Journal of Physics: Conference Series
internationale
331
042015
IOP Publishing

Conference on Computing in High Energy and Nuclear Physics 2010 (CHEP 2010)
Taipei
Taïwan
18/10/2010
22/10/2010

LAL 10-299