version française rss feed
HAL : in2p3-00702588, version 1

Fiche détaillée  Récupérer au format
23rd International Symposium on Computer Architecture and High Performance Computing - SBAC-PAD 2011 - WAMCA 2011, Victoria : Brazil (2011)
Large scale kronecker product on supercomputers
Claude Tadonki1, 2

The Kronecker product, also called tensor product, is a fundamental matrix algebra operation, which is widely used as a natural formalism to express a convolution of many interactions or representations. Given a set of ma- trices, we need to multiply their Kronecker product by a vector. This operation is a critical kernel for iterative algo- rithms, thus needs to be computed efficiently. In a previous work, we have proposed a cost optimal parallel algorithm for the problem, both in terms of floating point computation time and interprocessor communication steps. However, the lower bound of data transfers can only be achieved if we really consider (local) logarithmic broadcasts. In practice, we consider a communication loop instead. Thus, it be- comes important to care about the real cost of each broad- cast. As this local broadcast is performed simultaneously by each processor, the situation is getting worse on a large number of processors (supercomputers). We address the problem in this paper in two points. In one hand, we pro- pose a way to build a virtual topology which has the lowest gap to the theoretical lower bound. In the other hand, we consider a hybrid implementation, which has the advantage of reducing the number of communicating nodes. We il- lustrate our work with some benchmarks on a large SMP 8-Core supercomputer.
1 :  LAL - Laboratoire de l'Accélérateur Linéaire
2 :  CRI - Centre de Recherche en Informatique
Informatique/Algorithme et structure de données
Liste des fichiers attachés à ce document : 
Tadonki.pdf(82.2 KB)