1. [Publications](/index.php/publications)
2. Low Communication FMM-Accelerated FFT on GPUs
 
 # Low Communication FMM-Accelerated FFT on GPUs

  ![](/sites/default/files/styles/wide/public/publications/Screen%20Shot%202018-01-05%20at%2010.09.38%20AM.png?itok=BC6Z15eY)

 Communication-avoiding algorithms have been a subject of growing interest in the last decade due to the growth of distributed memory systems and the disproportionate increase of computational throughput to communication bandwidth. For distributed 1D FFTs, communication costs quickly dominate execution time as all industry-standard implementations perform three all-to-all transpositions of the data. In this work, we reformulate an existing algorithm that employs the Fast Multipole Method to reduce the communication requirements to approximately a single all-to-all transpose. We present a detailed and clear implementation strategy that relies heavily on existing library primitives, demonstrate that our strategy achieves consistent speed-ups between 1.3x and 2.2x against cuFFTXT on up to eight NVIDIA Tesla P100 GPUs, and develop an accurate compute model to analyze the performance and dependencies of the algorithm.



 ## Authors



[Cris Cecka](/index.php/person/cris-cecka)

 

 

 ## Publication Date



Sunday, November 12, 2017

 

 ## Published in



[The International Conference for High Performance Computing, Networking, Storag…](https://sc17.supercomputing.org/)

 

 ## Research Area



[Algorithms and Numerical Methods](/index.php/research-area/algorithms)

[High Performance Computing](/index.php/research-area/high-performance-computing)

 

 

 ## External Links



[GTC 2017 Slides](http://on-demand.gputechconf.com/gtc/2017/presentation/s7418-cecka-low-communication-fft-fast-multipole-method.pdf)

[GTC 2017 Talk](http://on-demand.gputechconf.com/gtc/2017/video/s7418-cecka-low-communcation-fast-multipole-method.mp4)

 

 

 ## Uploaded Files



[fmmfft\_sc17.pdf](https://research.nvidia.com/sites/default/files/pubs/2017-10_Low-Communication-FMM-Accelerated//fmmfft_sc17.pdf "Open file in new window")2.18 MB

 

 

 ## Copyright



Copyright by the Association for Computing Machinery, Inc. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept, ACM Inc., fax +1 (212) 869-0481, or <permissions@acm.org>. The definitive version of this paper can be found at ACM's Digital Library <http://www.acm.org/dl/>.