1. [Publications](/publications)
2. Evaluating and Accelerating High-Fidelity Error Injection for HPC
 
 # Evaluating and Accelerating High-Fidelity Error Injection for HPC

  ![](/sites/default/files/styles/wide/public/publications/hamartia_sc.JPG?itok=ug4f0ihI)

 We address two important concerns in the analysis of the behavior of applications in the presence of hardware errors: (1) when is it important to model how hardware faults lead to erroneous values (instruction-level errors) with high fidelity, as opposed to using simple bit-flipping models, and (2) how to enable fast high-fidelity error injection campaigns, in particular when error detectors are employed. We present and verify a new nested Monte Carlo methodology for evaluating high-fidelity gate-level fault models and error-detector coverage, which is orders of magnitude faster than current approaches. We use that methodology to demonstrate that, without detectors, simple error models suffice for evaluating errors in 9 HPC benchmarks.



 ## Authors



Chun-Kai Chang (The University of Texas at Austin)

Sangkug Lym (The University of Texas at Austin)

Nicholas Kelly (The University of Texas at Austin)

[Michael B. Sullivan](/person/mike-sullivan)

Mattan Erez (The University of Texas at Austin)

 

 

 ## Publication Date



Sunday, November 11, 2018

 

 ## Published in



[The International Conference on High Performance Computing, Networking, Storage…](https://ieeexplore.ieee.org/abstract/document/8665790)

 

 ## Research Area



[High Performance Computing](/research-area/high-performance-computing)

[Resilience and Safety](/research-area/resilience)

 

 

 ## External Links



[IEEE Digital Library](https://ieeexplore.ieee.org/abstract/document/8665790)

 

 

 ## Uploaded Files



[Published manuscript](https://d1qx31qr3h6wln.cloudfront.net/publications/SC_2018_Hamartia.pdf "Open file in new window")2.35 MB

 

 

 ## Copyright



This material is posted here with permission of the IEEE. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by writing to <pubs-permissions@ieee.org>.