## Communications in Applied Mathematics and Computational Science

### Achieving algorithmic resilience for temporal integration through spectral deferred corrections

#### Abstract

Spectral deferred corrections (SDC) is an iterative approach for constructing higher-order-accurate numerical approximations of ordinary differential equations. SDC starts with an initial approximation of the solution defined at a set of Gaussian or spectral collocation nodes over a time interval and uses an iterative application of lower-order time discretizations applied to a correction equation to improve the solution at these nodes. Each deferred correction sweep increases the formal order of accuracy of the method up to the limit inherent in the accuracy defined by the collocation points. In this paper, we demonstrate that SDC is well suited to recovering from soft (transient) hardware faults in the data. A strategy where extra correction iterations are used to recover from soft errors and provide algorithmic resilience is proposed. Specifically, in this approach the iteration is continued until the residual (a measure of the error in the approximation) is small relative to the residual of the first correction iteration and changes slowly between successive iterations. We demonstrate the effectiveness of this strategy for both canonical test problems and a comprehensive situation involving a mature scientific application code that solves the reacting Navier–Stokes equations for combustion research.

#### Article information

Source
Commun. Appl. Math. Comput. Sci., Volume 12, Number 1 (2017), 25-50.

Dates
Revised: 9 September 2016
Accepted: 18 January 2017
First available in Project Euclid: 19 October 2017

https://projecteuclid.org/euclid.camcos/1508432638

Digital Object Identifier
doi:10.2140/camcos.2017.12.25

Mathematical Reviews number (MathSciNet)
MR3652439

#### Citation

Grout, Ray; Kolla, Hemanth; Minion, Michael; Bell, John. Achieving algorithmic resilience for temporal integration through spectral deferred corrections. Commun. Appl. Math. Comput. Sci. 12 (2017), no. 1, 25--50. doi:10.2140/camcos.2017.12.25. https://projecteuclid.org/euclid.camcos/1508432638

#### References

• A. R. Benson, S. Schmit, and R. Schreiber, Silent error detection in numerical time-stepping schemes, Int. J. High Perform. C. 29 (2015), no. 4, 403–421.
• S. Borkar, Design challenges of technology scaling, IEEE Micro 19 (1999), no. 4, 23–29.
• A. Bourlioux, A. T. Layton, and M. L. Minion, High-order multi-implicit spectral deferred correction methods for problems of reactive flow, J. Comput. Phys. 189 (2003), no. 2, 651–675.
• P. G. Bridges, M. Hoemmen, K. B. Ferreira, M. A. Heroux, P. Soltero, and R. Brightwell, Cooperative application/OS DRAM fault recovery, Euro-Par 2011: parallel processing workshops (Bordeaux, 2011), vol. II, Lecture Notes in Computer Science, no. 7156, Springer, Berlin, 2012, pp. 241–250.
• D. L. Brown and M. L. Minion, Performance of under-resolved two-dimensional incompressible flow simulations, J. Comput. Phys. 122 (1995), no. 1, 165–183.
• J. H. Chen, A. Choudhary, B. de Supinski, M. DeVries, E. R. Hawkes, S. Klasky, W. K. Liao, K. L. Ma, J. Mellor-Crummey, N. Podhorszki, R. Sankaran, S. Shende, and C. S. Yoo, Terascale direct numerical simulations of turbulent combustion using S3D, Comput. Sci. Disc. 2 (2009), no. 1, 015001.
• S. Chen, G. Bronevetsky, M. Casas-Guix, and L. Peng, Comprehensive algorithmic resilience for numeric applications, technical note LLNL-CONF-618412, Lawrence Livermore National Laboratory, Livermore, CA, 2013.
• A. Christlieb, B. Ong, and J.-M. Qiu, Comments on high-order integrators embedded within integral deferred correction methods, Commun. Appl. Math. Comput. Sci. 4 (2009), 27–56.
• C. Constantinescu, Impact of deep submicron technology on dependability of VLSI circuits, International Conference on Dependable Systems and Networks (Washington, DC, 2002), IEEE, Los Alamitos, CA, 2002, pp. 205–209.
• V. Degalahal, R. Ramanarayanan, N. Vijaykrishnan, Y. Xie, and M. J. Irwin, The effect of threshold voltages on the soft error rate, 5th International Symposium on Quality Electronic Design (San Jose, 2004), IEEE, Los Alamitos, CA, 2004, pp. 503–508.
• D. A. Donzis and K. Aditya, Asynchronous finite-difference schemes for partial differential equations, J. Comput. Phys. 274 (2014), 370–392.
• A. Dutt, L. Greengard, and V. Rokhlin, Spectral deferred correction methods for ordinary differential equations, BIT 40 (2000), no. 2, 241–266.
• T. Echekki and J. H. Chen, Direct numerical simulation of autoignition in non-homogeneous hydrogen-air mixtures, Combust. Flame 134 (2003), no. 3, 169–191.
• J. Elliott, F. Mueller, M. Stoyanov, and C. Webster, Quantifying the impact of single bit flips on floating point arithmetic, technical note ORNL/TM-2013/282, Oak Ridge National Laboratory, Oak Ridge, TN, 2013.
• M. Emmett and M. L. Minion, Toward an efficient parallel in time method for partial differential equations, Commun. Appl. Math. Comput. Sci. 7 (2012), no. 1, 105–132.
• B. Fang, K. Pattabiraman, M. Ripeanu, and S. Gurumurthi, Evaluating the error resilience of parallel programs, 44th annual IEEE/IFIP International Conference on Dependable Systems and Networks (Atlanta, 2014), IEEE, Los Alamitos, CA, 2014, pp. 720–725.
• R. W. Grout, A. Gruber, H. Kolla, P.-T. Bremer, J. C. Bennett, A. Gyulassy, and J. H. Chen, A direct numerical simulation study of turbulence and flame structure in transverse jets analysed in jet-trajectory based coordinates, J. Fluid Mech. 706 (2012), 351–383.
• R. W. Grout, A. Gruber, C. S. Yoo, and J. H. Chen, Direct numerical simulation of flame stabilization downstream of a transverse fuel jet in cross-flow, P. Combust. Inst. 33 (2011), no. 1, 1629–1637.
• A. Gruber, R. Sankaran, E. R. Hawkes, and J. H. Chen, Turbulent flame–wall interaction: a direct numerical simulation study, J. Fluid Mech. 658 (2010), 5–32.
• E. Hairer and G. Wanner, Solving ordinary differential equations, II: stiff and differential-algebraic problems, 2nd ed., Springer Series in Computational Mathematics, no. 14, Springer, 1996.
• E. R. Hawkes and J. H. Chen, Evaluation of models for flame stretch due to curvature in the thin reaction zones regime, P. Combust. Inst. 30 (2005), no. 1, 647–655.
• E. R. Hawkes, R. Sankaran, J. C. Sutherland, and J. H. Chen, Scalar mixing in direct numerical simulations of temporally evolving plane jet flames with skeletal CO/H$_2$ kinetics, P. Combust. Inst. 31 (2007), no. 1, 1633–1640.
• M. A. Heroux, Scalable computing challenges: an overview, presentation at 2009 SIAM Annual Meeting, Sandia National Laboratories, Livermore, CA, 2009.
• A. A. Hwang, I. A. Stefanovici, and B. Schroeder, Cosmic rays don't strike twice: understanding the nature of DRAM errors and the implications for system design, Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems (London, 2012), ACM, New York, 2012, pp. 111–122.
• R. J. Kee, F. M. Rupley, E. Meeks, and J. A. Miller, CHEMKIN-III: A FORTRAN chemical kinetics package for the analysis of gas-phase chemical and plasma kinetics, technical note SAND96-8216, Sandia National Laboratories, Livermore, CA, 1996.
• C. A. Kennedy, M. H. Carpenter, and R. M. Lewis, Low-storage, explicit Runge–Kutta schemes for the compressible Navier–Stokes equations, Appl. Numer. Math. 35 (2000), no. 3, 177–219.
• A. T. Layton and M. L. Minion, Conservative multi-implicit spectral deferred correction methods for reacting gas dynamics, J. Comput. Phys. 194 (2004), no. 2, 697–715.
• ––––, Implications of the choice of quadrature nodes for Picard integral deferred corrections methods for ordinary differential equations, BIT 45 (2005), no. 2, 341–373.
• J. Li, Z. Zhao, A. Kazakov, and F. L. Dryer, An updated comprehensive kinetic model of hydrogen combustion, Int. J. Chem. Kinet. 36 (2004), no. 10, 566–575.
• J. Mayo, R. Armstrong, and J. Ray, Efficient, broadly applicable silent-error tolerance for extreme-scale resilience, technical note SAND2012-8131, Sandia National Laboratories, Livermore, CA, 2012.
• M. L. Minion, Semi-implicit spectral deferred correction methods for ordinary differential equations, Commun. Math. Sci. 1 (2003), no. 3, 471–500.
• A. Nonaka, J. B. Bell, M. S. Day, C. Gilet, A. S. Almgren, and M. L. Minion, A deferred correction coupling strategy for low Mach number flow with complex chemistry, Combust. Theor. Model. 16 (2012), no. 6, 1053–1088.
• R. Sankaran, E. R. Hawkes, J. H. Chen, T. Lu, and C. K. Law, Structure of a spatially developing turbulent lean methane–air Bunsen flame, P. Combust. Inst. 31 (2007), no. 1, 1291–1298.
• R. Sankaran, H. G. Im, E. R. Hawkes, and J. H. Chen, The effects of non-uniform temperature distribution on the ignition of a lean homogeneous hydrogen–air mixture, P. Combust. Inst. 30 (2005), no. 1, 875–882.
• B. Schroeder, E. Pinheiro, and W.-D. Weber, DRAM errors in the wild: a large-scale field study, Commun. ACM 54 (2011), no. 2, 100–107.
• J. Sloan, R. Kumar, and G. Bronevetsky, An algorithmic approach to error localization and partial recomputation for low-overhead fault tolerance, 43rd annual IEEE/IFIP International Conference on Dependable Systems and Networks (Budapest, 2013), IEEE, Los Alamitos, CA, 2013.
• K. Spafford, J. Meredith, J. Vetter, J. Chen, R. Grout, and R. Sankaran, Accelerating S3D: a GPGPU case study, Euro-Par 2009: parallel processing workshops (Delft, Netherlands, 2009), Lecture Notes in Computer Science, no. 6043, Springer, Berlin, 2010, pp. 122–131.
• V. Sridharan and D. Liberty, A study of DRAM failures in the field, SC '12: International Conference on High Performance Computing, Networking, Storage and Analysis (Salt Lake City, 2012), IEEE, Los Alamitos, CA, 2012.
• V. Sridharan, J. Stearley, N. DeBardeleben, S. Blanchard, and S. Gurumurthi, Feng Shui of supercomputer memory positional effects in DRAM and SRAM faults, SC '13: International Conference on High Performance Computing, Networking, Storage and Analysis (Denver, 2013), IEEE, Los Alamitos, CA, 2013.
• M. Stoyanov and C. Webster, Numerical analysis of fixed point algorithms in the presence of hardware faults, SIAM J. Sci. Comput. 37 (2015), no. 5, C532–C553.
• J. Wei, A. Thomas, G. Li, and K. Pattabiraman, Quantifying the accuracy of high-level fault injection techniques for hardware faults, 44th annual IEEE/IFIP International Conference on Dependable Systems and Networks (Atlanta, 2014), IEEE, Los Alamitos, CA, 2014, pp. 375–382.
• C. S. Yoo, R. Sankaran, and J. H. Chen, Three-dimensional direct numerical simulation of a turbulent lifted hydrogen jet flame in heated coflow: flame stabilization and structure, J. Fluid Mech. 640 (2009), 453–481.