CEA, March 2024

Francesco Rizzi

(NexGen Analytics)

Working at the intersection of projection-based reduced-order models, UQ, and HPC.

A tale of how to make virtue out of necessity.

nexgenanalytics.github.io/cea-seminar-march-2024/

nexgenanalytics.github.io/cea-seminar-march-2024/

First things first

The code used in this work
- Developed from scratch using C++14
- Relies on kokkos-core and kokkos-kernels
  - kokkos-kernels: BLAS kernels: dense and sparse, KokkosSparse::CrsMatrix
  - kokkos-core: parallel_* and View

Open-source:

```
github.com/Pressio/SHAW
```
was accepted to the ECP proxy apps catalog
```
proxyapps.exascaleproject.org/app/shaw
```

Many-query problems need surrogate models
- e.g. uncertainty quantification (UQ), design optimization
Projection-based reduced-order models (pROMs):
- Project the governing equations onto a subspace
- Explainable (physics), error bounds, full field predictions
Historically relying on linear subspaces (e.g. POD):
- POD: fast to compute, few knobs to tune
- Advection/hyperbolic problems need many linear modes
  - pROMs community sees this as a "limitation/bottleneck"
Can we make virtue out of necessity?

In a nutshell

Every tale presentation needs a structure

Freytag's pyramid: introduced in 1863 by Gustav Freytag

E.g. Hänsel e Gretel - Grimm, 1812

Introduction

Initial event

Rising action

Climax

Falling action

Resolution

denouement

Introduction

Initial event

Rising action

Climax

Falling action

Resolution

denouement

Parametrized models

\mathcal{M}({\bf x}, t, ..., \mu_1, \mu_2, ..., \mu_n)

Ubiquitous in science and engineering

Parameters, their uncertainties and correlations are critical

design/failure: high-consequence systems
understanding and prediction: climate

{\bf \mu} = \{\mu_1, \mu_2, ..., \mu_n \}

are parameters (material props, geometry, BCs)

Credit: UT Austin (link)

Source: Web (link)

A wide range of scenarios

Source: NASA (link)

Source: myself

Parameter count is (generally) a good indicator of complexity

Source: NVIDIA (link)

< 10

10 - 100

>> 100

Uncertainty Quantification (UQ):

characterize/quantify uncertainties in a system of interest
forward problem:
- How likely is a scenario/behavior given some conditions
- e.g. Monte Carlo, stochastic spectral methods, etc.
inverse problem:
- Given observations, what are the operating conditions?
- e.g. Bayesian inference, optimization

Can we just take any problem and apply UQ?
It depends! How many parameters is too many for UQ?

Any guess?

Making sense of parameters

# of parameters	Total runs	Total simulation time
2	25	~4 mins
3	125	~20 mins
5	3,125	~9 hours
7	78,125	~9 days
9	1,953,125	~32 weeks
11	48,828,125	~15 years
15	30,517,578,125	~9,000 years
20	95,367,431,640,625	~3e7 years
...	...	...

Assume: 1 simulation = 10 secs, 2 parameters, 5 pts along each axis

Possible counter arguments:

"Use sparse grids": mitigates but does not solve the problem
"Do multiple runs in parallel": what if you can only run one at a time?

\mu_1

\mu_2

UQ needs model reduction/surrogates

Uses the full-order model (FOM) as a black box
Explicit mapping from parameters to response of interest
- E.g: polynomials, GP, Machine Learning
Computationally efficient
Disregard governing physics (sometimes physics unknown)
Current approaches rely on a lot of data
Not robust to extrapolation

Surrogates: Data Fits

Simplify the FOM model
- Examples: coarsening the mesh, lower finite-element order,
  or neglecting physics
Enables reuse of existing workflows
By construction it neglects information
- appropriate in some settings

Surrogates: Reduced Physics

Project a given FOM/system on a subspace
- Retains the full physics: uses the same equations
- Known dynamics rather than learning unknown dynamics
- Subspace is typically constructed from FOM data (snapshots)
Enables stronger guarantees (e.g. structure) and error analysis
Enables full-field predictions
It is more intrusive due to the projection process

Surrogates: pROMs

A brief look at the past

A large linear subspace has always been seen as a drawback
Quest to: "find the smallest subspace"
- mathematically intriguing, hard, practical?

Can we break the trend and make virtue out of using a "large" linear subspaces?

1901

1987

PCA invented

(Karl Pearson)

Method of POD snapshots

(L. Sirovich)

Manifold learning

starts ...

~2017

Introduction

Initial event

Rising action

Climax

Falling action

Resolution

denouement

Real problem: seismic waves

Surface waves: travel at the Earth's surface
Body waves: travel through the Earth
- Affected by the material properties (density, modulus)
- Primary (P-waves) are compressional, secondary or shear (S-waves) are transversal (particles oscillate perpendicularly to the direction of wave propagation)
Very limited model reduction for this
- Likely neglected by pROM community since it is hyperbolic

Formulation

Approximate the Earth (or generic planet) as a sphere
Simplify via axisymmetric approximation

Surface

\partial / \partial \phi = 0

Elastic shear waves

Velocity-stress axisymmetric formulation*:

\newcommand{\spherCoords}{r, \theta, \phi} \newcommand{\axiCoords}{r, \theta} \newcommand{\coords}{r, \theta} \newcommand{\depVars}{t, \coords} \newcommand{\dens}{\rho} \newcommand{\shMod}{G} \newcommand{\stresses}{\sigma} \newcommand{\vp}{v} \newcommand{\dvpdt}{ \frac{\partial \vp}{\partial t} } \newcommand{\dvpdr}{ \frac{\partial \vp}{\partial r} } \newcommand{\dvpdtheta}{ \frac{\partial \vp}{\partial \theta} } \newcommand{\spp}{\sigma_{\phi\phi}} \newcommand{\dsppdp}{ \frac{\partial \spp}{\partial \phi} } \newcommand{\srp}{\sigma_{r\phi}} \newcommand{\dsrpdt}{ \frac{\partial \srp}{\partial t} } \newcommand{\dsrpdr}{ \frac{\partial \srp}{\partial r} } \newcommand{\stp}{\sigma_{\theta\phi}} \newcommand{\dstpdt}{ \frac{\partial \stp}{\partial t} } \newcommand{\dstpdtheta}{ \frac{\partial \stp}{\partial \theta} } \newcommand{\force}{f} \small{ \dens \dvpdt = \dsrpdr + \frac{1}{r} \dstpdtheta + \frac{1}{r} \left(3 \srp + 2 \stp \cot{\theta} \right) + \force }

\newcommand{\spherCoords}{r, \theta, \phi} \newcommand{\axiCoords}{r, \theta} \newcommand{\coords}{r, \theta} \newcommand{\depVars}{t, \coords} \newcommand{\dens}{\rho} \newcommand{\shMod}{G} \newcommand{\stresses}{\sigma} \newcommand{\vp}{v} \newcommand{\dvpdt}{ \frac{\partial \vp}{\partial t} } \newcommand{\dvpdr}{ \frac{\partial \vp}{\partial r} } \newcommand{\dvpdtheta}{ \frac{\partial \vp}{\partial \theta} } \newcommand{\spp}{\sigma_{\phi\phi}} \newcommand{\dsppdp}{ \frac{\partial \spp}{\partial \phi} } \newcommand{\srp}{\sigma_{r\phi}} \newcommand{\dsrpdt}{ \frac{\partial \srp}{\partial t} } \newcommand{\dsrpdr}{ \frac{\partial \srp}{\partial r} } \newcommand{\stp}{\sigma_{\theta\phi}} \newcommand{\dstpdt}{ \frac{\partial \stp}{\partial t} } \newcommand{\dstpdtheta}{ \frac{\partial \stp}{\partial \theta} } \newcommand{\force}{f} \small{ \dstpdt = \shMod \left( \frac{1}{r}\dvpdtheta - \frac{\cot{\theta}}{r}\vp \right) }

\newcommand{\spherCoords}{r, \theta, \phi} \newcommand{\axiCoords}{r, \theta} \newcommand{\coords}{r, \theta} \newcommand{\depVars}{t, \coords} \newcommand{\dens}{\rho} \newcommand{\shMod}{G} \newcommand{\stresses}{\sigma} \newcommand{\vp}{v} \newcommand{\dvpdt}{ \frac{\partial \vp}{\partial t} } \newcommand{\dvpdr}{ \frac{\partial \vp}{\partial r} } \newcommand{\dvpdtheta}{ \frac{\partial \vp}{\partial \theta} } \newcommand{\spp}{\sigma_{\phi\phi}} \newcommand{\dsppdp}{ \frac{\partial \spp}{\partial \phi} } \newcommand{\srp}{\sigma_{r\phi}} \newcommand{\dsrpdt}{ \frac{\partial \srp}{\partial t} } \newcommand{\dsrpdr}{ \frac{\partial \srp}{\partial r} } \newcommand{\stp}{\sigma_{\theta\phi}} \newcommand{\dstpdt}{ \frac{\partial \stp}{\partial t} } \newcommand{\dstpdtheta}{ \frac{\partial \stp}{\partial \theta} } \newcommand{\force}{f} \small{ \dsrpdt = \shMod \left( \dvpdr - \frac{1}{r}\vp \right) }

Shear effects negligible in liquid, the core is not considered

Surface

Core-mantle

Given:

Find:

* H. Igel, M. Weber, Geophys. Res. Lett. 22 (6) (1995)

Semi-discrete FOM system

%% FOM \newcommand{\state}{\boldsymbol y} \newcommand{\systemMat}{\mathbf{A}} \newcommand{\forcing}{\boldsymbol f} \newcommand{\fomDim}{N} \newcommand{\tfinal}{t_{final}} \newcommand{\paramsA}{\boldsymbol \eta} \newcommand{\paramsF}{\boldsymbol \mu} \newcommand{\paramDomainA}{\mathcal D_{\eta}} \newcommand{\paramDomainF}{\mathcal D_{\mu}} \newcommand{\nParamsA}{N_{\paramsA}} \newcommand{\nParamsF}{N_{\paramsF}} \newcommand{\paramsMat}{ \mathcal{M} } \newcommand{\stateRef}{\state_\text{ref}(\paramsA, \paramsF)} \newcommand{\stateRefNoPars}{\state_\text{ref}} %% ROM \newcommand{\romBasis}{\boldsymbol{\Phi}} \newcommand{\romState}{\hat{\state}} \newcommand{\romStateTensor}{\hat{\stateTensor}} \newcommand{\stateTensorRef}{{\stateTensor}_{\text{ref}}} \newcommand{\stateTensorRefWithPars}{{\stateTensor}_{\text{ref}}(\paramsA, \paramsMat)} \newcommand{\stateTensor}{\boldsymbol X} \newcommand{\approxStateTensor}{\tilde{\boldsymbol X}} \newcommand{\romSystemMat}{\hat{\systemMat}} \newcommand{\romDim}{K} \newcommand{\nRuns}{M} \newcommand{\forcingTensor}{\boldsymbol F} \newcommand{\approxState}{\tilde{\state}} %\newcommand{\stateRef}{{\state}_{\text{ref}}} \newcommand{\romBasisCol}{\boldsymbol \phi} \newcommand{\trialSubspace}{\mathcal{V}} \newcommand{\spherCoords}{r, \theta, \phi} \newcommand{\axiCoords}{r, \theta} \newcommand{\coords}{r, \theta} \newcommand{\depVars}{t, \coords} \newcommand{\dens}{\rho} \newcommand{\shMod}{G} \newcommand{\stresses}{\sigma} \newcommand{\vp}{v} \newcommand{\dvpdt}{ \frac{\partial \vp}{\partial t} } \newcommand{\dvpdr}{ \frac{\partial \vp}{\partial r} } \newcommand{\dvpdtheta}{ \frac{\partial \vp}{\partial \theta} } \newcommand{\spp}{\sigma_{\phi\phi}} \newcommand{\dsppdp}{ \frac{\partial \spp}{\partial \phi} } \newcommand{\srp}{\sigma_{r\phi}} \newcommand{\dsrpdt}{ \frac{\partial \srp}{\partial t} } \newcommand{\dsrpdr}{ \frac{\partial \srp}{\partial r} } \newcommand{\stp}{\sigma_{\theta\phi}} \newcommand{\dstpdt}{ \frac{\partial \stp}{\partial t} } \newcommand{\dstpdtheta}{ \frac{\partial \stp}{\partial \theta} } \newcommand{\force}{f} \frac{d \textcolor{red}{\state_{\vp}(t)}}{dt} = \systemMat_{\vp} \textcolor{gray}{\state_{\stresses}(t)} + \forcing_{\vp}(t)

%% FOM \newcommand{\state}{\boldsymbol y} \newcommand{\systemMat}{\mathbf{A}} \newcommand{\forcing}{\boldsymbol f} \newcommand{\fomDim}{N} \newcommand{\tfinal}{t_{final}} \newcommand{\paramsA}{\boldsymbol \eta} \newcommand{\paramsF}{\boldsymbol \mu} \newcommand{\paramDomainA}{\mathcal D_{\eta}} \newcommand{\paramDomainF}{\mathcal D_{\mu}} \newcommand{\nParamsA}{N_{\paramsA}} \newcommand{\nParamsF}{N_{\paramsF}} \newcommand{\paramsMat}{ \mathcal{M} } \newcommand{\stateRef}{\state_\text{ref}(\paramsA, \paramsF)} \newcommand{\stateRefNoPars}{\state_\text{ref}} %% ROM \newcommand{\romBasis}{\boldsymbol{\Phi}} \newcommand{\romState}{\hat{\state}} \newcommand{\romStateTensor}{\hat{\stateTensor}} \newcommand{\stateTensorRef}{{\stateTensor}_{\text{ref}}} \newcommand{\stateTensorRefWithPars}{{\stateTensor}_{\text{ref}}(\paramsA, \paramsMat)} \newcommand{\stateTensor}{\boldsymbol X} \newcommand{\approxStateTensor}{\tilde{\boldsymbol X}} \newcommand{\romSystemMat}{\hat{\systemMat}} \newcommand{\romDim}{K} \newcommand{\nRuns}{M} \newcommand{\forcingTensor}{\boldsymbol F} \newcommand{\approxState}{\tilde{\state}} %\newcommand{\stateRef}{{\state}_{\text{ref}}} \newcommand{\romBasisCol}{\boldsymbol \phi} \newcommand{\trialSubspace}{\mathcal{V}} \newcommand{\spherCoords}{r, \theta, \phi} \newcommand{\axiCoords}{r, \theta} \newcommand{\coords}{r, \theta} \newcommand{\depVars}{t, \coords} \newcommand{\dens}{\rho} \newcommand{\shMod}{G} \newcommand{\stresses}{\sigma} \newcommand{\vp}{v} \newcommand{\dvpdt}{ \frac{\partial \vp}{\partial t} } \newcommand{\dvpdr}{ \frac{\partial \vp}{\partial r} } \newcommand{\dvpdtheta}{ \frac{\partial \vp}{\partial \theta} } \newcommand{\spp}{\sigma_{\phi\phi}} \newcommand{\dsppdp}{ \frac{\partial \spp}{\partial \phi} } \newcommand{\srp}{\sigma_{r\phi}} \newcommand{\dsrpdt}{ \frac{\partial \srp}{\partial t} } \newcommand{\dsrpdr}{ \frac{\partial \srp}{\partial r} } \newcommand{\stp}{\sigma_{\theta\phi}} \newcommand{\dstpdt}{ \frac{\partial \stp}{\partial t} } \newcommand{\dstpdtheta}{ \frac{\partial \stp}{\partial \theta} } \newcommand{\force}{f} \frac{d \textcolor{gray}{\state_{\stresses}(t)}}{dt} = \systemMat_{\stresses} \textcolor{red}{\state_{\vp}(t)}

Staggered grid discretization, 2nd-order centered differences
Coupled system of velocities and stresses:

Sparse large coefficient matrices

(depend on material properties, not on time)

BC: reflective on surface and core-mantle boundary, zero stress on symmetry axes
This is a linear time-invariant (LTI) system

Open-source, using Kokkos: proxyapps.exascaleproject.org/app/shaw

\frac{d}{dt}

Semi-discrete FOM system

Cartoon, not real sparsity pattern!

LTI, but NOT simple dynamics

Contour plots of velocity field: Ricker wavelet source T = 60 sec, depth = 640 Km

Interference

Reflection

Refraction (from discontinuities)

PREM Earth model

time = 250 sec

time = 1000 sec

time = 2000 sec

Time-evolution of the

velocity field contours

\frac{d}{dt}

pROM formulation

FOM

ROM

%% FOM \newcommand{\state}{\boldsymbol y} \newcommand{\systemMat}{\mathbf{A}} \newcommand{\forcing}{\boldsymbol f} \newcommand{\fomDim}{N} \newcommand{\tfinal}{t_{final}} \newcommand{\paramsA}{\boldsymbol \eta} \newcommand{\paramsF}{\boldsymbol \mu} \newcommand{\paramDomainA}{\mathcal D_{\eta}} \newcommand{\paramDomainF}{\mathcal D_{\mu}} \newcommand{\nParamsA}{N_{\paramsA}} \newcommand{\nParamsF}{N_{\paramsF}} \newcommand{\paramsMat}{ \mathcal{M} } \newcommand{\stateRef}{\state_\text{ref}(\paramsA, \paramsF)} \newcommand{\stateRefNoPars}{\state_\text{ref}} %% ROM \newcommand{\romBasis}{\boldsymbol{\Phi}} \newcommand{\romState}{\hat{\state}} \newcommand{\romStateTensor}{\hat{\stateTensor}} \newcommand{\stateTensorRef}{{\stateTensor}_{\text{ref}}} \newcommand{\stateTensorRefWithPars}{{\stateTensor}_{\text{ref}}(\paramsA, \paramsMat)} \newcommand{\stateTensor}{\boldsymbol X} \newcommand{\approxStateTensor}{\tilde{\boldsymbol X}} \newcommand{\romSystemMat}{\hat{\systemMat}} \newcommand{\romDim}{K} \newcommand{\nRuns}{M} \newcommand{\forcingTensor}{\boldsymbol F} \newcommand{\approxState}{\tilde{\state}} %\newcommand{\stateRef}{{\state}_{\text{ref}}} \newcommand{\romBasisCol}{\boldsymbol \phi} \newcommand{\trialSubspace}{\mathcal{V}} \newcommand{\spherCoords}{r, \theta, \phi} \newcommand{\axiCoords}{r, \theta} \newcommand{\coords}{r, \theta} \newcommand{\depVars}{t, \coords} \newcommand{\dens}{\rho} \newcommand{\shMod}{G} \newcommand{\stresses}{\sigma} \newcommand{\vp}{v} \newcommand{\dvpdt}{ \frac{\partial \vp}{\partial t} } \newcommand{\dvpdr}{ \frac{\partial \vp}{\partial r} } \newcommand{\dvpdtheta}{ \frac{\partial \vp}{\partial \theta} } \newcommand{\spp}{\sigma_{\phi\phi}} \newcommand{\dsppdp}{ \frac{\partial \spp}{\partial \phi} } \newcommand{\srp}{\sigma_{r\phi}} \newcommand{\dsrpdt}{ \frac{\partial \srp}{\partial t} } \newcommand{\dsrpdr}{ \frac{\partial \srp}{\partial r} } \newcommand{\stp}{\sigma_{\theta\phi}} \newcommand{\dstpdt}{ \frac{\partial \stp}{\partial t} } \newcommand{\dstpdtheta}{ \frac{\partial \stp}{\partial \theta} } \newcommand{\force}{f} \frac{d \romState_{\vp}(t)}{dt} = \romBasis_{\vp}^T \systemMat_{\vp} \romBasis_{\stresses} \romState_{\stresses}(t) + \romBasis_{\vp}^T \forcing_{\vp}(t)

%% FOM \newcommand{\state}{\boldsymbol y} \newcommand{\systemMat}{\mathbf{A}} \newcommand{\forcing}{\boldsymbol f} \newcommand{\fomDim}{N} \newcommand{\tfinal}{t_{final}} \newcommand{\paramsA}{\boldsymbol \eta} \newcommand{\paramsF}{\boldsymbol \mu} \newcommand{\paramDomainA}{\mathcal D_{\eta}} \newcommand{\paramDomainF}{\mathcal D_{\mu}} \newcommand{\nParamsA}{N_{\paramsA}} \newcommand{\nParamsF}{N_{\paramsF}} \newcommand{\paramsMat}{ \mathcal{M} } \newcommand{\stateRef}{\state_\text{ref}(\paramsA, \paramsF)} \newcommand{\stateRefNoPars}{\state_\text{ref}} %% ROM \newcommand{\romBasis}{\boldsymbol{\Phi}} \newcommand{\romState}{\hat{\state}} \newcommand{\romStateTensor}{\hat{\stateTensor}} \newcommand{\stateTensorRef}{{\stateTensor}_{\text{ref}}} \newcommand{\stateTensorRefWithPars}{{\stateTensor}_{\text{ref}}(\paramsA, \paramsMat)} \newcommand{\stateTensor}{\boldsymbol X} \newcommand{\approxStateTensor}{\tilde{\boldsymbol X}} \newcommand{\romSystemMat}{\hat{\systemMat}} \newcommand{\romDim}{K} \newcommand{\nRuns}{M} \newcommand{\forcingTensor}{\boldsymbol F} \newcommand{\approxState}{\tilde{\state}} %\newcommand{\stateRef}{{\state}_{\text{ref}}} \newcommand{\romBasisCol}{\boldsymbol \phi} \newcommand{\trialSubspace}{\mathcal{V}} \newcommand{\spherCoords}{r, \theta, \phi} \newcommand{\axiCoords}{r, \theta} \newcommand{\coords}{r, \theta} \newcommand{\depVars}{t, \coords} \newcommand{\dens}{\rho} \newcommand{\shMod}{G} \newcommand{\stresses}{\sigma} \newcommand{\vp}{v} \newcommand{\dvpdt}{ \frac{\partial \vp}{\partial t} } \newcommand{\dvpdr}{ \frac{\partial \vp}{\partial r} } \newcommand{\dvpdtheta}{ \frac{\partial \vp}{\partial \theta} } \newcommand{\spp}{\sigma_{\phi\phi}} \newcommand{\dsppdp}{ \frac{\partial \spp}{\partial \phi} } \newcommand{\srp}{\sigma_{r\phi}} \newcommand{\dsrpdt}{ \frac{\partial \srp}{\partial t} } \newcommand{\dsrpdr}{ \frac{\partial \srp}{\partial r} } \newcommand{\stp}{\sigma_{\theta\phi}} \newcommand{\dstpdt}{ \frac{\partial \stp}{\partial t} } \newcommand{\dstpdtheta}{ \frac{\partial \stp}{\partial \theta} } \newcommand{\force}{f} \frac{d \romState_{\stresses}(t)}{dt} = \romBasis_{\stresses}^T \systemMat_{\stresses} \romBasis_{\vp} \romState_{\vp}(t)

pROM formulation

For simplicity, assume same # of modes = K for velocity/stresses

%% FOM \newcommand{\state}{\boldsymbol y} \newcommand{\systemMat}{\mathbf{A}} \newcommand{\forcing}{\boldsymbol f} \newcommand{\fomDim}{N} \newcommand{\tfinal}{t_{final}} \newcommand{\paramsA}{\boldsymbol \eta} \newcommand{\paramsF}{\boldsymbol \mu} \newcommand{\paramDomainA}{\mathcal D_{\eta}} \newcommand{\paramDomainF}{\mathcal D_{\mu}} \newcommand{\nParamsA}{N_{\paramsA}} \newcommand{\nParamsF}{N_{\paramsF}} \newcommand{\paramsMat}{ \mathcal{M} } \newcommand{\stateRef}{\state_\text{ref}(\paramsA, \paramsF)} \newcommand{\stateRefNoPars}{\state_\text{ref}} %% ROM \newcommand{\romBasis}{\boldsymbol{\Phi}} \newcommand{\romState}{\hat{\state}} \newcommand{\romStateTensor}{\hat{\stateTensor}} \newcommand{\stateTensorRef}{{\stateTensor}_{\text{ref}}} \newcommand{\stateTensorRefWithPars}{{\stateTensor}_{\text{ref}}(\paramsA, \paramsMat)} \newcommand{\stateTensor}{\boldsymbol X} \newcommand{\approxStateTensor}{\tilde{\boldsymbol X}} \newcommand{\romSystemMat}{\hat{\systemMat}} \newcommand{\romDim}{K} \newcommand{\nRuns}{M} \newcommand{\forcingTensor}{\boldsymbol F} \newcommand{\approxState}{\tilde{\state}} %\newcommand{\stateRef}{{\state}_{\text{ref}}} \newcommand{\romBasisCol}{\boldsymbol \phi} \newcommand{\trialSubspace}{\mathcal{V}} \newcommand{\spherCoords}{r, \theta, \phi} \newcommand{\axiCoords}{r, \theta} \newcommand{\coords}{r, \theta} \newcommand{\depVars}{t, \coords} \newcommand{\dens}{\rho} \newcommand{\shMod}{G} \newcommand{\stresses}{\sigma} \newcommand{\vp}{v} \newcommand{\dvpdt}{ \frac{\partial \vp}{\partial t} } \newcommand{\dvpdr}{ \frac{\partial \vp}{\partial r} } \newcommand{\dvpdtheta}{ \frac{\partial \vp}{\partial \theta} } \newcommand{\spp}{\sigma_{\phi\phi}} \newcommand{\dsppdp}{ \frac{\partial \spp}{\partial \phi} } \newcommand{\srp}{\sigma_{r\phi}} \newcommand{\dsrpdt}{ \frac{\partial \srp}{\partial t} } \newcommand{\dsrpdr}{ \frac{\partial \srp}{\partial r} } \newcommand{\stp}{\sigma_{\theta\phi}} \newcommand{\dstpdt}{ \frac{\partial \stp}{\partial t} } \newcommand{\dstpdtheta}{ \frac{\partial \stp}{\partial \theta} } \newcommand{\force}{f} \state_{\vp}(t) \approx \romBasis_{\vp} \hat{\state}_{\vp}(t),

%% FOM \newcommand{\state}{\boldsymbol y} \newcommand{\systemMat}{\mathbf{A}} \newcommand{\forcing}{\boldsymbol f} \newcommand{\fomDim}{N} \newcommand{\tfinal}{t_{final}} \newcommand{\paramsA}{\boldsymbol \eta} \newcommand{\paramsF}{\boldsymbol \mu} \newcommand{\paramDomainA}{\mathcal D_{\eta}} \newcommand{\paramDomainF}{\mathcal D_{\mu}} \newcommand{\nParamsA}{N_{\paramsA}} \newcommand{\nParamsF}{N_{\paramsF}} \newcommand{\paramsMat}{ \mathcal{M} } \newcommand{\stateRef}{\state_\text{ref}(\paramsA, \paramsF)} \newcommand{\stateRefNoPars}{\state_\text{ref}} %% ROM \newcommand{\romBasis}{\boldsymbol{\Phi}} \newcommand{\romState}{\hat{\state}} \newcommand{\romStateTensor}{\hat{\stateTensor}} \newcommand{\stateTensorRef}{{\stateTensor}_{\text{ref}}} \newcommand{\stateTensorRefWithPars}{{\stateTensor}_{\text{ref}}(\paramsA, \paramsMat)} \newcommand{\stateTensor}{\boldsymbol X} \newcommand{\approxStateTensor}{\tilde{\boldsymbol X}} \newcommand{\romSystemMat}{\hat{\systemMat}} \newcommand{\romDim}{K} \newcommand{\nRuns}{M} \newcommand{\forcingTensor}{\boldsymbol F} \newcommand{\approxState}{\tilde{\state}} %\newcommand{\stateRef}{{\state}_{\text{ref}}} \newcommand{\romBasisCol}{\boldsymbol \phi} \newcommand{\trialSubspace}{\mathcal{V}} \newcommand{\spherCoords}{r, \theta, \phi} \newcommand{\axiCoords}{r, \theta} \newcommand{\coords}{r, \theta} \newcommand{\depVars}{t, \coords} \newcommand{\dens}{\rho} \newcommand{\shMod}{G} \newcommand{\stresses}{\sigma} \newcommand{\vp}{v} \newcommand{\dvpdt}{ \frac{\partial \vp}{\partial t} } \newcommand{\dvpdr}{ \frac{\partial \vp}{\partial r} } \newcommand{\dvpdtheta}{ \frac{\partial \vp}{\partial \theta} } \newcommand{\spp}{\sigma_{\phi\phi}} \newcommand{\dsppdp}{ \frac{\partial \spp}{\partial \phi} } \newcommand{\srp}{\sigma_{r\phi}} \newcommand{\dsrpdt}{ \frac{\partial \srp}{\partial t} } \newcommand{\dsrpdr}{ \frac{\partial \srp}{\partial r} } \newcommand{\stp}{\sigma_{\theta\phi}} \newcommand{\dstpdt}{ \frac{\partial \stp}{\partial t} } \newcommand{\dstpdtheta}{ \frac{\partial \stp}{\partial \theta} } \newcommand{\force}{f} \state_{\stresses}(t) \approx \romBasis_{\stresses} \hat{\state}_{\stresses}(t)

Approximation:

Computing the basis via POD snapshots (1/2)

Approximation:

Also called the ROM "offline stage"
Execute solves of the FOM for "training" parameter instances

and collect the "snapshots" (mode-1 concatenation)

# time steps

# state dofs

Computing the basis via POD snapshots (2/2)

Identify low-dimensional structure in data (POD)

# time steps

# state dofs

\boldsymbol{U}

\boldsymbol{\Sigma} \boldsymbol{V}^T

Predictive ROM: quantitative

Earth, total simulation 2000 (sec)
Forcing: Ricker wavelet: depth = 150 Km, period T ∈ [31, 69] (sec)
Relative L2 norm of the error over domain at final time

Many modes needed, as expected

Data-driven interpolation fails

Predictive ROM: qualitative

Velocity field at time = 2000 secs computed for the forcing period T=69 (extrapolation point)
ROM using 436 modes for velocity and 417 modes for stresses

Introduction

Initial event

Rising action

Climax

Falling action

Resolution

denouement

O(1000) reduction in dofs, O(10) in runtime.. feels like a defeat!
The pROM community points fingers to using too many modes
But we absolutely need many modes or ROM is useless
What do we do?

The dilemma

	# of degrees of freedom	runtime
Full-order model	~3,150,000	t
pROM	~850	0.1 t
Reduction factor	3700	10

Introduction

Initial event

Rising action

Climax

Falling action

Resolution

denouement

Perseverance!

Galerkin ROM: Analysis

Dense system
General matrix-vector product (gemv) kernel
BLAS level-2 kernel

How do we evaluate the efficiency of this kernel?

Arithmetic Intensity

\text{arithmetic intensity} = \frac{\text{minimal FLOP-count}}{\text{minimal data movement (bytes)}}

Assume: square system of size , using doubles
FLOPS:
Data movement: (read ) + (read ) + (write )
Result ~1/4 (flops/byte)

gemv kernel:

y = A x + b

K^2

2K^2

x,b

1/4

Memory bandwidth bound!

theoretically attainable performance depending on the arithmetic intensity

Strive for compute-bound

Modern many-cores chips
Best when:
- cores are kept busy, data is local
- access patterns are optimal for the targeted arch

Standard Galerkin ROM

Source: web

Think about the use case

This is useful when we need many solves, UQ

Let's consider M trajectories simultaneously
e.g. different forcing evaluations

Let's put on the UQ and HPC glasses

Arithmetic Intensity

\text{arithmetic intensity} = \frac{\text{minimal FLOP-count}}{\text{minimal data movement}}

~ K / 16

This is now a function of K (# of modes)

Rank-2: why is it good?

Standard formulation

Rank-2 formulation

Results: performance

use kokkos-kernels with OpenMP backend;
workstation with two 18-core Intel(R) Xeon(R) Gold 6154 CPU @ 3.00 GHz (24.75MB L3 cache, 125GB total mem)*

M=1: very limited
M>1: increasing # of threads helps
A large K and M is an advantage!
Allows to fully exploit the machine!

M = # of simultaneous trajectories

M = 1 : standard pROM

M >= 2: rank-2 ROM formulation

* F.Rizzi et al, CMAME, 2021

When to use the Rank-2 ROM?

What combination of thread count (n) and number of trajectories, M, would be the most efficient to obtain those P samples while satisfying the given constraints?

Suppose:

a single compute node with 36 cores
need P >>1 trajectories (e.g. from forcing realizations)
1024 is the maximum feasible # of trajectories that can simultaneously be processed on the node
- e.g. due to memory constraints (collecting data, etc)

Launch 36 single-thread ROM runs each using M=1

and repeat until all P samples are done

CPU 0

CPU 1

Launch 18 two-threaded ROM runs each using M=1

and repeat until all P samples are done

Speedup wrt standard pROM

If we increase # of modes (K), things improve!

# of modes (K) = 512

# of modes (K) = 2048

Greener is better

Recall that standard ROM is the case M = 1

inadmissible

combinations

inadmissible

combinations

The larger the number of modes, the more efficient it is to evaluate an ensemble of trajectories!

# of modes (K)	How many times more efficient than rank-1 pROMs?
256	13x
512	19x
1024	23x
2048	26x

Monte Carlo becomes practical!

MC study: 512 trajectories sampling the forcing period T

Rank-2 ROM is 950 times faster than FOM

If FOM takes 1 hour, ROM takes 3 seconds

Introduction

Initial event

Rising action

Climax

Falling action

Resolution

denouement

Applicable to other systems

Aeroelasticity:
deforming structures modeled as linear, with a nonlinear load
Acoustic waves:
modeled with a linear PDE, but can have a number of nonlinear sources (turbulent shear layers from wakes)
Neutral particle (neutron, photon, etc.) transport
Linear circuit models

Challenges/outlook

What about if the matrix A changes?
What about nonlinear problems?
Tensors are getting more and more attention
- ROMs for LTI can benefit from them
- Leverage hardware evolution: CUDA tensor cores

Rank-3 ROMs?

Batched gemm?

Conclusions

Just because something has been standing for a long time,
does not mean it must be taken as gold standard
Previous formulations can/should be revisited
- Looking at problems from a different angle
- Synergies between scientific areas is critical
Sometimes, making virtue out of necessity is possible

A compute-bound formulation of Galerkin model reduction for linear time-invariant dynamical systems, F.Rizzi, E.Parish, P.Blonigan, J.Tencer, CMAME, 2021

Acknowledgmenets

Eric Parish (SNL)

Patrick Blonigan (SNL)

John Tencer (SNL)

This paper describes objective technical results and analysis. Any subjective views or opinions that might be expressed in the paper do not necessarily represent the views of the U.S. Department of Energy or the United States Government. This work was funded by the Advanced Simulation and Computing program and the Laboratory Directed Research and Development program at Sandia National Laboratories, a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA-0003525

In collaboration with Sandia National Labs.

Questions?

francesco.rizzi@ng-analytics.com

Thank you!

nexgenanalytics.github.io/cea-seminar-march-2024/

Backup

Why surrogates?

If you are here today, likely you use and/or study and/or believe in surrogate modeling. So I could spend minutes on this but...
Computing/hardware progresses and changes quickly
- Exascale is already here: China has two machines
- How does this impact surrogates (if at all)?
- Can we/how to leverage this for surrogate modeling?
  - "It allows me to run my same old surrogate faster": not ideal!
More synergistic development of surrogates and computing?

Source: https://www.alcf.anl.gov/files/DMello-Nguyen-ALCF-CP-Workshop-MKL-2019-05-01-2019.pdf

Why projection-based ROMs?

A priori training data and a projection process
- Leverages the governing equations
- Explainability, physics is already there
- Error bounds and models
To consider:
- Intrusivity, hyper-reduction for nonlinear models
- Linear time-invariant (LTI) systems: theory is quite mature
Various:
- Interplay between DL and pROMs?
- Can pROMs benefit from DL? Definitely, but must be practical

Follow along/slides at: fnrizzi.github.io/ramses-12-2021/

Historically, a key focus of pROMs work has been:
"finding the smallest subspace that can represent/solve a problem"
Intuitively : small system, more convenient to compute
Mathematically : intriguing but hard
Computationally: is this really the best approach?
- What if we can formulate the problem such that we don't need to reduce it so much while being efficient?
This talk aims to provide a counterargument

Why this talk?

Follow along/slides at: fnrizzi.github.io/ramses-12-2021/

Focus on pROMs for LTI systems
- Weren't they a solved problem...?
- Emphasis on computational aspects
- Little math, ~~error bounds~~, ~~ML, deep learning~~ (sorry!)
Disclaimer: this work might seem "obvious" (depending on whom you are talking to)
Format: this is going to be a "story"
- Walk through how this work started and developed
- Finally, we talk about generalization

Content and format

Follow along/slides at: fnrizzi.github.io/ramses-12-2021/

BLAS level-1 traces back to 1979
Level-2 kernels matrix-vector in 1988
Level-3 matrix-matrix in 1990

Memory BW-bound: so what?

Hardware has changed from the 80's!

Roofline Model

Visual performance model obtained by plotting:
- performance (in GFLOPs/s) against their arithmetic intensity
- two lines to indicate the theoretically attainable performance depending on the arithmetic intensity
Evaluate resource efficiency by relating its algorithm’s arithmetic intensity relative to the hardware’s peak main-memory bandwidth and floating-point performance
Hardware limitations for a given kernel, prioritize optimizations

Surrogate models

Numerics

Approximate

Exact

Physics/Equations

Approximate

Exact

Full-order model

(FOM)

pROMs

Data-driven surrogates

Reduced

physics

Why pROMs?

Directly tied to a "full-order model"
Retain the "physics"
Results are explainable
Evaluating known dynamics rather than learning unknown dynamics
A priori and a posteriori error bounds
Enables full-field predictions