H-AMR: The Next-Gen GRMHD Code

I am the main developer of H-AMR, a massively scalable GPU-accelerated general relativistic magnetohydrodynamics (GRMHD) code. H-AMR features adaptive grids, local adaptive timestepping and novel load balancing routines enabling it to attack challenging problems. H-AMR scales up to large GPU clusters including OLCF's Summit. H-AMR uses a grid-based Godunov scheme to solve the fluid equations, including the induction equation, and incorporates the M1 closure to model radiation. It can also evolve the electron and ion entropies to model two-temperature plasmas. H-AMR is triple level parallelized with CUDA doing most of the computations, with OpenMP handling communication and gridding, and with non-blocking message passing interface (MPI) handling the transfer of boundary cells between nodes.

grid.png

H-AMR: Adaptive Mesh Refinement

H-AMR makes use of adaptive mesh refinement (AMR). This allows H-AMR to focus the resolution where it is most usefull as shown in the right image: Regions in the accretion disk have more blocks and thus have a higher effective resolution than regions outside of the disk.

In addition. H-AMR uses a local adaptive timestep (LAT). H-AMR makes use of the fact that the grid is logarithmic spaced; e.g. the cell size increases with distance from the black hole. This makes it possible to reduce the numbers of timesteps. This leads to a factor of 3--8 speedup and increases the numerical accuracy by decreasing the noise generated due to the variable inversion (see our paper).

grid (1).png

H-AMR: Treatment of Pole

H-AMR efficiently and accurately handles the polar regions in a spherical grid. H-AMR achieves this by applying transmissive boundary conditions across the pole while using static mesh refinement (SMR) to avoid the squeezing of cells in the azimuthal direction, which would otherwise slow down computational speed by orders of magnitude. For this, H-AMR derefines the grid only in the troublesome azimuthal direction while maintaining the full resolution in the r- and theta-directions. This unique approach makes it possible to resolve features that pass close to the pole.

radiation2.png

H-AMR: Radiation and Two-phase Fluids

To perform simulations of radiatively cooled accretion disks, we have implemented direct radiation treatment into H-AMR. For this, we use a two-moment (M1) closure approach in the gray approximation. This means that the energy and momentum of the radiation field is able to feed back on the gas dynamics, but the spectral information is not retained.

In addition, to model dilute plasmas, which are present in most accretion disks, we have implemented two-temperature fluids. We evolve the ion and electron entropy separately, calculate the total dissipation and divide that between the two species. We account for Coulomb collisions between ions and electrons through an (implicit) source term . The image on the right shows transverse slices in density, gas/radiation internal energies and electron/ion temperatures through a radiative 2T simulation of an accretion disk at 35% of the Eddington limit.

benchmark.png

H-AMR: Single CPU and GPU Benchmarks

H-AMR is a leader in CPU and GPU performance. It achieves 0.5 x10^6 zone cycles/s/core on a Skylake CPU in non-radiative GRMHD with full AVX-512 vectorization. On an A100 GPU H-AMR achieves 1.8x10^8 zone-cycles/s/GPU. In radiative 2T GRMHD this drops to ~1.0-3.0x10^7 zone-cycles/s/GPU depending on the complexity of the setup.

Scaling.png

H-AMR: Scalability on OLCF Summit

H-AMR shows excellent weak scaling for both simple and complex grids. For simple grids, it shows an 80% numerical efficiency on 900 OLCF Summit nodes = 5,400 GPUs for a single block of 150x150x150 cells per GPU, without using adaptive mesh refinment (AMR) or local adaptive timestepping (LAT, dashed-dotted orange line). For more complex grids with 20 blocks of size 48x48x64 ~ 50^3 cells per GPU, the efficiency at 900 nodes drops to 60% (not shown for brevity). While using local adaptive timestepping (LAT) decreases the raw parallel efficiency to 35% (dashed green line), it also effectively speeds up the simulations by a factor of 4 (on 1 GPU) to 9 (on 5,400 GPUs, solid blue line), leading to an effective numerical efficiency of nearly 200% on 5,400 GPUs. This data is based on the largest GRMHD simulation to date, which contained ~15-20 billion cells and 85 million timesteps. It shows the ability of H-AMR to scale up under the most strenuous conditions.

Public Release

Presently, we are developing a novel Monte-Carlo radiation and test particle scheme to address accretion of high-luminosity black holes. Time constraints of various junior people in the collaboration make it challenging to prepare and support a public release of H-AMR at the time of writing. However, we are looking forward to collaborate with other interested groups on topics of mutual interest. Please contact me for more details.

Publications

1) H-AMR: A New GPU-accelerated GRMHD Code for Exascale Computing With 3D Adaptive Mesh Refinement and Local Adaptive Time-stepping

2) The Event Horizon General Relativistic Magnetohydrodynamic Code Comparison Project

Next
Next

Luminous Accretion Disks