ReliQ uses Global Arrays (GA) for distributed memory management and MPI for inter-process communication. The GA layer provides one-sided (PGAS) access to distributed arrays with automatic ghost region management.
┌─────────────────────────────────────┐ │ TensorField / GlobalShifter │ ← User-facing ├─────────────────────────────────────┤ │ GlobalArray[D, T] │ ← Nim wrapper ├─────────────────────────────────────┤ │ GA C API (ga.h / macdecls.h) │ ← FFI bindings ├─────────────────────────────────────┤ │ MPI (message passing) │ ← Communication └─────────────────────────────────────┘
GlobalArray[D, T] wraps a single GA handle with Nim-level resource management (RAII via =destroy):
import reliq # Usually created indirectly through TensorField: var field = lat.newTensorField([3, 3]): float64 # field.data is a GlobalArray[D+R+1, T] # Direct construction (advanced use): var ga = newGlobalArray[4]( globalGrid = [8, 8, 8, 16], mpiGrid = [1, 1, 1, 4], ghostGrid = [1, 1, 1, 1], T = float64 )
| Proc | Returns | Description |
|---|---|---|
| numSites() | int | Total sites in the global array |
| getGlobalGrid() | array[D, int] | Full grid dimensions |
| getLocalGrid() | array[D, int] | This rank's local partition |
| getMPIGrid() | array[D, int] | MPI process grid |
| getGhostGrid() | array[D, int] | Ghost widths per dimension |
| getBounds() | (lo, hi) | This rank's index bounds |
# Get a pointer to this rank's local data (including ghost region) let ptr = ga.accessLocal() # ... use the pointer ... # Release the local access ga.releaseLocal() # Access ghost data let ghostPtr = ga.accessGhosts()
# Update ghost regions in one direction ga.updateGhostDirection(dim=3, direction=1) # Update all ghost regions ga.updateGhosts()
GA 5.8.2 Limitation: All dimensions must have ghost width ≥ 1 for GA_Update_ghost_dir to work correctly. Ghost width 0 on any dimension causes a crash ("cannot locate region" with invalid bounds).
The globalarrays/gampi module provides MPI initialization and collective operations:
import reliq # The parallel: template handles init/finalize automatically: parallel: echo "Rank ", GA_Nodeid(), " of ", GA_Nnodes() # ... # Equivalent to: initMPI() initGA() # ... user code ... finalizeGA() finalizeMPI()
# Barrier synchronization (GA level) GA_Sync() # Typed all-reduce operations var localSum = 42.0 allReduceFloat64(addr localSum, 1) # In-place sum across ranks var localInt: int32 = 10 allReduceInt32(addr localInt, 1) # GA broadcast (rank 0 sends to all) var data: float64 = 3.14 GA_Brdcst(addr data, sizeof(float64).cint, 0.cint)
The parallel: block template sets up MPI + GA and handles cleanup:
import reliq parallel: # MPI and GA are initialized here echo "Running on ", GA_Nnodes(), " ranks" block: # Create distributed objects inside a block: var field = lat.newTensorField([3, 3]): float64 # ... computation ... # GA objects destroyed at block exit (before finalizeGA) # MPI and GA finalized automatically
Important: Use block: scoping for GA-backed objects. They must be destroyed before finalizeGA() is called.
ReliQ provides a launcher script (reliq) that dispatches to mpirun:
# Run on 4 MPI ranks ./reliq -e tensor -n 4 # Run on 1 rank (default) ./reliq -e tensor
LocalTensorField provides a direct pointer into the rank-local GA memory (via NGA_Access) together with a precomputed siteOffsets lookup table for navigating the padded strides. No contiguous buffer is allocated — reads and writes go directly to the Global Array:
parallel: let lat = newSimpleCubicLattice([8, 8, 8, 16]) block: var field = lat.newTensorField([3, 3]): float64 # Obtain a direct pointer into the rank-local GA memory var local = field.newLocalTensorField() # Work with local data — writes go directly to the GA for n in all 0..<local.numSites(): var site = local.getSite(n) site[0, 0] = 1.0 # No manual flush needed — data is already in the GA # Ghost exchange after modification field.updateAllGhosts()
| Module | Description |
|---|---|
| globalarrays/gatypes | GlobalArray[D,T] distributed array wrapper |
| globalarrays/gabase | GA initialization and finalization |
| globalarrays/gawrap | Low-level C FFI to Global Arrays |
| globalarrays/gampi | MPI initialization and collective operations |