GlobalTensor - Distributed Tensor Fields via Global Arrays

This module provides TensorField[D,R,L,T], the primary distributed data type in ReliQ. Each tensor field is backed by a Global Array with ghost (halo) regions, enabling transparent MPI communication of boundary data.

Key capabilities:

Construction: newTensorField creates a GA-backed field with automatic MPI decomposition and ghost region allocation
Local data access: accessLocal, accessGhosts, releaseLocal provide raw pointers into the local partition (with or without ghost cells)
Ghost exchange: updateGhosts, updateAllGhosts synchronise boundary data between MPI ranks via GA_Update_ghost_dir
Coordinate utilities: lexToCoords, coordsToLex, localIdx, localLexIdx handle the C row-major memory layout used by GA
Shifting / transport: GlobalShifter performs distributed shifts (dest[x] = src[x + e_mu]) using ghost exchange, with correct handling of periodic boundary conditions for both distributed and non-distributed dimensions
Stencil operations: applyStencilShift, discreteLaplacian implement common nearest-neighbour operations
Stencil integration: newLatticeStencil creates a unified LatticeStencil[D] from a tensor field's geometry

GA Memory Layout

Global Arrays store data in C row-major order (last dimension fastest varying). A TensorField[D=4, R=2, T=float64] with shape = [3, 3] maps to a 7-dimensional GA [Lx, Ly, Lz, Lt, S0, S1, Cplx]. The first D dimensions (lattice) carry ghost regions for boundary communication; the last R+1 dimensions (tensor shape + complex component) also carry ghost width 1 because GA 5.8.2 requires all dimensions to have ghost ≥ 1 for GA_Update_ghost_dir to function. The padded inner block therefore has (S_i + 2) entries per tensor dimension and (complexFactor + 2) for the complex dimension; only the central product(shape) * complexFactor elements are real data. Use innerPaddedBlockSize / innerPaddedOffset to navigate.

Two pointer types are available:

Local pointer (accessLocal): starts at the center of the padded inner block for the first local lattice site. p[0] is element 0 at local coordinates (0,0,...,0). The stride between adjacent lattice sites is innerPaddedBlockSize (the product of all padded inner dimensions), not the number of real elements. Use localIdx to compute the correct flat index.
Ghost pointer (accessGhosts): starts at the origin of the full padded array. Use coordsToPaddedFlat to index into the ghost-padded lattice dimensions; at each site, add innerPaddedOffset to reach the first real element.

Example

import reliq

parallel:
  let lat = newSimpleCubicLattice([8, 8, 8, 16], [1, 1, 1, 4], [1, 1, 1, 1])
  var src  = lat.newTensorField([1, 1]): float64
  var dest = lat.newTensorField([1, 1]): float64
  
  # Fill src with global lex index
  let p = src.accessLocal()
  for site in 0..<nLocal:
    p[src.localLexIdx(site)] = float64(site)
  src.releaseLocal()
  
  # Shift forward in t-direction
  let shifter = newGlobalShifter(src, dim = 3, len = 1)
  shifter.apply(src, dest)   # dest[x] = src[x + e_t]
  
  # Discrete Laplacian
  var lap = lat.newTensorField([1, 1]): float64
  var tmp = lat.newTensorField([1, 1]): float64
  discreteLaplacian(src, lap, tmp)

Imports

../lattice, ../globalarrays/gatypes, ../globalarrays/gawrap, ../utils/complex, ../lattice/stencil

Types

GlobalShifter[D; R; L; T] = object
  dim*: int                  ## Shift dimension
  len*: int                  ## Displacement (+forward, -backward)
  ## MPI decomposition grid
  ## Product of localGeom[]
  ## Product of shape[]

Distributed-memory field shifter backed by Global Arrays ghost exchange

Unlike the single-rank Shifter[D,T] in transporter.nim, a GlobalShifter performs real MPI communication via GA_Update_ghost_dir so that neighbor lookups near partition boundaries read correct remote data.

Usage:

let lat = newSimpleCubicLattice([8,8,8,16], [1,1,1,4], [1,1,1,1])
var field = lat.newTensorField([1,1]): float64
let shifter = newGlobalShifter(field, dim=3, len=1)
var dest   = lat.newTensorField([1,1]): float64
shifter.apply(field, dest)   # dest[x] = field[x + e_3]

TensorField[D; R; L; T] = object
  lattice*: L
  shape*: array[R, int]
  when D + R == 1:
    when isComplex32(T):
      data*: GlobalArray[2, float32]
    elif isComplex64(T):
      data*: GlobalArray[2, float64]
    else:
      data*: GlobalArray[2, T]
  elif D + R == 2:
    when isComplex32(T):
      data*: GlobalArray[3, float32]
    elif isComplex64(T):
      data*: GlobalArray[3, float64]
    else:
      data*: GlobalArray[3, T]
  elif D + R == 3:
    when isComplex32(T):
      data*: GlobalArray[4, float32]
    elif isComplex64(T):
      data*: GlobalArray[4, float64]
    else:
      data*: GlobalArray[4, T]
  elif D + R == 4:
    when isComplex32(T):
      data*: GlobalArray[5, float32]
    elif isComplex64(T):
      data*: GlobalArray[5, float64]
    else:
      data*: GlobalArray[5, T]
  elif D + R == 5:
    when isComplex32(T):
      data*: GlobalArray[6, float32]
    elif isComplex64(T):
      data*: GlobalArray[6, float64]
    else:
      data*: GlobalArray[6, T]
  elif D + R == 6:
    when isComplex32(T):
      data*: GlobalArray[7, float32]
    elif isComplex64(T):
      data*: GlobalArray[7, float64]
    else:
      data*: GlobalArray[7, T]
  elif D + R == 7:
    when isComplex32(T):
      data*: GlobalArray[8, float32]
    elif isComplex64(T):
      data*: GlobalArray[8, float64]
    else:
      data*: GlobalArray[8, T]
  elif D + R == 8:
    when isComplex32(T):
      data*: GlobalArray[9, float32]
    elif isComplex64(T):
      data*: GlobalArray[9, float64]
    else:
      data*: GlobalArray[9, T]
  elif D + R == 9:
    when isComplex32(T):
      data*: GlobalArray[10, float32]
    elif isComplex64(T):
      data*: GlobalArray[10, float64]
    else:
      data*: GlobalArray[10, T]
  elif D + R == 10:
    when isComplex32(T):
      data*: GlobalArray[11, float32]
    elif isComplex64(T):
      data*: GlobalArray[11, float64]
    else:
      data*: GlobalArray[11, T]

Distributed tensor field backed by a Global Array.

Represents a rank-R tensor field defined on a D-dimensional lattice, stored as a distributed GA with ghost regions for MPI communication. The element type T may be a real scalar or a complex type (Complex32, Complex64).

Procs

proc accessGhosts[D: static[int]; R: static[int]; L: Lattice[D]; T]( tensor: TensorField[D, R, L, T]): ptr UncheckedArray[T]: Access local data including ghost regions

Returns a raw pointer to the padded data (local + ghost regions). Ghost regions must have been updated via updateAllGhosts first. Must call releaseLocal when done.
proc accessLocal[D: static[int]; R: static[int]; L: Lattice[D]; T]( tensor: TensorField[D, R, L, T]): ptr UncheckedArray[T]: Access local data segment of the tensor field (no ghosts)

Important: The returned pointer shares the padded memory layout. The stride between consecutive lattice sites is NOT 1 — it is determined by the padded inner dimensions (tensor shape + complex, each ghost-padded). Use localIdx to compute the correct flat index for a lattice site.

Must call releaseLocal when done.
proc apply[D: static[int]; R: static[int]; L: Lattice[D]; T]( shifter: GlobalShifter[D, R, L, T]; src: TensorField[D, R, L, T]; dest: var TensorField[D, R, L, T]): Apply the shift: dest[x] = src[x + shift]

Performs GA ghost exchange in the shift dimension, then reads neighbor values from the ghost-padded local data.
proc applyStencilShift[D: static[int]; R: static[int]; L: Lattice[D]; T]( src: TensorField[D, R, L, T]; dest: var TensorField[D, R, L, T]; dim: int; direction: int): Apply a single stencil shift using GA ghost exchange

A convenience wrapper: dest[x] = src[x + direction*e_dim]

Parameters:

src: Source tensor field

dest: Destination tensor field (overwritten)

dim: Shift dimension (0..D-1)

direction: +1 for forward, -1 for backward
proc coordsToLex[D: static int](coords: array[D, int]; geom: array[D, int]): int: Convert D-dimensional coordinates to lexicographic index

GA C-order: last dimension is fastest varying.
proc coordsToPaddedFlat[D: static int](coords: array[D, int]; paddedGeom: array[D, int]; ghostWidth: array[D, int]; innerBlockSize: int): int: Convert local lattice coordinates to a flat index in the ghost-padded array

The padded array has D lattice dimensions (each padded by 2*ghostWidth) followed by R+1 inner dimensions (also padded). This function computes the index for the center element of the inner block at the given lattice site.

Parameters:

coords: Local lattice coordinates (can be negative for backward ghost)

paddedGeom: Padded lattice geometry (localGeom + 2*ghostWidth per dim)

ghostWidth: Ghost width per lattice dimension

innerBlockSize: Product of all padded inner dimensions

Returns: Flat index into the padded data array (not including inner offset)
proc discreteLaplacian[D: static[int]; R: static[int]; L: Lattice[D]; T]( src: TensorField[D, R, L, T]; dest: var TensorField[D, R, L, T]; scratch: var TensorField[D, R, L, T]): Compute the discrete Laplacian: dest[x] = sum_mu (src[x+mu] + src[x-mu]) - 2*D * src[x]

Uses GA ghost exchange for boundary communication. scratch is used as temporary storage.
proc ghostWidth[D: static[int]; R: static[int]; L: Lattice[D]; T]( tensor: TensorField[D, R, L, T]): array[D, int]: Get the ghost width in each dimension
proc hasGhosts[D: static[int]; R: static[int]; L: Lattice[D]; T]( tensor: TensorField[D, R, L, T]): bool: Check if tensor field has ghost regions configured
proc innerBlockSize(R: static int; shape: openArray[int]; isComplex: bool): int: Compute the number of contiguous elements per lattice site

This is the product of all tensor shape entries times the complex factor.
proc innerPaddedBlockSize(R: static int; ghostWidth: int): int: Compute the padded inner block size for ghost-padded arrays

Each inner dimension of size S gets padded to S + 2*ghostWidth. The block size is the product of all padded inner dimensions. For scalar real with S=1, ghost=1: each of R+1 dims is padded to 3, so block = 3^(R+1).
proc innerPaddedOffset(R: static int; ghostWidth: int): int: Compute the flat offset to the center element of the padded inner block

In the ghost-padded inner block, each inner dimension of padded size P has the real data at index ghostWidth. The offset from the start of the padded block to the center (first real element) is: sum_{i=0..R} ghostWidth * stride_i
proc lexToCoords[D: static int](idx: int; geom: array[D, int]): array[D, int]: Convert lexicographic index to D-dimensional coordinates

Uses GA C-order convention: last dimension is fastest varying. idx = coords[0] * stride0 + coords[1] * stride1 + ... + coords[D-1]
proc localGrid[D: static[int]; R: static[int]; L: Lattice[D]; T]( tensor: TensorField[D, R, L, T]): array[D, int]: Get local grid dimensions (excluding ghosts)
proc localIdx[D: static[int]; R: static[int]; L: Lattice[D]; T]( tensor: TensorField[D, R, L, T]; coords: array[D, int]): int: Compute the flat index into the local data pointer for a lattice site

The GA inner dimensions (tensor shape + complex) are ghost-padded to satisfy GA's requirement that all dimensions have ghost width ≥ 1. NGA_Access returns a pointer to the center of the padded block for the first local lattice site, so p[0] is the first real element. The stride between consecutive lattice sites is the product of all padded inner dimensions (innerPaddedBlockSize).

The returned index points to the center of the padded inner block for the given lattice site. Add e to reach element e (but only 0..<innerBlockSize are real data; the rest is padding).

Parameters:

tensor: The tensor field (used for shape and padding info)

coords: Local lattice coordinates

Returns: Flat index into the local data pointer for the first real element at the given lattice site.
proc localLexIdx[D: static[int]; R: static[int]; L: Lattice[D]; T]( tensor: TensorField[D, R, L, T]; site: int): int: Compute the flat index for a lexicographically-ordered site

Converts a lexicographic site index to the flat index in the local data pointer, accounting for lattice-dimension ghost padding while keeping inner (tensor/complex) dimensions contiguous.
proc localSiteOffset[D: static int](coords: array[D, int]; paddedGeom: array[D, int]; innerBlockSize: int): int: Convert local lattice coordinates to a flat offset in the local data pointer

The local pointer from NGA_Access points into the padded memory at the start of the local region. The memory layout still uses padded strides for the lattice dimensions (which carry ghost regions). The inner block at each lattice site is contiguous with innerBlockSize elements.

Parameters:

coords: Local lattice coordinates

paddedGeom: Padded lattice geometry (localGeom + 2*ghostWidth per dim)

innerBlockSize: Number of contiguous inner elements per lattice site

Returns: Flat offset from local pointer p0 to the start of the inner block for the given site.
proc newGlobalBackwardShifters[D: static[int]; R: static[int]; L: Lattice[D]; T]( field: TensorField[D, R, L, T]; len: int = 1): array[D, GlobalShifter[D, R, L, T]]: Create backward shifters for all D dimensions
proc newGlobalShifter[D: static[int]; R: static[int]; L: Lattice[D]; T]( field: TensorField[D, R, L, T]; dim: int; len: int = 1): GlobalShifter[D, R, L, T]: Create a global shifter for a tensor field in a given dimension

Parameters:

field: The tensor field (used only for geometry, not mutated)

dim: Dimension to shift along (0..D-1)

len: Displacement (+1 = forward, -1 = backward)
proc newGlobalShifters[D: static[int]; R: static[int]; L: Lattice[D]; T]( field: TensorField[D, R, L, T]; len: int = 1): array[D, GlobalShifter[D, R, L, T]]: Create forward shifters for all D dimensions
proc newLatticeStencil[D: static[int]; R: static[int]; L: Lattice[D]; T]( tensor: TensorField[D, R, L, T]; pattern: StencilPattern[D]): LatticeStencil[ D]: Create a unified lattice stencil from a tensor field

Automatically extracts local geometry and ghost width from the tensor. The stencil understands both local and ghost regions.

Example:

let lat = newSimpleCubicLattice([16, 16, 16, 32], [1, 1, 1, 4], [1, 1, 1, 1]) var field = lat.newTensorField([3, 3]): Complex64 let stencil = field.newLatticeStencil(nearestNeighborStencil[4]()) # Use stencil with views in each loop var local = field.newLocalTensorField() field.updateAllGhosts() block: var view = local.newTensorFieldView(iokRead) for site in 0..<stencil.nSites: for dir in 0..<4: let fwd = view[stencil.fwd(site, dir)] let bwd = view[stencil.bwd(site, dir)]
proc newTensorField[D: static[int]; R: static[int]; L: Lattice[D]](lattice: L; shape: array[R, int]; T: typedesc): TensorField[D, R, L, T]: Create a new TensorField

Parameters:

lattice: The lattice on which the tensor field is defined

shape: The shape of the tensor field

Returns: A new TensorField instance

Example:

let lattice = newSimpleCubicLattice([16, 16, 16, 16]) let TensorField = lattice.newTensorField([3, 3]): float64
proc paddedGrid[D: static[int]; R: static[int]; L: Lattice[D]; T]( tensor: TensorField[D, R, L, T]): array[D, int]: Get padded grid dimensions (including ghosts on both sides)
proc releaseLocal[D: static[int]; R: static[int]; L: Lattice[D]; T]( tensor: TensorField[D, R, L, T]): Release local data access obtained via accessLocal or accessGhosts
proc updateAllGhosts[D: static[int]; R: static[int]; L: Lattice[D]; T]( tensor: TensorField[D, R, L, T]; updateCorners: bool = false): Update ghost regions in all lattice dimensions

Skips dimensions with ghost width 0 (including inner tensor/complex dimensions). Uses GA_Update_ghost_dir for each lattice dimension.

Example:

var field = lat.newTensorField([3, 3]): Complex64 # ... modify local data ... field.updateAllGhosts() # Update all ghost regions
proc updateGhosts[D: static[int]; R: static[int]; L: Lattice[D]; T]( tensor: TensorField[D, R, L, T]; dim: int; direction: int = 0; updateCorners: bool = false): Update ghost regions for a tensor field in specified dimension

This synchronizes ghost (halo) regions between MPI ranks. Must be called before reading from ghost regions after local updates.

Parameters: tensor: The tensor field to update dim: Dimension to update (0..D-1) direction: +1 for forward, -1 for backward, 0 for both (default) updateCorners: Whether to update corner ghost cells

Example:

var field = lat.newTensorField([3, 3]): Complex64 # ... modify local data ... field.updateGhosts(0) # Update ghosts in x direction field.updateGhosts(1) # Update ghosts in y direction # Now ghost regions contain correct neighbor data