SIMD Lattice Layout

This module provides infrastructure for SIMD-vectorized AoSoA (Array of Structures of Arrays) memory layouts for lattice field theory computations.

The key concept is splitting the lattice into:

innerGeom: SIMD lane grid - sites processed together in one SIMD vector
outerGeom: Remaining sites - outer loop iterates over these

For example, on a 4D lattice 8,8,8,16 with simdGrid 1,2,2,2:

innerGeom = 1,2,2,2 → nSitesInner = 8 SIMD lanes
outerGeom = 8,4,4,8 → nSitesOuter = 1024 outer iterations
Total sites = 8 * 1024 = 8192

This layout enables efficient SIMD processing where:

Outer loop iterates over vector groups (OpenMP parallelized)
Inner "loop" processes VectorWidth sites simultaneously via SIMD

MPI Compatibility:

The simdGrid operates on the LOCAL lattice (after MPI partitioning)
localGrid = globalGrid / mpiGrid for each dimension
simdGrid must evenly divide localGrid in each dimension
Example: globalGrid=16,16,16,32, mpiGrid=2,2,2,2 → localGrid=8,8,8,16 Valid simdGrid: 1,2,2,2, 2,2,2,1, 1,1,1,8, etc. Invalid: 4,4,1,1 since 4 doesn't divide 8 and 4 doesn't divide 8 and... wait it does! Actually invalid example: 3,1,1,1 since 3 doesn't divide 8

Reference: QEX (https://github.com/jcosborn/qex) layout implementation

Types

SimdLatticeLayout = object
  nDim*: int                 ## Number of lattice dimensions
  localGeom*: seq[int]       ## Local lattice dimensions
  simdGrid*: seq[int]        ## SIMD lane grid per dimension
  innerGeom*: seq[int]       ## Inner (SIMD) geometry = simdGrid
  outerGeom*: seq[int]       ## Outer geometry = localGeom / simdGrid
  nSitesInner*: int          ## Total SIMD lanes = product(innerGeom)
  nSitesOuter*: int          ## Outer iterations = product(outerGeom)
  nSites*: int               ## Total sites = nSitesInner * nSitesOuter
  innerStrides*: seq[int]    ## Strides for inner (lane) indexing
  outerStrides*: seq[int]    ## Strides for outer (group) indexing
  localStrides*: seq[int]    ## Strides for full local indexing

SIMD-aware lattice layout for vectorized operations

Stores the decomposition of a lattice into inner (SIMD) and outer (iteration) components for efficient vectorized memory access.

Procs

proc `$`(layout: SimdLatticeLayout): string {....raises: [], tags: [], forbids: [].}: String representation of SIMD layout
proc aosoaIndex(outerIdx, innerIdx, elemIdx, elemsPerSite, nSitesInner: int): int {. inline, ...raises: [], tags: [], forbids: [].}: Compute AoSoA memory index for vectorized layout

AoSoA layout: outerIdx innerIdx

outerIdx: vector group index (0 to nSitesOuter-1)

innerIdx: SIMD lane (0 to nSitesInner-1)

elemIdx: element within tensor (0 to elemsPerSite-1)

Memory: group0e0: lane0,lane1,..., e1: lane0,lane1,..., group1...
proc aosoaIndexFromLocal(localIdx, elemIdx, elemsPerSite: int; layout: SimdLatticeLayout): int {.inline, ...raises: [], tags: [], forbids: [].}: Compute AoSoA memory index from local site index
proc computeLocalGeom(globalGeom, mpiGrid: openArray[int]): seq[int] {. ...raises: [], tags: [], forbids: [].}: Compute local geometry from global geometry and MPI grid

Parameters: globalGeom: Global lattice dimensions mpiGrid: MPI rank grid

Returns: Local geometry for each MPI rank: localGeomd = globalGeomd / mpiGridd
proc computeProduct(geom: seq[int]): int {....raises: [], tags: [], forbids: [].}: Compute product of all dimensions
proc computeStrides(geom: seq[int]): seq[int] {....raises: [], tags: [], forbids: [].}: Compute lexicographic strides for a geometry strided = product of dimensions 0..<d
proc coordsToLexicographic(coords: seq[int]; strides: seq[int]): int {. ...raises: [], tags: [], forbids: [].}: Convert coordinates to lexicographic index
proc generateCoordTable(layout: SimdLatticeLayout): seq[seq[int]] {....raises: [], tags: [], forbids: [].}: Generate coordinate lookup table: coordTableouterIdx = localSiteIdx

Pre-computes the mapping from (outerIdx, lane) to local site index for efficient vectorized iteration.
proc lexicographicToCoords(idx: int; strides: seq[int]; geom: seq[int]): seq[int] {. ...raises: [], tags: [], forbids: [].}: Convert lexicographic index to coordinates
proc localToOuterInner(localIdx: int; layout: SimdLatticeLayout): tuple[ outer, inner: int] {.inline, ...raises: [], tags: [], forbids: [].}: Convert local site index to (outerIdx, innerIdx) pair

Inverse of outerInnerToLocal.
proc newSimdLatticeLayout(localGeom: openArray[int]; simdGrid: openArray[int]): SimdLatticeLayout {. ...raises: [], tags: [], forbids: [].}: Create a new SIMD lattice layout

Parameters: localGeom: Local lattice dimensions (e.g., 8, 8, 8, 16) simdGrid: SIMD lane grid per dimension (e.g., 1, 2, 2, 2 for 8 lanes)

The simdGrid must evenly divide localGeom in each dimension. Total SIMD lanes = product of simdGrid elements.

Example: let layout = newSimdLatticeLayout(8,8,8,16, 1,2,2,2) # nSitesInner = 8 (SIMD width) # nSitesOuter = 1024 (outer loop iterations)
proc newSimdLatticeLayout(localGeom: openArray[int]; simdWidth: int): SimdLatticeLayout {. ...raises: [], tags: [], forbids: [].}: Create SIMD layout with automatic lane distribution

Automatically distributes simdWidth lanes across dimensions, prioritizing faster-varying (lower) dimensions.

Example: let layout = newSimdLatticeLayout(8,8,8,16, 8) # Might produce simdGrid = 2,2,2,1 or 1,1,2,4 depending on divisibility
proc outerInnerToLocal(outerIdx, innerIdx: int; layout: SimdLatticeLayout): int {. inline, ...raises: [], tags: [], forbids: [].}: Convert (outerIdx, innerIdx) pair to local site index

Given an outer index (vector group) and inner index (SIMD lane), returns the corresponding local site index.

Local coordinates: localCoordd = outerCoordd * innerGeomd + innerCoordd
proc simdLanes(layout: SimdLatticeLayout): int {.inline, ...raises: [], tags: [], forbids: [].}: Return the number of SIMD lanes (sites per vector group)
proc validateSimdGrid(globalGeom, mpiGrid, simdGrid: openArray[int]): tuple[ valid: bool, message: string] {....raises: [], tags: [], forbids: [].}: Validate that simdGrid is compatible with the MPI partitioned lattice

Parameters: globalGeom: Global lattice dimensions (e.g., 16, 16, 16, 32) mpiGrid: MPI rank grid (e.g., 2, 2, 2, 2) simdGrid: Proposed SIMD lane grid (e.g., 1, 2, 2, 2)

Returns: (valid, message) where valid=true if compatible, false otherwise with error message

Example: let (ok, msg) = validateSimdGrid(16,16,16,32, 2,2,2,2, 1,2,2,2) if not ok: echo "Error: ", msg
proc vectorGroups(layout: SimdLatticeLayout): int {.inline, ...raises: [], tags: [], forbids: [].}: Return the number of vector groups (outer loop iterations)

simd/simdlayout

Types

Procs