SIMD Lattice Layout
This module provides infrastructure for SIMD-vectorized AoSoA (Array of Structures of Arrays) memory layouts for lattice field theory computations.
The key concept is splitting the lattice into:
- innerGeom: SIMD lane grid - sites processed together in one SIMD vector
- outerGeom: Remaining sites - outer loop iterates over these
For example, on a 4D lattice 8,8,8,16 with simdGrid 1,2,2,2:
- innerGeom = 1,2,2,2 → nSitesInner = 8 SIMD lanes
- outerGeom = 8,4,4,8 → nSitesOuter = 1024 outer iterations
- Total sites = 8 * 1024 = 8192
This layout enables efficient SIMD processing where:
- Outer loop iterates over vector groups (OpenMP parallelized)
- Inner "loop" processes VectorWidth sites simultaneously via SIMD
MPI Compatibility:
- The simdGrid operates on the LOCAL lattice (after MPI partitioning)
- localGrid = globalGrid / mpiGrid for each dimension
- simdGrid must evenly divide localGrid in each dimension
- Example: globalGrid=16,16,16,32, mpiGrid=2,2,2,2 → localGrid=8,8,8,16 Valid simdGrid: 1,2,2,2, 2,2,2,1, 1,1,1,8, etc. Invalid: 4,4,1,1 since 4 doesn't divide 8 and 4 doesn't divide 8 and... wait it does! Actually invalid example: 3,1,1,1 since 3 doesn't divide 8
Reference: QEX (https://github.com/jcosborn/qex) layout implementation
Types
SimdLatticeLayout = object nDim*: int ## Number of lattice dimensions localGeom*: seq[int] ## Local lattice dimensions simdGrid*: seq[int] ## SIMD lane grid per dimension innerGeom*: seq[int] ## Inner (SIMD) geometry = simdGrid outerGeom*: seq[int] ## Outer geometry = localGeom / simdGrid nSitesInner*: int ## Total SIMD lanes = product(innerGeom) nSitesOuter*: int ## Outer iterations = product(outerGeom) nSites*: int ## Total sites = nSitesInner * nSitesOuter innerStrides*: seq[int] ## Strides for inner (lane) indexing outerStrides*: seq[int] ## Strides for outer (group) indexing localStrides*: seq[int] ## Strides for full local indexing
-
SIMD-aware lattice layout for vectorized operations
Stores the decomposition of a lattice into inner (SIMD) and outer (iteration) components for efficient vectorized memory access.
Procs
proc `$`(layout: SimdLatticeLayout): string {....raises: [], tags: [], forbids: [].}
- String representation of SIMD layout
proc aosoaIndex(outerIdx, innerIdx, elemIdx, elemsPerSite, nSitesInner: int): int {. inline, ...raises: [], tags: [], forbids: [].}
-
Compute AoSoA memory index for vectorized layout
AoSoA layout: outerIdxinnerIdx
- outerIdx: vector group index (0 to nSitesOuter-1)
- innerIdx: SIMD lane (0 to nSitesInner-1)
- elemIdx: element within tensor (0 to elemsPerSite-1)
Memory: group0e0: lane0,lane1,..., e1: lane0,lane1,..., group1...
proc aosoaIndexFromLocal(localIdx, elemIdx, elemsPerSite: int; layout: SimdLatticeLayout): int {.inline, ...raises: [], tags: [], forbids: [].}
- Compute AoSoA memory index from local site index
proc computeLocalGeom(globalGeom, mpiGrid: openArray[int]): seq[int] {. ...raises: [], tags: [], forbids: [].}
-
Compute local geometry from global geometry and MPI grid
Parameters: globalGeom: Global lattice dimensions mpiGrid: MPI rank grid
Returns: Local geometry for each MPI rank: localGeomd = globalGeomd / mpiGridd
proc computeProduct(geom: seq[int]): int {....raises: [], tags: [], forbids: [].}
- Compute product of all dimensions
proc computeStrides(geom: seq[int]): seq[int] {....raises: [], tags: [], forbids: [].}
- Compute lexicographic strides for a geometry strided = product of dimensions 0..<d
proc coordsToLexicographic(coords: seq[int]; strides: seq[int]): int {. ...raises: [], tags: [], forbids: [].}
- Convert coordinates to lexicographic index
proc generateCoordTable(layout: SimdLatticeLayout): seq[seq[int]] {....raises: [], tags: [], forbids: [].}
-
Generate coordinate lookup table: coordTableouterIdx = localSiteIdx
Pre-computes the mapping from (outerIdx, lane) to local site index for efficient vectorized iteration.
proc lexicographicToCoords(idx: int; strides: seq[int]; geom: seq[int]): seq[int] {. ...raises: [], tags: [], forbids: [].}
- Convert lexicographic index to coordinates
proc localToOuterInner(localIdx: int; layout: SimdLatticeLayout): tuple[ outer, inner: int] {.inline, ...raises: [], tags: [], forbids: [].}
-
Convert local site index to (outerIdx, innerIdx) pair
Inverse of outerInnerToLocal.
proc newSimdLatticeLayout(localGeom: openArray[int]; simdGrid: openArray[int]): SimdLatticeLayout {. ...raises: [], tags: [], forbids: [].}
-
Create a new SIMD lattice layout
Parameters: localGeom: Local lattice dimensions (e.g., 8, 8, 8, 16) simdGrid: SIMD lane grid per dimension (e.g., 1, 2, 2, 2 for 8 lanes)
The simdGrid must evenly divide localGeom in each dimension. Total SIMD lanes = product of simdGrid elements.
Example: let layout = newSimdLatticeLayout(8,8,8,16, 1,2,2,2) # nSitesInner = 8 (SIMD width) # nSitesOuter = 1024 (outer loop iterations)
proc newSimdLatticeLayout(localGeom: openArray[int]; simdWidth: int): SimdLatticeLayout {. ...raises: [], tags: [], forbids: [].}
-
Create SIMD layout with automatic lane distribution
Automatically distributes simdWidth lanes across dimensions, prioritizing faster-varying (lower) dimensions.
Example: let layout = newSimdLatticeLayout(8,8,8,16, 8) # Might produce simdGrid = 2,2,2,1 or 1,1,2,4 depending on divisibility
proc outerInnerToLocal(outerIdx, innerIdx: int; layout: SimdLatticeLayout): int {. inline, ...raises: [], tags: [], forbids: [].}
-
Convert (outerIdx, innerIdx) pair to local site index
Given an outer index (vector group) and inner index (SIMD lane), returns the corresponding local site index.
Local coordinates: localCoordd = outerCoordd * innerGeomd + innerCoordd
proc simdLanes(layout: SimdLatticeLayout): int {.inline, ...raises: [], tags: [], forbids: [].}
- Return the number of SIMD lanes (sites per vector group)
proc validateSimdGrid(globalGeom, mpiGrid, simdGrid: openArray[int]): tuple[ valid: bool, message: string] {....raises: [], tags: [], forbids: [].}
-
Validate that simdGrid is compatible with the MPI partitioned lattice
Parameters: globalGeom: Global lattice dimensions (e.g., 16, 16, 16, 32) mpiGrid: MPI rank grid (e.g., 2, 2, 2, 2) simdGrid: Proposed SIMD lane grid (e.g., 1, 2, 2, 2)
Returns: (valid, message) where valid=true if compatible, false otherwise with error message
Example: let (ok, msg) = validateSimdGrid(16,16,16,32, 2,2,2,2, 1,2,2,2) if not ok: echo "Error: ", msg
proc vectorGroups(layout: SimdLatticeLayout): int {.inline, ...raises: [], tags: [], forbids: [].}
- Return the number of vector groups (outer loop iterations)