simd/simdlayout

SIMD Lattice Layout

This module provides infrastructure for SIMD-vectorized AoSoA (Array of Structures of Arrays) memory layouts for lattice field theory computations.

The key concept is splitting the lattice into:

  • innerGeom: SIMD lane grid - sites processed together in one SIMD vector
  • outerGeom: Remaining sites - outer loop iterates over these

For example, on a 4D lattice 8,8,8,16 with simdGrid 1,2,2,2:

  • innerGeom = 1,2,2,2 → nSitesInner = 8 SIMD lanes
  • outerGeom = 8,4,4,8 → nSitesOuter = 1024 outer iterations
  • Total sites = 8 * 1024 = 8192

This layout enables efficient SIMD processing where:

  • Outer loop iterates over vector groups (OpenMP parallelized)
  • Inner "loop" processes VectorWidth sites simultaneously via SIMD

MPI Compatibility:

  • The simdGrid operates on the LOCAL lattice (after MPI partitioning)
  • localGrid = globalGrid / mpiGrid for each dimension
  • simdGrid must evenly divide localGrid in each dimension
  • Example: globalGrid=16,16,16,32, mpiGrid=2,2,2,2 → localGrid=8,8,8,16 Valid simdGrid: 1,2,2,2, 2,2,2,1, 1,1,1,8, etc. Invalid: 4,4,1,1 since 4 doesn't divide 8 and 4 doesn't divide 8 and... wait it does! Actually invalid example: 3,1,1,1 since 3 doesn't divide 8

Reference: QEX (https://github.com/jcosborn/qex) layout implementation

Types

SimdLatticeLayout = object
  nDim*: int                 ## Number of lattice dimensions
  localGeom*: seq[int]       ## Local lattice dimensions
  simdGrid*: seq[int]        ## SIMD lane grid per dimension
  innerGeom*: seq[int]       ## Inner (SIMD) geometry = simdGrid
  outerGeom*: seq[int]       ## Outer geometry = localGeom / simdGrid
  nSitesInner*: int          ## Total SIMD lanes = product(innerGeom)
  nSitesOuter*: int          ## Outer iterations = product(outerGeom)
  nSites*: int               ## Total sites = nSitesInner * nSitesOuter
  innerStrides*: seq[int]    ## Strides for inner (lane) indexing
  outerStrides*: seq[int]    ## Strides for outer (group) indexing
  localStrides*: seq[int]    ## Strides for full local indexing

SIMD-aware lattice layout for vectorized operations

Stores the decomposition of a lattice into inner (SIMD) and outer (iteration) components for efficient vectorized memory access.

Procs

proc `$`(layout: SimdLatticeLayout): string {....raises: [], tags: [], forbids: [].}
String representation of SIMD layout
proc aosoaIndex(outerIdx, innerIdx, elemIdx, elemsPerSite, nSitesInner: int): int {.
    inline, ...raises: [], tags: [], forbids: [].}

Compute AoSoA memory index for vectorized layout

AoSoA layout: outerIdxinnerIdx

  • outerIdx: vector group index (0 to nSitesOuter-1)
  • innerIdx: SIMD lane (0 to nSitesInner-1)
  • elemIdx: element within tensor (0 to elemsPerSite-1)

Memory: group0e0: lane0,lane1,..., e1: lane0,lane1,..., group1...

proc aosoaIndexFromLocal(localIdx, elemIdx, elemsPerSite: int;
                         layout: SimdLatticeLayout): int {.inline, ...raises: [],
    tags: [], forbids: [].}
Compute AoSoA memory index from local site index
proc computeLocalGeom(globalGeom, mpiGrid: openArray[int]): seq[int] {.
    ...raises: [], tags: [], forbids: [].}

Compute local geometry from global geometry and MPI grid

Parameters: globalGeom: Global lattice dimensions mpiGrid: MPI rank grid

Returns: Local geometry for each MPI rank: localGeomd = globalGeomd / mpiGridd

proc computeProduct(geom: seq[int]): int {....raises: [], tags: [], forbids: [].}
Compute product of all dimensions
proc computeStrides(geom: seq[int]): seq[int] {....raises: [], tags: [],
    forbids: [].}
Compute lexicographic strides for a geometry strided = product of dimensions 0..<d
proc coordsToLexicographic(coords: seq[int]; strides: seq[int]): int {.
    ...raises: [], tags: [], forbids: [].}
Convert coordinates to lexicographic index
proc generateCoordTable(layout: SimdLatticeLayout): seq[seq[int]] {....raises: [],
    tags: [], forbids: [].}

Generate coordinate lookup table: coordTableouterIdx = localSiteIdx

Pre-computes the mapping from (outerIdx, lane) to local site index for efficient vectorized iteration.

proc lexicographicToCoords(idx: int; strides: seq[int]; geom: seq[int]): seq[int] {.
    ...raises: [], tags: [], forbids: [].}
Convert lexicographic index to coordinates
proc localToOuterInner(localIdx: int; layout: SimdLatticeLayout): tuple[
    outer, inner: int] {.inline, ...raises: [], tags: [], forbids: [].}

Convert local site index to (outerIdx, innerIdx) pair

Inverse of outerInnerToLocal.

proc newSimdLatticeLayout(localGeom: openArray[int]; simdGrid: openArray[int]): SimdLatticeLayout {.
    ...raises: [], tags: [], forbids: [].}

Create a new SIMD lattice layout

Parameters: localGeom: Local lattice dimensions (e.g., 8, 8, 8, 16) simdGrid: SIMD lane grid per dimension (e.g., 1, 2, 2, 2 for 8 lanes)

The simdGrid must evenly divide localGeom in each dimension. Total SIMD lanes = product of simdGrid elements.

Example: let layout = newSimdLatticeLayout(8,8,8,16, 1,2,2,2) # nSitesInner = 8 (SIMD width) # nSitesOuter = 1024 (outer loop iterations)

proc newSimdLatticeLayout(localGeom: openArray[int]; simdWidth: int): SimdLatticeLayout {.
    ...raises: [], tags: [], forbids: [].}

Create SIMD layout with automatic lane distribution

Automatically distributes simdWidth lanes across dimensions, prioritizing faster-varying (lower) dimensions.

Example: let layout = newSimdLatticeLayout(8,8,8,16, 8) # Might produce simdGrid = 2,2,2,1 or 1,1,2,4 depending on divisibility

proc outerInnerToLocal(outerIdx, innerIdx: int; layout: SimdLatticeLayout): int {.
    inline, ...raises: [], tags: [], forbids: [].}

Convert (outerIdx, innerIdx) pair to local site index

Given an outer index (vector group) and inner index (SIMD lane), returns the corresponding local site index.

Local coordinates: localCoordd = outerCoordd * innerGeomd + innerCoordd

proc simdLanes(layout: SimdLatticeLayout): int {.inline, ...raises: [], tags: [],
    forbids: [].}
Return the number of SIMD lanes (sites per vector group)
proc validateSimdGrid(globalGeom, mpiGrid, simdGrid: openArray[int]): tuple[
    valid: bool, message: string] {....raises: [], tags: [], forbids: [].}

Validate that simdGrid is compatible with the MPI partitioned lattice

Parameters: globalGeom: Global lattice dimensions (e.g., 16, 16, 16, 32) mpiGrid: MPI rank grid (e.g., 2, 2, 2, 2) simdGrid: Proposed SIMD lane grid (e.g., 1, 2, 2, 2)

Returns: (valid, message) where valid=true if compatible, false otherwise with error message

Example: let (ok, msg) = validateSimdGrid(16,16,16,32, 2,2,2,2, 1,2,2,2) if not ok: echo "Error: ", msg

proc vectorGroups(layout: SimdLatticeLayout): int {.inline, ...raises: [],
    tags: [], forbids: [].}
Return the number of vector groups (outer loop iterations)