openmp/ompdisp

Search:
Group by:

SIMD-Vectorized Each Macro for TensorFieldView (OpenMP backend)

This module provides the each iterator for SIMD-vectorized parallel loops on TensorFieldView objects. Uses OpenMP for thread parallelism and the SIMD infrastructure (SimdVecDyn, AoSoA layout) from simd/ for vectorization.

The macro analyzes the loop body at compile time to detect operation patterns (copy, add, subtract, scalar multiply, matrix multiply, etc.) and generates SIMD-vectorized code that:

  • Iterates over vector groups (outer loop, OpenMP parallelized)
  • Loads contiguous AoSoA lanes as SimdVecDyn vectors
  • Performs arithmetic on full SIMD vectors
  • Stores results back to AoSoA layout

Usage: for n in each 0..<view.numSites(): viewCn = viewAn + viewBn # Vectorized via SIMD load/add/store

For LocalTensorField operations, see all in omplocal.nim.

Macros

macro each(forLoop: ForLoopStmt): untyped

SIMD-vectorized parallel each loop for TensorFieldView (OpenMP backend)

Analyzes the loop body at compile time to detect operation patterns, then generates SIMD-vectorized code using the AoSoA layout and SimdVecDyn load/store/arithmetic from simd/.

Recognized patterns (SIMD-vectorized): viewCn = viewAn # Copy viewCn = viewAn + viewBn # Addition viewCn = viewAn - viewBn # Subtraction viewCn = 2.0 * viewAn # Scalar multiply viewCn = viewAn + 3.0 # Scalar add viewCn = viewAn * viewBn # Matrix multiply

Falls back to scalar per-site loop for:

  • Echo/print statements
  • Stencil neighbor access
  • Complex/unrecognized expressions

Usage: for n in each 0..<view.numSites(): viewCn = viewAn + viewBn

macro eachImpl(loopVar: untyped; lo: typed; hi: typed; body: typed): untyped
Internal typed macro that receives full type information. Analyzes the expression pattern and generates SIMD-vectorized code using loadSimdVectorDyn/storeSimdVectorDyn on AoSoA data.