SIMD-Vectorized Each Macro for TensorFieldView (OpenMP backend)
This module provides the each iterator for SIMD-vectorized parallel loops on TensorFieldView objects. Uses OpenMP for thread parallelism and the SIMD infrastructure (SimdVecDyn, AoSoA layout) from simd/ for vectorization.
The macro analyzes the loop body at compile time to detect operation patterns (copy, add, subtract, scalar multiply, matrix multiply, etc.) and generates SIMD-vectorized code that:
- Iterates over vector groups (outer loop, OpenMP parallelized)
- Loads contiguous AoSoA lanes as SimdVecDyn vectors
- Performs arithmetic on full SIMD vectors
- Stores results back to AoSoA layout
Usage: for n in each 0..<view.numSites(): viewCn = viewAn + viewBn # Vectorized via SIMD load/add/store
For LocalTensorField operations, see all in omplocal.nim.
Macros
macro each(forLoop: ForLoopStmt): untyped
-
SIMD-vectorized parallel each loop for TensorFieldView (OpenMP backend)
Analyzes the loop body at compile time to detect operation patterns, then generates SIMD-vectorized code using the AoSoA layout and SimdVecDyn load/store/arithmetic from simd/.
Recognized patterns (SIMD-vectorized): viewCn = viewAn # Copy viewCn = viewAn + viewBn # Addition viewCn = viewAn - viewBn # Subtraction viewCn = 2.0 * viewAn # Scalar multiply viewCn = viewAn + 3.0 # Scalar add viewCn = viewAn * viewBn # Matrix multiply
Falls back to scalar per-site loop for:
- Echo/print statements
- Stencil neighbor access
- Complex/unrecognized expressions
Usage: for n in each 0..<view.numSites(): viewCn = viewAn + viewBn
macro eachImpl(loopVar: untyped; lo: typed; hi: typed; body: typed): untyped
- Internal typed macro that receives full type information. Analyzes the expression pattern and generates SIMD-vectorized code using loadSimdVectorDyn/storeSimdVectorDyn on AoSoA data.