SIMD-Vectorized Each Macro for TensorFieldView (OpenMP backend)

This module provides the each iterator for SIMD-vectorized parallel loops on TensorFieldView objects. Uses OpenMP for thread parallelism and the SIMD infrastructure (SimdVecDyn, AoSoA layout) from simd/ for vectorization.

The macro analyzes the loop body at compile time to detect operation patterns (copy, add, subtract, scalar multiply, matrix multiply, etc.) and generates SIMD-vectorized code that:

Iterates over vector groups (outer loop, OpenMP parallelized)
Loads contiguous AoSoA lanes as SimdVecDyn vectors
Performs arithmetic on full SIMD vectors
Stores results back to AoSoA layout

Usage: for n in each 0..<view.numSites(): viewCn = viewAn + viewBn # Vectorized via SIMD load/add/store

For LocalTensorField operations, see all in omplocal.nim.

Imports

ompbase, ompwrap, ../simd/simdtypes, ../tensor/sitetensor

Macros

macro each(forLoop: ForLoopStmt): untyped: SIMD-vectorized parallel each loop for TensorFieldView (OpenMP backend)

Analyzes the loop body at compile time to detect operation patterns, then generates SIMD-vectorized code using the AoSoA layout and SimdVecDyn load/store/arithmetic from simd/.

Recognized patterns (SIMD-vectorized): viewCn = viewAn # Copy viewCn = viewAn + viewBn # Addition viewCn = viewAn - viewBn # Subtraction viewCn = 2.0 * viewAn # Scalar multiply viewCn = viewAn + 3.0 # Scalar add viewCn = viewAn * viewBn # Matrix multiply

Falls back to scalar per-site loop for:

Echo/print statements

Stencil neighbor access

Complex/unrecognized expressions

Usage: for n in each 0..<view.numSites(): viewCn = viewAn + viewBn
macro eachImpl(loopVar: untyped; lo: typed; hi: typed; body: typed): untyped: Internal typed macro that receives full type information. Analyzes the expression pattern and generates SIMD-vectorized code using loadSimdVectorDyn/storeSimdVectorDyn on AoSoA data.

Exports

getNumThreads, omp_get_max_threads, getThreadId, initOpenMP, ompParallel, omp_get_thread_num

openmp/ompdisp

Imports

Macros

Exports