"Documentation is a love letter that you write to your future self." — Damian Conway
ReliQ is an experimental lattice field theory (LFT) framework written in Nim, designed for user-friendliness, performance, reliability, and portability across current and future heterogeneous architectures.
import reliq parallel: # Create a 4D lattice (8×8×8×16) with MPI decomposition let lat = newSimpleCubicLattice([8, 8, 8, 16]) block: # Create tensor fields on the lattice var fieldA = lat.newTensorField([3, 3]): float64 var fieldB = lat.newTensorField([3, 3]): float64 var fieldC = lat.newTensorField([3, 3]): float64 # Get local host-memory views var localA = fieldA.newLocalTensorField() var localB = fieldB.newLocalTensorField() var localC = fieldC.newLocalTensorField() # Host-side initialization with "for all" loop for n in all 0..<localA.numSites(): var siteA = localA.getSite(n) siteA[0, 0] = 1.0 # Set matrix element (0,0) # Create device views for backend dispatch var vA = localA.newTensorFieldView(iokRead) var vB = localB.newTensorFieldView(iokRead) var vC = localC.newTensorFieldView(iokWrite) # Device-side computation with "each" loop for n in each 0..<vA.numSites(): vC[n] = vA[n] + vB[n] # Matrix addition vC[n] = vA[n] * vB[n] # Matrix multiplication vC[n] = 3.0 * vA[n] # Scalar multiplication
ReliQ is organized into several layers. Each pillar has its own detailed guide page:
┌──────────────────────────────────────────────────────────┐ │ User Code │ │ import reliq; for n in each ...: ... │ ├──────────────────────────────────────────────────────────┤ │ Tensor Layer │ │ TensorField ─► LocalTensorField ─► TensorFieldView │ │ (GA/MPI) (direct GA ptr) (device buffers) │ ├──────────────────────────────────────────────────────────┤ │ GlobalShifter · LatticeStencil · Transporter │ │ discreteLaplacian · applyStencilShift │ ├──────────────────────────────────────────────────────────┤ │ Backend Dispatch │ │ OpenCL (JIT) │ SYCL (pre-compiled) │ OpenMP │ │ (cldisp) │ (sycldisp) │ (ompdisp) │ ├──────────────────────────────────────────────────────────┤ │ Memory & Communication │ │ Global Arrays · MPI · AoSoA Layout · SIMD Intrinsics │ └──────────────────────────────────────────────────────────┘
ReliQ supports three compute backends, selected at compile time:
| Backend | Flag | Requirements | Best For |
|---|---|---|---|
| OpenCL | (default) | OpenCL runtime (pocl, vendor SDK) | GPUs, FPGAs |
| SYCL | BACKEND=sycl | Intel oneAPI (icpx) or hipSYCL | Intel GPUs, modern CPUs |
| OpenMP | BACKEND=openmp | GCC/Clang with OpenMP | CPU-only, SIMD vectorization |
# OpenCL (default) make tensorview # OpenMP make tensorview BACKEND=openmp # SYCL (requires building the wrapper library first) make sycl-lib make tensorview BACKEND=sycl
# Run all tests across all backends make test # Run tests for a specific backend make test-core # Backend-agnostic core tests make test-opencl # OpenCL backend tests make test-openmp # OpenMP backend tests make test-sycl # SYCL backend tests
The each macro is the primary mechanism for expressing computations on lattice fields. It analyzes the loop body at compile time and generates optimized backend-specific code.
import reliq parallel: let lat = newSimpleCubicLattice([8, 8, 8, 16]) block: # Create distributed tensor fields var fieldA = lat.newTensorField([3, 3]): float64 # 3x3 matrix field var fieldB = lat.newTensorField([3, 3]): float64 var fieldC = lat.newTensorField([3, 3]): float64 # Get local host-memory views var localA = fieldA.newLocalTensorField() var localB = fieldB.newLocalTensorField() var localC = fieldC.newLocalTensorField() # Create device-side views var vA = localA.newTensorFieldView(iokRead) # read-only var vB = localB.newTensorFieldView(iokRead) # read-only var vC = localC.newTensorFieldView(iokWrite) # write-only
# Vector/matrix copy for n in each 0..<vC.numSites(): vC[n] = vA[n] # Scalar multiplication for n in each 0..<vC.numSites(): vC[n] = 3.0 * vA[n] # Matrix multiplication for n in each 0..<vC.numSites(): vC[n] = vA[n] * vB[n] # Addition / subtraction for n in each 0..<vC.numSites(): vC[n] = vA[n] + vB[n] # Combined expressions for n in each 0..<vC.numSites(): vC[n] = vA[n] * vB[n] + vC[n]
let stencil = newLatticeStencil(nearestNeighborStencil[4](), lat) for n in each 0..<vC.numSites(): let fwd = stencil.fwd(n, 0) # Forward x-neighbor let bwd = stencil.bwd(n, 0) # Backward x-neighbor vC[n] = vA[fwd] + vA[bwd] - 2.0 * vA[n]
The all loop operates on LocalTensorField objects for host-side site-level operations using LocalSiteProxy:
# Initialize via site proxy for n in all 0..<localA.numSites(): var site = localA.getSite(n) site[0, 0] = 1.0 # Set matrix element (row, col) # Arithmetic operations for n in all 0..<localC.numSites(): localC[n] = localA.getSite(n) + localB.getSite(n) # add localC[n] = localA.getSite(n) * localB.getSite(n) # multiply localC[n] = 2.5 * localA.getSite(n) # scale
ReliQ uses an Array of Structures of Arrays (AoSoA) memory layout for optimal SIMD and GPU performance:
Traditional AoS: [s0e0, s0e1, s1e0, s1e1, s2e0, s2e1, ...]
ReliQ AoSoA (VW=4): [s0e0, s1e0, s2e0, s3e0, ← element 0, group 0
s0e1, s1e1, s2e1, s3e1, ← element 1, group 0
s4e0, s5e0, s6e0, s7e0, ← element 0, group 1
s4e1, s5e1, s6e1, s7e1] ← element 1, group 1
Index formula: For site s with element index i:
group = s / VW lane = s mod VW index = group * (elemsPerSite * VW) + i * VW + lane
Where VW (VectorWidth) is typically 8 for CPU (AVX-512) or configurable via -d:VectorWidth=N.
For operations that span MPI boundaries at the TensorField level (before down-casting to views), GlobalShifter performs distributed nearest-neighbour shifts using Global Arrays ghost exchange.
parallel: let lat = newSimpleCubicLattice([8, 8, 8, 16], [1, 1, 1, 4], [1, 1, 1, 1]) block: var src = lat.newTensorField([1, 1]): float64 var dest = lat.newTensorField([1, 1]): float64 # Fill src ... # Shift forward in t-dimension (crosses MPI boundaries) let shifter = newGlobalShifter(src, dim=3, len=1) shifter.apply(src, dest) # dest[x] = src[x + e_t] # Create shifters for all dimensions at once let fwd = newGlobalShifters(src, len=1) let bwd = newGlobalBackwardShifters(src, len=1) # Discrete Laplacian: sum_mu (f[x+mu] + f[x-mu]) - 2D * f[x] var lap = lat.newTensorField([1, 1]): float64 var scratch = lat.newTensorField([1, 1]): float64 discreteLaplacian(src, lap, scratch)
| Layer | Type | Communication |
|---|---|---|
| GlobalShifter | TensorField | GA ghost exchange (MPI) |
| Shifter / Transporter | TensorFieldView | Device-side halo buffers |
Use GlobalShifter when working directly with distributed tensor fields (e.g. setup, I/O, measurement code). Use Shifter / Transporter when the data is already on-device inside an each loop.
ReliQ supports standard lattice QCD file formats through the io module:
parallel: let lat = newSimpleCubicLattice([8, 8, 8, 16]) block: # Read an ILDG gauge configuration var g0 = lat.newTensorField([3, 3]): Complex64 var g1 = lat.newTensorField([3, 3]): Complex64 var g2 = lat.newTensorField([3, 3]): Complex64 var g3 = lat.newTensorField([3, 3]): Complex64 var gaugeField = [g0, g1, g2, g3] readGaugeField(gaugeField, "config.ildg") # Write a tensor field to LIME/SciDAC format var field = lat.newTensorField([3, 3]): float64 writeTensorField(field, "output.lime")
MIT License — Copyright (c) 2025 reliq-lft
See LICENSE for details.