SDK

MPI Support

Automatic MPI detection and coordination for parallel simulations

MPI Support

Automatic MPI detection and coordination for parallel simulations.


Overview

The SDK automatically:

  • Detects MPI environments
  • Identifies master (rank 0) and worker processes
  • Only logs from master to avoid duplicates
  • Returns None from logging methods on workers

No code changes required — just run with mpirun.


Supported Implementations

ImplementationDetection Method
OpenMPIOMPI_COMM_WORLD_RANK env var
Intel MPIPMI_RANK env var
MS-MPIMPI_LOCALRANKID env var
mpi4pyMPI.COMM_WORLD.Get_rank()
Meepmeep.am_master()

Basic Usage

mpirun -n 4 python simulation.py
from optixlog import Optixlog

# Initialize client - MPI detected automatically
client = Optixlog(api_key="your_api_key")

# MPI info available on client
print(f"Rank: {client.rank}, Size: {client.size}")
print(f"Is master: {client.is_master}")

# Create project and run
project = client.project(name="ParallelSimulations")
run = project.run(name="parallel_sim", config={"processes": client.size})

# Only master logs - workers return None
for step in range(100):
    result = compute(step)
    run.log(step=step, result=result)  # No-op on workers

Master vs Worker

Master Process (rank 0)

  • Creates API connection
  • Creates the run
  • Logs all data
  • Gets return values

Worker Processes (rank > 0)

  • Skip API connection
  • Skip logging (returns None)
  • Can access MPI info

Example:

from optixlog import Optixlog

client = Optixlog(api_key="your_api_key")

if client.is_master:
    print("I'm the master")
else:
    print(f"I'm worker {client.rank}")

# Create run (only master actually creates it)
project = client.project(name="MyProject")
run = project.run(name="experiment")

MPI Information

from optixlog import Optixlog

client = Optixlog(api_key="your_api_key")

# Properties on client
print(client.is_master)  # True/False
print(client.rank)       # 0, 1, 2, ...
print(client.size)       # Total processes

# Method for full info
info = client.get_mpi_info()
# {"is_master": True, "rank": 0, "size": 4, "has_mpi": True}

Synchronization

Barrier

Wait for all processes:

from optixlog import Optixlog

client = Optixlog(api_key="your_api_key")
project = client.project(name="SyncDemo")
run = project.run(name="barrier_example")

# All processes do work
do_parallel_work()

# Wait for all
client.barrier()

# Master logs summary
if client.is_master:
    run.log(step=0, message="All done")

Broadcast Run ID

Share run_id with workers:

from optixlog import Optixlog

client = Optixlog(api_key="your_api_key")
project = client.project(name="BroadcastDemo")
run = project.run(name="broadcast_example")

if client.is_master:
    client.broadcast_run_id()
else:
    client.broadcast_run_id()  # Receives run_id
    print(f"Worker got: {run.run_id}")

Meep Integration

from optixlog import Optixlog
import meep as mp

# Run: mpirun -n 4 python meep_sim.py

client = Optixlog(api_key="your_api_key")
project = client.project(name="MeepSimulations")
run = project.run(
    name="meep_parallel",
    config={
        "processes": client.size,
        "resolution": 30
    }
)

sim = mp.Simulation(
    cell_size=mp.Vector3(10, 5),
    resolution=30,
    boundary_layers=[mp.PML(1.0)]
)

for step in range(100):
    sim.run(until=1)
    
    # Only master logs
    if client.is_master and step % 10 == 0:
        field = sim.get_array(
            center=mp.Vector3(),
            size=mp.Vector3(10, 5),
            component=mp.Ez
        )
        run.log_array_as_image(f"field_{step}", field, cmap='RdBu')

Distributed Work

from optixlog import Optixlog

client = Optixlog(api_key="your_api_key")
project = client.project(name="Distributed")
run = project.run(name="distributed_work")

# Distribute iterations across processes
my_range = range(client.rank, 1000, client.size)

results = []
for i in my_range:
    results.append(expensive_compute(i))

# Synchronize all processes
client.barrier()

# Master collects and logs
if client.is_master:
    all_results = gather(results)  # Your gather function
    run.log(step=0, total=sum(all_results))

Troubleshooting

MPI Not Detected

Symptoms: is_master always True, rank always 0

Check:

echo $OMPI_COMM_WORLD_RANK  # Should show rank
mpirun --version            # Should work

Fix:

pip install mpi4py

All Processes Logging

Fix: Check is_master:

if client.is_master:
    run.log(step=step, value=value)

Workers Hanging

Cause: Unbalanced barrier calls

Fix: All processes must call:

client.barrier()  # Every process must call this

run_id Not Available on Workers

Fix: Use broadcast:

if client.is_master:
    client.broadcast_run_id()
else:
    client.broadcast_run_id()
    # Now run_id is available on workers

Detection Priority

The SDK checks in order:

  1. Environment variables (fastest)
  2. mpi4py library
  3. Meep's am_master()
  4. Fallback to single process

Best Practices

  1. Always check is_master before logging
  2. Use barrier() before collective operations
  3. Broadcast run_id if workers need it
  4. Don't log in tight loops on all processes
  5. Install mpi4py for reliable detection

Complete Example

from optixlog import Optixlog
import numpy as np

# Initialize - MPI auto-detected
client = Optixlog(api_key="your_api_key")

print(f"Process {client.rank}/{client.size} starting...")

# Create project and run
project = client.project(name="ParallelCompute")
run = project.run(
    name="distributed_simulation",
    config={
        "total_processes": client.size,
        "iterations": 1000
    }
)

# Distribute work
iterations_per_process = 1000 // client.size
my_start = client.rank * iterations_per_process
my_end = my_start + iterations_per_process

local_results = []
for i in range(my_start, my_end):
    result = expensive_computation(i)
    local_results.append(result)
    
    # Log progress (only master actually logs)
    if i % 100 == 0:
        run.log(step=i, partial_result=result)

# Sync before gathering
client.barrier()

# Master gathers and logs final results
if client.is_master:
    # Gather from all processes (using your MPI gather)
    all_results = mpi_gather(local_results)
    
    run.log(
        step=1000,
        total_sum=sum(all_results),
        mean=np.mean(all_results),
        std=np.std(all_results)
    )
    
    print("✓ Simulation complete!")

Next Steps

On this page