Seeking M5 Max/Ultra telemetry: does higher bandwidth improve Tokens/Joule or just raw throughput?

dilber · Apr 16, 2026

I built an open-source LLM inference telemetry suite that measures Tokens Per Joule — the energy efficiency metric, not just raw speed.

Current baseline is on an M1 Pro (32GB UMA):

2.42 Tokens/Joule on Qwen-3B Q4_K_M
22 t/s on Llama-3.1-8B Q8_0 at 8192 context (13.7GB workload) at 35W whole-SoC power via powermetrics
Zero thermal throttling across 10+ minute sustained loads

Power methodology: the suite uses sudo powermetrics for whole-SoC power telemetry (CPU + GPU + memory controller). Memory tracked
via psutil per-PID RSS. Each config runs 10 times with 95% confidence intervals.

The M5 Max/Ultra should have significantly higher memory bandwidth and GPU throughput. The open question is whether T/J scales
proportionally — or whether the higher power envelope eats into the efficiency gains. Only real telemetry will answer that.

Setup is straightforward (model download is ~25GB, so budget time for that):

Bash:

git clone https://github.com/dilbersha/universal-llm-telemetry-suite

Bash:

python3 -m venv venv && source venv/bin/activate

Bash:

pip install -r requirements-apple-silicon.txt

Bash:

python3 -m venv venv && source venv/bin/activate

Bash:

sudo ./venv/bin/python src/orchestrator.py

Results land in results/<your-chip>/production_benchmarks.csv with 95% CI. Runtime: ~30-45 minutes.

Would be great to get M5 data points to map how T/J evolves across the Apple Silicon generations.

Repo: https://github.com/dilberx/universal-llm-telemetry-suite

Search

Search

Seeking M5 Max/Ultra telemetry: does higher bandwidth improve Tokens/Joule or just raw throughput?

dilber

macrumors newbie

Our Staff