Story Machine — Benchmarks

Matrix Archive

01 / CYBERPUNK DETECTIVE

"A cinematic low-angle tracking shot of a cyberpunk detective walking down a neon-lit alleyway in the rain, 8k, photorealistic."

MemFlow 512 x 288

2 STEPS49.3 FPS

3 STEPS38.9 FPS

4 STEPS36.7 FPS

6 STEPS27.5 FPS

8 STEPS22.1 FPS

10 STEPS18.8 FPS

12 STEPS16.3 FPS

MemFlow 848 x 480

2 STEPS28.3 FPS

3 STEPS24.3 FPS

4 STEPS20.6 FPS

6 STEPS16.5 FPS

8 STEPS13.3 FPS

10 STEPS11.3 FPS

12 STEPS9.3 FPS

02 / DRONE METROPOLIS

"A dynamic drone shot sweeping over a futuristic sci-fi metropolis with flying cars and massive holographic advertisements, vivid colors, 4k."

MemFlow 512 x 288

2 STEPS47.2 FPS

3 STEPS40.5 FPS

4 STEPS34.8 FPS

6 STEPS27.1 FPS

8 STEPS23.0 FPS

10 STEPS19.3 FPS

12 STEPS16.7 FPS

MemFlow 848 x 480

2 STEPS28.7 FPS

3 STEPS23.9 FPS

4 STEPS20.5 FPS

6 STEPS16.4 FPS

8 STEPS13.3 FPS

10 STEPS11.1 FPS

12 STEPS9.4 FPS

Frame Interpolation (RIFE)

Generating every single frame through a heavy diffusion network takes massive compute. Modern real-time pipelines circumvent this by generating a low-framerate base video (e.g., 6-11 FPS) and applying RIFE (Real-Time Intermediate Flow Estimation).

RIFE intelligently hallucinates the "in-between" frames to artificially upscale the playback to a fluid 30+ FPS. This essential technique preserves critical GPU compute, allowing it to be spent running higher inference steps on the base frames for vastly improved visual coherence and prompt adherence.

Real-Time Pipeline (MemFlow & Helios)

After consulting with our multi-agent swarm (Architect & Scholar), we've implemented several critical optimizations to push real-time video generation limits:

Denoising Step Reduction: By dropping the MemFlow diffusion steps down to 4, we significantly reduce GPU workload, achieving a much faster native frame generation rate while preserving the core aesthetic.
Dynamic Audio Steering: Integrating the Lyria Music API via a persistent WebSocket thread. Instead of hard-restarting the audio track per scene, we dynamically steer the prompt so the cinematic underscore evolves seamlessly.
Context Cache Clearing: Actively managing the VACE (Context Enforcement) tensors to prevent PyTorch dimension mismatches while preparing the pipeline for continuous narrative generation.

Plans for the Future

Our roadmap for achieving zero-latency cinematic real-time generation includes massive architectural upgrades to the Helios foundation models:

NVIDIA Blackwell (B200) Integration: Leveraging the 8 TB/s of HBM3e bandwidth and FlashAttention-4 to shatter current memory-bound diffusion limits.
3-Stage Asynchronous Pipeline: Splitting the workload across multiple GPUs using NVLink—GPU 0 handles raw Latent generation, while GPU 1 processes VAE decoding and RIFE asynchronously.
FP8/NVFP4 Quantization via LightX2V: Doubling matrix multiplication throughput and halving the VRAM footprint so we can run high-step generation at low-step speeds.
FlowGRPO Reinforcement Learning: Using GenRL to fine-tune the model with Motion Quality and Aesthetic rewards, allowing low-step renders to look structurally flawless.

REAL-TIME
VIDEO LIMITS