Tuesday, 21 April 2026

Automated Screenshot Testing for a Python Terminal Game

I've been building a terminal-based survival game in Python using the Textual framework. It runs as a TUI (Terminal User Interface) with Rich markup, threaded game logic, and SQLite for persistence. The game is called Moon Traveler Terminal.

The problem: I needed automated screenshots for documentation, GitHub Pages, and regression testing. Taking them manually every release was not sustainable. So I built a pilot script that plays the entire game, captures 27 screenshots at key moments, and validates the output.

Here's what I learned and the problems I had to solve.

The Architecture Challenge

The game has a split-thread design:

  • Worker thread - game logic, LLM inference, time.sleep() for animations
  • Main thread - Textual UI rendering, input handling

Every print() call in the game routes through a thread-safe bridge to Textual's RichLog widget:

class UIBridge:
    def print(self, *args, **kwargs):
        # Route to Textual's RichLog via call_from_thread
        self._app.call_from_thread(self._log.write, args[0])

Each print is immediately visible in the UI. No buffering. This is great for the player but tricky for automated testing - you never know exactly when output finishes rendering.

Step 1: Textual's Auto-Pilot

Textual has a built-in test harness. You pass an auto_pilot coroutine to app.run() and it drives the UI programmatically. No real terminal needed.

from src.tui_app import MoonTravelerApp

async def screenshot_pilot(pilot):
    app = pilot.app

    async def take(name, desc):
        await pilot.pause(0.5)
        app.refresh()
        await pilot.pause(0.5)
        app.save_screenshot(f"assets/{name}.svg")

    async def send(text, wait=4.0):
        app.command_queue.put(text)
        await pilot.pause(wait)

    # Take title screenshot
    await pilot.pause(3.0)
    await take("tui-title", "Title screen")

    # Play the game
    await send("look", wait=3.0)
    await take("tui-look", "Look at crash site")

    await send("scan", wait=3.0)
    await take("tui-scan", "Scan results")

app = MoonTravelerApp()
app.run(auto_pilot=screenshot_pilot)

Three ways to talk to the game:

MethodWhat it does
command_queue.put(text)Injects a command (like typing + Enter)
bridge.push_response(text)Answers interactive prompts (y/n, menus)
wait_for_ask_mode()Polls until the game blocks on a prompt

Step 2: Handling Branching Game Flows

The game has branching prompts - new game vs load, difficulty selection, player name. You cannot just hardcode sleep timers. You have to wait for the game to actually ask a question.

async def wait_for_ask_mode(timeout=10.0):
    """Wait until the game blocks on a prompt."""
    elapsed = 0.0
    while elapsed < timeout:
        if app._ask_mode:
            return True
        await pilot.pause(0.3)
        elapsed += 0.3
    return False

# Navigate: New Game -> Easy -> Player name
if await wait_for_ask_mode(timeout=5.0):
    await respond("1", wait=2.0)      # "New Game"
if await wait_for_ask_mode(timeout=5.0):
    await respond("1", wait=2.0)      # "Easy" difficulty
if await wait_for_ask_mode(timeout=5.0):
    await respond("Screenshot", wait=3.0)  # Player name

This pattern made the script reliable across different game states - fresh install with no saves, existing saves, different model loading times.

Problem 1: Screenshots Only Capture the Viewport

This was the first real surprise. Textual's save_screenshot() exports what's currently visible in the viewport - about 24 lines. Content that scrolled off the top is gone from the SVG.

I built a narrative intro that's 18 lines long. By the time the boot sequence finishes, the heading at the top has scrolled away. My first validation checked for "FLIGHT RECORDER" in the screenshot - it failed because the heading was above the viewport.

The fix: Always validate against text near the bottom of each screen, not the top.

import re

def _svg_text(path):
    """Extract visible text from an SVG screenshot."""
    with open(path) as f:
        return " ".join(
            t.replace("&#160;", " ").strip()
            for t in re.findall(r">([^<]+)<", f.read())
            if t.strip() and len(t.strip()) > 2
        )

# Validate bottom-of-screen content, not headers
validations = [
    ("tui-intro", "rescue", "Intro narrative visible"),
    ("tui-help", "drone", "Help shows commands"),
    ("tui-victory", "Grade", "Victory shows score"),
    ("tui-scores", "Ripley", "Leaderboard has entries"),
]

for name, expected, desc in validations:
    text = _svg_text(f"assets/{name}.svg")
    status = "PASS" if expected.lower() in text.lower() else "FAIL"
    print(f"  {status}: {desc}")

Problem 2: SQLite Data Pollution

The screenshot script seeds fake leaderboard entries so the scores screenshot is not empty:

from src.save_load import record_score

record_score(820, "A", True, "short", 18, 1200, 3, 12345,
             player_name="Ripley")
record_score(650, "B", True, "medium", 35, 2400, 2, 67890,
             player_name="Dallas")
record_score(410, "C", False, "long", 12, 900, 1, 11111,
             player_name="Lambert")

The problem: these entries persisted across runs. After running the script 5 times, I had 15 fake entries in my real database mixed with actual play data.

The fix: Clean up test data on exit, keyed by player name:

import sqlite3

# Always clean up, even if the script crashes elsewhere
with sqlite3.connect(str(db_path)) as conn:
    conn.execute(
        "DELETE FROM leaderboard WHERE player_name "
        "IN ('Ripley', 'Dallas', 'Lambert', 'Screenshot')"
    )

I also added state logging at every screenshot checkpoint - player location, inventory, repair progress, and full DB row counts. When a screenshot looks wrong, the debug log tells you exactly what the game state was at capture time:

def log_game_state(ctx, label=""):
    p = ctx.player
    log(f"[{label}] Loc={p.location_name}")
    log(f"[{label}] Food={p.food:.0f}% Water={p.water:.0f}%")
    log(f"[{label}] Inventory={dict(p.inventory)}")

def log_db_state(label=""):
    with sqlite3.connect(str(db_path)) as conn:
        for table in ["saves", "chat_history", "leaderboard"]:
            n = conn.execute(f"SELECT COUNT(*) FROM [{table}]").fetchone()[0]
            log(f"[DB {label}] {table}: {n} rows")

Problem 3: Animations Break Script Timing

I added ASCII frame animations to the game - scan sweeps, travel progress bars, hazard flashes. Each one adds 0.3 to 1.0 seconds of time.sleep() in the worker thread. The screenshot script's fixed await pilot.pause(3.0) durations were suddenly too tight.

The wrong fix would be to disable animations in the config file - that persists and would turn off animations for the user's next real play session.

The right fix: A runtime kill switch that only lasts for the current process:

# src/animations.py
_force_disabled = False

def force_disable():
    """Session-only. Does NOT persist to config."""
    global _force_disabled
    _force_disabled = True

def _enabled():
    if _force_disabled:
        return False
    from src.config import get_animations_enabled
    return get_animations_enabled()

The game's --super mode (used by test scripts) calls force_disable() at startup. Real players still get animations. Test scripts get deterministic timing.

Problem 4: Capturing the Game Context

The screenshot script needs access to the live game state (player location, creatures, inventory) to make smart decisions - like finding a creature to talk to. But the game context only exists inside the worker thread.

Solution: monkey-patch the game loop to capture the context object:

import threading
from src import game

_game_ctx = None
_ctx_ready = threading.Event()
_original_game_loop = game.game_loop

def _patched_game_loop(ctx):
    global _game_ctx
    _game_ctx = ctx
    _ctx_ready.set()        # Signal that context is ready
    return _original_game_loop(ctx)

game.game_loop = _patched_game_loop

# Later in the pilot:
_ctx_ready.wait(timeout=30)
ctx = _game_ctx

# Now we can query live game state
creature_loc = None
for c in ctx.creatures:
    if c.location_name in ctx.player.known_locations:
        creature_loc = c.location_name
        break

The End Result

The full script plays an entire game: new game, explore, scan, travel to creatures, have LLM-powered conversations, escort allies back to the ship, repair and win. 27 screenshots, 10 validated, all in about 3 minutes.

$ uv run python scripts/tui_screenshots.py

Taking TUI screenshots...
  Saved: assets/tui-title.svg — Title screen
  Saved: assets/tui-intro.svg — Flight recorder narrative
  Saved: assets/tui-crash-site.svg — Crash site after boot
  Saved: assets/tui-look.svg — Look at crash site
  ...
  Saved: assets/tui-victory.svg — Victory screen
Validation: 10 passed, 0 failed
  Cleaned up seeded leaderboard entries
Done! Screenshots saved to assets/

Lessons Learned

  1. Textual's auto-pilot is powerful but you need polling patterns like wait_for_ask_mode() for branching flows. Fixed sleeps alone will not work.
  2. Viewport screenshots miss scrollback. Validate against content near the bottom of the screen, never the top.
  3. Clean up test data. If your script seeds a database, delete those rows on exit. Key by a known player name so you can always find them.
  4. Animations need a runtime kill switch for automated scripts. Never persist test-only config changes.
  5. Log game state at capture time. When a screenshot fails validation, you need the context - not just a failed assertion.
  6. Monkey-patching the game loop to capture the context object is ugly but effective. It lets the pilot script make decisions based on live game state.

The game and all the testing scripts are open source:

https://github.com/elephantatech/moon_traveler

https://elephantatech.github.io/moon_traveler/