Skip to main content

Command Palette

Search for a command to run...

Write Your First Mobile E2E Test in YAML with FinalRun

Published
6 min read

Writing a mobile end-to-end test usually means learning a selector language, fighting flaky locators, and rebuilding the whole suite every time your UI shifts a pixel. FinalRun takes a different route: you describe the test in plain English inside a YAML file, and an AI agent drives the app on a real device or emulator — tapping, typing, swiping, and verifying state.

This post walks through installing FinalRun, writing your first test, and running it on Android — start to finish in about 10 minutes.

FinalRun is open source at github.com/final-run/finalrun-agent

Why natural-language E2E tests?

Traditional mobile test frameworks — Appium, Espresso, XCUITest, even newer tools like Maestro — rely on brittle selectors (IDs, XPaths, accessibility labels). When a developer renames a button or restyles a screen, tests break even though the feature still works.

FinalRun tests describe intent, not implementation:

steps:
  - Tap the Get Started button.
  - Scroll to the bottom of the feed.

An AI model (Gemini, ChatGPT, or Claude — your choice, your API key) sees the screen and figures out which element matches "the login button." Even if you rename the button, move it, restyle it — the test keeps working.

Prerequisites

  • macOS or Linux

  • An Android emulator or iOS simulator running, or a real device connected. Probably an Android Studio setup or Xcode setup does the above for emulator & simulator setup for you.

  • An API key from one of: Google (Gemini), OpenAI, or Anthropic

Step 1: Install

curl -fsSL https://raw.githubusercontent.com/final-run/finalrun-agent/main/scripts/install.sh | bash

This installs Node.js (if needed), the finalrun CLI, and platform tools. Verify your setup:

finalrun doctor

doctor reports missing dependencies — Android SDK, Xcode command-line tools, ADB, and so on — with fix hints for each.

Step 2: Add your API key (BYOK)

Your tokens are billed directly by the AI provider. FinalRun never proxies them. You can keep you API keys in your project's .env or bash files at the system root for ex. zshrc

echo "GOOGLE_API_KEY=your-key-here" > .env
Provider Environment variable
google/gemini-3-flash-preview GOOGLE_API_KEY
openai/gpt-5.4-mini OPENAI_API_KEY
anthropic/claude-sonnet-4-6 ANTHROPIC_API_KEY

Step 3: Write your first test

Create .finalrun/tests/onboarding/launch.yaml:

name: launch_smoke
description: Verify the app launches and the home screen is reachable from onboarding.

steps:
  - Launch the app.
  - Tap the Get Started button.

expected_state:
  - The home screen is visible.
  - The bottom navigation bar shows the Home, Search, and Profile tabs.

A few things to notice:

  • steps are ordered, natural-language instructions. Write them the way you'd describe the test to a teammate.

  • expected_state is what the agent verifies at the end — the test fails if any assertion doesn't hold.

  • No selectors, no IDs, no setup files — just describe what a human would do and see.

Step 4: Validate the workspace

finalrun check

check parses every spec, resolves placeholders, confirms your app identity is configured, and lists problems before you burn API tokens on a broken run.

Step 5: Run it

finalrun test onboarding/launch.yaml --platform android --model google/gemini-3-flash-preview

What happens next:

  1. FinalRun boots your emulator (or uses a connected device).

  2. The agent installs and launches your app.

  3. For each step, it captures the screen, asks the model what to do, and performs the tap/type/swipe.

  4. At the end, it verifies every expected_state assertion.

  5. You get a pass/fail report with video recording, device logs, and per-step screenshots.

Want to group tests into a suite? Create .finalrun/suites/auth_smoke.yaml:

name: auth_smoke
description: Covers the authentication smoke scenarios.
tests:
  - auth/login.yaml
  - auth/logout.yaml

Then:

finalrun suite auth_smoke.yaml --platform android --model google/gemini-3-flash-preview

Step 6: Let your AI coding agent write tests for you

FinalRun ships skills that plug into Claude Code, Cursor, or any AI coding agent. From your AI Agent:

/finalrun-generate-test Generate tests for the authentication feature —
cover login with valid credentials, login with wrong password, and logout

Your agent reads your source code, infers the package name, proposes a test plan, writes the YAML, and validates the workspace — without you touching a selector.

Step 7: Close the loop — generate, run, diagnose, fix

Authoring a test is only half of the job. Most teams lose time after the red run — reading logs, deciding whether the bug is in the app or the test, applying a fix, and re-running. FinalRun ships a second skill, /finalrun-test-and-fix, that owns that whole loop.

From your AI agent:

/finalrun-test-and-fix Verify and fix the login feature end-to-end

Here's what it does on top of /finalrun-generate-test:

  1. Explores and generates tests for the feature (delegating to /finalrun-generate-test), while recording any suspicious code it sees as hypotheses instead of fixing it on the spot.

  2. Runs the narrow spec you just touched via /finalrun-use-cli, asking you first whether to rebuild the app or run against an existing artifact — so you never waste a run on a stale build.

  3. Triages on failure by reading the exact artifacts the CLI emits — result.json, the failing step's entry in actions/, the screenshot at that moment, device.log, and recording.* — instead of guessing from the YAML.

  4. Classifies the failure into buckets: app bug, over-strict test wording, assertion on an ephemeral toast, missing env/secret, or a real crash. Each bucket points at a specific fix target.

  5. Fixes the app first, the test second. The default hypothesis is that the app does not meet the acceptance criteria — the skill only edits the test when the requirement actually changed or the assertion was wrong. It will never relax an assertion just to force green.

  6. Re-runs until green or legitimately blocked. A missing secret, unavailable emulator, or explicit user opt-out are the only valid stopping conditions short of a passing run.

In practice this means a single chat message — "verify and fix the checkout flow" — can take you from no coverage to green test + fixed bug, with the AI agent doing the log-reading work that normally eats an afternoon.

Why this matters for your team

  • Tests survive refactors. Rename a button, move a screen — intent-level tests keep passing.

  • No selector-wrangling tax. Your team stops maintaining XPath cheat-sheets.

  • PRs ship with tests. Because an AI agent writes them from the same diff, coverage lands with the feature, not three sprints later.

  • Video + debug logs + network logs on every run. Debugging a failure means watching a 30-second clip, not replaying a stack trace. Or dumping this to claude.

Try it

If you hit a snag, open an issue on Github or reach out to me(Ashish) on Slack.