March 20, 2026·10 min read

Mobile E2E: smoke tests that survive refactors

Full E2E coverage sounds noble until every UI tweak breaks selectors. I aim for a short smoke suite on real CI devices/emulators, then lean on integration tests for edge cases. The win is confidence on the paths that pay rent—login, paywall, core loop—not every permutation.

Pick stable selectors

Accessibility labels and testIDs beat text that marketing rewrites. Coordinate with design system names.

Avoid sleeps; wait for conditions.

Auth in tests

Seed test accounts, mock backends, or short-lived tokens—whatever matches how flaky your staging is. Document login steps once, reuse helpers.

Don’t commit real passwords; CI secrets exist for a reason.

Cost vs value

Five reliable tests beat fifty flaky ones that everyone ignores. Run smoke on PR; run heavier nightly if needed.

Screenshots on failure save hours of reproduction.

What “smoke” should prove

App launches cold, primary navigation works, and one revenue- or trust-critical path completes—often sign-in and one “happy path” transaction or save. Anything more is optional until the basics stop flaking.

When a smoke test fails, triage: infra flake vs real regression. Track flake rate; if it is above a few percent, fix the harness before you add cases.

Selecting smoke paths deliberately

Pick flows that cover authentication, primary navigation, and one revenue-critical action. Say no to forty-step scripts that break weekly. Stability beats coverage early; expand once smoke is green consistently. Involve product in choosing paths—they know what ‘must never break’ means commercially.

Selectors and testability hooks

Standardize on `testID` and accessibility labels; avoid text selectors tied to marketing copy. Coordinate naming with design systems. For web views, expose bridges for critical actions or avoid E2E on them in favor of integration tests at native boundaries. Document selector ownership so refactors update tests in the same PR.

Test data and environments

Seed accounts, reset state between runs, and isolate flaky network dependencies with mocks or dedicated staging stacks. Rotate credentials; never hardcode secrets. Make tests parallel-safe when possible—serial suites slow feedback. Clean app data on failure to prevent cascading flakes.

CI infrastructure

Real devices or reliable emulators—know which you trust. Pin OS versions and emulator images. Collect videos/screenshots on failure. Cache dependencies smartly without hiding issues. Parallelize jobs within budget; queue time versus flake risk is a tradeoff. Monitor CI costs as suite grows.

Flake management

Track flake rate; set thresholds that block merges or require investigation. Quarantine consistently failing tests instead of ignoring them—visibility matters. Root causes often include timing, animations, and network—prefer deterministic waits. Occasionally retire tests that cost more than they save.

Balancing E2E with other tests

Unit and integration tests catch most logic bugs faster and cheaper. Use E2E for critical paths and user journeys that integration cannot approximate. Avoid duplicating coverage across layers without reason—maintenance tax accumulates.

Review cadence

Monthly, review failing tests, runtime trends, and new features lacking coverage. Update selectors when UI changes. Celebrate when flake rate drops—morale matters for test maintenance.

Shipping and reliability habits (1)

Testing onboarding changes with funnel metrics beats debating opinions. Segment by acquisition channel and platform; back behavior differs. Skip paths must be genuine—dark patterns may win short metrics and destroy brand trust. Localization length tests prevent clipped CTAs in verbose languages.

Background execution policies change with OS updates—revalidate after major iOS and Android releases. Misused background modes invite rejection. Persist user work frequently; the OS can kill you anytime after backgrounding. Uploads and timers should tolerate pause and resume without corrupting state.

Platform differences worth rehearsing (2)

Project structure should make ownership obvious: routes as backbone, feature folders for product areas, thin screens, and shared infrastructure that is deliberately named. Refactor in vertical slices with device-tested releases—big-bang rewrites without tests are how teams lose weeks.

Helper modules concentrate glue code—storage, navigation, permissions—so screens stay readable. Split helpers by topic before files become merge-conflict magnets, and document each module’s contract. Good helpers answer ‘where do we save tokens?’ in one glance—not ‘ask Sarah.’

Security, privacy, and data handling (3)

Internationalization is a product feature, not a string swap. Plural rules, RTL layout, and locale-aware formatting change behavior—not just copy length. Pseudolocale helps find clipping early, but real Arabic and German QA catches nuance. Avoid concatenating translated fragments; context matters. Document glossary terms so translators do not invent inconsistent product names.

Monorepos amplify both leverage and failure modes: duplicate React versions cause mysterious hook errors, and Metro misconfiguration blocks local packages from resolving. Invest in workspace discipline—single React version, documented `watchFolders`, and lint rules preventing packages from importing app navigators accidentally. CI must mirror local installs; ‘works on my laptop’ with different package managers is a time bomb.

Performance and measurement discipline (4)

Error boundaries catch render failures, not native crashes or async mistakes. Pair them with platform crash reporting and structured client logs. Fallback UI should include build identifiers and humane copy—never raw stack traces for end users. Test fallbacks with screen readers; a broken error screen is still broken UX.

Storage is not a database. AsyncStorage and MMKV excel at key-value preferences; SQLite or remote APIs belong elsewhere for relational data. Migrations should be incremental, logged, and non-blocking for UI. Secure tokens need secure storage when your model demands it—speed is not a substitute for correctness on auth material.

Team process and long-term maintenance (5)

Type-safe navigation pays off when routes multiply. Keep param lists near navigators, validate external URLs, and avoid serializing non-JSON-safe values through params. Renaming routes is a cross-cutting change—update analytics, push payloads, and E2E selectors in the same release train.

FlatList performance is configuration as much as code. Stable keys, reasonable `windowSize`, and memoized rows beat switching to a different list primitive blindly. Nested virtualized lists are a last resort—redesign first. Profile with production-like data volumes; dev placeholders lie.

Shipping and reliability habits (6)

Expo SDK upgrades are integration projects: `expo doctor`, aligned community packages, regenerated native projects, and device smoke tests for camera, push, and IAP. Freeze unrelated native refactors during the upgrade window and keep rollback paths hot. Document surprises for the next upgrade while memory is fresh.

Hermes versus JSC is not a lifestyle choice—profile your app. Hermes usually wins on startup; some libraries still assume JSC quirks. Engine toggles are not substitutes for fixing quadratic renders in your own code. Upgrade notes matter: Intl support and debugging tooling evolve.

Platform differences worth rehearsing (7)

Design tokens and semantic colors make dark mode and rebrands feasible. Mixing three styling systems doubles migration cost—pick a primary approach and draw boundaries. Runtime CSS-in-JS can cost frame time on hot screens—profile before adopting wholesale.

ScrollView versus FlatList is a data-volume question. Small static content belongs in ScrollView; long feeds belong in virtualized lists. Nested scrollables need explicit height contracts—redesign beats fighting physics. Document intentional choices so future refactors do not ‘optimize’ blindly.

Security, privacy, and data handling (8)

Shipping React Native features is less about any single API and more about the system around it: typed boundaries, predictable navigation, and telemetry that tells you what broke in production. Prefer boring, explicit modules over clever metaprogramming that the next hire cannot grep. When platform vendors change behavior in point releases, your defense is automated smoke tests on real devices and a short internal changelog of native assumptions you rely on.

Performance work should start with measurement, not instinct. Watch JS thread versus UI thread separately; they bottleneck differently. Lists, images, and animations dominate most regressions—optimize those before micro-optimizing pure functions. Hermes, JSC, and bridge internals evolve; re-profile after every major upgrade instead of trusting last year’s numbers. Battery and thermal throttling on mid devices reveal issues flagship phones hide.

Performance and measurement discipline (9)

Native modules are product decisions disguised as engineering tasks. You inherit Xcode and Gradle upgrades, store review scrutiny, and security obligations. Prefer maintained Expo modules and config plugins before writing JNI or Swift glue from scratch. When you must go native, budget pairing time with platform specialists and write runbooks for on-call—crashes in native code bypass many JS safeguards.

Deep links are a cross-team system: marketing URLs, hosted association files, entitlements, router params, and analytics query preservation. Debug with structured logging of raw URLs (scrub secrets) and reproduce cold-start races with auth hydration. Staging and production should be obviously separated—accidentally opening prod from a QA link erodes trust and pollutes data.

Team process and long-term maintenance (10)

WebViews are untrusted browsers inside your app. Validate `postMessage` payloads, lock navigation to expected hosts, and prefer system-browser auth flows when OAuth security demands it. Third-party JavaScript can change without your deploy—treat XSS in web as bridge compromise risk. Clear storage on logout and rate-limit message handlers.

E2E tests should protect revenue paths, not every permutation. Stable selectors (`testID`) beat text that marketing rewrites weekly. Flake management is a feature: quarantine, fix root causes, and keep smoke suites green on CI devices. Five reliable tests beat fifty flaky ones that everyone ignores.

Shipping and reliability habits (11)

Security and privacy expectations move faster than roadmaps. Treat analytics, crash, and attribution SDKs as part of your threat model: initialize them deliberately, document data flows, and verify ‘off’ truly stops network calls. Client-side secrets are public secrets—anything shipped in an APK or IPA should be assumed extractable. Pair mobile changes with backend policies so authorization remains consistent across platforms.

Platform differences worth rehearsing (12)

Push notifications walk a line between helpful and intrusive. Prime users with context, respect notification channels on Android, and measure opt-outs after campaigns—spikes mean copy or frequency problems. Payload design affects background behavior; test killed and locked-device states. Tokens belong server-side with rotation strategies; never treat the client as authoritative for subscription state.

Security, privacy, and data handling (13)

OTA updates are powerful and risky: runtime compatibility, rollback plans, and user-visible behavior changes need governance. Channels should map to release maturity—staging versus production—with access controls on publish credentials. Large assets over cellular need care; silent failures erode trust more than a frank ‘update failed, retry’ message.

Performance and measurement discipline (14)

JWT and session refresh flows need single-flight refresh, clear logout semantics, and secure storage for refresh tokens when appropriate. Parallel 401s should not stampede refresh endpoints. Clock skew and biometrics policies belong in explicit product decisions, not accidental implementation details.

In-app purchases require server validation, restore flows, and support tooling that respects privacy. Sandbox quirks are normal—budget QA time. Subscriptions interact with family sharing, regional pricing, and refunds; engineering must stay aligned with finance and legal narratives users see in receipts.

Team process and long-term maintenance (15)

Analytics schema governance prevents warehouse disasters: version events, avoid high-cardinality strings, and align names across iOS, Android, and web. Consent gating must stop network calls, not just UI. Separate dev and prod projects to avoid polluting dashboards.

Shipping and reliability habits (16)

Environment variables should be classified: public-by-design, sensitive-with-mitigations, or never-on-device. `EXPO_PUBLIC_` values are extractable—treat them that way. Align env handling across EAS profiles and local dev; fail fast when keys are missing instead of shipping undefined behavior.

Seguir leyendo

Más recienteImages: caching and placeholders that feel intentional Más antiguaOTA updates with Expo: what you can actually patch

Estructura del proyecto → · Utilidades de app →