App Features
Live text from the camera
In a typical Expo document-scanning flow, the live preview helps you see roughly where printed text sits on the page, and a separate capture step runs a stronger pass that can pass the transcript to the next screen or feature in your app. The wording stays product-agnostic so you can line it up with your own files.
Single-file reference (download)
Below is a teaching bundle that keeps every step in one React Native file: OCR helpers, image prep for the final pass, mapping text boxes onto the preview, the timed live loop, and the shutter handoff to whatever you do next with the text. It mirrors how many teams split the same flow across modules—not a drop-in route. After you download it, rewire imports (`ScreenHeader`, `ROUTES`, theme, and any storage or navigation helpers) and add the `scan.*` / `a11y.*` translation keys your UI expects, or change those calls to your own strings.
Packages: `expo-camera`, `expo-text-extractor`, `expo-image-manipulator`, `expo-file-system`, `expo-router`, `react-i18next`, and your safe-area provider. The sample uses `navigate` from an Expo Router internal path; prefer your project’s supported navigation helper if upgrading SDKs.
Demo screenshots
From a reference Expo app using on-device OCR (`expo-text-extractor` and `expo-camera`).


What people see
The camera fills the screen. Every so often, the app takes a quick, lightweight snapshot only to locate text regions. Those regions appear as simple corner brackets on the preview so you can tell the system is “seeing” the page. That pass is meant to guide alignment, not to be perfect. When you are ready, one deliberate capture runs a fuller read of the image; the extracted text can then open on a follow-up screen already filled in.
How the flow is usually wired
- The preview knows its width and height on screen so text boxes from the image can be scaled and positioned to match what you see.
- On a steady timer, the app saves a small, compressed photo, runs on-device text detection to get bounding boxes, maps them onto the preview, and deletes the temporary file so storage does not fill up.
- On the main capture action, the app saves a higher-quality photo, optionally shrinks very large images so recognition stays fast, runs full text extraction, stores the result for the next screen, and navigates there.
Where processing happens
In a typical Expo setup using native text extraction, recognition runs on the device. Nothing in this pattern requires sending the live preview frames to a server; check your own modules and privacy policy to confirm how your build behaves.
Situations to plan for
- In the browser, camera OCR is often unavailable or limited. It is reasonable to show a clear message and let people go back rather than pretending the feature works.
- Some devices or builds may not expose the native extractor. A short “not supported here” state is better than a silent failure.
- If no text is found after capture, the app can explain that in plain language instead of leaving a blank screen.
How the downloaded file is organized
Read top to bottom: shared helpers first, then the screen. Thrown errors use the string codes `OCR_WEB`, `OCR_UNSUPPORTED`, and `OCR_EMPTY` so the UI can branch to localized alerts.
Text extraction (full text vs blocks)
`extractPrintedTextFromImageUri` checks web and `isSupported`, calls `extractTextFromImage`, joins trimmed segments with newlines, and throws `OCR_EMPTY` if nothing usable is left. `extractPrintedTextBlocksFromImageUri` only returns `extractTextBlocksFromImage` results—used for live rectangles, not for the final transcript.
`prepareImageForOcr`
If width and height exceed 1600 on the longest edge, the image is resized so that edge equals 1600 px, then saved as JPEG at compress 0.82. That bounds memory and work for the final OCR without changing the live preview logic.
`mapTextBlocksToPreview` (cover fit)
The preview and the photo share the same aspect ratio in spirit, but the view is a fixed rectangle: this helper scales the image with `Math.max` so it covers the view, centers it with offsets, multiplies normalized block coordinates (`x`, `y`, `width`, `height`) by the displayed size, and caps at `maxBoxes` (28) overlays.
`ScanTextRegionHighlight`
Each detected region gets a translucent fill, four L-shaped corner strokes sized from the rectangle, and a faint inner border using `accentToRgba` so the bracket color follows your theme accent without blocking the camera.
Live interval and cleanup
When the camera is ready, permission is granted, layout is known, and final OCR is not running (`busy` is false), a timer runs about every 820 ms. Each tick takes a low-quality picture (`quality` 0.22), extracts blocks, maps them to `highlights`, then deletes that URI with `FileSystem.deleteAsync`. `liveBusyRef` prevents stacking two live captures at once.
Capture button and full OCR
`onCapture` fires a higher-quality shot (`quality` 0.78), passes dimensions into `prepareImageForOcr`, then `extractPrintedTextFromImageUri`. The follow-up handler trims and collapses whitespace, enforces a minimum length, persists the string for the next screen, and navigates using your own route. Web shows a minimal header and alerts on entry so users can go back.
If you are implementing it
Names vary between projects; a split layout often looks like: a camera route screen, utilities for OCR strings and bounding boxes, a helper to resize images before the final pass, mapping normalized coordinates to preview pixels, and a small cache or store so the next screen can receive the extracted text. Native extraction is commonly provided by an Expo module in this family of apps.
Treat this page as a conceptual map. Align filenames and constants with your repository and product requirements.