App Features

Listen aloud & word highlight

Use expo-speech to read a passage aloud and highlight each word during playback. The snippets below mirror the multilingual docs style: copy them into your project, then change paths, styles, and translation keys (`src/utils/speechVoice.ts` is a typical location for the resolver).

Demo video

Short screen recording of the sample: play / pause / stop and word highlighting while the engine speaks.

Complete sample screen

A single file you can drop into an Expo app: one paragraph, Play / Pause / Stop controls, per-word highlight while speaking, and light/dark styles. On Android, pause is not offered by the speech API— the sample stops instead (see comments in the file).

expo-speech — Install with `npx expo install expo-speech`.
@expo/vector-icons — Ships with Expo (Ionicons). No extra install.

Download ListenAloudScreen.tsx

Dependencies & files

expo-speech — `Speech.speak`, `Speech.stop`, `Speech.getAvailableVoicesAsync`. Install with your Expo toolchain (`npx expo install expo-speech`).
Your screen component — UI for the text you read aloud: segmentation helpers and `Speech.speak` bound to a trimmed string. Use whatever file fits your router (for example `app/article.tsx` or `src/screens/ReaderScreen.tsx`).
src/utils/speechVoice.ts — `resolveSpeechOptions`, Urdu routing, and picking an installed Urdu voice when the app language or script calls for it.

Word segmentation & boundary helpers

Segmentation uses the regex `(\S+)(\s*)` so each token keeps its trailing spaces. `readBoundaryCharIndex` reads the numeric `charIndex` from native boundary events and from web synthesis payloads. `wordIndexFromBoundaryCharIndex` maps the TTS character offset to a segment index.

type ReadableWordSeg = { word: string; sep: string; start: number };

function segmentReadableWords(text: string): ReadableWordSeg[] {
  const out: ReadableWordSeg[] = [];
  const re = /(\S+)(\s*)/g;
  let m: RegExpExecArray | null;
  while ((m = re.exec(text)) !== null) {
    out.push({ word: m[1], sep: m[2], start: m.index });
  }
  return out;
}

function wordIndexFromBoundaryCharIndex(text: string, charIndex: number): number {
  const segs = segmentReadableWords(text);
  if (segs.length === 0) return 0;
  const idx = Math.max(0, Math.min(charIndex, Math.max(0, text.length - 1)));
  for (let i = 0; i < segs.length; i++) {
    const { start, word } = segs[i];
    if (idx >= start && idx < start + word.length) return i;
  }
  for (let i = 0; i < segs.length - 1; i++) {
    const gapStart = segs[i].start + segs[i].word.length;
    const gapEnd = segs[i + 1].start;
    if (idx >= gapStart && idx < gapEnd) return i + 1;
  }
  return segs.length - 1;
}

function readBoundaryCharIndex(ev: unknown): number {
  if (
    ev &&
    typeof ev === 'object' &&
    'charIndex' in ev &&
    typeof (ev as { charIndex: unknown }).charIndex === 'number'
  ) {
    return (ev as { charIndex: number }).charIndex;
  }
  return 0;
}

Memoized segments & UI state

Memoize segments for the trimmed string so rendering and boundary mapping stay in sync when the text changes. Track whether speech is running and which word index to highlight.

const wordSegments = useMemo(
  () => segmentReadableWords(textTrimmed),
  [textTrimmed]
);

const [isSpeaking, setIsSpeaking] = useState(false);
const [speechHighlightWordIndex, setSpeechHighlightWordIndex] = useState<number | null>(null);

Stop speech & reset highlight

Centralize teardown in a callback: stop the engine and clear highlight state. Run cleanup on unmount and when the source text changes so playback and highlighting are not left over from previous content.

const stopReadAloud = useCallback(() => {
  Speech.stop();
  setIsSpeaking(false);
  setSpeechHighlightWordIndex(null);
}, []);

useEffect(() => {
  return () => {
    Speech.stop();
  };
}, []);

useEffect(() => {
  stopReadAloud();
}, [sourceText, stopReadAloud]);

Play/stop handler & Speech.speak

Typical flow:

Resolve `language` / `voice` with `resolveSpeechOptions(appLanguage, text)`.
Optionally alert when `missingUrduVoice` so users know the fallback may sound generic.
Spread only defined `language` / `voice`. On iOS with the Urdu route, set `useApplicationAudioSession: false` when that fixes routing to the system voice.
In `onBoundary`, read `charIndex`, map to a word index, and update highlight state; clear state in `onDone`, `onStopped`, and `onError`.

// import * as Speech from 'expo-speech';
// import { Alert, Platform } from 'react-native';
// import type { ResolvedSpeechOptions } from '@/utils/speechVoice';
// import { resolveSpeechOptions } from '@/utils/speechVoice';

const onToggleReadAloud = async () => {
  if (!hasText) return;
  if (isSpeaking) {
    stopReadAloud();
    return;
  }
  const text = textTrimmed;
  Speech.stop();
  setIsSpeaking(true);
  setSpeechHighlightWordIndex(0);

  let resolved: ResolvedSpeechOptions;
  try {
    resolved = await resolveSpeechOptions(language, text);
  } catch {
    stopReadAloud();
    return;
  }

  if (resolved.missingUrduVoice) {
    Alert.alert(t('alerts.urduVoiceTitle'), t('alerts.urduVoiceMessage'));
  }

  Speech.speak(text, {
    ...(resolved.language != null ? { language: resolved.language } : {}),
    ...(resolved.voice != null ? { voice: resolved.voice } : {}),
    ...(Platform.OS === 'ios' && resolved.urduRoute ? { useApplicationAudioSession: false } : {}),
    pitch: 1,
    rate: 0.96,
    onStart: () => setSpeechHighlightWordIndex(0),
    onBoundary: (ev: unknown) => {
      const ci = readBoundaryCharIndex(ev);
      setSpeechHighlightWordIndex(wordIndexFromBoundaryCharIndex(text, ci));
    },
    onDone: () => {
      setIsSpeaking(false);
      setSpeechHighlightWordIndex(null);
    },
    onStopped: () => {
      setIsSpeaking(false);
      setSpeechHighlightWordIndex(null);
    },
    onError: () => {
      setIsSpeaking(false);
      setSpeechHighlightWordIndex(null);
    },
  });
};

Voice & language resolution

Helpers in `speechVoice.ts` typically:

Treat as Urdu when the app language is `ur`, or when the text uses Arabic script unless the app is already `ar`.
On iOS with a picked Urdu voice, pass only `voice` (the identifier from `getAvailableVoicesAsync`).
On Android with a picked voice, pass `language: 'ur'` plus `voice`.
On web, pass `language` derived from the voice plus `voice` when available.
If no Urdu voice is listed, fall back to `language` only and set `missingUrduVoice: true` so the UI can warn.

import * as Speech from 'expo-speech';
import { Platform } from 'react-native';

export type AppLanguage = 'en' | 'ur' | 'ar' | string;

export type ResolvedSpeechOptions = {
  language?: string;
  voice?: string;
  urduRoute: boolean;
  missingUrduVoice?: boolean;
};

function hasArabicScript(text: string): boolean {
  return /[\u0600-\u06FF\u0750-\u077F\u08A0-\u08FF\uFB50-\uFDFF\uFE70-\uFEFF]/.test(text);
}

export function shouldUseUrduSpeech(appLang: AppLanguage, text: string): boolean {
  if (appLang === 'ur') return true;
  if (appLang === 'ar') return false;
  return hasArabicScript(text);
}

function localeTagForAppLanguage(appLang: AppLanguage): string {
  const map: Record<string, string> = { en: 'en-US', ur: 'ur-PK', ar: 'ar' };
  return map[appLang] ?? appLang;
}

type VoiceInfo = { identifier: string; language: string; name?: string };

/** Rank available voices; tune for your product. */
export function pickUrduVoice(voices: VoiceInfo[]): VoiceInfo | null {
  const candidates = voices.filter(
    (v) =>
      /^ur/i.test(v.language) ||
      /urd/i.test(v.language) ||
      /urdu/i.test(v.name ?? '')
  );
  if (candidates.length === 0) return null;
  const rank = (lang: string) => {
    const l = lang.toLowerCase().replace(/_/g, '-');
    if (l.includes('pk')) return 3;
    if (l === 'ur' || l.startsWith('ur-')) return 2;
    return 1;
  };
  return [...candidates].sort((a, b) => rank(b.language) - rank(a.language))[0];
}

export async function resolveSpeechOptions(
  appLang: AppLanguage,
  text: string
): Promise<ResolvedSpeechOptions> {
  const defaultLang = localeTagForAppLanguage(appLang);

  if (!shouldUseUrduSpeech(appLang, text)) {
    return { language: defaultLang, urduRoute: false };
  }

  let missingUrduVoice = false;

  try {
    const voices = await Speech.getAvailableVoicesAsync();
    const picked = pickUrduVoice(voices);

    if (picked) {
      if (Platform.OS === 'ios') {
        return { voice: picked.identifier, urduRoute: true };
      }
      if (Platform.OS === 'android') {
        return { language: 'ur', voice: picked.identifier, urduRoute: true };
      }
      return { language: picked.language.replace(/_/g, '-'), voice: picked.identifier, urduRoute: true };
    }
    missingUrduVoice = true;
  } catch {
    missingUrduVoice = true;
  }

  return {
    language: Platform.OS === 'android' ? 'ur' : 'ur-PK',
    urduRoute: true,
    missingUrduVoice,
  };
}

UI: play button & nested Text

Place a round play/stop control next to the passage. Render the text as nested `Text` children: the active segment uses your highlight style so only the current word (and its trailing spaces) gains background and weight.

<Pressable
  onPress={onToggleReadAloud}
  style={({ pressed }) => [
    styles.readAloudButton,
    busy && styles.readAloudButtonDisabled,
    pressed && !busy && styles.readAloudButtonPressed,
  ]}
  disabled={busy}
  accessibilityRole="button"
  accessibilityState={{ busy: isSpeaking }}
  accessibilityLabel={
    isSpeaking ? t('a11y.stopSpeech') : t('a11y.listenAloud')
  }
>
  <Ionicons
    name={isSpeaking ? 'stop' : 'play'}
    size={22}
    color={colors.accentText}
  />
</Pressable>

<Text style={styles.bodyText}>
  {wordSegments.map((seg, i) => (
    <Text
      key={`${seg.start}-${i}`}
      style={
        isSpeaking && speechHighlightWordIndex === i
          ? styles.wordHighlighted
          : undefined
      }
    >
      {seg.word}
      {seg.sep}
    </Text>
  ))}
</Text>

Highlight style

// e.g. in your StyleSheet
wordHighlighted: {
  backgroundColor: colors.accentMuted!,
  color: colors.textPrimary!,
  borderRadius: Utility.SP_6,
  overflow: 'hidden',
  fontWeight: '700',
},

Limitations

Some Android TTS engines emit sparse or uneven `onBoundary` events; the highlight may lag or jump.
Word boundaries are whitespace-based; languages that do not separate words with spaces will not align cleanly.
Simulators often lack non-English voices; verify TTS on a device with system voices installed.