OCR API Integration: Best Practices

Integrating commercial OCR APIs like Google Cloud Vision, AWS Textract, or Azure Computer Vision into your application offers managed OCR features without the operational burden of running your own OCR infrastructure. However, production integration requires careful attention to authentication, error handling, rate limiting, and cost optimization.

This guide provides practical patterns for OCR API integration that support reliability, performance, and cost control.

Provider selection starts with the document class. Printed-text OCR, handwritten-text recognition, and mixed forms fail in different ways, so teams should first separate the OCR vs HTR decision before comparing authentication flows, pricing, or SDK ergonomics.

Choosing an OCR API Provider

Each major cloud provider offers OCR features with different strengths:

Google Cloud Vision API

Strengths: Multilingual support and document text detection, with handwriting support that should be validated on representative samples
Pricing: Check current Google Cloud Vision pricing before estimating production costs
Best for: Diverse language requirements, handwritten documents

AWS Textract

Strengths: Table extraction, form parsing, signature detection
Pricing: Check current AWS Textract pricing before estimating production costs
Best for: Structured documents, forms, invoices

Azure Computer Vision

Strengths: Layout analysis, batch processing, custom models
Pricing: Check current Azure AI Vision pricing before estimating production costs
Best for: Document layout understanding, batch operations

Microsoft Azure Form Recognizer

Strengths: Pre-built models for receipts, invoices, ID cards
Pricing: Pay-per-page with different tiers
Best for: Common document types with structured layouts

Handwriting Recognition API Requirements

A handwriting recognition API has a different risk profile from a printed-text OCR API. Printed documents often fail in predictable ways: skew, blur, low contrast, or table layout. Handwritten documents add writer variation, connected characters, abbreviations, and ambiguous line order. Your integration should make those uncertainties visible instead of hiding them behind a single confidence score.

Before selecting a provider, validate these requirements:

Requirement	Production Question	Failure Mode If Missing
Line-level text output	Can you inspect text line order before merging the page into one string?	Correct words appear in the wrong order
Word or line coordinates	Can reviewers jump from extracted text back to the source image?	Review becomes slow and error-prone
Confidence granularity	Are confidence values available per word, line, or region?	Low-quality regions are hard to route for review
Async job support	Can large pages or batches run without request timeouts?	Long handwriting jobs fail under load
Model or language hints	Can you pass language, script, or domain hints?	Names, abbreviations, and historical spelling degrade
Retention controls	Are uploads retained, used for training, or stored outside the required region?	Sensitive collections create governance risk
Evaluation exports	Can you export enough detail to calculate CER/WER?	You cannot compare providers fairly

ℹ

Confidence Is Not Accuracy

Provider confidence scores are useful routing signals, but they are not a substitute for CER/WER measurement on your own handwriting samples. Use confidence to prioritize review; use ground truth to choose a provider.

Normalized Response Shape for Handwriting APIs

Normalize every provider into a response model that preserves text, geometry, confidence, and review status. This keeps your application independent from one vendor's response schema.

from dataclasses import dataclass
from typing import Literal

@dataclass
class TextRegion:
    text: str
    kind: Literal["printed", "handwritten", "unknown"]
    confidence: float | None
    bbox: tuple[float, float, float, float] | None
    page: int
    needs_review: bool

@dataclass
class RecognitionResult:
    provider: str
    document_id: str
    full_text: str
    regions: list[TextRegion]
    warnings: list[str]

def review_flag(region: TextRegion, threshold: float = 0.82) -> bool:
    if region.confidence is None:
        return True
    if region.kind == "handwritten" and region.confidence < threshold:
        return True
    return False

For mixed documents, set kind from layout analysis or provider metadata where available. If the provider cannot distinguish printed text from handwriting, preserve unknown and route uncertain regions through review until your own validation shows the risk is acceptable.

Multi-Provider Architecture

For production systems, implement a multi-provider strategy for reliability and cost optimization:

# app/services/ocr_provider.py
from abc import ABC, abstractmethod
from typing import Dict, List, Optional
from enum import Enum
import structlog

logger = structlog.get_logger()

class OCRProvider(str, Enum):
    GOOGLE_VISION = "google_vision"
    AWS_TEXTRACT = "aws_textract"
    AZURE_VISION = "azure_vision"
    TESSERACT = "tesseract"  # Fallback

class OCRResult:
    """Normalized OCR result across providers."""

    def __init__(self, text: str, [confidence](/articles/character-recognition-accuracy): float,
                 words: List[Dict], metadata: Dict):
        self.text = text
        self.confidence = confidence
        self.words = words
        self.metadata = metadata

class BaseOCRProvider(ABC):
    """Abstract base class for OCR providers."""

    @abstractmethod
    async def process_image(self, image_bytes: bytes,
                          language: str = 'en') -> OCRResult:
        """Process image and return normalized results."""
        pass

    @abstractmethod
    def estimate_cost(self, image_count: int) -> float:
        """Estimate processing cost for given image count."""
        pass

    @abstractmethod
    async def health_check(self) -> bool:
        """Check if provider is available."""
        pass

Google Cloud Vision Integration

Implement Google Cloud Vision with proper authentication and error handling:

# app/services/google_vision_provider.py
from google.cloud import vision
from google.oauth2 import service_account
from google.api_core import retry, exceptions
import asyncio
from typing import Dict, List
import structlog

logger = structlog.get_logger()

class GoogleVisionProvider(BaseOCRProvider):
    """Google Cloud Vision API provider."""

    def __init__(self, credentials_path: str):
        """Initialize with service account credentials."""
        credentials = service_account.Credentials.from_service_account_file(
            credentials_path
        )
        self.client = vision.ImageAnnotatorClient(credentials=credentials)

    async def process_image(self, image_bytes: bytes,
                          language: str = 'en') -> OCRResult:
        """Process image using Google Cloud Vision."""
        try:
            # Create image object
            image = vision.Image(content=image_bytes)

            # Configure image context
            image_context = vision.ImageContext(
                language_hints=[self._map_language_code(language)]
            )

            # Call API with retry logic
            response = await self._call_with_retry(
                self.client.document_text_detection,
                image=image,
                image_context=image_context
            )

            if response.error.message:
                raise Exception(f"Vision API error: {response.error.message}")

            # Extract text
            text = response.full_text_annotation.text

            # Extract words with bounding boxes
            words = []
            for page in response.full_text_annotation.pages:
                for block in page.blocks:
                    for paragraph in block.paragraphs:
                        for word in paragraph.words:
                            word_text = ''.join([
                                symbol.text for symbol in word.symbols
                            ])
                            words.append({
                                'text': word_text,
                                'confidence': word.confidence,
                                'bounding_box': self._extract_bounds(word.bounding_box)
                            })

            # Calculate average confidence
            confidences = [w['confidence'] for w in words if w['confidence'] > 0]
            avg_confidence = sum(confidences) / len(confidences) if confidences else 0.0

            logger.info("google_vision_success",
                       word_count=len(words),
                       confidence=avg_confidence)

            return OCRResult(
                text=text,
                confidence=avg_confidence * 100,
                words=words,
                metadata={
                    'provider': 'google_vision',
                    'language': language
                }
            )

        except exceptions.GoogleAPIError as e:
            logger.error("google_vision_api_error", error=str(e))
            raise
        except Exception as e:
            logger.error("google_vision_error", error=str(e))
            raise

    async def _call_with_retry(self, func, **kwargs):
        """Call API function with exponential backoff retry."""
        retry_policy = retry.Retry(
            initial=1.0,
            maximum=60.0,
            multiplier=2.0,
            deadline=300.0,
            predicate=retry.if_exception_type(
                exceptions.ServiceUnavailable,
                exceptions.DeadlineExceeded,
                exceptions.ResourceExhausted
            )
        )

        loop = asyncio.get_event_loop()
        return await loop.run_in_executor(
            None,
            lambda: func(**kwargs, retry=retry_policy)
        )

    def _map_language_code(self, language: str) -> str:
        """Map ISO 639-1 to Google Vision language codes."""
        language_map = {
            'en': 'en',
            'es': 'es',
            'fr': 'fr',
            'de': 'de',
            'zh': 'zh',
            'ja': 'ja',
            'ar': 'ar'
        }
        return language_map.get(language, 'en')

    def _extract_bounds(self, bounding_box) -> Dict:
        """Extract bounding box coordinates."""
        vertices = bounding_box.vertices
        return {
            'x1': vertices[0].x,
            'y1': vertices[0].y,
            'x2': vertices[2].x,
            'y2': vertices[2].y
        }

    def estimate_cost(self, image_count: int, free_tier: int = 0, unit_price: float = 0.0) -> float:
        """Estimate cost from configured provider pricing."""
        billable_images = max(image_count - free_tier, 0)
        if billable_images == 0:
            return 0.0

        return billable_images * unit_price

    async def health_check(self) -> bool:
        """Check Google Vision API availability."""
        try:
            # Verify client is properly configured
            # Note: In production, use a minimal valid image or quota check
            return self.client is not None
        except Exception:
            return False

AWS Textract Integration

Implement AWS Textract with proper IAM authentication:

# app/services/aws_textract_provider.py
import boto3
from botocore.exceptions import ClientError, BotoCoreError
from botocore.config import Config
import asyncio
from typing import Dict, List
import structlog

logger = structlog.get_logger()

class AWSTextractProvider(BaseOCRProvider):
    """AWS Textract API provider."""

    def __init__(self, region: str = 'us-east-1',
                 access_key_id: str = None,
                 secret_access_key: str = None):
        """Initialize AWS Textract client."""
        config = Config(
            region_name=region,
            retries={
                'max_attempts': 3,
                'mode': 'adaptive'
            }
        )

        self.client = boto3.client(
            'textract',
            config=config,
            aws_access_key_id=access_key_id,
            aws_secret_access_key=secret_access_key
        )

    async def process_image(self, image_bytes: bytes,
                          language: str = 'en') -> OCRResult:
        """Process image using AWS Textract."""
        try:
            loop = asyncio.get_event_loop()

            # Call Textract API
            response = await loop.run_in_executor(
                None,
                lambda: self.client.detect_document_text(
                    Document={'Bytes': image_bytes}
                )
            )

            # Extract text and words
            text_lines = []
            words = []

            for block in response['Blocks']:
                if block['BlockType'] == 'LINE':
                    text_lines.append(block['Text'])

                elif block['BlockType'] == 'WORD':
                    words.append({
                        'text': block['Text'],
                        'confidence': block['Confidence'],
                        'bounding_box': self._extract_bounds(block['Geometry'])
                    })

            # Combine text
            text = '\n'.join(text_lines)

            # Calculate average confidence
            confidences = [w['confidence'] for w in words]
            avg_confidence = sum(confidences) / len(confidences) if confidences else 0.0

            logger.info("textract_success",
                       word_count=len(words),
                       confidence=avg_confidence)

            return OCRResult(
                text=text,
                confidence=avg_confidence,
                words=words,
                metadata={
                    'provider': 'aws_textract',
                    'document_pages': response['DocumentMetadata']['Pages']
                }
            )

        except ClientError as e:
            error_code = e.response['Error']['Code']
            logger.error("textract_client_error",
                        error_code=error_code,
                        error=str(e))

            # Handle specific errors
            if error_code == 'ProvisionedThroughputExceededException':
                raise RateLimitError("Textract rate limit exceeded")
            elif error_code == 'InvalidParameterException':
                raise ValueError(f"Invalid parameter: {str(e)}")
            else:
                raise

        except BotoCoreError as e:
            logger.error("textract_botocore_error", error=str(e))
            raise

    def _extract_bounds(self, geometry: Dict) -> Dict:
        """Extract bounding box from Textract geometry."""
        bbox = geometry['BoundingBox']
        return {
            'left': bbox['Left'],
            'top': bbox['Top'],
            'width': bbox['Width'],
            'height': bbox['Height']
        }

    def estimate_cost(self, image_count: int) -> float:
        """Estimate cost for AWS Textract."""
        return image_count * 0.0015  # USD 1.50 per 1,000 pages

    async def health_check(self) -> bool:
        """Check AWS Textract availability."""
        try:
            # Verify client is properly configured
            # Note: In production, use get_document_analysis or similar lightweight call
            return self.client is not None
        except Exception:
            return False

Provider Manager with Fallback

Implement intelligent provider selection with fallback:

# app/services/ocr_manager.py
from typing import Optional, List
import structlog
from datetime import datetime, timedelta

logger = structlog.get_logger()

class RateLimitError(Exception):
    """Raised when rate limit is exceeded."""
    pass

class OCRManager:
    """Manages multiple OCR providers with fallback and cost optimization."""

    def __init__(self, providers: List[BaseOCRProvider],
                 cost_threshold: Optional[float] = None):
        """
        Initialize OCR manager.

        Args:
            providers: List of OCR providers in priority order
            cost_threshold: Maximum cost per 1,000 images
        """
        self.providers = providers
        self.cost_threshold = cost_threshold
        self.provider_stats = {}

        # Initialize stats for each provider
        for provider in providers:
            provider_name = provider.__class__.__name__
            self.provider_stats[provider_name] = {
                'success_count': 0,
                'error_count': 0,
                'total_cost': 0.0,
                'last_error': None,
                'circuit_open_until': None
            }

    async def process_image(self, image_bytes: bytes,
                          language: str = 'en',
                          preferred_provider: Optional[str] = None) -> OCRResult:
        """
        Process image with fallback logic.

        Args:
            image_bytes: Image data
            language: Language code
            preferred_provider: Preferred provider name (optional)

        Returns:
            OCR result
        """
        providers = self._get_provider_order(preferred_provider)

        last_error = None

        for provider in providers:
            provider_name = provider.__class__.__name__
            stats = self.provider_stats[provider_name]

            # Check circuit breaker
            if self._is_circuit_open(provider_name):
                logger.warning("circuit_breaker_open",
                             provider=provider_name)
                continue

            # Check cost threshold
            if self.cost_threshold:
                estimated_cost = provider.estimate_cost(1) * 1000
                if estimated_cost > self.cost_threshold:
                    logger.info("cost_threshold_exceeded",
                              provider=provider_name,
                              cost=estimated_cost)
                    continue

            try:
                logger.info("attempting_provider", provider=provider_name)

                result = await provider.process_image(image_bytes, language)

                # Update stats
                stats['success_count'] += 1
                stats['total_cost'] += provider.estimate_cost(1)

                logger.info("provider_success",
                           provider=provider_name,
                           confidence=result.confidence)

                return result

            except RateLimitError as e:
                logger.warning("rate_limit_exceeded",
                             provider=provider_name)
                self._open_circuit(provider_name, duration_minutes=5)
                last_error = e

            except Exception as e:
                logger.error("provider_error",
                           provider=provider_name,
                           error=str(e))

                stats['error_count'] += 1
                stats['last_error'] = str(e)

                # Open circuit breaker if error rate is high
                total_requests = stats['success_count'] + stats['error_count']
                if total_requests > 10:
                    error_rate = stats['error_count'] / total_requests
                    if error_rate > 0.5:
                        self._open_circuit(provider_name, duration_minutes=10)

                last_error = e

        # All providers failed
        raise Exception(f"All OCR providers failed. Last error: {last_error}")

    def _get_provider_order(self, preferred_provider: Optional[str]) -> List:
        """Get providers in execution order."""
        if preferred_provider:
            # Put preferred provider first
            providers = []
            for p in self.providers:
                if p.__class__.__name__ == preferred_provider:
                    providers.insert(0, p)
                else:
                    providers.append(p)
            return providers

        return self.providers

    def _is_circuit_open(self, provider_name: str) -> bool:
        """Check if circuit breaker is open for provider."""
        stats = self.provider_stats[provider_name]
        if stats['circuit_open_until']:
            if datetime.utcnow() < stats['circuit_open_until']:
                return True
            else:
                # Reset circuit breaker
                stats['circuit_open_until'] = None
                logger.info("circuit_breaker_closed", provider=provider_name)

        return False

    def _open_circuit(self, provider_name: str, duration_minutes: int):
        """Open circuit breaker for provider."""
        stats = self.provider_stats[provider_name]
        stats['circuit_open_until'] = datetime.utcnow() + timedelta(
            minutes=duration_minutes
        )
        logger.warning("circuit_breaker_opened",
                      provider=provider_name,
                      duration=duration_minutes)

    def get_stats(self) -> Dict:
        """Get statistics for all providers."""
        return self.provider_stats

Rate Limiting and Throttling

Implement client-side rate limiting:

# app/services/rate_limiter.py
from datetime import datetime, timedelta
from typing import Dict
import asyncio
import structlog

logger = structlog.get_logger()

class RateLimiter:
    """Token bucket rate limiter for API calls."""

    def __init__(self, requests_per_second: int,
                 burst_size: Optional[int] = None):
        """
        Initialize rate limiter.

        Args:
            requests_per_second: Sustained request rate
            burst_size: Maximum burst size (default: 2x sustained rate)
        """
        self.rate = requests_per_second
        self.burst = burst_size or (requests_per_second * 2)
        self.tokens = self.burst
        self.last_update = datetime.utcnow()
        self.lock = asyncio.Lock()

    async def acquire(self):
        """Acquire token, waiting if necessary."""
        async with self.lock:
            while self.tokens < 1:
                # Calculate wait time
                wait_time = (1.0 - self.tokens) / self.rate
                logger.debug("rate_limit_wait", wait_time=wait_time)
                await asyncio.sleep(wait_time)
                self._add_tokens()

            self.tokens -= 1

    def _add_tokens(self):
        """Add tokens based on elapsed time."""
        now = datetime.utcnow()
        elapsed = (now - self.last_update).total_seconds()
        self.tokens = min(
            self.burst,
            self.tokens + (elapsed * self.rate)
        )
        self.last_update = now

# Usage in provider
class RateLimitedProvider:
    def __init__(self, provider: BaseOCRProvider,
                 requests_per_second: int):
        self.provider = provider
        self.limiter = RateLimiter(requests_per_second)

    async def process_image(self, image_bytes: bytes,
                          language: str = 'en') -> OCRResult:
        await self.limiter.acquire()
        return await self.provider.process_image(image_bytes, language)

Cost Optimization Strategies

Implement intelligent cost optimization:

# app/services/cost_optimizer.py
from typing import Dict, List
import structlog

logger = structlog.get_logger()

class CostOptimizer:
    """Optimize OCR costs based on document characteristics."""

    def __init__(self):
        self.cost_history = []

    def select_provider(self, image_info: Dict,
                       providers: List[BaseOCRProvider]) -> BaseOCRProvider:
        """
        Select optimal provider based on image characteristics.

        Args:
            image_info: Dictionary with image metadata
            providers: Available providers

        Returns:
            Optimal provider
        """
        # Use free tier when available
        for provider in providers:
            if self._is_free_tier_available(provider):
                logger.info("using_free_tier",
                           provider=provider.__class__.__name__)
                return provider

    def _is_free_tier_available(self, provider):
        """
        Check if provider's free tier quota is still available.

        Args:
            provider: OCR provider instance

        Returns:
            Boolean indicating if free tier is available
        """
        # Get current month's usage for this provider
        current_month = datetime.now().strftime('%Y-%m')
        provider_name = provider.__class__.__name__

        # Retrieve usage from tracking system
        if not hasattr(self, 'usage_tracker'):
            self.usage_tracker = {}

        monthly_key = f"{provider_name}_{current_month}"
        current_usage = self.usage_tracker.get(monthly_key, 0)

        # Check provider-specific free tier limits
        free_tier_limits = {
            'GoogleVisionProvider': 1000,  # 1,000 images/month
            'AWSTextractProvider': 1000,    # AWS Free Tier: 1,000 pages/month for 3 months (Detect Document Text)
            'AzureVisionProvider': 5000,    # 5,000 images/month
        }

        free_tier_limit = free_tier_limits.get(provider_name, 0)

        return current_usage < free_tier_limit

        # For simple documents, use cheaper provider
        if self._is_simple_document(image_info):
            cheapest = min(providers, key=lambda p: p.estimate_cost(1))
            logger.info("using_cheap_provider_for_simple_doc",
                       provider=cheapest.__class__.__name__)
            return cheapest

        # For complex documents, use most accurate provider
        # (usually more expensive but worth it)
        if self._is_complex_document(image_info):
            # Google Vision typically best for complex/handwritten
            for provider in providers:
                if isinstance(provider, GoogleVisionProvider):
                    logger.info("using_premium_provider_for_complex_doc")
                    return provider

        # Default to first provider
        return providers[0]

    def _is_simple_document(self, image_info: Dict) -> bool:
        """Determine if document is simple (printed, high quality)."""
        return (
            image_info.get('quality', 0) > 80 and
            image_info.get('is_printed', True) and
            image_info.get('language') == 'en'
        )

    def _is_complex_document(self, image_info: Dict) -> bool:
        """Determine if document is complex (handwritten, low quality)."""
        return (
            image_info.get('is_handwritten', False) or
            image_info.get('quality', 100) < 60 or
            image_info.get('has_tables', False)
        )

Error Handling and Retry Logic

Implement comprehensive error handling:

# app/services/error_handler.py
from tenacity import (
    retry,
    stop_after_attempt,
    wait_exponential,
    retry_if_exception_type
)
import structlog

logger = structlog.get_logger()

class RetryableError(Exception):
    """Errors that should be retried."""
    pass

class PermanentError(Exception):
    """Errors that should not be retried."""
    pass

@retry(
    stop=stop_after_attempt(3),
    wait=wait_exponential(multiplier=1, min=4, max=60),
    retry=retry_if_exception_type(RetryableError),
    reraise=True
)
async def process_with_retry(provider: BaseOCRProvider,
                            image_bytes: bytes,
                            language: str) -> OCRResult:
    """Process image with automatic retry on transient errors."""
    try:
        return await provider.process_image(image_bytes, language)

    except RateLimitError as e:
        logger.warning("rate_limit_hit", provider=provider.__class__.__name__)
        raise RetryableError(f"Rate limit: {e}")

    except ConnectionError as e:
        logger.warning("connection_error", error=str(e))
        raise RetryableError(f"Connection failed: {e}")

    except ValueError as e:
        logger.error("validation_error", error=str(e))
        raise PermanentError(f"Invalid input: {e}")

    except Exception as e:
        logger.error("unexpected_error", error=str(e), exc_info=True)
        raise PermanentError(f"Unexpected error: {e}")

Conclusion

OCR API integration requires careful attention to provider selection, error handling, rate limiting, and cost optimization. The patterns presented here provide a practical foundation for production systems.

Key recommendations:

Implement multi-provider fallback for reliability
Use circuit breakers to avoid cascading failures
Apply rate limiting to respect API quotas
Optimize costs based on document complexity
Monitor provider performance and costs continuously

With these patterns in place, your OCR API integration will be reliable, cost-effective, and maintainable at scale.

References

Google Cloud. (2024). "Cloud Vision API Documentation." Google Cloud Platform.
Amazon Web Services. (2024). "Amazon Textract Developer Guide." AWS Documentation.
Nygard, M. (2018). "Release It! Design and Deploy Production-Ready Software." Pragmatic Bookshelf.

This guide provides practical patterns for OCR API integration that support reliability, performance, and cost control.

Choosing an OCR API Provider

Each major cloud provider offers OCR features with different strengths:

Google Cloud Vision API

Strengths: Multilingual support and document text detection, with handwriting support that should be validated on representative samples
Pricing: Check current Google Cloud Vision pricing before estimating production costs
Best for: Diverse language requirements, handwritten documents

AWS Textract

Strengths: Table extraction, form parsing, signature detection
Pricing: Check current AWS Textract pricing before estimating production costs
Best for: Structured documents, forms, invoices

Azure Computer Vision

Strengths: Layout analysis, batch processing, custom models
Pricing: Check current Azure AI Vision pricing before estimating production costs
Best for: Document layout understanding, batch operations

Microsoft Azure Form Recognizer

Strengths: Pre-built models for receipts, invoices, ID cards
Pricing: Pay-per-page with different tiers
Best for: Common document types with structured layouts

Handwriting Recognition API Requirements

Before selecting a provider, validate these requirements:

Requirement	Production Question	Failure Mode If Missing
Line-level text output	Can you inspect text line order before merging the page into one string?	Correct words appear in the wrong order
Word or line coordinates	Can reviewers jump from extracted text back to the source image?	Review becomes slow and error-prone
Confidence granularity	Are confidence values available per word, line, or region?	Low-quality regions are hard to route for review
Async job support	Can large pages or batches run without request timeouts?	Long handwriting jobs fail under load
Model or language hints	Can you pass language, script, or domain hints?	Names, abbreviations, and historical spelling degrade
Retention controls	Are uploads retained, used for training, or stored outside the required region?	Sensitive collections create governance risk
Evaluation exports	Can you export enough detail to calculate CER/WER?	You cannot compare providers fairly

ℹ

Confidence Is Not Accuracy

Normalized Response Shape for Handwriting APIs

Normalize every provider into a response model that preserves text, geometry, confidence, and review status. This keeps your application independent from one vendor's response schema.

from dataclasses import dataclass
from typing import Literal

@dataclass
class TextRegion:
    text: str
    kind: Literal["printed", "handwritten", "unknown"]
    confidence: float | None
    bbox: tuple[float, float, float, float] | None
    page: int
    needs_review: bool

@dataclass
class RecognitionResult:
    provider: str
    document_id: str
    full_text: str
    regions: list[TextRegion]
    warnings: list[str]

def review_flag(region: TextRegion, threshold: float = 0.82) -> bool:
    if region.confidence is None:
        return True
    if region.kind == "handwritten" and region.confidence < threshold:
        return True
    return False

Multi-Provider Architecture

For production systems, implement a multi-provider strategy for reliability and cost optimization:

# app/services/ocr_provider.py
from abc import ABC, abstractmethod
from typing import Dict, List, Optional
from enum import Enum
import structlog

logger = structlog.get_logger()

class OCRProvider(str, Enum):
    GOOGLE_VISION = "google_vision"
    AWS_TEXTRACT = "aws_textract"
    AZURE_VISION = "azure_vision"
    TESSERACT = "tesseract"  # Fallback

class OCRResult:
    """Normalized OCR result across providers."""

    def __init__(self, text: str, [confidence](/articles/character-recognition-accuracy): float,
                 words: List[Dict], metadata: Dict):
        self.text = text
        self.confidence = confidence
        self.words = words
        self.metadata = metadata

class BaseOCRProvider(ABC):
    """Abstract base class for OCR providers."""

    @abstractmethod
    async def process_image(self, image_bytes: bytes,
                          language: str = 'en') -> OCRResult:
        """Process image and return normalized results."""
        pass

    @abstractmethod
    def estimate_cost(self, image_count: int) -> float:
        """Estimate processing cost for given image count."""
        pass

    @abstractmethod
    async def health_check(self) -> bool:
        """Check if provider is available."""
        pass

Google Cloud Vision Integration

Implement Google Cloud Vision with proper authentication and error handling:

# app/services/google_vision_provider.py
from google.cloud import vision
from google.oauth2 import service_account
from google.api_core import retry, exceptions
import asyncio
from typing import Dict, List
import structlog

logger = structlog.get_logger()

class GoogleVisionProvider(BaseOCRProvider):
    """Google Cloud Vision API provider."""

    def __init__(self, credentials_path: str):
        """Initialize with service account credentials."""
        credentials = service_account.Credentials.from_service_account_file(
            credentials_path
        )
        self.client = vision.ImageAnnotatorClient(credentials=credentials)

    async def process_image(self, image_bytes: bytes,
                          language: str = 'en') -> OCRResult:
        """Process image using Google Cloud Vision."""
        try:
            # Create image object
            image = vision.Image(content=image_bytes)

            # Configure image context
            image_context = vision.ImageContext(
                language_hints=[self._map_language_code(language)]
            )

            # Call API with retry logic
            response = await self._call_with_retry(
                self.client.document_text_detection,
                image=image,
                image_context=image_context
            )

            if response.error.message:
                raise Exception(f"Vision API error: {response.error.message}")

            # Extract text
            text = response.full_text_annotation.text

            # Extract words with bounding boxes
            words = []
            for page in response.full_text_annotation.pages:
                for block in page.blocks:
                    for paragraph in block.paragraphs:
                        for word in paragraph.words:
                            word_text = ''.join([
                                symbol.text for symbol in word.symbols
                            ])
                            words.append({
                                'text': word_text,
                                'confidence': word.confidence,
                                'bounding_box': self._extract_bounds(word.bounding_box)
                            })

            # Calculate average confidence
            confidences = [w['confidence'] for w in words if w['confidence'] > 0]
            avg_confidence = sum(confidences) / len(confidences) if confidences else 0.0

            logger.info("google_vision_success",
                       word_count=len(words),
                       confidence=avg_confidence)

            return OCRResult(
                text=text,
                confidence=avg_confidence * 100,
                words=words,
                metadata={
                    'provider': 'google_vision',
                    'language': language
                }
            )

        except exceptions.GoogleAPIError as e:
            logger.error("google_vision_api_error", error=str(e))
            raise
        except Exception as e:
            logger.error("google_vision_error", error=str(e))
            raise

    async def _call_with_retry(self, func, **kwargs):
        """Call API function with exponential backoff retry."""
        retry_policy = retry.Retry(
            initial=1.0,
            maximum=60.0,
            multiplier=2.0,
            deadline=300.0,
            predicate=retry.if_exception_type(
                exceptions.ServiceUnavailable,
                exceptions.DeadlineExceeded,
                exceptions.ResourceExhausted
            )
        )

        loop = asyncio.get_event_loop()
        return await loop.run_in_executor(
            None,
            lambda: func(**kwargs, retry=retry_policy)
        )

    def _map_language_code(self, language: str) -> str:
        """Map ISO 639-1 to Google Vision language codes."""
        language_map = {
            'en': 'en',
            'es': 'es',
            'fr': 'fr',
            'de': 'de',
            'zh': 'zh',
            'ja': 'ja',
            'ar': 'ar'
        }
        return language_map.get(language, 'en')

    def _extract_bounds(self, bounding_box) -> Dict:
        """Extract bounding box coordinates."""
        vertices = bounding_box.vertices
        return {
            'x1': vertices[0].x,
            'y1': vertices[0].y,
            'x2': vertices[2].x,
            'y2': vertices[2].y
        }

    def estimate_cost(self, image_count: int, free_tier: int = 0, unit_price: float = 0.0) -> float:
        """Estimate cost from configured provider pricing."""
        billable_images = max(image_count - free_tier, 0)
        if billable_images == 0:
            return 0.0

        return billable_images * unit_price

    async def health_check(self) -> bool:
        """Check Google Vision API availability."""
        try:
            # Verify client is properly configured
            # Note: In production, use a minimal valid image or quota check
            return self.client is not None
        except Exception:
            return False

AWS Textract Integration

Implement AWS Textract with proper IAM authentication:

# app/services/aws_textract_provider.py
import boto3
from botocore.exceptions import ClientError, BotoCoreError
from botocore.config import Config
import asyncio
from typing import Dict, List
import structlog

logger = structlog.get_logger()

class AWSTextractProvider(BaseOCRProvider):
    """AWS Textract API provider."""

    def __init__(self, region: str = 'us-east-1',
                 access_key_id: str = None,
                 secret_access_key: str = None):
        """Initialize AWS Textract client."""
        config = Config(
            region_name=region,
            retries={
                'max_attempts': 3,
                'mode': 'adaptive'
            }
        )

        self.client = boto3.client(
            'textract',
            config=config,
            aws_access_key_id=access_key_id,
            aws_secret_access_key=secret_access_key
        )

    async def process_image(self, image_bytes: bytes,
                          language: str = 'en') -> OCRResult:
        """Process image using AWS Textract."""
        try:
            loop = asyncio.get_event_loop()

            # Call Textract API
            response = await loop.run_in_executor(
                None,
                lambda: self.client.detect_document_text(
                    Document={'Bytes': image_bytes}
                )
            )

            # Extract text and words
            text_lines = []
            words = []

            for block in response['Blocks']:
                if block['BlockType'] == 'LINE':
                    text_lines.append(block['Text'])

                elif block['BlockType'] == 'WORD':
                    words.append({
                        'text': block['Text'],
                        'confidence': block['Confidence'],
                        'bounding_box': self._extract_bounds(block['Geometry'])
                    })

            # Combine text
            text = '\n'.join(text_lines)

            # Calculate average confidence
            confidences = [w['confidence'] for w in words]
            avg_confidence = sum(confidences) / len(confidences) if confidences else 0.0

            logger.info("textract_success",
                       word_count=len(words),
                       confidence=avg_confidence)

            return OCRResult(
                text=text,
                confidence=avg_confidence,
                words=words,
                metadata={
                    'provider': 'aws_textract',
                    'document_pages': response['DocumentMetadata']['Pages']
                }
            )

        except ClientError as e:
            error_code = e.response['Error']['Code']
            logger.error("textract_client_error",
                        error_code=error_code,
                        error=str(e))

            # Handle specific errors
            if error_code == 'ProvisionedThroughputExceededException':
                raise RateLimitError("Textract rate limit exceeded")
            elif error_code == 'InvalidParameterException':
                raise ValueError(f"Invalid parameter: {str(e)}")
            else:
                raise

        except BotoCoreError as e:
            logger.error("textract_botocore_error", error=str(e))
            raise

    def _extract_bounds(self, geometry: Dict) -> Dict:
        """Extract bounding box from Textract geometry."""
        bbox = geometry['BoundingBox']
        return {
            'left': bbox['Left'],
            'top': bbox['Top'],
            'width': bbox['Width'],
            'height': bbox['Height']
        }

    def estimate_cost(self, image_count: int) -> float:
        """Estimate cost for AWS Textract."""
        return image_count * 0.0015  # USD 1.50 per 1,000 pages

    async def health_check(self) -> bool:
        """Check AWS Textract availability."""
        try:
            # Verify client is properly configured
            # Note: In production, use get_document_analysis or similar lightweight call
            return self.client is not None
        except Exception:
            return False

Provider Manager with Fallback

Implement intelligent provider selection with fallback:

# app/services/ocr_manager.py
from typing import Optional, List
import structlog
from datetime import datetime, timedelta

logger = structlog.get_logger()

class RateLimitError(Exception):
    """Raised when rate limit is exceeded."""
    pass

class OCRManager:
    """Manages multiple OCR providers with fallback and cost optimization."""

    def __init__(self, providers: List[BaseOCRProvider],
                 cost_threshold: Optional[float] = None):
        """
        Initialize OCR manager.

        Args:
            providers: List of OCR providers in priority order
            cost_threshold: Maximum cost per 1,000 images
        """
        self.providers = providers
        self.cost_threshold = cost_threshold
        self.provider_stats = {}

        # Initialize stats for each provider
        for provider in providers:
            provider_name = provider.__class__.__name__
            self.provider_stats[provider_name] = {
                'success_count': 0,
                'error_count': 0,
                'total_cost': 0.0,
                'last_error': None,
                'circuit_open_until': None
            }

    async def process_image(self, image_bytes: bytes,
                          language: str = 'en',
                          preferred_provider: Optional[str] = None) -> OCRResult:
        """
        Process image with fallback logic.

        Args:
            image_bytes: Image data
            language: Language code
            preferred_provider: Preferred provider name (optional)

        Returns:
            OCR result
        """
        providers = self._get_provider_order(preferred_provider)

        last_error = None

        for provider in providers:
            provider_name = provider.__class__.__name__
            stats = self.provider_stats[provider_name]

            # Check circuit breaker
            if self._is_circuit_open(provider_name):
                logger.warning("circuit_breaker_open",
                             provider=provider_name)
                continue

            # Check cost threshold
            if self.cost_threshold:
                estimated_cost = provider.estimate_cost(1) * 1000
                if estimated_cost > self.cost_threshold:
                    logger.info("cost_threshold_exceeded",
                              provider=provider_name,
                              cost=estimated_cost)
                    continue

            try:
                logger.info("attempting_provider", provider=provider_name)

                result = await provider.process_image(image_bytes, language)

                # Update stats
                stats['success_count'] += 1
                stats['total_cost'] += provider.estimate_cost(1)

                logger.info("provider_success",
                           provider=provider_name,
                           confidence=result.confidence)

                return result

            except RateLimitError as e:
                logger.warning("rate_limit_exceeded",
                             provider=provider_name)
                self._open_circuit(provider_name, duration_minutes=5)
                last_error = e

            except Exception as e:
                logger.error("provider_error",
                           provider=provider_name,
                           error=str(e))

                stats['error_count'] += 1
                stats['last_error'] = str(e)

                # Open circuit breaker if error rate is high
                total_requests = stats['success_count'] + stats['error_count']
                if total_requests > 10:
                    error_rate = stats['error_count'] / total_requests
                    if error_rate > 0.5:
                        self._open_circuit(provider_name, duration_minutes=10)

                last_error = e

        # All providers failed
        raise Exception(f"All OCR providers failed. Last error: {last_error}")

    def _get_provider_order(self, preferred_provider: Optional[str]) -> List:
        """Get providers in execution order."""
        if preferred_provider:
            # Put preferred provider first
            providers = []
            for p in self.providers:
                if p.__class__.__name__ == preferred_provider:
                    providers.insert(0, p)
                else:
                    providers.append(p)
            return providers

        return self.providers

    def _is_circuit_open(self, provider_name: str) -> bool:
        """Check if circuit breaker is open for provider."""
        stats = self.provider_stats[provider_name]
        if stats['circuit_open_until']:
            if datetime.utcnow() < stats['circuit_open_until']:
                return True
            else:
                # Reset circuit breaker
                stats['circuit_open_until'] = None
                logger.info("circuit_breaker_closed", provider=provider_name)

        return False

    def _open_circuit(self, provider_name: str, duration_minutes: int):
        """Open circuit breaker for provider."""
        stats = self.provider_stats[provider_name]
        stats['circuit_open_until'] = datetime.utcnow() + timedelta(
            minutes=duration_minutes
        )
        logger.warning("circuit_breaker_opened",
                      provider=provider_name,
                      duration=duration_minutes)

    def get_stats(self) -> Dict:
        """Get statistics for all providers."""
        return self.provider_stats

Rate Limiting and Throttling

Implement client-side rate limiting:

# app/services/rate_limiter.py
from datetime import datetime, timedelta
from typing import Dict
import asyncio
import structlog

logger = structlog.get_logger()

class RateLimiter:
    """Token bucket rate limiter for API calls."""

    def __init__(self, requests_per_second: int,
                 burst_size: Optional[int] = None):
        """
        Initialize rate limiter.

        Args:
            requests_per_second: Sustained request rate
            burst_size: Maximum burst size (default: 2x sustained rate)
        """
        self.rate = requests_per_second
        self.burst = burst_size or (requests_per_second * 2)
        self.tokens = self.burst
        self.last_update = datetime.utcnow()
        self.lock = asyncio.Lock()

    async def acquire(self):
        """Acquire token, waiting if necessary."""
        async with self.lock:
            while self.tokens < 1:
                # Calculate wait time
                wait_time = (1.0 - self.tokens) / self.rate
                logger.debug("rate_limit_wait", wait_time=wait_time)
                await asyncio.sleep(wait_time)
                self._add_tokens()

            self.tokens -= 1

    def _add_tokens(self):
        """Add tokens based on elapsed time."""
        now = datetime.utcnow()
        elapsed = (now - self.last_update).total_seconds()
        self.tokens = min(
            self.burst,
            self.tokens + (elapsed * self.rate)
        )
        self.last_update = now

# Usage in provider
class RateLimitedProvider:
    def __init__(self, provider: BaseOCRProvider,
                 requests_per_second: int):
        self.provider = provider
        self.limiter = RateLimiter(requests_per_second)

    async def process_image(self, image_bytes: bytes,
                          language: str = 'en') -> OCRResult:
        await self.limiter.acquire()
        return await self.provider.process_image(image_bytes, language)

Cost Optimization Strategies

Implement intelligent cost optimization:

# app/services/cost_optimizer.py
from typing import Dict, List
import structlog

logger = structlog.get_logger()

class CostOptimizer:
    """Optimize OCR costs based on document characteristics."""

    def __init__(self):
        self.cost_history = []

    def select_provider(self, image_info: Dict,
                       providers: List[BaseOCRProvider]) -> BaseOCRProvider:
        """
        Select optimal provider based on image characteristics.

        Args:
            image_info: Dictionary with image metadata
            providers: Available providers

        Returns:
            Optimal provider
        """
        # Use free tier when available
        for provider in providers:
            if self._is_free_tier_available(provider):
                logger.info("using_free_tier",
                           provider=provider.__class__.__name__)
                return provider

    def _is_free_tier_available(self, provider):
        """
        Check if provider's free tier quota is still available.

        Args:
            provider: OCR provider instance

        Returns:
            Boolean indicating if free tier is available
        """
        # Get current month's usage for this provider
        current_month = datetime.now().strftime('%Y-%m')
        provider_name = provider.__class__.__name__

        # Retrieve usage from tracking system
        if not hasattr(self, 'usage_tracker'):
            self.usage_tracker = {}

        monthly_key = f"{provider_name}_{current_month}"
        current_usage = self.usage_tracker.get(monthly_key, 0)

        # Check provider-specific free tier limits
        free_tier_limits = {
            'GoogleVisionProvider': 1000,  # 1,000 images/month
            'AWSTextractProvider': 1000,    # AWS Free Tier: 1,000 pages/month for 3 months (Detect Document Text)
            'AzureVisionProvider': 5000,    # 5,000 images/month
        }

        free_tier_limit = free_tier_limits.get(provider_name, 0)

        return current_usage < free_tier_limit

        # For simple documents, use cheaper provider
        if self._is_simple_document(image_info):
            cheapest = min(providers, key=lambda p: p.estimate_cost(1))
            logger.info("using_cheap_provider_for_simple_doc",
                       provider=cheapest.__class__.__name__)
            return cheapest

        # For complex documents, use most accurate provider
        # (usually more expensive but worth it)
        if self._is_complex_document(image_info):
            # Google Vision typically best for complex/handwritten
            for provider in providers:
                if isinstance(provider, GoogleVisionProvider):
                    logger.info("using_premium_provider_for_complex_doc")
                    return provider

        # Default to first provider
        return providers[0]

    def _is_simple_document(self, image_info: Dict) -> bool:
        """Determine if document is simple (printed, high quality)."""
        return (
            image_info.get('quality', 0) > 80 and
            image_info.get('is_printed', True) and
            image_info.get('language') == 'en'
        )

    def _is_complex_document(self, image_info: Dict) -> bool:
        """Determine if document is complex (handwritten, low quality)."""
        return (
            image_info.get('is_handwritten', False) or
            image_info.get('quality', 100) < 60 or
            image_info.get('has_tables', False)
        )

Error Handling and Retry Logic

Implement comprehensive error handling:

# app/services/error_handler.py
from tenacity import (
    retry,
    stop_after_attempt,
    wait_exponential,
    retry_if_exception_type
)
import structlog

logger = structlog.get_logger()

class RetryableError(Exception):
    """Errors that should be retried."""
    pass

class PermanentError(Exception):
    """Errors that should not be retried."""
    pass

@retry(
    stop=stop_after_attempt(3),
    wait=wait_exponential(multiplier=1, min=4, max=60),
    retry=retry_if_exception_type(RetryableError),
    reraise=True
)
async def process_with_retry(provider: BaseOCRProvider,
                            image_bytes: bytes,
                            language: str) -> OCRResult:
    """Process image with automatic retry on transient errors."""
    try:
        return await provider.process_image(image_bytes, language)

    except RateLimitError as e:
        logger.warning("rate_limit_hit", provider=provider.__class__.__name__)
        raise RetryableError(f"Rate limit: {e}")

    except ConnectionError as e:
        logger.warning("connection_error", error=str(e))
        raise RetryableError(f"Connection failed: {e}")

    except ValueError as e:
        logger.error("validation_error", error=str(e))
        raise PermanentError(f"Invalid input: {e}")

    except Exception as e:
        logger.error("unexpected_error", error=str(e), exc_info=True)
        raise PermanentError(f"Unexpected error: {e}")

Conclusion

Key recommendations:

Implement multi-provider fallback for reliability
Use circuit breakers to avoid cascading failures
Apply rate limiting to respect API quotas
Optimize costs based on document complexity
Monitor provider performance and costs continuously

With these patterns in place, your OCR API integration will be reliable, cost-effective, and maintainable at scale.

References

Google Cloud. (2024). "Cloud Vision API Documentation." Google Cloud Platform.
Amazon Web Services. (2024). "Amazon Textract Developer Guide." AWS Documentation.
Nygard, M. (2018). "Release It! Design and Deploy Production-Ready Software." Pragmatic Bookshelf.

OCR API Integration: Best Practices

Choosing an OCR API Provider#

Handwriting Recognition API Requirements#

Normalized Response Shape for Handwriting APIs#

Multi-Provider Architecture#

Google Cloud Vision Integration#

AWS Textract Integration#

Provider Manager with Fallback#

Rate Limiting and Throttling#

Cost Optimization Strategies#

Error Handling and Retry Logic#

Conclusion#

References#

OCR API Integration: Best Practices

Choosing an OCR API Provider#

Handwriting Recognition API Requirements#

Normalized Response Shape for Handwriting APIs#

Multi-Provider Architecture#

Google Cloud Vision Integration#

AWS Textract Integration#

Provider Manager with Fallback#

Rate Limiting and Throttling#

Cost Optimization Strategies#

Error Handling and Retry Logic#

Conclusion#

References#

Choosing an OCR API Provider

Handwriting Recognition API Requirements

Normalized Response Shape for Handwriting APIs

Multi-Provider Architecture

Google Cloud Vision Integration

AWS Textract Integration

Provider Manager with Fallback

Rate Limiting and Throttling

Cost Optimization Strategies

Error Handling and Retry Logic

Conclusion

References

Choosing an OCR API Provider

Handwriting Recognition API Requirements

Normalized Response Shape for Handwriting APIs

Multi-Provider Architecture

Google Cloud Vision Integration

AWS Textract Integration

Provider Manager with Fallback

Rate Limiting and Throttling

Cost Optimization Strategies

Error Handling and Retry Logic

Conclusion

References