MLOps for Azure Document Intelligence

AUTHOR

Juan Alejandro Arguello

Senior Software Developer

https://www.linkedin.com/in/juanarguello/

MLOps for Azure Document Intelligence: Orchestrating the Iterative Development of Custom Models

Introduction: The Context of Document Extraction

Azure AI Document Intelligence provides a robust framework of prebuilt and custom models for extracting structured data from documents at scale.

For standardized document types, Azure AI Document Intelligence provides prebuilt models that achieve high accuracy with minimal setup. However, enterprise systems quickly move beyond a single schema, ingesting documents from multiple sources with evolving layouts.

As soon as new document formats are introduced, complexity shifts from model accuracy to orchestration.

The Iterative Development Journey: From Extractor to Composed Models

The technical implementation of an extraction system is not a single event, but an iterative process.

Document extraction systems typically begin with a single extractor. As new document layouts appear, multiple custom extractors are introduced. Routing logic then becomes difficult to maintain, leading to the adoption of a classifier to identify document type.

Finally, composed models re-establish a single integration surface by combining the classifier and extractors behind one endpoint.

Document Intelligence composed custom models.

Phases of Complexity in Document Ingestion Systems

The Problem: Handling the Limits of Confidence

No model is 100% accurate forever. As documents drift, confidence scores drop. Without an explicit management layer, low-confidence outputs either block pipelines or silently corrupt downstream systems.

At this point, confidence is no longer a metric to observe, but a decision the system must make.

The confidence gate makes extraction failures explicit, defining how they are detected, contained, and reused for iterative improvement.

Interpret and improve accuracy and confidence scores

Confidence‑based decision flow for routing low‑reliability extractions into human review and retraining

The "Review Candidate" Pattern

Rather than embedding failure handling directly into ingestion logic, extraction results are evaluated against a confidence gate.

Documents that fail this gate are flagged as Review Candidates.
The main system notifies the Model Administration Sidecar via an API call, triggering a Human-in-the-Loop (HITL) workflow.
A Document Reviewer, typically from operations or finance, validates the business meaning of the document and determines whether it belongs to the system.
Documents that do not match existing types can be flagged for escalation, where a decision is made on whether a new document class should be supported.

Once documents are reviewed and validated, the focus shifts from decision-making to preparing data for reliable retraining.

Training Requirements: The Data Staging

Retraining requires more than approved documents; it requires reproducible data lineage.

Reviewed documents must be staged together with their OCR outputs and label files so that retraining can be repeated and audited.

Maintaining this lineage is a prerequisite for any iterative training process.

Best practices: generating labeled datasets

Model Lineage

In practice, extractors, classifiers, and composed models evolve independently.

Improvements are rarely applied across all models at once, making coordination between model versions a central MLOps challenge.

Each model in the system—the classifier, individual extractors, and composed models

Orchestrating Model Iteration and Lineage

Once confidence handling is in place, the primary challenge becomes orchestrating how models evolve over time, not how individual documents are extracted.

Each component in the system—the classifier, individual extractors, and the composed model—has an independent lifecycle and version lineage, yet must be coordinated deliberately. In our experience, retraining rarely happens atomically across all models; improvements are selective and incremental.

To manage this complexity, the architecture introduces a model-iteration sidecar that acts as a control plane for review tracking, dataset curation, retraining, composed model assembly. Rather than maintaining static training snapshots, training datasets are built incrementally over time, reflecting the evolving document landscape and ensuring continuity across iterations.

The ingestion application remains decoupled from the specific versions of the models. It queries a Model Registry within the Administration system to identify the current "Active" Composed Model ID.

A stable runtime is fed by an evolving set of models whose versions are curated and promoted via a

MLOps Sidecar Implementation

These orchestration concepts are implemented in an open-source reference sidecar that manages: Review candidates, incremental datasets, staging, retraining, model composition.

MLOps sidecar implementation: Document Intelligence Model Administration.

The repository focuses on control-plane logic and intentionally omits application-specific ingestion concerns.

Expansion Layers: Document Intelligence and Content Understanding

Content Understanding becomes relevant when document variability exceeds the practical limits of layout-based extraction, complementing—not replacing—Document Intelligence.

Choose the right Foundry tool for document processing: Document Intelligence, Content Understanding, or Foundry models.