What you’ll learn (learning objectives)
-
Understand what a Knowledge Store is and where it fits in an enrichment pipeline
Learn how the knowledge store captures outputs from enrichment—such as extracted text, entities, key-value pairs, tables, metadata, and OCR output—and persists them in structured forms. -
Understand why a Knowledge Store is essential for persisting structured information
See how storing enriched outputs enables auditability, re-use, and downstream analytics without re-running costly enrichment steps. -
Organize enriched data using projections (table and file projections)
Learn common projection types (relational-style tables and file formats like JSON and Parquet) and when to use each for querying, analytics, or bulk processing. -
Configure a Knowledge Store within your enrichment pipeline and access its output
Walk through configuring the store so enriched artifacts are written to persistent targets, and learn how to consume those outputs from tools like Power BI, Azure Synapse, and Logic Apps.
A knowledge store is not just storage — it’s the bridge between enrichment (extracting meaning from documents) and downstream systems that need structured, queryable, and analyzable data.
What is a Knowledge Store?
A Knowledge Store is a structured persistence layer that receives enrichment outputs and writes them into formats suitable for downstream consumption. Instead of only storing raw documents, a knowledge store preserves the enriched artifacts—entities, relationships, tables, and metadata—so you can query, audit, and analyze results without re-running enrichment. Key benefits:- Persistent, auditable enriched outputs
- Reuse across analytics, reporting, and automation
- Reduced compute cost by avoiding repeated enrichment runs
- Flexible outputs for BI tools, data warehouses, and data lakes
How a Knowledge Store fits into the enrichment pipeline
An enrichment pipeline takes raw documents (PDFs, images, Office files, etc.), applies extractors and cognitive skills, and emits enriched artifacts. The knowledge store captures those artifacts and projects them into target structures (tables, JSON, Parquet, or file hierarchies) for long-term use. Typical flow:- Ingest documents into the enrichment pipeline.
- Run cognitive skills to extract text, OCR, entities, key-value pairs, and tables.
- Persist enriched outputs to the knowledge store.
- Consume persisted outputs from BI or analytics tools.
Common projection types and when to use them
Choosing the right projection type affects query, analytics, and storage goals. Use the following table to decide which projection best fits your use case.| Projection Type | Use Case | Best for |
|---|---|---|
| Table projection | Relational queries and reporting | Power BI, SQL-based analysis, Azure Synapse |
| JSON file projection | Flexible, nested document storage and event-driven processing | Data lakes, downstream ETL, ad-hoc processing |
| Parquet file projection | Columnar, compressed storage for large-scale analytics | Batch analytics, Spark, Synapse Analytics |
| File projection (raw artifacts) | Store original enriched artifacts and metadata | Auditability, backup, document-level replay |
Configuring a Knowledge Store
When configuring a knowledge store:- Define projections that map enrichment outputs to target formats (tables, JSON, Parquet, or file hierarchies).
- Map fields and relationships so relational projections preserve entity context.
- Select storage targets (Azure Storage accounts, Data Lake, or Synapse).
- Configure access and retention policies to support audit and compliance needs.
- Identify the outputs you need (entities, tables, KV pairs).
- Choose projection types for each output based on query and processing needs.
- Configure mappings and storage targets in the enrichment pipeline.
- Validate persisted data using sample queries or BI tooling.
Consuming Knowledge Store outputs
Persisted outputs can be consumed by a variety of tools and services:- Power BI: Connect to table projections or Synapse for visualization and dashboards.
- Azure Synapse: Query Parquet or table projections for large-scale analytics.
- Logic Apps / Power Automate: Trigger workflows based on new files or metadata writes.
- Custom apps: Use SDKs or REST endpoints to query or download persisted artifacts.
Design considerations and best practices
- Model projections for common query patterns to avoid expensive joins or transformations at query time.
- Prefer Parquet for large-scale analytical workloads due to columnar storage and compression.
- Use table projections when you need relational querying with BI tools.
- Keep raw enriched artifacts for auditability and reprocessing scenarios.
- Apply retention policies and role-based access control to secure persisted outputs.
Next steps
- Review your enrichment pipeline outputs and decide which artifacts to persist.
- Map those artifacts to projection types that match your analytics and reporting needs.
- Configure a Knowledge Store in your pipeline and validate with sample data.
- Connect Power BI, Synapse, or other services to begin consuming and visualizing your enriched data.