v vanemmerik.ai / aws-ai
Tip of the Day 2026 · 06 · 06 ≈ 9 min read Amazon Bedrock · Knowledge Bases

Bedrock Knowledge Bases.

Yesterday's Evaluations tip was about measuring an agent. Today is about feeding it. Amazon Bedrock Knowledge Bases is AWS's managed Retrieval-Augmented Generation (RAG) service: point it at your data, pick an embedding model and a vector store, and it runs the parse → chunk → embed → index pipeline and the query-time retrieval — so your app gets relevant context, with citations, instead of a model guessing from training data alone.

$ aws bedrock-agent-runtime retrieve-and-generate …  — retrieve + generate, with citations

01What Knowledge Bases actually is

From the overview: RAG "uses information from data sources to improve the relevancy and accuracy of generated responses." Knowledge Bases is AWS's out-of-the-box implementation — it "abstracts from the heavy lifting of building pipelines" so you don't hand-roll the ingestion and retrieval layer, and it removes "the need to continually train your model to be able to use your private data."

Standing one up is four steps:

The shift

You stop owning the RAG plumbing — chunkers, embedding jobs, a vector index, citation tracking — and start owning two decisions: how to chunk, and which retrieval API to call.

02The ingestion pipeline: parse → chunk → embed → index

For unstructured data, ingestion converts each document to text, splits it into chunks, converts each chunk into a vector embedding, and writes those embeddings to a vector index "while maintaining a mapping to the original document." Those vectors are what make semantic search work: at query time the user's question is embedded too, and the index returns the chunks whose vectors sit closest to it.

Before chunking, you pick a parser. The default parser reads text; for documents heavy with tables, figures, or scanned pages you can use a foundation-model parser — the docs list the Claude vision, Nova vision, and Llama 4 vision model families — or the Bedrock Data Automation (BDA) parser (in preview, US West (Oregon) only, subject to change).

03Four chunking strategies

Chunking is the single most consequential ingestion choice. Bedrock supports four text strategies:

StrategyWhat it does
Fixed-size You set a maximum tokens-per-chunk and an overlap percentage between consecutive chunks.
Default Splits into chunks of "approximately 300 tokens," honoring sentence boundaries so complete sentences stay intact.
No chunking (NONE) Each document becomes a single chunk — pre-split your files first. You lose page numbers in citations and the x-amz-bedrock-kb-document-page-number metadata filter.
Hierarchical Nested parent/child chunks. Retrieval pulls precise child chunks, then "replaces them with broader parent chunks" for more context. You set parent size, child size, and overlap tokens.
Semantic Splits on meaning, not syntax. Three knobs: max tokens, buffer size (surrounding sentences embedded together), and a breakpoint percentile threshold (higher = fewer, larger chunks).

Multimodal content follows different rules: with Nova multimodal embeddings, chunking happens at the embedding-model level — audio and video chunk duration is configurable from 1–30 seconds (default 5) — and the text strategies above apply only to text documents. The BDA parser instead converts audio/video to transcripts and scene summaries first, then applies the text strategies.

04Where the vectors live

Knowledge Bases indexes into a range of vector stores. Either let the console spin up an OpenSearch Serverless collection, or bring your own:

05Querying: Retrieve vs RetrieveAndGenerate vs GenerateQuery

Three runtime APIs, each a level up in how much AWS does for you:

RetrieveAndGenerate is the combined action: under the hood it uses GenerateQuery (for structured stores), Retrieve, and InvokeModel. Because Retrieve is exposed on its own, you "have the flexibility to decouple the steps in RAG and customize them." With either retrieval API you can add a reranking model to re-order results by relevance before they reach the prompt.

06Embeddings, multimodal, and structured data

The embedding model turns text into the vectors the index compares. Supported models and their vector types:

ModelVector type · dimensions
Titan Embeddings G1 – Text Floating-point · 1536
Titan Text Embeddings V2 Floating-point or binary · 256 / 512 / 1024
Cohere Embed English v3 / Multilingual v3 Floating-point or binary · 1024
Titan Multimodal G1 / Cohere Embed v3 (Multimodal) 1024 — image and text

Binary vectors use 1 bit per dimension instead of 32, so they're far cheaper to store — but less precise, and they require both a model and a vector store that support binary. Beyond plain text, Knowledge Bases can extract and retrieve images from visually rich documents, accept images as queries, convert natural language to SQL against structured stores, build on an Amazon Kendra GenAI index or Neptune Analytics graphs, and plug into an Amazon Bedrock Agents workflow. It also supports inference profiles for cross-Region inference to raise throughput on parsing and generation.

07Limits worth knowing

08Try it in five minutes

Once that loop feels natural, swap default chunking for hierarchical or semantic and re-sync to see how retrieval quality shifts on your own corpus.

Tomorrow: a closer look at Bedrock Guardrails meets Knowledge Bases — grounding and relevance contextual checks that score a generated answer against the chunks it was supposed to be based on.

Verified against the official AWS docs on 2026-06-06.
Sources: Retrieve data and generate AI responses with Knowledge Bases, How knowledge bases work, Retrieving information from data sources, How content chunking works, Supported models and Regions.
If the docs change, this tip is a snapshot of that day — check the sources for current behaviour.
Heads up — this tip is from 2026-06-06. AWS services move fast. Cross-check the Knowledge Bases developer guide before relying on specifics, then come back for today's tip →
C

This page — research, writing, verification, and deployment — was built by Claude Cowork. No human touched the prose, the layout, or the upload pipeline. The tip was generated this morning, cross-checked against the official AWS docs by an independent verification pass, and published to Cloudflare R2 on a schedule.

A daily experiment by Monty van Emmerik · vanemmerik.ai · what is Claude Cowork?