---
title: "Amazon Bedrock Knowledge Bases — managed RAG, four chunking strategies, and the Retrieve vs RetrieveAndGenerate split"
date: 2026-06-06
service: "Amazon Bedrock"
component: "Knowledge Bases"
tags: [bedrock, knowledge-bases, rag, retrieve, retrieve-and-generate, generate-query, chunking, hierarchical-chunking, semantic-chunking, embeddings, vector-store, opensearch-serverless, s3-vectors, reranking, citations, multimodal]
source: https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base.html
verified_on: 2026-06-06
url: https://vanemmerik.ai/aws-ai/2026-06-06.html
---

# AWS Bedrock & AgentCore · Tip of the Day · 2026-06-06

## Amazon Bedrock Knowledge Bases — managed RAG without the plumbing

Yesterday was about *measuring* an agent. Today is about *feeding*
it. **Amazon Bedrock Knowledge Bases** is the managed
Retrieval-Augmented Generation (RAG) service: point it at your data,
pick an embedding model and a vector store, and it handles the
parse → chunk → embed → index pipeline and the query-time retrieval —
so your application gets relevant context (with citations) instead of
a model guessing from training data alone.

    aws bedrock-agent-runtime retrieve-and-generate \
      --input '{"text":"What is our refund window?"}' \
      --retrieve-and-generate-configuration '{...knowledgeBaseId...}'

≈ 9 min read · Amazon Bedrock · Knowledge Bases

## 01 · What Knowledge Bases actually is

From the [overview](https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base.html):
RAG "uses information from data sources to improve the relevancy and
accuracy of generated responses." Knowledge Bases is AWS's
out-of-the-box implementation — it "abstracts from the heavy lifting
of building pipelines" so you don't hand-roll the ingestion and
retrieval layer, and it removes "the need to continually train your
model to be able to use your private data."

Setting one up is four steps:

1. (Optional) Stand up a supported vector store — or let the console
   create an Amazon OpenSearch Serverless store for you.
2. Connect the knowledge base to an unstructured or structured data
   source.
3. Sync the data source so it's ingested into the index.
4. Query it from your application or agent — return raw sources,
   generate a natural-language answer, or transform the question into
   a structured query (e.g. SQL).

## 02 · The ingestion pipeline: parse → chunk → embed → index

For unstructured data, ingestion converts each document into text,
splits it into chunks, converts each chunk to a vector embedding, and
writes those embeddings to a vector index "while maintaining a mapping
to the original document." Those vectors are what make semantic search
possible: at query time the user's question is embedded too, and the
index returns the chunks whose vectors sit closest to it.

Before chunking, you can choose a **parser**. The default parser reads
text; for documents heavy with tables, figures, or scanned pages you
can use a foundation-model parser — the docs list **Claude vision**,
**Nova vision**, and **Llama 4 vision** model families — or the
**Bedrock Data Automation (BDA) parser** (in preview, US West (Oregon)
only, subject to change).

## 03 · Four chunking strategies

Chunking is the single most consequential ingestion choice. Bedrock
supports four text strategies:

- **Fixed-size** — you set a maximum tokens-per-chunk and an overlap
  percentage between consecutive chunks.
- **Default** — splits into chunks of "approximately 300 tokens,"
  honoring sentence boundaries so complete sentences stay intact.
- **No chunking** (`NONE`) — each document becomes a single chunk.
  Pre-split your files first if you go this route. The catch: you
  "cannot view page number in citation or filter by the
  *x-amz-bedrock-kb-document-page-number* metadata field."
- **Hierarchical** — nested parent and child chunks. Retrieval pulls
  the precise child chunks, then "replaces them with broader parent
  chunks" to hand the model more context. You set parent size, child
  size, and the overlap token count.
- **Semantic** — splits on meaning, not syntax, using three
  hyperparameters: maximum tokens, **buffer size** (surrounding
  sentences pulled in for each embedding — a buffer of 1 embeds the
  previous, current, and next sentence together), and a **breakpoint
  percentile threshold** (a higher threshold means fewer, larger
  chunks).

For multimodal content the rules differ: with **Nova multimodal
embeddings**, chunking happens at the embedding-model level — audio
and video chunk duration is configurable from **1–30 seconds**
(default 5) — and text chunking strategies apply only to text
documents. The BDA parser instead converts audio/video to transcripts
and scene summaries first, then applies the text strategies.

## 04 · Where the vectors live

Knowledge Bases can index into a range of vector stores. You either
let the console spin up an OpenSearch Serverless collection, or bring
your own:

- **Amazon OpenSearch Serverless** (console can auto-create it)
- **Amazon OpenSearch Managed Clusters**
- **Amazon S3 Vectors** (cost-optimized vector storage for RAG)
- **Amazon Aurora PostgreSQL** (pgvector)
- **Amazon Neptune Analytics** (for graph-backed retrieval)
- **Pinecone**, **Redis Enterprise Cloud**, **MongoDB Atlas**
  (credentials brokered through AWS Secrets Manager)

## 05 · Querying: Retrieve vs RetrieveAndGenerate vs GenerateQuery

Three runtime APIs, each a level up in how much work AWS does for you:

- **`Retrieve`** — returns the source chunks (or images) most relevant
  to the query as an array. You own the prompt assembly and generation.
- **`RetrieveAndGenerate`** — joins `Retrieve` with `InvokeModel` to do
  the whole RAG loop: retrieve the chunks, generate a natural-language
  answer, and attach **citations** to the specific source chunks. If
  the source includes visual elements, the model can use insights from
  those images and attribute them.
- **`GenerateQuery`** — converts a natural-language question into a
  query suited to a structured data store (e.g. SQL).

`RetrieveAndGenerate` is the combined action: under the hood it uses
`GenerateQuery` (for structured stores), `Retrieve`, and `InvokeModel`.
Because `Retrieve` is exposed on its own, you "have the flexibility to
decouple the steps in RAG and customize them." With either retrieval
API you can add a **reranking model** to re-order results by relevance
before they reach the prompt.

## 06 · Embeddings, multimodal, and structured data

The embedding model turns text into the vectors the index compares.
Supported models and their vector types:

- **Titan Embeddings G1 – Text** — floating-point, 1536 dimensions.
- **Titan Text Embeddings V2** — floating-point or **binary**;
  256 / 512 / 1024 dimensions.
- **Cohere Embed English v3** / **Multilingual v3** — floating-point
  or binary, 1024 dimensions.
- **Titan Multimodal Embeddings G1** and **Cohere Embed v3
  (Multimodal)** — 1024 dimensions, for image and text.

Binary vectors use 1 bit per dimension instead of 32, so they're far
cheaper to store — but less precise, and they require both a model and
a vector store that support binary. Beyond plain text, Knowledge Bases
can extract and retrieve images from visually rich documents, accept
images as queries, convert natural language to SQL against structured
stores, build on an **Amazon Kendra GenAI index** or **Neptune
Analytics graphs**, and plug into an **Amazon Bedrock Agents**
workflow. It also supports **inference profiles** for cross-Region
inference to raise throughput on the parsing and generation models.

## Limits worth knowing

- **`NONE` chunking drops page-number citations.** No
  `x-amz-bedrock-kb-document-page-number` filter, no page numbers in
  citations — pre-split your files if granularity matters.
- **Hierarchical chunking isn't recommended with an S3 vector
  bucket.** Combined parent+child token counts over ~8,000 can hit
  metadata size limits. Hierarchical retrieval can also return *fewer*
  results than requested, since child chunks get replaced by parents.
- **Semantic chunking costs extra.** It invokes a foundation model
  during ingestion, billed on top of embedding cost.
- **Cross-Region inference shares data across Regions.** The docs flag
  this explicitly — factor it into your data-residency rules before
  enabling inference profiles.
- **Custom and SageMaker models need explicit prompts.** If you bring
  your own generation model, you must supply the orchestration and
  generation prompt templates with the required input variables.
- **The BDA parser is preview, Oregon-only.** Treat it as subject to
  change.

## Try it in five minutes

1. In the Bedrock console, create a knowledge base; let it create an
   **OpenSearch Serverless** store so you skip the vector-store setup.
2. Point a data source at an S3 prefix of PDFs or Markdown. Start with
   **default chunking** (≈300 tokens) — tune later.
3. Pick **Titan Text Embeddings V2** as the embedding model and
   **Sync**. Wait for ingestion to finish.
4. Open **Test knowledge base**, toggle *Generate responses* on, ask a
   question, and inspect the **citations** — every claim should trace
   to a source chunk.
5. When you need control, drop to the API: call `Retrieve` for raw
   chunks, or `RetrieveAndGenerate` for an answer-with-citations in one
   request. Add a reranker if the top results look noisy.

Once that loop feels natural, swap default chunking for
**hierarchical** or **semantic** and re-sync to see how retrieval
quality shifts on your own corpus.

## Verified against the official AWS docs on 2026-06-06

Sources:

- Retrieve data and generate AI responses with Amazon Bedrock Knowledge Bases — https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base.html
- How Amazon Bedrock knowledge bases work — https://docs.aws.amazon.com/bedrock/latest/userguide/kb-how-it-works.html
- Retrieving information from data sources — https://docs.aws.amazon.com/bedrock/latest/userguide/kb-how-retrieval.html
- How content chunking works — https://docs.aws.amazon.com/bedrock/latest/userguide/kb-chunking-parsing.html
- Supported models and Regions — https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-supported.html

If the docs change, this tip is a snapshot of that day — check the sources for current behaviour.

---

*This page — research, writing, verification, and deployment — was built by Claude Cowork. The tip was generated this morning, cross-checked against the official AWS docs, and published to Cloudflare R2 on a schedule. A daily experiment by Monty van Emmerik · vanemmerik.ai*