Choosing a Bedrock Knowledge Base Vector Store

April 10, 2028 · 14 min read

The situation

An internal-tools team is building a Bedrock Knowledge Base to power a company-wide documentation chatbot. The ingest side: ~8,000 docs across a documentation site, a wiki, and an S3 bucket of archived PDFs, re-ingested nightly. Each document chunks to ~500-token pieces, giving ~40,000 chunks total. The base embedding model is amazon.titan-embed-text-v2 (1024 dimensions), so ~40k × 1024 floats = ~160MB of raw vectors plus the chunk text and metadata.

Query side: ~2,000 queries/day during business hours, peaking at ~10 QPS. Per query: a top-K=5 retrieval plus the LLMLLMA neural network trained to predict the next token in a sequence, large enough that it generalises to tasks it wasn’t explicitly trained for. call. Latency budget: retrieval under 300ms end-to-end so the overall p95 stays under 3 seconds.

Bedrock Knowledge Bases supports five vector storeVectorAn ordered list of numbers – in AI usage, almost always an embedding – and by extension the databases that index them for nearest-neighbour search. back-ends. The team needs to pick one:

Amazon OpenSearch Serverless (vector search collection)
Amazon Aurora PostgreSQL Serverless v2 (with pgvector)
Pinecone (managed SaaS, AWS-region-matched)
Redis Enterprise Cloud (managed SaaS, AWS-region-matched)
Amazon Neptune Analytics (for graph-RAG, different use case)

All five can store embeddings, all five can do approximate nearest neighbour (ANN) search, all five work as Bedrock Knowledge Base back-ends. The question is which one fits this team’s workload, budget, and operational model.

What actually matters

Before comparing products, it helps to agree on what a vector store actually does. The job has three parts:

Storage: embedding vectors + the chunk text + metadata (doc URL, version, permission tags). Size is a function of chunk count and embedding dimensionality; ours is ~160MB of vectors plus some amount of chunk text and metadata, call it 500MB total. Small by database standards.

Indexing: some flavour of ANN index over the vectors. HNSW (Hierarchical Navigable Small World) is the dominant choice for modern vector search; IVF variants are still around for specific scale/quality trade-offs. The index lets top-K-similar lookup complete in sub-linear time over the dataset. Building the index is a one-off cost at ingest time; maintaining it means handling inserts, deletes, and updates.

Query: given a query vector, return top-K similar chunks plus their text + metadata. Usually combined with a metadata filter (“only docs with access:public tag”) and sometimes with a lexical-search hybrid (BM25 + vector, so-called “hybrid search”).

What varies between stores:

The underlying primitive: a search engine (OpenSearch), a relational database (Aurora), a purpose-built vector DB (Pinecone), or an in-memory datastore (Redis).
Scaling model: provisioned per-resource (Aurora nodes, Redis instances), serverless capacity units (OpenSearch Serverless OCUs, Pinecone pods), or fully managed elastic (Pinecone’s newer serverless tier).
Cost shape: hourly minimums vs per-request billing; whether there’s a fixed floor that kicks in regardless of traffic.
Operational ownership: AWS-managed, third-party SaaS, self-managed.
Integrations beyond vectors: Does the store also do lexical search (OpenSearch), or relational joins (Aurora), or session state (Redis)? Some workloads benefit from collocating vectors with other data.

For 40,000 chunks and 10 QPS, the vector store is not the binding constraint. Any of the five will deliver sub-100ms retrieval comfortably. The binding constraints are cost at this scale and operational burden.

What we’ll filter on

Five filters for picking a backing store:

Cost floor, what’s the minimum monthly bill even with zero traffic?
Scaling model, how does cost grow as index size and QPS grow?
Operational model. AWS-managed, SaaS, or self-tuned?
Adjacent capabilities, lexical search, relational queries, other uses?
Region / networking, in-Region, in-VPC, cross-cloud?

The vector-store landscape

1. Amazon OpenSearch Serverless (vector collection). Pay-per-OCU (OpenSearch Compute Unit) capacity. Minimum is 2 OCUs for indexing + 2 OCUs for search, each ~$0.24/OCU-hour, roughly $700/month floor even at zero traffic. Extremely capable at scale: native HNSW, handles millions of vectors without blinking, lexical+vector hybrid search, fine-grained ACLs. Heavyweight for small RAG projects; excellent for multi-tenant or high-QPS shared retrieval services. Fully AWS-managed, in-VPC option available.

2. Amazon Aurora PostgreSQL with pgvector. A regular Aurora cluster; the pgvector extension provides vector storage and ANN indexes (HNSW and IVFFlat). Aurora Serverless v2 scales compute from 0.5 ACUs ($0.06/ACU-hour) up to 128 ACUs; minimum is ~$45/month at the 0.5-ACU floor if always-on, or lower still with Aurora Serverless v2’s auto-pause (minutes to resume). Pairs well with teams that already run Postgres, the embeddings live beside the application data, joins work, transactions work. Less optimised for extreme ANN scale than purpose-built stores; fine for up to low millions of vectors with HNSW.

3. Pinecone (SaaS). Purpose-built vector database, mature ecosystem, multi-region. Two tiers: pods (older, provisioned) and serverless (newer, pay per storage + per query). Serverless starts at ~$0.03/GB-month storage + per-1000-query pricing; at 500MB index + ~60k queries/month, the bill is in single-digit dollars plus a fixed $40-$70 minimum depending on the tier. Fast, purpose-built, low-op. Not AWS-native; PrivateLink available but the data plane crosses into Pinecone’s AWS account.

4. Redis Enterprise Cloud with Redis Search/VSS. Managed Redis with vector search. Pay per GB of data + per-shard throughput. Small workloads fit on the smallest managed Redis tier; minimum ~$25/month. Ultra-low-latency retrieval (sub-10ms is typical), because it’s in-memory. Useful when retrieval latency is a sensitive budget. Redis cloud is a SaaS (AWS-region-matched); data plane is Redis’s account.

5. Amazon Neptune Analytics. Graph database with built-in vector support, intended for graph-RAG patterns (where the retrieval walks entity relationships in addition to similarity). Different shape of RAG; not the doc-chunk pattern this team is building. Worth naming.

Side by side

Store	Cost floor	Cost growth	Operational model	Adjacent capabilities	Network
OpenSearch Serverless	~$700/mo	OCU-hours for indexing + search	AWS-managed	Lexical + hybrid search	VPC-native
Aurora pgvector	~$45/mo (0.5 ACU)	ACU-hours	AWS-managed, some DBA	Relational joins, transactions	VPC-native
Pinecone	~$40-70/mo (entry)	Per GB + per 1000 queries	SaaS	Purpose-built vector ops	PrivateLink available
Redis Enterprise	~$25/mo (smallest)	Per GB + shard throughput	SaaS	Low-latency cache, session store	PrivateLink available
Neptune Analytics	Provisioned instance	Per instance-hour	AWS-managed	Graph queries + vectors	VPC-native

Reading the table against the 40k-chunk / 10-QPS workload:

OpenSearch Serverless is overkill. Building at this scale is paying $700/mo for capacity that will sit at 5% utilisation. Great at a million chunks; expensive at forty thousand.
Aurora pgvector is the sweet spot if the team already runs Aurora. ~$45/month floor, scales cleanly, vectors live beside application data, familiar operational model.
Pinecone is the sweet spot if the team wants the lowest-touch SaaS experience and is comfortable with a non-AWS data plane. Entry pricing is comparable to Aurora; the purpose-built APIs are pleasant to work with.
Redis Enterprise wins on retrieval latency if the team needs sub-10ms retrieval, for chatbot RAG at 300ms retrieval budget, this is headroom the team doesn’t need.
Neptune Analytics is the wrong shape for doc-chunk RAG; it’s the correct shape for graph-walking RAG.

For this team: Aurora pgvector if the team already uses Postgres for application data; Pinecone if not; OpenSearch Serverless later if the chunk count grows past a few million or QPS jumps 10x.

Five stores, one use case

All five can serve this workload. The discriminators are the cost floor (how much you pay before a query is made) and the operational touch (how much of your attention the store demands). For 40k chunks and 10 QPS, the fit region is the lower-left quadrant.

The pick in depth

Aurora pgvector. The team already runs Aurora for application data; adding a vector table beside it means no new service to operate, no new IAM surface, no new network path. The setup:

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE doc_chunks (
    chunk_id     bigserial PRIMARY KEY,
    doc_url      text NOT NULL,
    doc_version  text NOT NULL,
    chunk_text   text NOT NULL,
    embedding    vector(1024) NOT NULL,
    access_tags  text[] NOT NULL DEFAULT '{}',
    updated_at   timestamptz NOT NULL DEFAULT now()
);

CREATE INDEX doc_chunks_emb_hnsw
    ON doc_chunks USING hnsw (embedding vector_cosine_ops)
    WITH (m = 16, ef_construction = 64);

CREATE INDEX doc_chunks_url ON doc_chunks (doc_url);
CREATE INDEX doc_chunks_tags ON doc_chunks USING gin (access_tags);

HNSW parameters: m=16 controls the graph connectivity (higher = better recall, more memory), ef_construction=64 controls build quality. For 40k vectors, defaults are fine; tune upwards for recall if evaluation demands.

Bedrock Knowledge Base pointing at Aurora:

import boto3
bedrock = boto3.client("bedrock-agent")
kb = bedrock.create_knowledge_base(
    name="docs-kb",
    roleArn="arn:aws:iam::111122223333:role/bedrock-kb-role",
    knowledgeBaseConfiguration={
        "type": "VECTOR",
        "vectorKnowledgeBaseConfiguration": {
            "embeddingModelArn": "arn:aws:bedrock:eu-west-1::foundation-model/amazon.titan-embed-text-v2:0",
        },
    },
    storageConfiguration={
        "type": "RDS",
        "rdsConfiguration": {
            "resourceArn": "arn:aws:rds:eu-west-1:111122223333:cluster:docs-kb",
            "credentialsSecretArn": "arn:aws:secretsmanager:...:secret:docs-kb-creds",
            "databaseName": "kb",
            "tableName": "doc_chunks",
            "fieldMapping": {
                "primaryKeyField": "chunk_id",
                "textField": "chunk_text",
                "metadataField": "meta",
                "vectorField": "embedding",
            },
        },
    },
)

Aurora Serverless v2 auto-scales between 0.5 and some ceiling; for this workload, 1 ACU peak is plenty. Cost ~$45/month at the 0.5-ACU floor with auto-pause; ~$90/month if always-on.

When Pinecone is the clearer pick. A team with no existing AWS database, where the ingestion pipeline is a Lambda pulling from a GitHub repo, and operational bandwidth is low: Pinecone’s serverless tier offers a faster path to production. Pay ~$50/month and forget about it. PrivateLink to keep traffic off the public internet.

When OpenSearch Serverless is the correct answer. A documentation platform serving multiple teams, with tens of millions of chunks, hundreds of QPS, and a need for hybrid lexical-plus-vector search. The $700/month floor disappears in the noise; the scalability ceiling is where this store’s value shows up.

When Redis Enterprise is the correct answer. A latency-sensitive pattern, a code-completion service where sub-10ms retrieval is non-negotiable, or a session-scoped RAG where the chunks are being read in-memory next to session state. The SaaS simplicity is a plus; the memory-first architecture is the actual differentiator.

When Neptune Analytics is the correct answer. Graph-RAG. The team’s domain is inherently relational (knowledge graph of entities and relationships), and retrieval needs to walk the graph in addition to doing vector similarity. Different shape of RAG; rare enough that most teams don’t need it, but when the shape fits, nothing else fits as well.

A worked setup

The internal-tools team picks Aurora pgvector. One-time setup takes an afternoon:

Provision Aurora Serverless v2 with the rds_data extension enabled (needed for Bedrock KB). Smallest ACU floor (0.5), cap at 4.
Install pgvector via CREATE EXTENSION vector;.
Create the table and HNSW index (SQL above).
Store credentials in Secrets Manager; grant the Bedrock KB service role rds:DescribeDBClusters and Secrets access.
Create the Knowledge Base via the SDK (above) or the Bedrock console.
Create a data source pointing at the S3 bucket holding the doc chunks; start the first ingestion job. Bedrock chunks the docs, calls Titan for embeddings, writes to Aurora.
Wait, ingestion for 8,000 docs / 40,000 chunks takes ~20-30 minutes (Titan rate-limited).
Test via RetrieveAndGenerate API. Verify retrieval returns sensible chunks; verify the end-to-end LLM answer cites them correctly.

Month-one bill: ~$50 (Aurora) + Bedrock Titan embedding calls (pennies) + Bedrock LLM calls (a few dollars at 2k queries/day × ~1500 input tokensTokenThe unit of text an LLM actually sees – usually a short character sequence, not a whole word. ). Compare against an OpenSearch Serverless alternative at ~$700/month minimum; the savings fund a year of the chatbot team’s coffee budget.

What’s worth remembering

Bedrock Knowledge Bases can use any of five back-ends. OpenSearch Serverless, Aurora pgvector, Pinecone, Redis Enterprise, Neptune Analytics. All work; all have different cost and operational shapes.
Cost floor dominates small workloads. OpenSearch Serverless’s ~$700/month floor is heavy for a 40k-chunk project; Aurora’s ~$45/month or Pinecone’s ~$50/month are proportional.
Scaling ceiling dominates large workloads. OpenSearch Serverless handles millions of chunks and hundreds of QPS without breaking a sweat; Aurora pgvector starts to strain in the tens-of-millions range.
Aurora pgvector wins on “vectors beside application data.” If the team already runs Postgres, the vectors live in the same cluster, joins work, transactions work.
Pinecone wins on “lowest operational touch.” SaaS, purpose-built, no tuning required.
Redis wins on retrieval latency. Sub-10ms is routine; useful when retrieval is on the critical path of sub-50ms responses.
Neptune wins on graph-RAG. Not the doc-chunk pattern; the entity-relationship pattern.
The pick is not permanent. Bedrock Knowledge Bases let you swap back-ends by creating a new KB and re-ingesting; the embeddings don’t care where they live.

“Where the vectors live” is not a technical question at 40,000 chunks. It’s a cost and operational-model question, pick the store whose floor and touch match the team’s scale today, knowing the pick can change if the scale does.

These posts are LLM-aided. Backbone, original writing, and structure by Craig. Research and editing by Craig + LLM. Proof-reading by Craig.