technology

Vector Databases Explained: When You Actually Need One (And When pgvector Is Enough)

FinTekCafe

6 May 2026 11 min read

Vector Databases Explained: When You Actually Need One (And When pgvector Is Enough)

Vector databases are the most over-purchased AI infrastructure category of 2024 and 2025. Across the enterprise AI investment cycle, organizations have provisioned dedicated vector database services for workloads that PostgreSQL with the pgvector extension could handle at a fraction of the cost and operational complexity. The result is a category of infrastructure spend with no measurable return, maintained by engineering teams that cannot articulate why they chose a dedicated vector store over the database they already operate.

This is not an argument against vector databases. For the workloads they are designed for, at the scale those workloads actually require specialized infrastructure, dedicated vector stores offer genuine advantages in retrieval speed, filtering performance, and horizontal scalability. The problem is that most organizations have not reached that scale and will not reach it within the planning horizon of their current AI initiatives.

Executives approving AI infrastructure budgets need to understand exactly what problem vector databases solve, at what scale pgvector becomes insufficient, what the total cost of ownership looks like across different operational models, and how to make the buy-vs-build-vs-extend decision without relying on vendor claims.

What Problem Vector Databases Actually Solve

A vector database stores and retrieves high-dimensional numeric representations of content, called embeddings. An embedding is a list of numbers, typically between 768 and 3,072 values, produced by an embedding model from a piece of text, an image, audio clip, or structured data record. Vectors that represent semantically similar content cluster near each other in the high-dimensional space. A vector database finds the nearest neighbors to a query vector efficiently, without requiring exact string or keyword matching.

This capability is fundamental to three production AI workload patterns.

Retrieval-augmented generation (RAG). When a large language model answers questions based on a proprietary document corpus, it cannot hold thousands of documents in its context window simultaneously. The retrieval step finds the most semantically relevant documents to include in the context for each query. A vector database handles this retrieval at scale, returning relevant chunks even when the user's question uses different terminology than the source documents.

Semantic search. Unlike keyword search, which requires approximate string matches, semantic search returns results based on meaning. A search for "subscription cancellation steps" should return relevant account management content even if that content never contains the phrase "cancellation steps." Semantic search is where vector similarity search first found product-market fit, particularly in enterprise search and customer support knowledge base applications.

Recommendation systems. Product recommendations, content recommendations, and similar features work by finding items whose embedding vectors are similar to items a user has engaged with previously. Collaborative filtering approaches, which dominated the field before the embedding era, are increasingly being replaced or augmented by vector similarity approaches.

All three workloads require a system that can execute approximate nearest-neighbor (ANN) search efficiently across a corpus of vectors. PostgreSQL with pgvector can execute this search. The architectural question is at what scale the tradeoffs shift decisively toward a specialized store.

When pgvector Is Enough

pgvector is a PostgreSQL extension that adds vector storage and similarity search to a standard relational database. For most organizations building their first AI applications, and a substantial share of organizations in full production deployment, it is the correct infrastructure choice.

The case for pgvector rests on four concrete advantages.

Operational simplicity. pgvector runs inside the PostgreSQL instance the engineering team already operates. There is no additional service to provision, monitor, scale, debug, or integrate with the existing authentication and authorization model. The operational complexity of the AI stack does not increase when pgvector is adopted.

Transactional consistency. Vectors and the metadata they describe live in the same database. An update to a document can atomically update both the relational record and the corresponding embedding vector. There are no consistency windows or eventual-consistency behaviors to reason about between separate services.

Hybrid queries without cross-service complexity. Filtering vector search results by metadata, such as returning only documents belonging to a specific user, within a specific date range, or tagged with a specific category, is a standard SQL WHERE clause. Hybrid queries that combine vector similarity with structured filters are natural in SQL.

Cost. pgvector adds no marginal cost beyond the PostgreSQL instance the organization already runs. For organizations on managed PostgreSQL services on major cloud providers, enabling pgvector is a configuration change, not a new budget line.

For a broader view of how vector storage fits into the data platform, see Data Engineering Explained and What Is a Data Lakehouse.

Where pgvector Breaks

pgvector's performance degrades predictably as corpus size grows and query patterns become more complex. Engineering teams commonly report that query latency becomes difficult to manage at corpus sizes approaching 10 million vectors when combined with high filter selectivity and heavy concurrent query loads. Below that threshold, with HNSW indexing properly configured and appropriate database instance sizing, pgvector latency is acceptable for most applications.

The specific failure modes are distinct.

Index size and memory pressure. HNSW indexes, which deliver the best query latency in pgvector, must reside in memory for optimal performance. At very large corpus sizes, the index alone can exceed available memory on a standard database instance. When the index spills to disk, query latency collapses by an order of magnitude or more.

Pre-filtered ANN search. Executing an approximate nearest-neighbor search and then filtering by metadata is not the same as executing a filtered nearest-neighbor search directly. pgvector performs ANN search first and filters afterward. When filters are highly selective, eliminating most of the ANN results, the returned result set may be empty or may not contain the most relevant documents for the user's actual query. This recall degradation is a fundamental architectural constraint, not a configuration problem.

Concurrent write throughput. Corpus updates require rebuilding parts of the vector index. At high update rates, such as an application that continuously ingests new documents, index updates create contention that degrades both write and read latency on the shared PostgreSQL instance.

Multi-tenancy at scale. In SaaS applications where each customer has a distinct document corpus, a single pgvector instance requires careful schema design to isolate customer vectors and prevent cross-tenant result contamination. Purpose-built vector stores often have native namespace isolation that simplifies this architecture.

The Purpose-Built Vector Store Landscape

When a team's workload has clearly exceeded pgvector's practical limits, several purpose-built options are available across a spectrum of operational models. The five most common choices in 2026 differ on operational model, pre-filtering support, scale ceiling, and cost shape:

Vector store	Operational model	Pre-filtered ANN	Practical scale	Cost shape
pgvector	Embedded in PostgreSQL (existing infra)	Post-filter only	Up to roughly 10M vectors with tuning	Marginal cost on existing Postgres
Pinecone	Fully managed SaaS	Yes	100M-plus vectors	Always-on reserved capacity, premium pricing
Weaviate	Managed or self-hosted	Yes	100M-plus vectors	Lower at scale self-hosted; cloud tier costs comparable to Pinecone
Qdrant	Self-hosted (Rust) or managed	Yes (payload filters at index level)	100M-plus vectors per node	Low at moderate scale; cluster ops add overhead
Milvus	Distributed self-hosted, optional Zilliz Cloud	Yes	Multi-billion-vector scale	Highest operational complexity; cheapest per-vector at very large corpora
Turbopuffer	Object-storage-backed managed	Yes	100M-plus vectors	Pay per request and per stored byte; lowest idle cost

The cells that matter most when deciding are pre-filtering support and cost shape. pgvector loses the pre-filter axis; the always-on managed services lose on idle cost; Milvus wins on raw scale but costs the most in engineering time.

Pinecone is the most widely deployed managed vector database service. It handles all infrastructure operations and exposes a straightforward API. Pinecone's operational simplicity makes it the lowest-friction option for teams that have outgrown pgvector and want to minimize infrastructure engineering effort. The trade-off is cost: managed services at scale carry a significant premium over self-hosted alternatives, and the always-on reserved capacity model can create substantial idle spend.

Weaviate is an open-source vector database deployable on managed cloud (Weaviate Cloud Services) or self-hosted. It supports multi-modal vectors covering text, images, and audio, and has strong support for hybrid search combining vector similarity with BM25 keyword ranking. Well suited to organizations with infrastructure engineering capacity who want to avoid managed service cost at scale.

Qdrant is an open-source vector database written in Rust, built for high performance and memory efficiency. It supports sophisticated payload filtering at the index level, directly addressing the pre-filtering limitation of pgvector. Popular with engineering teams that prioritize raw query performance at moderate corpus sizes on their own infrastructure.

Milvus is an open-source distributed vector database designed for very large-scale deployments, originally developed at Zilliz and donated to the Linux Foundation AI. It supports horizontal scaling across many nodes and is designed to handle multi-billion-vector workloads. Best suited to large enterprises with dedicated data infrastructure teams and corpus sizes that exceed what single-node systems can serve.

Turbopuffer is a newer entrant built around cloud object storage (S3-compatible backends). By storing vectors in object storage and using local caching for frequently accessed data, it achieves lower cost at scale than services that require always-on compute capacity for the full corpus. Well suited to workloads where query patterns are uneven across the corpus and idle capacity costs are a primary concern.

For context on how vector infrastructure fits into a broader architectural decision, see Microservices vs. Monoliths and Cloud Computing Explained.

Hybrid Search: The Architecture Decision That Matters More Than Vendor Choice

Most RAG applications and semantic search implementations in production require both vector similarity and structured filtering. "Find documents semantically relevant to this query that belong to workspace ID 7890 and were modified after March 1, 2025" is a hybrid query: part vector similarity, part relational filter.

Two architectures handle hybrid queries, with meaningfully different recall characteristics.

Post-filtering. Execute ANN search first across the full corpus, then apply metadata filters to the result set. This is how pgvector works and how naive implementations of any vector store operate. When the filter is moderately selective, post-filtering is acceptable. When the filter is highly selective, eliminating 95 percent of ANN results, the returned set may be empty even though highly relevant documents satisfying the filter criteria exist in the corpus. The ANN search never reached them because it prioritized similarity before scope.

Pre-filtering (filtered ANN). Apply metadata filters first to identify the eligible subset of documents, then run ANN search within that subset. This delivers better recall when filters are selective, but requires index structures that support filtered traversal without full corpus scans. Qdrant, Weaviate, Milvus, and the major managed services all support pre-filtering. pgvector does not.

For teams whose primary use case involves heavy per-query filtering, such as a RAG system where every query is scoped to a specific customer's private documents, the pre-filtering capability of purpose-built stores is a real technical advantage over pgvector, independent of scale.

The Cost Trap of Always-On Managed Services

The total cost of ownership comparison between pgvector and managed vector stores is frequently distorted by a pattern common in initial deployments: organizations over-provision managed vector services because those services cannot scale to zero.

Managed vector database services charge for reserved capacity, a minimum index size and query throughput tier, regardless of whether queries are actually executing. An enterprise that provisions a managed vector service for a RAG application used primarily during business hours is paying for 24 hours of daily capacity. At low-to-moderate traffic volumes, this idle capacity cost is the dominant line item in the database infrastructure budget for that workload.

For related context on controlling AI infrastructure spend, see Executive Guide to Cloud Costs.

A practical TCO framework at three scale tiers illustrates the tradeoffs.

Tier 1: Under 1 million vectors, moderate query volume. pgvector on an existing managed PostgreSQL instance adds essentially zero marginal cost. A dedicated managed vector service adds hundreds to potentially thousands of dollars per month in reserved capacity fees, plus engineering time for integration and ongoing maintenance. The correct choice at this tier is almost always pgvector, with a documented migration plan for when corpus growth triggers a reassessment.

Tier 2: 1 million to 50 million vectors, moderate query volume. pgvector remains viable with careful HNSW index tuning and appropriately sized database instances. A self-hosted Qdrant or Weaviate deployment may offer better filtering performance at comparable or lower cost if the team has infrastructure capacity to operate it. Managed services become cost-competitive primarily when engineering time is the tighter constraint, not cloud spend.

Tier 3: Over 100 million vectors or high concurrent query volume. pgvector is not the right tool at this scale. The choice shifts to self-hosted Milvus or Weaviate for cost efficiency, or managed Pinecone or Weaviate Cloud for operational simplicity. Which path is correct depends on whether infrastructure engineering capacity or cloud spend is the binding constraint for the specific team.

A Decision Flowchart

Work through the following questions in sequence to identify the correct architecture.

Question 1. Is the use case one of semantic search, RAG retrieval, or vector-based recommendations? If no to all three, vector infrastructure of any kind is likely unnecessary.

Question 2. Does the corpus size exceed 1 million vectors, or does the application require pre-filtered ANN search with high filter selectivity? If no to both, start with pgvector.

Question 3. Is the team already operating PostgreSQL? If yes, the marginal cost of pgvector is minimal. If no, the operational cost of adding pgvector is higher and a lightweight managed option warrants consideration.

Question 4. Does the workload require scaling beyond 10 million vectors with high filter selectivity, or more than several hundred concurrent queries per second with strict latency requirements? If no, properly tuned pgvector on appropriately sized hardware is likely sufficient.

Question 5. Is the primary constraint engineering time or cloud spend? If engineering time is the binding constraint, managed services reduce operational overhead at the cost of higher ongoing spend. If cloud spend is the binding constraint, self-hosted open-source solutions are more cost-effective at scale.

Key Takeaways

Vector databases solve semantic similarity search at scale; they are not general-purpose AI infrastructure and should not be the default starting point for new AI projects.
pgvector is the correct starting point for most teams: it runs inside existing PostgreSQL, adds zero marginal cost, and handles workloads up to roughly 10 million vectors with appropriate tuning and indexing.
pgvector's practical limits are pre-filtered ANN search with high filter selectivity, very large corpus sizes that exceed available memory for HNSW indexes, and high concurrent write throughput.
Purpose-built vector stores differ primarily on operational model (managed vs. self-hosted), pre-filtering support, horizontal scalability, and cost structure at scale.
The cost trap in managed vector services is always-on reserved capacity charges that apply regardless of actual utilization.
The buy-vs-build-vs-extend decision turns on corpus scale, filtering requirements, and whether engineering time or cloud spend is the tighter constraint.

Frequently Asked Questions

Should every RAG application use a vector database?

No. Most enterprise RAG applications handling fewer than 1 million documents should start with pgvector. A purpose-built vector store only makes sense when the workload's scale or filtering requirements have clearly exceeded what pgvector can handle with reasonable index tuning and hardware sizing.

What is an embedding, and why does it need a specialized database?

An embedding is a list of numbers, typically hundreds or thousands of values, that represents the semantic meaning of a piece of content. Standard relational databases are not designed for the similarity computations required to find semantically related embeddings efficiently at scale. pgvector adds this capability to PostgreSQL; purpose-built vector stores build their entire architecture around it.

Can a vector database replace a search engine like Elasticsearch or Solr?

Not in most cases. Inverted-index systems such as Elasticsearch are superior for exact keyword matching, phrase search, field-boosted ranking, and faceted navigation. Vector stores are superior for semantic similarity. Hybrid architectures that combine both approaches typically outperform either alone for general search workloads, and many production implementations run both systems in parallel.

How do vector databases handle updates when source documents change?

Updated documents require new embeddings, which must replace the old vectors in the store. This requires coordination between the document ingestion pipeline and the vector store's update process. Purpose-built stores generally handle high-frequency updates more gracefully than pgvector at scale, but all vector stores face the same fundamental challenge: re-embedding and re-indexing at high update rates is computationally expensive.

What should a CTO ask before approving a vector database purchase in 2026?

Three questions: What is the current and projected vector count for this use case? Has the team validated that pgvector is insufficient at current scale with appropriate tuning? What is the total cost of ownership including idle capacity fees, and how does it compare to pgvector on a rightsized PostgreSQL instance?

Vector Databases Explained: When You Actually Need One (And When pgvector Is Enough)

What Problem Vector Databases Actually Solve

When pgvector Is Enough

Where pgvector Breaks

The Purpose-Built Vector Store Landscape

Hybrid Search: The Architecture Decision That Matters More Than Vendor Choice

The Cost Trap of Always-On Managed Services

A Decision Flowchart

Key Takeaways

Frequently Asked Questions

Should every RAG application use a vector database?

What is an embedding, and why does it need a specialized database?

Can a vector database replace a search engine like Elasticsearch or Solr?

How do vector databases handle updates when source documents change?

What should a CTO ask before approving a vector database purchase in 2026?

Related Articles

Service Mesh in 2026: Istio vs Linkerd vs Cilium for Financial Workloads

What Is Retrieval-Augmented Generation (RAG)? Architecture, Costs, and Where It Breaks

Edge Computing for Financial Services: When Low Latency Actually Justifies the Premium