How pg_duckdb Brings Columnar Analytics Inside PostgreSQL

In the world of databases, developers and analysts often face a tension:

  • PostgreSQL shines at transactional workloads (OLTP): rapid inserts, updates, and complex multi-user business applications.
  • Columnar engines like DuckDB excel at analytics (OLAP): large scans, aggregations, data exploration, and reporting.

What if you could combine the strengths of both?

Enter pg_duckdb — a PostgreSQL extension that merges DuckDB’s columnar, vectorized analytics engine directly into PostgreSQL. It’s not just “another extension” — it fundamentally expands what PostgreSQL can do without moving data out of your database.

What Is pg_duckdb?

pg_duckdb is an open-source PostgreSQL extension that embeds the DuckDB analytics engine within PostgreSQL itself. Instead of exporting your data to another analytics database, you can run advanced analytical queries inside PostgreSQL using DuckDB’s high-performance execution.

In practical terms, this means:

  • Your OLTP data stays in PostgreSQL
  • You get DuckDB-style performance for analytics
  • You don’t need heavy ETL pipelines or separate data warehouses

Why It Matters

Traditionally, PostgreSQL stores data in a row-oriented format. That’s optimized for transactions — fetching one row at a time, updating frequently, and ensuring consistency. But analytical workloads — like scanning millions of rows, aggregating, or summarizing — benefit from columnar storage, where only relevant columns are read into memory. Columnar systems also compress data better and execute operations in vectorized batches.

DuckDB shines because of:

  • Columnar storage — efficient for analytical workloads
  • Vectorized execution — high CPU throughput
  • Minimal dependencies — lightweight and fast for embedded use cases

pg_duckdb brings these abilities into PostgreSQL. The result? Analytical SQL runs much faster — especially on large tables — without moving data out of your production database.

How pg_duckdb Actually Works

Unlike traditional PostgreSQL extensions that add helper functions or indexing structures, pg_duckdb embeds an entire analytics engine inside the PostgreSQL server process. When enabled, certain queries — especially large analytical scans — are automatically routed through DuckDB’s engine instead of PostgreSQL’s planner and executor.

Here’s the magic:

  1. DuckDB’s engine lives inside PostgreSQL — not as an external service.
  2. You can query PostgreSQL tables directly — with no data export.
  3. pg_duckdb decides when analytics queries should be executed by DuckDB.
  4. You get faster scans, aggregations, joins, and heavy SQL transformations.

From the official GitHub README:

pg_duckdb “integrates DuckDB’s columnar-vectorized analytics engine into PostgreSQL, enabling high-performance analytics and data-intensive applications.”

Querying PostgreSQL Data With DuckDB

Here’s the beauty: you do not need to change your existing SQL. You write standard SELECT statements, and if pg_duckdb is configured (e.g., with duckdb.force_execution = true), DuckDB’s engine will run the query.

For example:

SET duckdb.force_execution = true;
SELECT
    date_trunc('month', created_at) AS month,
    COUNT(*) AS user_count,
    SUM(amount) AS revenue
FROM
    orders
GROUP BY
    month;

Behind the scenes, DuckDB executes this using columnar internals, while PostgreSQL handles access to the tables. No CSV exports, no ETL — just SQL.

Real-World Use Cases

Here’s where pg_duckdb shines:

  • Large-scale Analytics

Aggregations over millions of rows become much faster than classic PostgreSQL plans.

  • Data Lakes

Query Parquet files or Iceberg tables directly — no ETL required.

  • Mixed Workloads

Run OLTP and OLAP from the same database instance.

  • Just-In-Time Analytics

Final stage reporting, dashboards, time-based aggregations.

Not a Replacement for PostgreSQL

It’s important to set expectations.

  • pg_duckdb accelerates analytical queries
  • It does not turn PostgreSQL into a full columnar database
  • It’s not replacing PostgreSQL indexes, WAL, or core transactional internals

The idea is to combine strengths, not replace one with another.

Setting Up pg_duckdb

Step 1 — Clone the Repository

git clone https://github.com/duckdb/pg_duckdb.git
cd pg_duckdb

Step 2 — Build and Install

make
make install

Ensure pg_config from your PostgreSQL installation is available in your PATH.

Step 3 — Enable in PostgreSQL

CREATE EXTENSION pg_duckdb;

To force DuckDB execution for compatible analytical queries:

SET duckdb.force_execution = true;

pg_duckdb is a game changer for analytics inside PostgreSQL. It’s not a buzzword — it’s a practical extension backed by DuckDB’s battle-tested vectorized engine. You keep your transactional database, and gain analytical power without building separate data pipelines.

If your workflows include:

  • Dashboarding
  • Reporting
  • Large aggregations
  • Data lake queries

Then pg_duckdb is absolutely worth experimenting with.

As analytics workloads continue to grow, having a tool that bridges OLTP and OLAP within a single PostgreSQL playground is powerful. And best of all — you do it with straight SQL.

whatsapp_icon
location

Calicut

Cybrosys Technologies Pvt. Ltd.
Neospace, Kinfra Techno Park
Kakkancherry, Calicut
Kerala, India - 673635

location

Kochi

Cybrosys Technologies Pvt. Ltd.
1st Floor, Thapasya Building,
Infopark, Kakkanad,
Kochi, India - 682030.

location

Bangalore

Cybrosys Techno Solutions
The Estate, 8th Floor,
Dickenson Road,
Bangalore, India - 560042

Send Us A Message