Overview of Buffer Access Strategies in PostgreSQL

PostgreSQL’s buffer manager is the unsung engine of database performance, tasked with keeping the most relevant data pages in memory (shared_buffers). However, standard Least Recently Used (LRU) or Clock-Sweep algorithms have a fatal weakness: massive sequential operations. Without intervention, a single full-table scan or a routine VACUUM could evict the entire working set of your database, causing severe cache thrashing and plunging live application performance.

To counteract this, PostgreSQL employs a mechanism called the Buffer Access Strategy (BAS). In this blog, we will discuss the internal C source code of PostgreSQL to understand how these strategies are defined, allocated, and enforced to protect memory integrity.

The Core Concept: Ring Buffers

Instead of allowing a bulk operation to flood the main shared_buffers pool, PostgreSQL assigns it a temporary, restricted memory area known as a "ring buffer." As the operation progresses, it allocates memory up to a predefined limit. Once that limit is reached, it wraps around and overwrites its own oldest buffers, completely bypassing the global buffer eviction logic.

This ensures that bulk operations only recycle a tiny, isolated footprint of memory, leaving the rest of the database cache entirely untouched.

Note: Ring buffer sizes are capped to prevent any single operation from consuming an outsized share of memory, but the cap mechanism differs per strategy:

  • BAS_BULKWRITE and BAS_VACUUM are capped at NBuffers / 8 (1/8th of shared_buffers) via a Min() call.
  • BAS_BULKREAD uses a different, more nuanced cap based on GetPinLimit() and the system's async I/O configuration - detailed in its section below.

On systems with small shared_buffers, these caps can silently reduce the effective ring size below its nominal value.

Strategy Types in the Architecture

The architecture defines four distinct access strategies in src/include/storage/bufmgr.h under the BufferAccessStrategyType enum. Each is engineered with specific memory allocations and behavioral rules tailored to different internal database operations.

/*
 * Possible arguments for GetAccessStrategy().
 * If adding a new BufferAccessStrategyType, also add a new IOContext so
 * IO statistics using this strategy are tracked.
 */
typedef enum BufferAccessStrategyType
{
BAS_NORMAL, /* Normal random access */
BAS_BULKREAD, /* Large read-only scan (hint bit updates are
* ok) */
BAS_BULKWRITE, /* Large multi-block write (e.g. COPY IN) */
BAS_VACUUM, /* VACUUM */
} BufferAccessStrategyType;

1. BAS_NORMAL (Standard Access)

This is the default operational state of PostgreSQL. When a strategy pointer is NULL or explicitly set to BAS_NORMAL, the engine assumes the query is executing standard transactional logic - like fetching a specific record via an index scan or performing a targeted update.

  • It entirely bypasses the ring buffer logic. Buffer requests are routed directly to the global shared_buffers pool using PostgreSQL's standard clock-sweep (LRU approximation) algorithm.
  • Pages loaded under this strategy are pinned, given a usage count, and allowed to persist in the main cache until the clock-sweep algorithm eventually evicts them due to inactivity.

Notably, GetAccessStrategy() returns NULL for BAS_NORMAL - there is no strategy object allocated at all:

		case BAS_NORMAL:
/* if someone asks for NORMAL, just give 'em a "default" object */
return NULL;

2. BAS_BULKREAD (Bulk Read Protection)

When PostgreSQL executes a massive sequential scan, it risks loading gigabytes of data that will likely only be read once, displacing hot transactional data. To prevent this cache poisoning, the engine allocates a BAS_BULKREAD strategy.

The query planner intelligently determines when to deploy this. As seen in heapam.c, if a sequential scan targets a relation whose physical size exceeds 1/4 of the total configured shared_buffers (i.e., rs_nblocks > NBuffers / 4), the engine automatically wraps the scan in a BAS_BULKREAD strategy.

/* From src/backend/access/heap/heapam.c, initscan() */
if (!RelationUsesLocalBuffers(scan->rs_rd) &&
    scan->rs_nblocks > NBuffers / 4)
{
    /* During a rescan, keep the previous strategy object. */
    if (scan->rs_strategy == NULL)
        scan->rs_strategy = GetAccessStrategy(BAS_BULKREAD);
}
  • Ring Size: The size is dynamic, Not a Fixed 256 KB: BAS_BULKREAD doesn't use a hardcoded ring size when compared to other methods. The actual sizing algorithm in src/backend/storage/buffer/freelist.c is:
		case BAS_BULKREAD:
{
int ring_max_kb;
ring_size_kb = 256;
ring_max_kb = GetPinLimit() * (BLCKSZ / 1024);
ring_max_kb = Max(ring_size_kb, ring_max_kb);
ring_size_kb += (BLCKSZ / 1024) *
io_combine_limit * effective_io_concurrency;
if (ring_size_kb > ring_max_kb)
ring_size_kb = ring_max_kb;
break;
}

The algorithm proceeds in four steps:

  • Step 1 - Start with a 256 KB base. This is the historical minimum and is sufficient for purely sequential single-stream reads on older storage.
  • Step 2 - Compute the cap from GetPinLimit(). Rather than applying the global NBuffers / 8 ceiling used by the other strategies, BAS_BULKREAD derives its upper bound from GetPinLimit() - a backend-level function that reflects how many buffers a single process is permitted to hold pinned at once. Multiplying that pin ceiling by BLCKSZ converts it into a kilobyte limit, which is then floored at 256 KB to guarantee a usable minimum even on very constrained systems.
  • Step 3 - Grow the ring for the async I/O pipeline. The ring is expanded by:

additional_kb = (BLCKSZ / 1024) × io_combine_limit × effective_io_concurrency

This accounts for PostgreSQL's async I/O infrastructure (introduced in earnest in PG17). With async I/O, multiple read requests can be in flight simultaneously; the ring must be large enough to hold all in-flight blocks across all concurrent I/O operations, otherwise the ring would evict pages that are still actively being read from disk.

With the standard 8 KB block size, BLCKSZ / 1024 = 8. So for example with io_combine_limit = 16 and effective_io_concurrency = 8:

additional_kb = 8 (BLCKSZ/1024) × 8 (effective_io_concurrency) × 16 (io_combine_limit)= 1024 KB

This makes the effective ring 256 + 1024 = 1280 KB before clamping.

  • Step 4 - Clamp to the cap. If the grown size exceeds ring_max_kb, it is clamped down.

The two GUCs that directly influence this calculation at runtime are io_combine_limit and effective_io_concurrency.

In practice, on modern hardware with async I/O enabled, the BAS_BULKREAD ring will be considerably larger than 256 KB - its exact size adapts to how aggressively your system is configured for parallel I/O.

3. BAS_BULKWRITE (Heavy Ingestion Handling)

Bulk ingestion commands like COPY FROM, CREATE TABLE AS, CREATE MATERIALIZED VIEW, REFRESH MATERIALIZED VIEW, and ALTER TABLE, generate a tremendous volume of dirty (modified) pages. If written to the main cache, they would force massive asynchronous write-backs, causing I/O bottlenecks.

BAS_BULKWRITE allocates a nominal ring buffer of 16 MB, subject to the global NBuffers / 8 cap. The case stores the size in kilobytes - the conversion to buffer blocks happens in a single place after the switch statement, not inside each case:

case BAS_BULKWRITE:
           ring_size_kb = 16 * 1024;
           break;

This 16 MB ring acts as a shock absorber. It allows the database to batch writes and generate Write-Ahead Logs (WAL) efficiently. When the ring fills up and the internal pointer loops back around to the oldest buffer, the buffer manager forcibly flushes that dirty page to disk before allowing it to be overwritten. This localized flushing prevents the global background writer from being overwhelmed.

4. BAS_VACUUM (Background Maintenance Isolation)

PostgreSQL's Multi-Version Concurrency Control (MVCC) requires frequent garbage collection to remove dead tuples. This process (VACUUM and ANALYZE) must scan entire tables, making it a prime candidate for cache disruption if left unchecked.

       case BAS_VACUUM:
           ring_size_kb = 2048;
           break;

The value 2048 represents 2 MB in kilobytes - hardcoded as a literal in this switch case, with the conversion to buffer blocks applied after the switch statement. The vacuum_buffer_usage_limit GUC is resolved by the caller before GetAccessStrategy() is invoked; by the time execution reaches this case, the 2 MB value is what gets used as the default. Vacuum_buffer_usage_limit GUC or a per-command BUFFER_USAGE_LIMIT option on VACUUM and ANALYSE statements can be used to modify this default starting with PSQL 16, as given below.

-- Set the vacuum ring size at the session level
SET vacuum_buffer_usage_limit = '4MB';
-- Or per-command
VACUUM (BUFFER_USAGE_LIMIT '8MB') my_large_table;

The vacuum process reads pages into this ring, inspects them, and modifies them if dead tuples are removed (dirtying the page). By trapping the vacuum process inside this limited memory area, PostgreSQL ensures that heavy background maintenance never starves live application queries of cache space. The vacuum process is forced to clean up after itself, flushing its own modified pages to disk as it cycles through the ring.

Larger values of vacuum_buffer_usage_limit allow VACUUM to run faster (fewer repeated reads of evicted pages), but at the cost of displacing more potentially useful pages from shared_buffers.

The Buffer Access Strategy is a masterclass in workload isolation. By understanding the interplay between the memory manager (freelist.c, bufmgr.c) and the access methods (heapam.c), database engineers can better appreciate why PostgreSQL remains highly performant even under the extreme stress of analytical scans or heavy background maintenance. For developers building specialized database forks or optimizing enterprise instances, tweaking these ring buffer thresholds in the source code remains a highly effective method for tuning high-throughput workloads.

whatsapp_icon
location

Calicut

Cybrosys Technologies Pvt. Ltd.
Neospace, Kinfra Techno Park
Kakkancherry, Calicut
Kerala, India - 673635

location

Kochi

Cybrosys Technologies Pvt. Ltd.
1st Floor, Thapasya Building,
Infopark, Kakkanad,
Kochi, India - 682030.

location

Bangalore

Cybrosys Techno Solutions
The Estate, 8th Floor,
Dickenson Road,
Bangalore, India - 560042

Send Us A Message