Data pages are the backbone of how a Pega application reads, caches, and reuses data, yet they are also one of the most common sources of slow interactions, runaway database load, and confusing stale-data bugs. When a data page is misconfigured, the symptoms rarely point at the data page itself — you see slow screens, high requestor counts, or a connector that hammers a downstream system. This guide walks through how scope, load management, refresh strategy, and node-level caching actually behave, and how to diagnose and fix the bottlenecks that surface in real Pega Infinity applications.
How data page scope drives caching behavior
The single most important property on a data page is its scope, because scope determines where the cached page lives and how long it survives. Choosing the wrong scope is the root cause of a surprising number of both performance and correctness problems.
- Thread scope — the page is cached for the life of the current thread (roughly, the current case or work item context). A new page is loaded per thread, so two users — or even two cases for the same user — never share it. Use this for data that is specific to the work being done right now.
- Requestor scope — the page is cached for the entire user session across threads. It survives navigation between cases. Use this for per-user reference data that does not change during a session, such as the operator's preferences or their list of work queues.
- Node scope — the page is cached once per JVM node and shared across every requestor on that node. This is the most powerful cache for performance because dozens or hundreds of users reuse a single load, but it is also the most dangerous for staleness because invalidation must be coordinated.
The mental model that prevents most mistakes: the broader the scope, the fewer the loads but the harder the invalidation. A node-scoped page may load once an hour and serve thousands of reads, but if the underlying data changes, every node holds its own stale copy until the cache expires or is explicitly cleared.
| Scope | Cache lifetime | Shared across | Best for | Main risk |
|---|---|---|---|---|
| Thread | Current thread/case | Nothing | Case-specific working data | Reloads too often if reused widely |
| Requestor | User session | Threads of one user | Per-user reference data | Memory per session |
| Node | Until expiry/invalidation | All requestors on node | Read-mostly shared reference data | Cluster-wide staleness |
Load management: declarative reference vs. explicit load
There are two fundamentally different ways a data page enters memory, and mixing them up leads to subtle bugs.
Declarative page reference is the lazy, automatic path. When a UI control, a property reference, or an expression touches D_CustomerList, Pega loads it on first access and then serves the cached copy. This is the default and the right choice for most read paths because the platform handles the lifecycle for you.
Explicit load uses the Load-DataPage method (or the pxRetrieveDataPage / pxLoadDataPage APIs) inside an activity or data transform. You reach for this when you need precise control — for example, to force a synchronous load before a calculation, to load into a specific target page, or to pass parameters computed at runtime.
// Activity step: explicit synchronous load into a named page
Load-DataPage
DataPage: D_CustomerByID
PageName: Customer
// Parameters tab
CustomerID = .CustomerID
// Equivalent Java-callable API when you need it in a function or utility
ClipboardPage cust = tools.getDataPage("D_CustomerByID", paramPage);
A common anti-pattern is calling Load-DataPage in a loop. If you find yourself iterating a list and loading a single-record data page per iteration, you have created an N+1 access pattern that will dominate your response time. The fix is almost always a keyed (indexed) list data page that you load once and then look up in memory.
Choosing the right data source
A data page's source defines where the data physically comes from, and each source has a different performance profile:
- Report definition — the most common and usually the fastest for Pega-managed tables, because it pushes filtering, paging, and column selection down into SQL. Always prefer a report definition over an activity for relational reads.
- Connector — REST, SOAP, or other integration. Network latency dominates, so caching scope and retry/timeout settings matter enormously.
- Activity — maximum flexibility, minimum guardrails. Easy to write inefficient procedural data access. Reserve for genuinely complex orchestration.
- Data transform — for synthesizing or shaping data already in memory, not for fetching.
- Lookup — a single instance fetched by key; lightweight and direct.
For sourced lists, use a response data transform to map the raw source structure (especially a connector response) into your application's page-list shape. Keeping the mapping in a response data transform — rather than baking it into the connector or a post-activity — keeps the data page declarative and testable.
Refresh strategy: the difference between fast and stale
The refresh settings on a data page decide whether you re-hit the source or serve the cache. Getting this right is the line between a snappy screen and either a slow one or a stale one.
- Reload once per interaction — re-evaluates the page at most once per user interaction (per HTTP request batch). Good default for data that should be fresh on each screen submit but not re-fetched multiple times within the same interaction.
- Do not reload when — keeps the cached page as long as a
whenrule evaluates true. This is the workhorse for conditional refresh: cache aggressively, but invalidate when a meaningful condition changes (for example, the selected account ID differs from the cached one). - Reload if older than — a time-based expiry, essential for node-scoped reference data that changes slowly.
Data page: D_AccountSummary (Thread scope)
Refresh strategy:
[x] Do not reload when: AccountUnchanged
where when rule AccountUnchanged =
Param.AccountID == pyDataPageParameters.AccountID
[ ] Reload once per interaction
The most frequent refresh mistake is leaving a node-scoped reference page on a too-aggressive reload setting, so it reloads far more often than the data actually changes. The second most frequent is the opposite — a thread page that never reloads when it should, producing "I updated the record but the screen still shows the old value" tickets.
Node-level pages and cluster cache invalidation
Node-scoped pages deserve special attention because the cache is per-JVM and Pega runs as a cluster. When data behind a node page changes, only the node that performed the write knows immediately; the other nodes keep serving their own cached copies until expiry.
For data that changes during the day, do not rely solely on a long time-based expiry. Combine a sensible Reload if older than window with an explicit invalidation when you know the data changed. The platform exposes APIs to invalidate data pages, and you can invoke them after a successful commit so the next access reloads:
// After committing a change that affects a node-scoped reference page,
// invalidate it so the next read reloads fresh data.
DataPageInvalidation invalidateAPI = ... ; // pxInvalidateCachedPages family
invalidateAPI.invalidate("D_ProductCatalog");
// In a cluster, invalidation must reach every node — verify your
// platform version's behavior and use a node-aware approach so all
// JVMs drop the stale copy, not just the one that ran the activity.
The practical rule: node scope is for read-mostly data with a clear invalidation trigger or an acceptable staleness window. If you cannot tolerate cross-node staleness and cannot reliably invalidate every node, drop to requestor or thread scope and accept more loads.
Parameterized and keyed data pages
Two features dramatically improve cache hit rates when used correctly:
- Parameterized data pages cache a separate instance per unique parameter combination.
D_CustomerByIDwithCustomerID=42andCustomerID=99are two cached pages. This is exactly what you want for single-record lookups — but be aware that a high-cardinality parameter (like a free-text search term) can balloon the cache. - Keyed (indexed) data pages load a list once and then let you retrieve a single item by key in memory. This is the canonical fix for the N+1 loop problem: load
D_AllActiveCustomersonce, then look up byCustomerIDwithout another source hit.
<!-- Keyed access pattern (conceptual): one load, many in-memory lookups -->
<DataPage name="D_AllActiveCustomers" structure="List" scope="Node">
<Keys>
<Key>CustomerID</Key>
</Keys>
</DataPage>
<!-- Retrieve a single record by key without re-hitting the source -->
<!-- D_AllActiveCustomers[CustomerID:42] -->
Common bottlenecks and how to spot them
Most data page performance problems fall into a handful of recurring shapes:
- Large unbounded list pages that pull entire tables into memory with no
max recordscap. - Connector calls inside loops (N+1) — one network round trip per item instead of one batched call.
- Unfiltered report definitions that select all columns and all rows, then filter in the UI.
- Reloading node pages too often, throwing away the very benefit node scope provides.
- Over-parameterized pages that defeat caching with high-cardinality keys.
Diagnosing with the right tools
You cannot tune what you cannot measure. Use these in combination:
- PAL (Performance Analyzer) — take a reading before and after the slow interaction. Watch
DataPage Loads, elapsed time, and DB count deltas. A spike in load count points straight at a refresh-strategy or scope problem. - Tracer — enable the Data Page event and filter to your page. You will see whether it loaded from source or served the cache, and how long the source took.
- DB Trace — captures the actual SQL. This is where N+1 patterns and unfiltered
SELECT *queries become obvious; you will see the same query repeated dozens of times. - Data Pages landing page (Records Explorer / Application landing pages) — shows configured data pages, their scope, and runtime statistics so you can spot the heavy hitters across the application.
A reliable workflow: reproduce the slow interaction with PAL running, read the DataPage Load count and DB count, then confirm the root cause in DB Trace before changing anything.
Concrete fixes
Once you have located the offending page, the remedies are straightforward:
- Prune columns — in the report definition, select only the fields the UI actually uses.
- Paginate — set a
max recordscap and use server-side paging for lists, never load-everything-then-scroll. - Filter server-side — push every filter into the report definition or connector request, not the clipboard.
- Defer or async-load — for data not needed on first paint, load it asynchronously or on demand so it never blocks the interaction.
- Right-size scope and caching — move read-mostly reference data to node scope with a sane expiry; tighten thread pages that reload needlessly.
- Replace N+1 with a keyed list — load once, look up by key in memory.
If you want a structured way to build these diagnostic and tuning skills with real cases, our Pega mentorship program and hands-on Pega training walk through PAL and DB Trace investigations end to end.
Key takeaways
- Scope decides caching. Thread, requestor, and node trade fewer loads for harder invalidation — pick deliberately.
- Node scope is per-JVM. Plan cluster-wide invalidation or accept a staleness window; never assume one write clears every node.
- Refresh strategy is correctness, not just speed. Use
Do not reload whenfor conditional refresh and time-based expiry for slow-changing reference data. - Kill N+1 with keyed data pages. Load a list once, look up by key in memory instead of loading per item.
- Measure first. PAL for load counts, Tracer for cache-vs-source, DB Trace for the actual SQL, and the Data Pages landing page for the application-wide view.
- Fix at the source. Prune columns, paginate, filter server-side, and defer non-critical loads.
Struggling with a specific slow screen or a stubborn stale-cache bug? Bring your Tracer and PAL output to a focused working session — reach out through our contact page or explore one-on-one Pega mentorship to debug it together and build the tuning instincts that prevent the next one.