Materialized Views: Precompute with Postgres

Materialized views represent the precomputed, stored results of an SQL query based on a single table or multiple tables. They are separate database objects that occupy their own storage space in a relational database management system (RDBMS) such as PostgreSQL.

Materialized views can speed up query performance in analytics by storing precomputed results of complex queries, eliminating expensive computations on the underlying tables for faster response times. They make it easier to get insights from data because the subsequent reporting queries can run directly on these resultant views.

In this article, you'll learn about materialized views, how they differ from regular views, and some best practices for using them.

Star our GitHub repo to support current and future open source contributions!

How Do Materialized Views Differ from Regular Views?

An SQL view, whether regular or materialized, allows you to see a specific subset of data from one or more tables. This subset is defined by a SELECT query.

Though they both represent the results of a SELECT query, there are key differences between them in terms of data storage and freshness, maintenance overhead, and performance.

Data Storage and Freshness

Regular views are virtual tables defined by SELECT statements. These views do not store data themselves and are dynamic. This means that they always reflect the most up-to-date data from the underlying tables. The database engine parses any queries executed on a regular view and converts them into more intricate queries, involving the view's definition query, that operate on the underlying tables.

On the other hand, a materialized view stores the precomputed results in a separate physical object in PostgreSQL. Any queries on materialized views run directly without any parsing or conversion, making them much faster. However, these views must be refreshed to get up-to-date data from the underlying tables.

Maintenance Overhead

Once created, regular views do not need any maintenance because the data they reflect is always up-to-date. Materialized views, however, need periodic maintenance and refreshing to ensure that their data is up-to-date.

Performance

Regular views are slower than materialized views for complex queries that fetch a large amount of data. This is because the PostgreSQL engine needs to parse and convert every time a query is run on a regular view. Materialized views are faster because queries run directly on them, just as they would on the physical tables in PostgreSQL.

The table below summarizes the differences between these two views:

Aspect	Regular view	Materialized view
Data storage	Not required	Required
Data freshness	Always up-to-date data	Stale data possible
Maintenance	Not required	Required
Performance	Slower	Faster

How to Use Materialized Views

Despite their storage and maintenance requirements, materialized views are often preferred for their performance alone. The following sections explain how they are created, refreshed, and deleted and provide some tips for using them effectively.

CREATING A MATERIALIZED VIEW

A materialized view is created in PostgreSQL with a SELECT statement that may use one or multiple tables, as shown in the example below:

CREATE MATERIALIZED VIEW IF NOT EXISTS mat_view AS
    SELECT col1, col2, SUM(col3) AS "Total"
    FROM table_xyz
    GROUP BY col1, col2
    WITH DATA;

In this example, a materialized view named mat_view is created based on a query that fetches data from the table_xyz table. The aggregate function SUM() is used to calculate the total value for col3.

The parameter IF NOT EXISTS prevents errors from occurring when creating a materialized view with the same name (although the definition may differ) as an existing one. Instead of throwing an error, PostgreSQL will display a notice and skip the creation of the view. This is useful when these CREATE statements are executed programmatically.

The clause WITH DATA ensures that the materialized view is populated with the SQL query results when it's created. You can omit this if you only want to define the materialized view and refresh it later. However, an unpopulated materialized view cannot be queried unless you use REFRESH MATERIALIZED VIEW.

Once a materialized view is created, you can query it like any other PostgreSQL table. Similarly, you can also alter a materialized view if needed.

REFRESHING A MATERIALIZED VIEW

A materialized view must be refreshed to fetch the most up-to-date results of the SELECT query used in its definition. PostgreSQL allows you (as the owner) to update the data in your materialized views to reflect the changes in the underlying tables using the following command:

REFRESH MATERIALIZED VIEW mat_view;

REFRESH MATERIALIZED VIEW mat_view;

By default, PostgreSQL locks the materialized view when it's being refreshed, which prevents other queries from accessing it during the refresh process. To avoid this behavior, you can use the CONCURRENTLY option to ensure that other queries can access the materialized view while it's being refreshed:

REFRESH MATERIALIZED VIEW CONCURRENTLY mat_view;

‍
Note that to use this option, a unique column index must be present in the materialized view. Moreover, the CONCURRENTLY option can lead to a slower refresh process for a large number of rows. This is because it creates new data in a temporary tablespace and needs to carry out a comparison with the older data using a diff operation.

In both cases, all the data in the materialized view is replaced.

DELETING A MATERIALIZED VIEW

A materialized view can be deleted when you no longer need it. However, you must be the owner of that materialized view in order to delete (drop) it. You can delete the view with the following command:

DROP MATERIALIZED VIEW IF EXISTS mat_view;

‍
Dropping a materialized view deletes the physical database object that stores the data for the view. Any data stored in that materialized view will be lost. However, it does not affect the underlying tables.

As with a database table, if the materialized view is referenced by any other objects, such as other materialized views, you need to use the CASCADE option to forcefully delete it and all of its dependent objects.

The IF EXISTS parameter can be used to prevent errors and instead receive a notice.

Getting the Most Out of Materialized Views

So, when should you typically use materialized views?

You can use them for frequently run, complex queries on large data sets that take a longer time to execute; these are typically expensive aggregation queries that run on specific groups of rows on multiple tables. A materialized view offers great performance improvements for such queries by storing their precomputed results in a physical database object. Materialized views are also helpful when the data is relatively static since the refresh overheads to update their precomputed results are negligible. Lastly, they are useful when only specific data needs to be exposed via views for reports or dashboards.

The following pro tips can help you get the most out of materialized views.

ADD APPROPRIATE INDEXES

Unlike regular views, materialized views are actual physical objects in the database, so you can add indexes to the materialized views to significantly boost query performance.

Consider these best practices:

Use relevant indexes: Only add indexes on the columns that are frequently used in the queries for your materialized views. Avoid excessive indexing because it adds storage overhead and makes it slower to refresh.
Use correct index types: PostgreSQL supports multiple types of indexes, including B-tree, hash, GiST, and GIN. The default B-tree indexes are best for range queries, while hash indexes are good for equality queries. You should choose the correct index types for your use case.
Add an index to the primary key of the materialized view: This helps to maintain consistency with the underlying tables. However, just like with the concurrent refresh mentioned earlier, you must ensure that the data for this column is always unique.
Use a composite or multicolumn index: A composite index is established on more than one column of a table. This can be useful when you anticipate that queries will frequently use multiple columns together in their WHERE or ORDER BY clauses. It results in fewer index lookups because the queries can efficiently join, filter, or sort multiple columns together.
Test and refine indexes: Lastly, you should test and fine-tune your indexes. Use EXPLAIN and ANALYZE to evaluate the query plans for your queries. Then, based on the results, adjust your indexes as needed to optimize performance.

Examples of Materialized Views

The examples in this section demonstrate how to effectively use materialized views with two simplified tables: product_details and product_sales. These tables are shown in the following entity-relationship diagram, and for the purposes of discussion, you can assume that they are populated with relevant data:

Entity-relationship diagram for product_details and product_sales

The following SQL snippet creates a materialized view on these two tables:

CREATE MATERIALIZED VIEW IF NOT EXISTS starbooks_sales_reports AS
  SELECT
    sales.pid AS id,
    prod.name AS product_name
    SUM(quantity) as total_quantity,
    SUM(sales.amount) as product_sales
  FROM product_details prod
  INNER JOIN product_sales sales ON prod.pid = sales.pid
  WHERE
    merchant_id = 'starbooks'
    AND timestamp >= CURRENT_DATE - interval '1' day
  GROUP BY
    sales.pid, prod.name;

The next snippet then adds a unique index to this materialized view:

CREATE UNIQUE INDEX IF NOT EXISTS mat_view_pid
  ON starbooks_sales_reports ("ID");

This materialized view, starbooks_sales_reports, presents sales data categorized by product for a specific merchant named starbooks. Its SELECT query is based on two tables with an inner join on pid. The timestamp >= CURRENT_DATE - interval '1' day condition ensures that the view gets results for the previous day whenever it is refreshed.

This view is built on an aggregation query, SUM(), with the GROUP BY clause. As mentioned, aggregation queries are expensive and take longer to execute. Having the results stored in a materialized view ensures that such expensive operations are not done repeatedly, and analytics queries can run directly on this materialized view. The data for other merchants from the underlying tables is also not exposed in this view. Similar views can be created for other merchants as required.

The following query returns the top five products with the highest sales:

SELECT * FROM starbooks_sales_reports order by product_sales DESC LIMIT 5;

This one gives all the products with sales over $1,500:

SELECT * FROM starbooks_sales_reports where product_sales > 1500;

Most importantly, the execution of both queries is efficient due to PostgreSQL's ability to directly access the precomputed results available in the materialized view starbooks_sales_reports. This eliminates the need for PostgreSQL to compute the aggregate SUM() query on the underlying tables, resulting in faster query processing.

The following materialized view shows the previous day's sales for each merchant whenever it's refreshed:

CREATE MATERIALIZED VIEW IF NOT EXISTS merchant_sales_reports AS
  SELECT
    prod.merchant_id,
    sum(sales.amount) as total_sales
  FROM product_details prod
  INNER JOIN product_sales sales
    ON prod.pid = sales.pid
  WHERE
    timestamp >= CURRENT_DATE - interval '1' day
  GROUP BY
    prod.merchant_id;

You can run a simple query on this materialized view to generate a sales report for all merchants, sorted by sales in descending order:

SELECT * FROM merchant_sales_reports ORDER BY total_sales DESC;

You can also run additional reporting queries directly on this materialized view. These views can be refreshed daily with the REFRESH MATERIALIZED VIEW command. This enables the execution of analytics queries to create daily sales reports, dashboards, and more. You can design different view definitions and refresh strategies to cater to different use cases.

Conclusion

A materialized view contains the precomputed results of an SQL query. Unlike a regular views, materialized views actually store the results in separate database objects. Although materialized views require periodic maintenance, the benefit of faster query execution outweighs that cost. They are especially useful for speeding up complex, expensive queries that are run frequently on a database.

To get the most out of materialized views, you should choose the right indexes, refresh them as needed, and consider incremental updates for frequently changing data. These best practices can greatly improve your query performance.

Hydra is an open source alternative to enterprise data warehouses, offering advanced capabilities by implementing a columnar engine within Postgres. It delivers significant query performance gains, improved cache hit rates, and enhanced scalability compared to standard Postgres.

Star our GitHub repo to join the Hydra community.

‍

Share this post

Manish Hatwalne