Hydra 1.0 Generally Available

‍

‍Hydra is an open-source extension that adds columnar tables to Postgres for efficient analytical reporting. With Hydra, you can analyze billions of rows instantly without changing code.

Hydra augments Postgres’ existing row-based tables, enabling developers to tailor Postgres to their application’s custom transactional and analytical needs. Hydra combines a columnar format, vectorized execution, and parallelism to supercharge modern applications that aren’t wholly transactional or fully analytical, such as real-time dashboards, IOT, geospatial & logistics apps, and time-series workloads.

🔔

Try the Hydra Free Tier to create a column-oriented Postgres instance. Then connect to it with your preferred Postgres client (psql, dbeaver, etc). Alternatively, you can run Hydra locally.

Tailor Postgres to your modern, real-time apps

👸 Columnar tables = OLAP

OLAP (Online Analytical Processing) is designed to support analytical workloads, such as data mining, reporting, and business intelligence. OLAP systems typically use a multidimensional data model, which allows users to analyze data from multiple perspectives and at different levels of detail. OLAP systems are often used in decision support applications, where users need to quickly and easily analyze large amounts of data.

🤴 Row tables (heap) = OLTP

OLTP (Online Transactional Processing) is optimized for a large number of small, frequent transactions that insert, update, delete, and retrieve data from a database. This type of system manages real-time data processing for record lookups, fast writes, high concurrency, and useful for order entry, sales, financial applications, and more.

👸🤝🤴 Row + Columnar tables = HTAP

HTAP (Hybrid Transactional Analytical Processing) combines the strengths of OLTP and OLAP into a single system. When transactions occur, they are instantly accessible for analytics and machine learning. HTAP is commonly used when reporting latency must be low, such as financial analysis, IOT alerting, fraud detection, supply chain management, customer-facing dashboards, and applications with real-time decision making.

Columnar tables 101

Columnar tables are organized transversely from row tables. For example, take the following table stored in row format:

Joe	156789	Blue	Platinum
Felisha	119988	Red	Gold
Sam	232667	Green	Silver

The order of the data would be: | Joe | 156789 | Blue | Platinum | Felisha | 119988 | Red | Gold | Sam | 232667 | Green | Silver |

The same data stored in columnar can be visualized as follows:

Joe	Felisha	Sam
156789	119988	232667
Blue	Red	Green
Platinum	Gold	Silver

As columnar, the order of the data would be: | Joe | Felisha | Sam | 156789 | 119988 | 232667 | Blue | Red | Green | Platinum | Gold | Silver |

Learn more in our docs.

Using Hydra

Hydra makes use of tableam (table access method API), which was added in Postgres 12 released in 2019. Using tableam engineers can flexibly define when a table is row (heap) or column-oriented in their Postgres database.

It’s super simple to swap table format:

CREATE TABLE my_table (i INT8) USING heap; -- convert to columnar SELECT columnar.alter_table_set_access_method('my_table', 'columnar'); -- convert back to row (heap) SELECT columnar.alter_table_set_access_method('my_table', 'heap');

🔔

Please note: When creating tables in Hydra the table access method has been set to USING columnar by default. For example, when data is first loaded into a Hydra database, all tables are formatted as column-oriented.

Benchmarks

Hydra columnar tables enable the fastest Postgres aggregates on earth.

Review Clickbench for comprehensive results and the list of 42 queries tested.

This benchmark represents typical workload in the following areas: clickstream and traffic analysis, web analytics, machine-generated data, structured logs, and events data.

Benchmarks were run on a c6a.4xlarge (16 vCPU, 32 GB RAM) with 500 GB of GP2 storage.

For our continuous benchmark results, see BENCHMARKS.

Release Notes

Aggregate queries are over 60% faster compared to Hydra 1.0 beta release. Spatial indexes and pg_hint_plan are now enabled for performance optimization.

Please refer to Hydra 1.0 beta release notes here.

Aggregate vectorization

We added vectorization of integer and date data that is stored in a columnar table. Vectorization happens automatically whenever applicable. The following aggregate functions are vectorized:

MIN
MAX
COUNT
SUM
AVG

‍

Vectorization can result in aggregate queries being over 60% faster. If vectorized aggregate is not found or execution plan is not suitable, Hydra falls back to standard Postgres execution.

Hydra uses automatic vectorization by the compiler for vectorization. We have enabled -O3 optimizations in our compilation process to maximize this effect.

This optimization is for Postgres 14+ only.

Spatial index types and pg_hint_plan

After testing, we have enabled gin, gist, spgist, and rum indexes on columnar tables, enabling indexes for geospatial queries. We have also added pg_hint_plan to enable you to experiment with the query plan to maximize performance. Recommendations for indexing and using pg_hint_plan are available in our documentation.

Developer Changelog

View or full CHANGELOG on Github.

Limitations

Hydra Columnar currently does not support the following Postgres features. These features are available on heap tables for transactional workloads.

Upsert (ON CONFLICT statements, except DO NOTHING actions with no target specified)
- We are targeting upsert for version 1.1.
Tuple locks (SELECT ... FOR SHARE, SELECT ... FOR UPDATE)
SERIALIZABLE isolation level
Foreign keys
Logical decoding
AFTER ... FOR EACH ROW triggers
UNLOGGED columnar tables

‍

Share this post

Simon Wijckmans

Product Lead at Hydra