OLAP vs Time-Series Databases: The SQL Perspective

QuestDB is a high performance time-series database with SQL analytics that can power through data ingestion and analysis. It's open source and integrates with many tools and languages. Give us a try!

When I started developing software, the database world was all about OLTP – Online Transactional Processing. These were the days of databases primarily handling the CRUD operations: Create, Read, Update, and Delete. SQL was king, universally used despite the subtle variations across platforms.

An artistic representation of SQL extensions for time-series for questdb, duckdb, clickhouse, timescale, and postgreSQL.

Back then, only large corporations could afford Online Analytical Processing (OLAP). They used expensive Data Warehouses to ingest and analyze historical data, which was rarely updated and typically deleted in bulk. Since there was no standard language for OLAP, each system often invented its own.

But things changed in the past 10 years. The database landscape exploded; every OLAP system adopted SQL as its query language, computation became much cheaper, and now almost every database is either open source or offers a generous free-tier. So, today the question isn't whether my OLAP database supports SQL or if I can afford it. Instead, it's about whether it supports my business use case and how its performance stacks up relative to the money invested in infrastructure.

In this article, I'll show you how SQL is used across various OLAP databases for time-series analytics. You'll see how some databases have adapted their SQL extensions specifically for time-series scenarios. We're going to dive into query comparisons in QuestDB, TimeScale, DuckDB, and ClickHouse, and I’m also including PostgreSQL in the mix to offer a perspective on how it compares to these more specialized databases.

Time-series queries in a nutshell

Time-series analytics involve handling large data volumes with some typical patterns:

Recent individual rows may provide specific insights, whereas older data is typically more useful when aggregated to reveal broader trends.
Filtering by time intervals is a common requirement, and it's often necessary to compare the same intervals over different date ranges.
Resampling data at various time resolutions is a frequent task. It's about readjusting the timeframe of data, aligning points within the same or across different tables, and addressing any data gaps.

Let's see how some common time-series queries can be executed on the different databases.

Latest Record Query

Imagine you're ingesting financial tick data, tracking various trading symbols across multiple markets and time zones. A typical challenge is retrieving the most recent row for each symbol, a quintessential time-series query.

Click on the tabs below to see the difference across different database engines.

QuestDB
DuckDB
PostgreSQL, Timescale, and Clickhouse

-- QuestDB implements the LATEST ON ... PARTITION BY extension

SELECT * FROM trades
LATEST ON timestamp PARTITION BY symbol, side;

-- DuckDB does not have an extension for this, but we can use a window
-- function and the QUALIFY filter

  SELECT *,
  row_number() OVER(
    PARTITION BY symbol, side
    ORDER BY timestamp DESC
    ) as rownum
  FROM trades
  QUALIFY rownum=1;

-- For this query, there are no extensions in either PostgreSQL, Timescale,
-- or Clickhouse. We can use standard SQL with window functions and a CTE

WITH numbered AS (
  SELECT *,
  row_number() OVER(
    PARTITION BY symbol, side
    ORDER BY timestamp DESC
    ) as rownum
  FROM trades
)
SELECT * from numbered WHERE rownum=1;

Interestingly, only QuestDB has a specific SQL extension for this use case. In other databases, we can employ a Window Function to implement a row counter for each symbol and side, filtering only the first one. An alternative might be using a GROUP BY clause with the first_value aggregation function. However, this approach becomes less readable with more columns. Notably, the standard SQL queries consume more resources than QuestDB's optimized implementation, which fetches only the most recent row for each unique combination of provided columns.

Time-Interval Filtering

A fundamental aspect of exploring time-series data is applying time filters. Typically, analysts start by examining a broad time interval and then progressively focus on more granular timestamps—or sometimes, they do the reverse, It's also often crucial to examine consistent time slices across different date ranges. Let's see five queries around this topic.

QuestDB
PostgreSQL and Timescale
DuckDB and Clickhouse

-- QuestDB offers the IN extension for time-intervals

SELECT * FROM trades WHERE timestamp in '2023'; -- whole year
SELECT * FROM trades WHERE timestamp in '2023-12'; -- whole month
SELECT * FROM trades WHERE timestamp in '2023-12-20'; -- whole day

-- The whole day, extending 15s into the next day
SELECT * FROM trades WHERE timestamp in '2023-12-20;15s';

-- For the past 7 days, 2 seconds before and after midnight
SELECT * from trades WHERE timestamp in '2023-09-20T23:59:58;4s;-1d;7'

-- PostgreSQL and Timescale can use the EXTRACT function
-- and then filter by parts of a date

-- whole year
SELECT * FROM trades WHERE EXTRACT(YEAR FROM timestamp) = 2023;

-- whole month
SELECT * FROM trades WHERE EXTRACT(YEAR FROM timestamp) = 2023 AND
  EXTRACT(MONTH FROM timestamp) = 12;

-- whole day
SELECT * FROM trades WHERE EXTRACT(YEAR FROM timestamp) = 2023 AND
  EXTRACT(MONTH FROM timestamp) = 12 AND
  EXTRACT(DAY FROM timestamp) = 20;

-- The whole day, extending 15s into the next day
SELECT * FROM trades
WHERE timestamp between '2023-12-20' AND '2023-12-21T00:00:15';

-- For the past 7 days, 2 seconds before and after midnight
SELECT * FROM trades
WHERE timestamp between '2023-12-14' AND '2023-12-21T00:00:02' AND
(
  (
  EXTRACT(HOUR FROM timestamp)=23 AND
  EXTRACT(MINUTE FROM timestamp)=59 AND
  EXTRACT(SECOND FROM timestamp)>=58
  ) OR
  (
  EXTRACT(HOUR FROM timestamp)=0 AND
  EXTRACT(MINUTE FROM timestamp)=0 AND
  EXTRACT(SECOND FROM timestamp)<2
  )
);

-- DuckDB and Clickhouse can use datetime functions
-- and then filter by parts of a date

-- whole year
SELECT * FROM trades WHERE year(timestamp)=2023;

-- whole month
SELECT * FROM trades WHERE year(timestamp)=2023 AND
  month(timestamp)=12;

-- whole day
SELECT * FROM trades WHERE year(timestamp)=2023 AND
  month(timestamp)=12 AND day(timestamp)=20;

-- The whole day, extending 15s into the next day
SELECT * FROM trades
WHERE timestamp between '2023-12-20' AND '2023-12-21T00:00:15';

-- For the past 7 days, 2 seconds before and after midnight
SELECT * FROM trades
WHERE timestamp between '2023-12-14' AND '2023-12-21T00:00:02' AND
(
  (
  hour(timestamp)=23 AND minute(timestamp)=59 AND second(timestamp)>=58
  ) OR
  (
  hour(timestamp)=0 AND minute(timestamp)=0 AND second(timestamp)<2
  )
);

You can see how having native extensions for time-range filters can simplify your SQL. In databases without specific extensions, we might have utilized time_diff and interval functions, but the resulting SQL would still be more complex than just using a specialized operator.

Joining Tables by Approximate Time

In analytics, while we often work with denormalized tables, joining them with other tables can yield insightful results. Joining on common columns is straightforward, but joining based on time can be far more insightful. However, this is complicated by the fact that events seldom occur at precisely the same microsecond. Issues like clock calibration discrepancies or network latencies can lead to slightly different timestamps in different tables. Moreover, there might be scenarios where one table logs data every second, and another does so every 15 minutes, yet you still need to join them to construct a coherent snapshot of a specific moment in time.

Most of the analytical databases we are examining support the ASOF JOIN. This join type, for a given row in one table, finds a matching row in another table that occurred at the same exact moment or, if not available, the closest preceding entry. For databases lacking native ASOF JOIN support, a similar result can be achieved using a lateral join.

QuestDB
PostgreSQL and Timescale
DuckDB
Clickhouse

-- QuestDB provides ASOF JOIN. Since table definition includes
-- the designated timestamp column, no time condition is needed

SELECT t.*, n.*
FROM trades ASOF JOIN news ON (symbol);

-- Postgresql and timescale are the only ones with no ASOF JOIN support,
-- but we can use a LEFT JOIN LATERAL and filter by time using DISTINCT

SELECT t.*, n.*
FROM trades t LEFT JOIN LATERAL (
 SELECT DISTINCT ON (news.symbol) *
    FROM news n1
    WHERE n1.symbol = t.symbol
    AND n1.timestamp <= news.timestamp
    ORDER BY n1.symbol, n1.timestamp DESC LIMIT 1) n

-- DuckDB provides the ASOF JOIN extension

SELECT t.*, n.*
FROM trades t ASOF JOIN news n
  ON t.symbol = n.symbol AND t.timestamp >= n.timestamp;

-- Clickhouse provides the ASOF JOIN extension

SELECT t.*, n.*
FROM trades t ASOF JOIN news n
  ON t.symbol = n.symbol AND n.timestamp <= t.timestamp;

In this scenario, QuestDB, DuckDB, and Clickhouse stand out for their developer-friendliness, extending SQL to accommodate this common requirement. However, there's a nuance with Clickhouse: its ASOF JOIN does not allow using the timestamp column as the sole condition. This limitation works fine for our example but might pose challenges if you need to match all rows in one table with those in another based solely on approximate timestamps.

Time Interval Grouping/Downsampling

Running aggregations over specific time intervals is another fundamental query type in time-series analytics. For instance, analysts might ask:

How many rows are there per second?
Are there any trends in trading volume per minute over the last few hours?
What's the average order amount per country and category on a monthly basis over the last year?

Let's calculate the total price and volume in our trading table at 15 minutes intervals.

QuestDB
PostgreSQL
Timescale
DuckDB
Clickhouse

-- QuestDB implements the SAMPLE BY extension
-- which accepts granularity from years to microseconds.
-- GROUP BY and ORDER BY are implicit and not needed

SELECT
   timestamp, symbol,
   sum(price) AS price,
   sum(amount) AS volume
FROM trades
SAMPLE BY 15m;

-- Postgresql supports date_bin, allowing a maximum of 1 month
-- time buckets. An alternative should be used for longer intervals
-- then we GROUP BY and ORDER BY

SELECT
  date_bin(
    '15 minutes', timestamp, date_trunc('day',timestamp)
    ) AS time_bucket,
  symbol,
  sum(price) AS price,
  sum(amount) AS volume
FROM trades
GROUP BY time
ORDER BY time_bucket

-- Timescale supports time_bucket, then GROUP BY and ORDER

SELECT time_bucket('15 minute', timestamp) AS bucket,
    symbol,
    sum(price) AS price,
    sum(amount) AS volume
FROM trades
GROUP BY bucket, symbol
ORDER BY bucket ASC;

-- DuckDB supports time_bucket, then GROUP BY and ORDER

SELECT time_bucket(Interval '15 minutes', timestamp) AS bucket,
    symbol,
    sum(price) AS price,
    sum(amount) AS volume
FROM trades
GROUP BY bucket, symbol
ORDER BY bucket ASC;

-- Clickhouse supports toStartOfInterval, then GROUP BY and ORDER

SELECT
    toStartOfInterval(timestamp, INTERVAL 15 MINUTE) AS i,
    symbol,
    sum(price) AS price,
    sum(amount) AS volume
FROM trades
GROUP BY i, symbol
ORDER BY i ASC;

All databases include some support for classifying a timestamp into a time bucket. Most of them then use the standard GROUP BY and ORDER BY so you can execute aggregations. One thing I like about the QuestDB SAMPLE BY extension is that grouping and sorting by timestamp are implicit, making my SQL a bit easier to write.

Time Interval Downsampling with linear interpolation

Wouldn't life be great if our data always arrived at regular predictable intervals? Reality often presents us with irregular time intervals, complete with gaps In such cases, it's crucial not only to identify these gaps but also to consider interpolating results at these points using linear interpolation. This approach ensures a uniform dataset without gaps, which is critical when training machine learning models or displaying results on a business dashboard.

QuestDB
PostgreSQL
Timescale
DuckDB
Clickhouse

-- QuestDB SAMPLE BY accepts FILL with different strategies,
-- including LINEAR, PREVious row value, NULL, or Literal value

SELECT
timestamp,
sum(price) AS price,
sum(amount) AS volume
FROM trades
SAMPLE BY 1s FILL(LINEAR);

-- Postgresql. No support for interpolation, so we use window functions.
-- We first generate a virtual table with all the intervals we want
-- in the result and LEFT JOIN with our table. Now we need to find
-- previous and next values, so we can finally interpolate

WITH time_series AS (
    SELECT generate_series(
        min(date_trunc('day', timestamp)),
        max(timestamp),
        interval '1 seconds'
    ) AS time_bucket
    FROM trades
),
aggregated_data AS (
    SELECT
        date_bin(
          '1 seconds', timestamp, date_trunc('day', timestamp)
          ) AS time_bucket,
        symbol,
        sum(price) AS price,
        sum(amount) AS volume
    FROM trades
    GROUP BY time_bucket, symbol
)
SELECT
    ts.time_bucket,
    ad.symbol,
    COALESCE(
        ad.price,
        last_value(ad.price) OVER w
        + (next_value(ad.price) OVER w - last_value(ad.price) OVER w)
        * EXTRACT(
          epoch FROM ts.time_bucket - last_value(ts.time_bucket) OVER w
          )
        / NULLIF(
          EXTRACT(
            epoch FROM next_value(ts.time_bucket) OVER w
              - last_value(ts.time_bucket) OVER w
            ), 0
            )
    ) AS interpolated_price,
    COALESCE(
        ad.volume,
        last_value(ad.volume) OVER w
        + (next_value(ad.volume) OVER w - last_value(ad.volume) OVER w)
        * EXTRACT(
          epoch FROM ts.time_bucket - last_value(ts.time_bucket) OVER w
          )
        / NULLIF(
          EXTRACT(
            epoch FROM next_value(ts.time_bucket) OVER w
              - last_value(ts.time_bucket) OVER w
              ), 0
              )
    ) AS interpolated_volume
FROM time_series ts
LEFT JOIN aggregated_data ad ON ts.time_bucket = ad.time_bucket
WINDOW w AS (
    PARTITION BY ad.symbol
    ORDER BY ts.time_bucket
    ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
)
ORDER BY ts.time_bucket, ad.symbol;

-- Timescale offers the time_bucket_gapfill function which,
-- together with interpolate, allows for different gap filling
-- strategies

SELECT time_bucket_gapfill('1 second', timestamp) AS bucket,
    interpolate(sum(price)) AS price,
    interpolate(sum(amount)) AS volume,
FROM trades
GROUP BY bucket, symbol
ORDER BY bucket ASC;

-- Duckdb. No support for interpolation, so we need to use window functions.
-- We first generate a virtual table with all the intervals we want
-- in the result and LEFT JOIN with our table. Now we need to find
-- previous and next values, so we can finally interpolate

WITH time_series AS (
    SELECT unnest(generate_series(
        min(date_trunc('second',timestamp)),
        max(timestamp),
        interval '1 seconds'
    )) AS time_bucket
    FROM trades
),
aggregated_data AS (
    SELECT time_bucket(Interval '1 seconds', timestamp) AS bucket,
    symbol,
    sum(price) AS price,
    sum(amount) AS volume
    FROM trades
    GROUP BY bucket, symbol
    ORDER BY bucket ASC
)
SELECT
    ts.time_bucket,
    ad.symbol,
    price,
    volume,
    COALESCE(
        ad.price,
        last_value(ad.price) OVER w
        + (first_value(ad.price) OVER w - last_value(ad.price) OVER w)
        * EXTRACT(
          epoch FROM ts.time_bucket - last_value(ts.time_bucket) OVER w
          )
        / NULLIF(
          EXTRACT(
            epoch FROM first_value(ts.time_bucket) OVER w
              - last_value(ts.time_bucket) OVER w
            ), 0
            )
    ) AS interpolated_price,
    COALESCE(
        ad.volume,
        last_value(ad.volume) OVER w
        + (first_value(ad.volume) OVER w - last_value(ad.volume) OVER w)
        * EXTRACT(
          epoch FROM ts.time_bucket - last_value(ts.time_bucket) OVER w
          )
        / NULLIF(
          EXTRACT(
            epoch FROM first_value(ts.time_bucket) OVER w
              - last_value(ts.time_bucket) OVER w
            ), 0
            )
    ) AS interpolated_volume
FROM time_series ts
LEFT JOIN aggregated_data ad ON ts.time_bucket = ad.bucket
WINDOW w AS (
    PARTITION BY symbol
    ORDER BY time_bucket
)
ORDER BY ts.time_bucket, ad.symbol;

-- Clickhouse. It supports the WITH FILL STEP extension, but
-- interpolation is based on last row value only, so no
-- easy method for linear interpolation.
-- We first fill rows, then use window functions to get results similar
-- to linear interpolation. We could make the query more complex to get
-- better results, but this is probably enough for this post comparison

WITH interpolated AS (
    SELECT
    toStartOfInterval(timestamp, INTERVAL 15 SECOND) AS i,
    symbol,
    sum(price) AS price,
    sum(amount) AS volume
FROM trades
GROUP BY i, symbol
ORDER BY i ASC
WITH FILL STEP Interval 1 SECOND INTERPOLATE(symbol)
), prev_and_next AS (
  SELECT i, symbol, price, volume,
  last_value(volume) OVER prev_w as prev_vol,
  first_value(volume) OVER next_w as next_vol,
  last_value(price) OVER prev_w as prev_price,
  first_value(price) OVER next_w as next_price
  FROM interpolated
  WINDOW prev_w AS (
    PARTITION BY symbol ORDER BY i
    ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
    ),
    next_w AS (
    PARTITION BY symbol ORDER BY i
    ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING
    )
)
SELECT i, symbol,
      coalesce(price, (prev_price + next_price)/2) as price,
      coalesce(volume,(prev_vol+next_vol)/2) as volume
FROM prev_and_next;

QuestDB and Timescale both offer extensions for linear interpolation, leading to queries that are quite similar to those in the previous section—a convenient consistency. On the other hand, DuckDB, Clickhouse, and PostgreSQL, not being specifically designed for time-series, lack such extensions. For PostgreSQL and DuckDB, one must first create a time-series virtual table, then perform a join, and use Window Functions for the interpolation calculations. Clickhouse does provide some native interpolation capabilities, but these are limited to values based on the previous row only, necessitating the use of Window Functions for more complex calculations.

Conclusion

As we have explored, SQL is not a one-size-fits-all language; its effectiveness can vary significantly depending on the database and its specific extensions. A database like QuestDB, with tailored SQL extensions for particular use cases, notably improves the developer experience by reducing friction. This ease of use translates into greater productivity, less frustration, and more time for analysts and developers to engage in interactive exploration. Such an environment fosters deeper insights and, ultimately, can lead to more informed business decisions.

Moreover, QuestDB's SQL extensions are not just about syntax convenience; they are backed by optimizations specifically designed for the unique characteristics and volumes of time-series data. This focus on efficiency is a key factor in QuestDB's consistent top performance in industry benchmarks.

Closing Notes

For those already familiar with SQL, adapting to QuestDB's extensions should be straightforward, especially once you grasp the fundamental concepts underpinning QuestDB. If you're new to SQL and eager to dive into time-series analytics, starting with QuestDB offers a smoother learning curve. You'll spend less time grappling with complex syntax and more time acquiring practical skills.

You can try QuestDB's SQL extensions at our public demo or downloading QuestDB Open Source .

Get the latest and greatest. Secure and never shared or sold.

Time-series queries in a nutshell​

Latest Record Query​

Time-Interval Filtering​

Joining Tables by Approximate Time​

Time Interval Grouping/Downsampling​

Time Interval Downsampling with linear interpolation​

Conclusion​

Closing Notes​