When I started developing software, the database world was all about OLTP – Online Transactional Processing. These were the days of databases primarily handling the CRUD operations: Create, Read, Update, and Delete. SQL was king, universally used despite the subtle variations across platforms.
Back then, only large corporations could afford Online Analytical Processing (OLAP). They used expensive Data Warehouses to ingest and analyze historical data, which was rarely updated and typically deleted in bulk. Since there was no standard language for OLAP, each system often invented its own.
But things changed in the past 10 years. The database landscape exploded; every OLAP system adopted SQL as its query language, computation became much cheaper, and now almost every database is either open source or offers a generous free-tier. So, today the question isn't whether my OLAP database supports SQL or if I can afford it. Instead, it's about whether it supports my business use case and how its performance stacks up relative to the money invested in infrastructure.
In this article, I'll show you how SQL is used across various OLAP databases for time-series analytics. You'll see how some databases have adapted their SQL extensions specifically for time-series scenarios. We're going to dive into query comparisons in QuestDB, TimeScale, DuckDB, and ClickHouse, and I’m also including PostgreSQL in the mix to offer a perspective on how it compares to these more specialized databases.
Time-series queries in a nutshell
Time-series analytics involve handling large data volumes with some typical patterns:
- Recent individual rows may provide specific insights, whereas older data is typically more useful when aggregated to reveal broader trends.
- Filtering by time intervals is a common requirement, and it's often necessary to compare the same intervals over different date ranges.
- Resampling data at various time resolutions is a frequent task. It's about readjusting the timeframe of data, aligning points within the same or across different tables, and addressing any data gaps.
Let's see how some common time-series queries can be executed on the different databases.
Latest Record Query
Imagine you're ingesting financial tick data, tracking various trading symbols across multiple markets and time zones. A typical challenge is retrieving the most recent row for each symbol, a quintessential time-series query.
Click on the tabs below to see the difference across different database engines.
- QuestDB
- DuckDB
- PostgreSQL, Timescale, and Clickhouse
-- QuestDB implements the LATEST ON ... PARTITION BY extension
SELECT * FROM trades
LATEST ON timestamp PARTITION BY symbol, side;
-- DuckDB does not have an extension for this, but we can use a window
-- function and the QUALIFY filter
SELECT *,
row_number() OVER(
PARTITION BY symbol, side
ORDER BY timestamp DESC
) as rownum
FROM trades
QUALIFY rownum=1;
-- For this query, there are no extensions in either PostgreSQL, Timescale,
-- or Clickhouse. We can use standard SQL with window functions and a CTE
WITH numbered AS (
SELECT *,
row_number() OVER(
PARTITION BY symbol, side
ORDER BY timestamp DESC
) as rownum
FROM trades
)
SELECT * from numbered WHERE rownum=1;
Interestingly, only QuestDB has
a specific SQL extension for
this use case. In other databases, we can employ a Window Function to implement
a row counter for each symbol and side, filtering only the first one. An
alternative might be using a GROUP BY
clause with the first_value
aggregation function. However, this approach becomes less readable with more
columns. Notably, the standard SQL queries consume more resources than QuestDB's
optimized implementation, which fetches
only the most recent row
for each unique combination of provided columns.
Time-Interval Filtering
A fundamental aspect of exploring time-series data is applying time filters. Typically, analysts start by examining a broad time interval and then progressively focus on more granular timestamps—or sometimes, they do the reverse, It's also often crucial to examine consistent time slices across different date ranges. Let's see five queries around this topic.
- QuestDB
- PostgreSQL and Timescale
- DuckDB and Clickhouse
-- QuestDB offers the IN extension for time-intervals
SELECT * FROM trades WHERE timestamp in '2023'; -- whole year
SELECT * FROM trades WHERE timestamp in '2023-12'; -- whole month
SELECT * FROM trades WHERE timestamp in '2023-12-20'; -- whole day
-- The whole day, extending 15s into the next day
SELECT * FROM trades WHERE timestamp in '2023-12-20;15s';
-- For the past 7 days, 2 seconds before and after midnight
SELECT * from trades WHERE timestamp in '2023-09-20T23:59:58;4s;-1d;7'
-- PostgreSQL and Timescale can use the EXTRACT function
-- and then filter by parts of a date
-- whole year
SELECT * FROM trades WHERE EXTRACT(YEAR FROM timestamp) = 2023;
-- whole month
SELECT * FROM trades WHERE EXTRACT(YEAR FROM timestamp) = 2023 AND
EXTRACT(MONTH FROM timestamp) = 12;
-- whole day
SELECT * FROM trades WHERE EXTRACT(YEAR FROM timestamp) = 2023 AND
EXTRACT(MONTH FROM timestamp) = 12 AND
EXTRACT(DAY FROM timestamp) = 20;
-- The whole day, extending 15s into the next day
SELECT * FROM trades
WHERE timestamp between '2023-12-20' AND '2023-12-21T00:00:15';
-- For the past 7 days, 2 seconds before and after midnight
SELECT * FROM trades
WHERE timestamp between '2023-12-14' AND '2023-12-21T00:00:02' AND
(
(
EXTRACT(HOUR FROM timestamp)=23 AND
EXTRACT(MINUTE FROM timestamp)=59 AND
EXTRACT(SECOND FROM timestamp)>=58
) OR
(
EXTRACT(HOUR FROM timestamp)=0 AND
EXTRACT(MINUTE FROM timestamp)=0 AND
EXTRACT(SECOND FROM timestamp)<2
)
);
-- DuckDB and Clickhouse can use datetime functions
-- and then filter by parts of a date
-- whole year
SELECT * FROM trades WHERE year(timestamp)=2023;
-- whole month
SELECT * FROM trades WHERE year(timestamp)=2023 AND
month(timestamp)=12;
-- whole day
SELECT * FROM trades WHERE year(timestamp)=2023 AND
month(timestamp)=12 AND day(timestamp)=20;
-- The whole day, extending 15s into the next day
SELECT * FROM trades
WHERE timestamp between '2023-12-20' AND '2023-12-21T00:00:15';
-- For the past 7 days, 2 seconds before and after midnight
SELECT * FROM trades
WHERE timestamp between '2023-12-14' AND '2023-12-21T00:00:02' AND
(
(
hour(timestamp)=23 AND minute(timestamp)=59 AND second(timestamp)>=58
) OR
(
hour(timestamp)=0 AND minute(timestamp)=0 AND second(timestamp)<2
)
);
You can see how having native extensions for time-range filters can simplify
your SQL. In databases without specific extensions, we might have utilized
time_diff
and interval
functions, but the resulting SQL would still be more
complex than just using a specialized operator.
Joining Tables by Approximate Time
In analytics, while we often work with denormalized tables, joining them with other tables can yield insightful results. Joining on common columns is straightforward, but joining based on time can be far more insightful. However, this is complicated by the fact that events seldom occur at precisely the same microsecond. Issues like clock calibration discrepancies or network latencies can lead to slightly different timestamps in different tables. Moreover, there might be scenarios where one table logs data every second, and another does so every 15 minutes, yet you still need to join them to construct a coherent snapshot of a specific moment in time.
Most of the analytical databases we are examining support the
ASOF JOIN
. This join
type, for a given row in one table, finds a matching row in another table that
occurred at the same exact moment or, if not available, the closest preceding
entry. For databases lacking native ASOF JOIN
support, a similar result can be
achieved using a lateral join.
- QuestDB
- PostgreSQL and Timescale
- DuckDB
- Clickhouse
-- QuestDB provides ASOF JOIN. Since table definition includes
-- the designated timestamp column, no time condition is needed
SELECT t.*, n.*
FROM trades ASOF JOIN news ON (symbol);
-- Postgresql and timescale are the only ones with no ASOF JOIN support,
-- but we can use a LEFT JOIN LATERAL and filter by time using DISTINCT
SELECT t.*, n.*
FROM trades t LEFT JOIN LATERAL (
SELECT DISTINCT ON (news.symbol) *
FROM news n1
WHERE n1.symbol = t.symbol
AND n1.timestamp <= news.timestamp
ORDER BY n1.symbol, n1.timestamp DESC LIMIT 1) n
-- DuckDB provides the ASOF JOIN extension
SELECT t.*, n.*
FROM trades t ASOF JOIN news n
ON t.symbol = n.symbol AND t.timestamp >= n.timestamp;
-- Clickhouse provides the ASOF JOIN extension
SELECT t.*, n.*
FROM trades t ASOF JOIN news n
ON t.symbol = n.symbol AND n.timestamp <= t.timestamp;
In this scenario, QuestDB, DuckDB, and Clickhouse stand out for their
developer-friendliness, extending SQL to accommodate this common requirement.
However, there's a nuance with Clickhouse: its ASOF JOIN
does not allow using
the timestamp column as the sole condition. This limitation works fine for our
example but might pose challenges if you need to match all rows in one table
with those in another based solely on approximate timestamps.
Time Interval Grouping/Downsampling
Running aggregations over specific time intervals is another fundamental query type in time-series analytics. For instance, analysts might ask:
- How many rows are there per second?
- Are there any trends in trading volume per minute over the last few hours?
- What's the average order amount per country and category on a monthly basis over the last year?
Let's calculate the total price and volume in our trading table at 15 minutes intervals.
- QuestDB
- PostgreSQL
- Timescale
- DuckDB
- Clickhouse
-- QuestDB implements the SAMPLE BY extension
-- which accepts granularity from years to microseconds.
-- GROUP BY and ORDER BY are implicit and not needed
SELECT
timestamp, symbol,
sum(price) AS price,
sum(amount) AS volume
FROM trades
SAMPLE BY 15m;
-- Postgresql supports date_bin, allowing a maximum of 1 month
-- time buckets. An alternative should be used for longer intervals
-- then we GROUP BY and ORDER BY
SELECT
date_bin(
'15 minutes', timestamp, date_trunc('day',timestamp)
) AS time_bucket,
symbol,
sum(price) AS price,
sum(amount) AS volume
FROM trades
GROUP BY time
ORDER BY time_bucket
-- Timescale supports time_bucket, then GROUP BY and ORDER
SELECT time_bucket('15 minute', timestamp) AS bucket,
symbol,
sum(price) AS price,
sum(amount) AS volume
FROM trades
GROUP BY bucket, symbol
ORDER BY bucket ASC;
-- DuckDB supports time_bucket, then GROUP BY and ORDER
SELECT time_bucket(Interval '15 minutes', timestamp) AS bucket,
symbol,
sum(price) AS price,
sum(amount) AS volume
FROM trades
GROUP BY bucket, symbol
ORDER BY bucket ASC;
-- Clickhouse supports toStartOfInterval, then GROUP BY and ORDER
SELECT
toStartOfInterval(timestamp, INTERVAL 15 MINUTE) AS i,
symbol,
sum(price) AS price,
sum(amount) AS volume
FROM trades
GROUP BY i, symbol
ORDER BY i ASC;
All databases include some support for classifying a timestamp into a time
bucket. Most of them then use the standard GROUP BY
and ORDER BY
so you can
execute aggregations. One thing I like about the QuestDB
SAMPLE BY
extension is
that grouping and sorting by timestamp are implicit, making my SQL a bit easier
to write.
Time Interval Downsampling with linear interpolation
Wouldn't life be great if our data always arrived at regular predictable intervals? Reality often presents us with irregular time intervals, complete with gaps In such cases, it's crucial not only to identify these gaps but also to consider interpolating results at these points using linear interpolation. This approach ensures a uniform dataset without gaps, which is critical when training machine learning models or displaying results on a business dashboard.
- QuestDB
- PostgreSQL
- Timescale
- DuckDB
- Clickhouse
-- QuestDB SAMPLE BY accepts FILL with different strategies,
-- including LINEAR, PREVious row value, NULL, or Literal value
SELECT
timestamp,
sum(price) AS price,
sum(amount) AS volume
FROM trades
SAMPLE BY 1s FILL(LINEAR);
-- Postgresql. No support for interpolation, so we use window functions.
-- We first generate a virtual table with all the intervals we want
-- in the result and LEFT JOIN with our table. Now we need to find
-- previous and next values, so we can finally interpolate
WITH time_series AS (
SELECT generate_series(
min(date_trunc('day', timestamp)),
max(timestamp),
interval '1 seconds'
) AS time_bucket
FROM trades
),
aggregated_data AS (
SELECT
date_bin(
'1 seconds', timestamp, date_trunc('day', timestamp)
) AS time_bucket,
symbol,
sum(price) AS price,
sum(amount) AS volume
FROM trades
GROUP BY time_bucket, symbol
)
SELECT
ts.time_bucket,
ad.symbol,
COALESCE(
ad.price,
last_value(ad.price) OVER w
+ (next_value(ad.price) OVER w - last_value(ad.price) OVER w)
* EXTRACT(
epoch FROM ts.time_bucket - last_value(ts.time_bucket) OVER w
)
/ NULLIF(
EXTRACT(
epoch FROM next_value(ts.time_bucket) OVER w
- last_value(ts.time_bucket) OVER w
), 0
)
) AS interpolated_price,
COALESCE(
ad.volume,
last_value(ad.volume) OVER w
+ (next_value(ad.volume) OVER w - last_value(ad.volume) OVER w)
* EXTRACT(
epoch FROM ts.time_bucket - last_value(ts.time_bucket) OVER w
)
/ NULLIF(
EXTRACT(
epoch FROM next_value(ts.time_bucket) OVER w
- last_value(ts.time_bucket) OVER w
), 0
)
) AS interpolated_volume
FROM time_series ts
LEFT JOIN aggregated_data ad ON ts.time_bucket = ad.time_bucket
WINDOW w AS (
PARTITION BY ad.symbol
ORDER BY ts.time_bucket
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
)
ORDER BY ts.time_bucket, ad.symbol;
-- Timescale offers the time_bucket_gapfill function which,
-- together with interpolate, allows for different gap filling
-- strategies
SELECT time_bucket_gapfill('1 second', timestamp) AS bucket,
interpolate(sum(price)) AS price,
interpolate(sum(amount)) AS volume,
FROM trades
GROUP BY bucket, symbol
ORDER BY bucket ASC;
-- Duckdb. No support for interpolation, so we need to use window functions.
-- We first generate a virtual table with all the intervals we want
-- in the result and LEFT JOIN with our table. Now we need to find
-- previous and next values, so we can finally interpolate
WITH time_series AS (
SELECT unnest(generate_series(
min(date_trunc('second',timestamp)),
max(timestamp),
interval '1 seconds'
)) AS time_bucket
FROM trades
),
aggregated_data AS (
SELECT time_bucket(Interval '1 seconds', timestamp) AS bucket,
symbol,
sum(price) AS price,
sum(amount) AS volume
FROM trades
GROUP BY bucket, symbol
ORDER BY bucket ASC
)
SELECT
ts.time_bucket,
ad.symbol,
price,
volume,
COALESCE(
ad.price,
last_value(ad.price) OVER w
+ (first_value(ad.price) OVER w - last_value(ad.price) OVER w)
* EXTRACT(
epoch FROM ts.time_bucket - last_value(ts.time_bucket) OVER w
)
/ NULLIF(
EXTRACT(
epoch FROM first_value(ts.time_bucket) OVER w
- last_value(ts.time_bucket) OVER w
), 0
)
) AS interpolated_price,
COALESCE(
ad.volume,
last_value(ad.volume) OVER w
+ (first_value(ad.volume) OVER w - last_value(ad.volume) OVER w)
* EXTRACT(
epoch FROM ts.time_bucket - last_value(ts.time_bucket) OVER w
)
/ NULLIF(
EXTRACT(
epoch FROM first_value(ts.time_bucket) OVER w
- last_value(ts.time_bucket) OVER w
), 0
)
) AS interpolated_volume
FROM time_series ts
LEFT JOIN aggregated_data ad ON ts.time_bucket = ad.bucket
WINDOW w AS (
PARTITION BY symbol
ORDER BY time_bucket
)
ORDER BY ts.time_bucket, ad.symbol;
-- Clickhouse. It supports the WITH FILL STEP extension, but
-- interpolation is based on last row value only, so no
-- easy method for linear interpolation.
-- We first fill rows, then use window functions to get results similar
-- to linear interpolation. We could make the query more complex to get
-- better results, but this is probably enough for this post comparison
WITH interpolated AS (
SELECT
toStartOfInterval(timestamp, INTERVAL 15 SECOND) AS i,
symbol,
sum(price) AS price,
sum(amount) AS volume
FROM trades
GROUP BY i, symbol
ORDER BY i ASC
WITH FILL STEP Interval 1 SECOND INTERPOLATE(symbol)
), prev_and_next AS (
SELECT i, symbol, price, volume,
last_value(volume) OVER prev_w as prev_vol,
first_value(volume) OVER next_w as next_vol,
last_value(price) OVER prev_w as prev_price,
first_value(price) OVER next_w as next_price
FROM interpolated
WINDOW prev_w AS (
PARTITION BY symbol ORDER BY i
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
),
next_w AS (
PARTITION BY symbol ORDER BY i
ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING
)
)
SELECT i, symbol,
coalesce(price, (prev_price + next_price)/2) as price,
coalesce(volume,(prev_vol+next_vol)/2) as volume
FROM prev_and_next;
QuestDB and Timescale both offer extensions for linear interpolation, leading to queries that are quite similar to those in the previous section—a convenient consistency. On the other hand, DuckDB, Clickhouse, and PostgreSQL, not being specifically designed for time-series, lack such extensions. For PostgreSQL and DuckDB, one must first create a time-series virtual table, then perform a join, and use Window Functions for the interpolation calculations. Clickhouse does provide some native interpolation capabilities, but these are limited to values based on the previous row only, necessitating the use of Window Functions for more complex calculations.
Conclusion
As we have explored, SQL is not a one-size-fits-all language; its effectiveness can vary significantly depending on the database and its specific extensions. A database like QuestDB, with tailored SQL extensions for particular use cases, notably improves the developer experience by reducing friction. This ease of use translates into greater productivity, less frustration, and more time for analysts and developers to engage in interactive exploration. Such an environment fosters deeper insights and, ultimately, can lead to more informed business decisions.
Moreover, QuestDB's SQL extensions are not just about syntax convenience; they are backed by optimizations specifically designed for the unique characteristics and volumes of time-series data. This focus on efficiency is a key factor in QuestDB's consistent top performance in industry benchmarks.
Closing Notes
For those already familiar with SQL, adapting to QuestDB's extensions should be straightforward, especially once you grasp the fundamental concepts underpinning QuestDB. If you're new to SQL and eager to dive into time-series analytics, starting with QuestDB offers a smoother learning curve. You'll spend less time grappling with complex syntax and more time acquiring practical skills.
You can try QuestDB's SQL extensions at our public demo or downloading QuestDB Open Source .