Advanced templates

This document shows and advanced usage of our datafile system when using the CLI. It’s also related to how to work with query parameters.

Reusing templates

When developing several use cases, it’s very common to find the need of reusing certain parts or steps of the analysis, like same data filters or similar table operations. We’re going to use for this purpose the following repository:

Clone demo
git clone https://github.com/tinybirdco/ecommerce_data_project_advanced.git
cd ecommerce_data_project_advanced
File structure
ecommerce_data_project/
    datasources/
        events.datasource
        mv_top_per_day.datasource
        products.datasource
        fixtures/
            events.csv
            products.csv
    endpoints/
        sales.pipe
        top_products_between_dates.pipe
        top_products_last_week.pipe
    includes/
        only_by_events.incl
        top_products.incl
    pipes/
        top_product_per_day.pipe

Let’s take a look first to the sales.pipe endpoint and the top_product_per_day.pipe pipe that materializes to a ‘mv_top_per_day’ datasource. They both make use of the same node: only_buy_events:

includes/only_buy_events.incl
NODE only_buy_events
SQL >
    SELECT
        toDate(timestamp) date,
        product,
        joinGet('products_join_by_id', 'color', product) as color,
        JSONExtractFloat(json, 'price') as price
    FROM events
    where action = 'buy'
endpoints/sales.pipes
INCLUDE "./includes/only_buy_events.incl"

NODE endpoint
DESCRIPTION >
    return sales for a product with color filter
SQL >
    %
    select date, sum(price) total_sales
    from only_buy_events
    where color in {{Array(colors, 'black')}}
    group by date
pipes/top_product_per_day.pipe
INCLUDE "./includes/only_buy_events.incl"

NODE top_per_day
SQL >
  SELECT date,
          topKState(10)(product) top_10,
          sumState(price) total_sales
  from only_buy_events
  group by date

TYPE materialized
DATASOURCE mv_top_per_day
ENGINE AggregatingMergeTree
SORTING_KEY date

Include variables

Using variables

However, we’ve the possibility of sending variables to an include. The main reason to do that is to have a very similar node or nodes that can be reused, but with slightly differences. For instance, in our example, we want to have two endpoints to display the 10 top products, but filtered by different date intervals:

includes/top_products.incl
NODE endpoint
DESCRIPTION >
    returns top 10 products for the last week
SQL >
    select
        date,
        topKMerge(10)(top_10) as top_10
    from top_product_per_day
    {% if '$DATE_FILTER' = 'last_week' %}
        where date > today() - interval 7 day
    {% else %}
        where date between {{Date(start)}} and {{Date(end)}}
    {% end %}
    group by date
endpoints/top_products_last_week.pipe
INCLUDE "./includes/top_products.incl" "DATE_FILTER=last_week"
endpoints/top_products_between_dates.pipe
INCLUDE "./includes/top_products.incl" "DATE_FILTER=between_dates"

As we can see, we’re sending the variable DATE_FILTER to the top_products include, where we check the variable content using $ as a prefix to retrieve the variable content.

It’s also possible to assign an array of values to an include variable, but then it’s needed to parse it properly from the template using function templates, as it’s explained in the following section.

Variables vs parameters

Please note the difference between variables and parameters. Parameters are indeed variables whose value can be changed by the user through the endpoint. Variables only live in the template and can be set when declaring the INCLUDE or with the set template util:

Using ‘set’ to declare a variable
{% set my_var = 'default' %}

By default, variables will be interpreted as parameters. In order to avoid variables or private parameters to appear in the public endpoint page of your endpoint, they need to start with _. Example:

Define private variables
%
SELECT
  date
FROM my_table
WHERE a > 10
{% if defined(_private_param) %}
  and b = {{Int32(_private_param)}}
{% end %}

This is also needed when using variables in template functions.

Template functions

This is the list of the available functions that can be used in a template:

  • defined(param): check if a variable is defined

defined function
%
SELECT
  date
FROM my_table
{% if defined(param) %}
  WHERE ...
{% end %}
  • column(name): get the column by its name from a variable

column function
%
{% set var_1 = 'name' %}
SELECT
  {{column(var_1)}}
FROM my_table
  • columns(names): get columns by their name from a variable

columns function
%
{% set var_1 = 'name,age,address' %}
SELECT
  {{columns(var_1)}}
FROM my_table
  • day_diff(date_1, date_2, default): get the difference in days between two dates

day_diff function
%
SELECT
  date
FROM my_table
{% if day_diff(date_end, date_start) < 7 %}
  WHERE ...
{% end %}
  • symbol(x, quote): get the value of a variable

symbol function
%
SELECT
  *
FROM {{symbol(table_name)}}
  • split_to_array(arr, default): splits comma separated values into an array

split_to_array function
%
SELECT
  arrayJoin(arrayMap(x -> toInt32(x), {{split_to_array(code, '')}})) as codes
FROM my_table
  • enumerate_with_last(arr, default): creates an iterable array, returning a boolean value that allows to check if the current element is the last element in the array. It can be used along with the split_to_array function.

enumerate_with_last function
%
SELECT
    {% for _last, _x in enumerate_with_last(split_to_array(attr, 'amount')) %}
        sum({{symbol(_x)}}) as {{symbol(_x)}}
        {% if not _last %}, {% end %}
    {% end %}
FROM my_table

Transform types functions

  • Boolean(x)

  • DateTime(x)

  • Date(x)

  • Float32(x)

  • Float64(x)

  • Int8(x)

  • Int16(x)

  • Int32(x)

  • Int64(x)

  • UInt8(x)

  • UInt16(x)

  • UInt32(x)

  • UInt64(x)

  • String(x)

  • Array(x)