Global Settings

When the API is first started, it first reads in a global settings file from ./settings.yaml. You can override this location by setting the system environment variable PROTARI_SETTINGS_PATH.

The settings file tells the API where to find the data, the dataset configuration files, parameters used for perturbations, and how to authorize users. The file may be in either yaml or json format.

An example is:

dataset_config_path: ./dataset_config
logging_config_path: ./logging_config.yaml
validate_on_startup: true
  - remote_addr
  - X-Forwarded-For
  name: Sample
  title: Sample organization
description: A description to include in every json-formatted API output.
terms: Optional terms and conditions to include in every json-formatted API output.
      max_functions: 1
      - sum
      - mean
      - count
      path: ./perturbation_data
  reference: protari.auth_interface.db_auth.DatabaseAuthInterface
      limit: limit
      reference: protari_api.sql_query_limiter.SqlQueryLimiter
        table_name: query_limit
        - 10/minute
- sql_tbe_pipeline
- sql_laplace_pipeline

The schema for the settings file is provided in the source code.

Environment variables

Settings values can reference environment variables as ${ENV_VAR}. A typical use-case is to include passwords without the need to put them directly in the file, eg:


together with the environment variable DATABASE_PWD.

You could alternatively put the entire database URL into an environment variable, eg:

      url: ${DATABASE_URL}


The path to your dataset configuration files. All json files found under this path will be read by the API and interpreted as dataset configuration files. This path is relative to the location of the settings file.


The path to your logging configuration file. This path is relative to the location of the settings file.


If true (the default), all datasets will be put through a number of validation checks when the API starts up.


If true (the default is false), all output is validated explicitly against the API's swagger spec.


An optional list of attributes of Flask's request object that you wish to include in the API's logs, eg. to log the user's IP address (when it is not obscured by a proxy layer), include remote_addr.


An optional list of headers you wish to include in the API's logs, eg. User-Agent or X-Forwarded-For. The X-Forwarded-For header can be useful when a user's IP address is not available via the attribute remote_addr.


The API's base url, eg. Must include the swagger basePath. If provided, this is returned by the API in the aggregation response. It is not used internally by Protari.

So your users don't get confused, this base url should match that served by the API, which is set independently in protari-api's, eg:

app.add_api(swagger_spec, base_path='/api/v1', ...


This is information for the user, and is returned in the relevant API outputs.


This is information for the user, and is returned in all json-formatted API output (not sdmx-json).


This is intended for user interfaces to display terms and conditions of using the API. It is returned in all json-formatted API output (not sdmx-json).


By default only the 'aggregation' query class is defined, but there is the flexibility to add your own.


The names of the functions allowed for this query class, eg.

        - sum
        - mean
        - count

By default, only 'count' is allowed.


The maximum number of functions that a user can request in a single query. Default 1.


References to custom function types defined in python can be added here, eg.



All the operations that are needed to take a query from the user and return output to them are called "transforms". They include, but are not limited to:

The available transforms must be listed in the settings file. In the example above, the included sql_tbe pipeline does this. It provides the references to the functions which perform each of the above tasks, and some sensible default parameters for them. These parameters and references can be supplemented (and/or overridden) in the settings file, and also in each dataset configuration file. For example, if all your datasets' data sit on the same SQL database, it makes sense to provide a single global connection URL to this database in the settings file, rather than repeat it for each dataset. On the other hand, perturbation parameters may differ between datasets, so should be defined there.

You will need to provide additional settings to use these transforms defined in the SQL TBE pipeline:

While it is possible to define the numerical TBE perturbation parameters in the settings file too, don't do this in settings.yaml! That's because the yaml and json file readers treat floating point numbers slightly differently: eg. the json reader treats 0.55 as the exact decimal 0.55, while the yaml reader represents it to machine precision, which is accurate to about 10^-17. For this reason, yaml dataset configuration files are not supported either.


Requires a database "url", which you should set to the connection string of the database containing the data.

This interface uses SQLAlchemy to interface with the SQL database. This works with most SQL dialects, including Oracle, MS-SQL, MySQL, sqlite and Postgres.

The API issues SQL "SELECT" statements to read from the database. It requires the FROM, AS, WHERE, GROUP BY, and ORDER BY clauses, and the COUNT and SUM functions. For mean and sum perturbations using the TBE algorithm, it also requires ROW_NUMBER, PARTITION BY and OVER, which are available in Postgres and Oracle, but not sqlite.

You can also define max_custom_groups to override the default 160; see numeric data fields for more information.


The TBE perturbation algorithm requires large matrices to perform its perturbation, which should be stored as csv files. Set path to the path of the directory containing these matrix files.


There are three auth interfaces provided by Protari.


NoAuthInterface is the default interface, used if none is provided in the settings file.

It treats all non-empty Authorization headers and key URL parameters as invalid.


This is a simple database key lookup, with reference protari.auth_interface.db_auth.DatabaseAuthInterface. To use this, specify the database connection string in url, using the same approach as for the SQL interface.

In addition, you can optionally specify the following further parameters:

The API user supplies either an Authorization header with Protari key= preceding the key, eg.

curl -X GET --header 'Accept: application/json' --header 'Authorization: Protari key=abc123' 'http://localhost:8080/v1/datasets/'

Currently, you can alternatively provide the auth key using the key query parameter, eg. ?key=abc123, but this is deprecated and may be removed in a future version.

Note that Protari does not provide any mechanisms for generating, maintaining or updating auth keys.

It is not recommended to use this method in production.


This verifies an encrypted JSON web token (eg. see this introduction), using the OpenID Connect standard. The auth interface has reference protari.auth_interface.jwt_auth.JWTAuthInterface.

You should specify the following parameters:

You should configure your OpenID Connect identity provider to return permissions in the claim with the name given in permissions_name. These permissions should be a list, with each element being either:

  1. A colon-separated string, eg. <op>:<dataset>:<limit>, with the meaning of each part given by permission_string_keys (see above). Eg. *:secret:50.
  2. An object with op and dataset keys at least, and optional keyword arguments for the permissions post-processor, eg. {"op": "*", "dataset": "secret", "limit": 50}.

In each case, op gives the query class (eg. aggregation), or * to match all query classes, or an empty string "" to match only metadata queries.

A sample value for json_web_key_set would be (with the <...>s replaced with long strings):

        - alg: RS256
          kty: RSA
          use: sig
          n: <...>
          e: AQAB
          kid: <...>

Here is an example that uses a jwks_url, a permissions post processor to limit the number of queries each user can ever make, and a permission_string_keys setting so that these limits can be given in the token in the string format <op>:<dataset>:<limit>:

  reference: protari.auth_interface.jwt_auth.JWTAuthInterface
    - RS256
    - op
    - dataset
    - limit
      - reference: protari_api.sql_query_limiter.SqlQueryLimiter
          url: postgresql:///protari_demo
          table_name: query_limit

For more details see the JWT standard.


The SqlQueryLimiter permissions post-processor can take the following parameters:

      - reference: protari_api.sql_query_limiter.SqlQueryLimiter
          url: postgresql:///protari_demo
          exceeded_limit_message: You have reached the maximum number of queries allowed on this dataset  # the default
          column_name_mapping:  # the default is shown here; the values of each property are the database table columns.
            user_id: user
            query_class_name: query_class
            dataset_name: dataset
            value: value
          table_name: query_limit
          engine_parameters:  # extra parameters to send to the sql engine
            echo: false

The possible engine parameters are described here.


The API's aggregation-related endpoints can be rate limited per authenticated user, per dataset. Datasets that can be queried without authentication cannot be rate limited in this way, as they do not require users to identify themselves.

The format for this specification is:

  storage_uri: "memory://"  # the default
  headers_enabled: false    # the default
  strategy: fixed-window    # the default
        - 100/second
        - 5/60 seconds
        - 8/2 minutes
        - 3/60 seconds

Rate limiting is handled by the flask-limiter library - see its documentation for further information. In particular, redis (redis://host:port) or memcached (memcached://host:port) can be used to store usage data.

The limits under limits.aggregation.public are applied to dataset that does not require permission to query. If no public limit is specified, Protari uses 1 million queries per second as the limit. This limit applies across all unauthenticated users (and per user to authenticated users).

All aggregation queries that require authentication are rate-limited by the list under default. The limit applies per user for each dataset, and applies across all the aggregation endpoints (ie. /aggregation, /aggregation/csv and /aggregation/sdmx-json). In addition, any query containing a semi-colon is further rate limited by those listed under semicolon.

Note: semicolon is an experimental option and may be refined in a future release.


The example above finishes by including settings from other files, sql_tbe_pipeline and sql_laplace_pipeline. If no path is provided, these are assumed to be yaml files located in Protari's protari/settings/includes/ directory.

The SQL TBE pipeline was discussed further under transform.

Using the Protari library without the API

The above description applies to settings files used by the Protari API. The underlying Protari library only recognizes query_class_definitions, transform_definitions and includes.