The Standard SQL TBE Pipeline

If you are using TBE perturbation with a SQL database, it is recommended that you simply include sql_tbe_pipeline in your settings file.

This is the same as adding the following definitions to your settings file:

transform_definitions:
  sql_tbe_pipeline:
    reference: protari.transform.Compose
    parameters:
      transform_sequence:
      - parse_aggregation_query
      - check_query_size
      - check_field_exclusions
      - tbe_replace
      - get_aggregated_sql_data
      - get_sql_top_rows
      - check_sparsity
      - pad
      - tbe_perturb
      - finish
  parse_aggregation_query:
    reference: protari.transform.ParseAggregationQuery
  get_aggregated_sql_data:
    reference: protari.transform.GetAggregatedSqlData
  pad_if_requested:
    reference: protari.transform.Pad
    parameters:
      allow_query_override: true
  tbe_perturb:
    reference: protari.transform.TBEPerturb
  get_sql_top_rows:
    reference: protari.transform.AddFunctionData
    parameters:
      function_types:
      - sum
      - mean
      - sum_product
      - mean_product
      reference: protari.transform.get_aggregated_data.get_aggregated_flat_sql_data.get_top_rows_by_abs_value
      parameters:
        record_key_name: "{tbe_perturb.record_key_name}"
        value_tiebreaker_name: "{tbe_perturb.record_key_name}"
  check_query_size:
    reference: protari.transform.CheckQuerySize
  check_field_exclusions:
    reference: protari.transform.CheckFieldExclusions
  check_sparsity:
    reference: protari.transform.CheckSparsity
  tbe_replace:
    reference: protari.transform.ReplaceFunction
    parameters:
      function_mapping:
        count:
        - count {}
        - unweighted_count {}         # Required for TBE perturbation and sparsity check.
        - unweighted_sum {tbe_perturb.record_key_name}  # Required for TBE perturbation.
        sum:
        - sum {}                      # Could use either {} or {0}, since only one argument.
        - count {}                    # Required for TBE perturbation.
        - unweighted_count {}         # Required for TBE perturbation and sparsity check.
        - unweighted_sum {tbe_perturb.record_key_name}  # Required for TBE perturbation.
        mean:
        - sum {}                      # Required for TBE perturbation.
        - count {}                    # Required for TBE perturbation.
        - unweighted_count {}         # Required for TBE perturbation and sparsity check.
        - unweighted_sum {tbe_perturb.record_key_name}  # Required for TBE perturbation.
includes:
- finish

You can see that finish is another standard pipeline provided and used by sql_tbe_pipeline. If you define your own custom aggregation pipeline, you should still finish it with finish. This removes hidden fields and rows with missing perturbed values, rounds the perturbed results, adds totals if requested, and formats the results as json, csv or SDMX-JSON.

transform_definitions:
  finish:
    reference: protari.transform.Compose
    parameters:
      transform_sequence:
      - round
      - remove_hidden
      - remove_none
      - format_rows
  remove_hidden:
    reference: protari.transform.RemoveHidden
  remove_none:
    reference: protari.transform.RemoveNone
  round:
    reference: protari.transform.Round
  format_rows:
    reference: protari.transform.FormatRows

Key points

There's a lot to the pipeline. In particular, the TBE algorithm needs to know:

The remove_hidden and remove_none transforms are especially important, so that no unperturbed data is returned to the user.

Transforms can refer to the parameters of other transforms via the {transform_name.parameter_name} syntax, eg. {tbe_perturb.record_key_name}.

The names given to each transform by this pipeline are arbitrary, but your settings and dataset configuration files need to use these names to change or add parameters to them.