The Standard SQL TBE Pipeline
If you are using TBE perturbation with a SQL database,
it is recommended that you simply include sql_tbe_pipeline
in your settings file.
This is the same as adding the following definitions to your settings file:
transform_definitions:
sql_tbe_pipeline:
reference: protari.transform.Compose
parameters:
transform_sequence:
- parse_aggregation_query
- check_query_size
- check_field_exclusions
- get_sql_top_rows
- tbe_replace
- get_aggregated_sql_data
- check_sparsity
- pad_if_requested
- tbe_perturb
- finish
parse_aggregation_query:
reference: protari.transform.ParseAggregationQuery
get_aggregated_sql_data:
reference: protari.transform.GetAggregatedSqlData
pad_if_requested:
reference: protari.transform.Pad
parameters:
allow_query_override: true
tbe_perturb:
reference: protari.transform.TBEPerturb
get_sql_top_rows:
reference: protari.transform.AddFunctionData
parameters:
function_types:
- sum
- mean
- sum_product
- mean_product
reference: protari.transform.get_aggregated_data.get_aggregated_flat_sql_data.get_top_rows_by_abs_value
parameters:
record_key_name: "{tbe_perturb.record_key_name}"
value_tiebreaker_name: "{tbe_perturb.record_key_name}"
check_query_size:
reference: protari.transform.CheckQuerySize
check_field_exclusions:
reference: protari.transform.CheckFieldExclusions
check_sparsity:
reference: protari.transform.CheckSparsity
tbe_replace:
reference: protari.transform.Replace
parameters:
function_mapping:
count:
- count {}
- unweighted_count {} # Required for TBE perturbation and sparsity check.
- unweighted_sum {tbe_perturb.record_key_name} # Required for TBE perturbation.
sum:
- sum {} # Could use either {} or {0}, since only one argument.
- count {} # Required for TBE perturbation.
- unweighted_count {} # Required for TBE perturbation and sparsity check.
- unweighted_sum {tbe_perturb.record_key_name} # Required for TBE perturbation.
mean:
- sum {} # Required for TBE perturbation.
- count {} # Required for TBE perturbation.
- unweighted_count {} # Required for TBE perturbation and sparsity check.
- unweighted_sum {tbe_perturb.record_key_name} # Required for TBE perturbation.
includes:
- finish
You can see that finish
is another standard pipeline provided and used by sql_tbe_pipeline
.
If you define your own custom aggregation pipeline, you should still finish it with finish
.
This removes hidden fields and rows with missing perturbed values, rounds the perturbed results,
adds totals if requested, and formats the results as json, csv or SDMX-JSON.
transform_definitions:
finish:
reference: protari.transform.Compose
parameters:
transform_sequence:
- round
- remove_hidden
- remove_none
- add_totals
- format_rows
remove_hidden:
reference: protari.transform.RemoveHidden
remove_none:
reference: protari.transform.RemoveNone
round:
reference: protari.transform.Round
add_totals:
reference: protari.transform.AddTotals
format_rows:
reference: protari.transform.FormatRows
Key points
There's a lot to the pipeline. In particular, the TBE algorithm needs to know:
- The record key sum and unweighted counts – achieved by the
tbe_replace
transform. - The weighted sum and count, even for means – again achieved by
tbe_replace
. - The top contributing rows to perturb means and sums – achieved by
sql_top_rows
.
The remove_hidden
and remove_none
transforms are especially important, so that no unperturbed data is
returned to the user.
Transforms can refer to the parameters of other transforms via the {transform_name.parameter_name}
syntax,
eg. {tbe_perturb.record_key_name}
.
The names given to each transform by this pipeline are arbitrary, but your settings and dataset configuration files need to use these names to change or add parameters to them.