Guide to Using the API

The best way to explore the capabilities of the API is to use the automatically-generated OpenAPI interface at http://localhost:8080/v1/ui/. (In this section, all links will assume the API has been deployed at the URL http://localhost:8080, which is the default during development.)

Some more specific examples based on the sample datasets provided by protari-sample are described below.

All requests can include an Authorization header to access datasets with authorization requirements, eg. with the database auth interface:

curl -X GET --header 'Accept: application/json' --header 'authorization: Protari key=abc123' 'http://localhost:8080/v1/datasets/'

To use Protari with OpenID Connect, you will need a token from the authentication provider, and provide it in place of TOKEN in the following:

curl -X GET --header 'Accept: application/json' --header 'authorization: Bearer TOKEN' 'http://localhost:8080/v1/datasets/'

The sample datasets do not require any authorization.

/about

The http://localhost:8080/v1/about returns just the "header" property that is included at every endpoint:

{
  "header": {
    "api_version": "1.2.0",
    "base_url": "https://protari.example.com/v1",
    "data_terms": "This data was generated using Protari. You can share it according to these [terms and conditions](http://www.example.com/terms-of-sharing-data). Protari perturbs (slightly adjusts) data outputs to protect privacy. No reliance should be placed on data cells with small values.",
    "organization": {
      "name": "Sample",
      "title": "Sample organization"
    },
    "prepared_at": "2019-02-14T03:15:16.868564Z",
    "terms": "By using Protari you acknowledge that you are authorised to do so and your use will comply with these [terms and conditions](http://www.example.com/terms-of-use). All activity is logged and may be audited."
  }
}

/datasets/

The http://localhost:8080/v1/datasets/ endpoint returns a json-formatted list of the available datasets, with high-level metadata, that are visible to all users.

Eg.

{
  "datasets": [
    {
      "description": "A sample household-level dataset with <a href='http://www.example.com/joined' target='_blank'>joined</a> tables",
      "main_resource_name": "household",
      "name": "joined_household",
      "resources": [
        {
          "description": "Some detail about the transaction level data",
          "name": "transaction",
          "notes": "And some notes about the transaction level data",
          "title": "Transaction",
          "unit": {
            "plural": "transactions",
            "singular": "transaction"
          }
        },
        {
          "description": "Some detail about the person level data",
          "name": "person",
          "title": "Person",
          "unit": {
            "plural": "people",
            "singular": "person"
          }
        },
        {
          "description": "Some detail about the item level data",
          "name": "item",
          "title": "Item",
          "unit": {
            "plural": "items",
            "singular": "item"
          }
        },
        {
          "description": "Some detail about the [household](http://www.example.com/household) level data",
          "name": "household",
          "title": "Household",
          "unit": {
            "plural": "households",
            "singular": "household"
          }
        }
      ],
      "title": "Sample household dataset with joined data",
      "unit": {
        "plural": "households",
        "singular": "household"
      }
    },
    {
      "description": "A sample person-level dataset with <a href='http://www.example.com/joined' target='_blank'>joined</a> tables",
      "main_resource_name": "person",
      "name": "joined_person",
      "resources": [
        {
          "description": "Some detail about the transaction level data",
          "name": "transaction",
          "notes": "And some notes about the transaction level data",
          "title": "Transaction",
          "unit": {
            "plural": "transactions",
            "singular": "transaction"
          }
        },
        {
          "description": "Some detail about the person level data",
          "name": "person",
          "title": "Person",
          "unit": {
            "plural": "people",
            "singular": "person"
          }
        },
        {
          "description": "Some detail about the item level data",
          "name": "item",
          "title": "Item",
          "unit": {
            "plural": "items",
            "singular": "item"
          }
        },
        {
          "description": "Some detail about the [household](http://www.example.com/household) level data",
          "name": "household",
          "title": "Household",
          "unit": {
            "plural": "households",
            "singular": "household"
          }
        }
      ],
      "title": "Sample person dataset with joined data",
      "unit": {
        "plural": "people",
        "singular": "person"
      }
    },
    {
      "description": "This dataset contains sample longitudinal data.",
      "name": "sample_longitudinal",
      "resources": [
        {
          "name": "person",
          "title": "person",
          "unit": {
            "plural": "people",
            "singular": "person"
          }
        }
      ],
      "title": "Sample longitudinal dataset",
      "unit": {
        "plural": "people",
        "singular": "person"
      }
    },
    {
      "description": "This dataset contains sample data, perturbed using Laplace-distributed noise.",
      "name": "sample_laplace",
      "resources": [
        {
          "name": "person",
          "title": "person",
          "unit": {
            "plural": "people",
            "singular": "person"
          }
        }
      ],
      "title": "Sample dataset with Laplace noise",
      "unit": {
        "plural": "people",
        "singular": "person"
      }
    },
    {
      "description": "A high-level description is available [here](http://www.example.com/sample)",
      "name": "sample",
      "notes": "For detailed notes on this dataset, see http://www.example.com/ or <a href='http://www.example.com/notes'>here</a>",
      "resources": [
        {
          "name": "person",
          "title": "person",
          "unit": {
            "plural": "people",
            "singular": "person"
          }
        }
      ],
      "title": "Sample dataset",
      "unit": {
        "plural": "people",
        "singular": "person"
      }
    }
  ],
  "header": {
    "api_version": "1.2.0",
    "base_url": "http://localhost:8080/v1",
    "data_terms": "This data was generated using Protari. You can share it according to these [terms and conditions](http://www.example.com/terms-of-sharing-data). Protari perturbs (slightly adjusts) data outputs to protect privacy. No reliance should be placed on data cells with small values.",
    "organization": {
      "name": "Sample",
      "title": "Sample organization"
    },
    "prepared_at": "2019-02-14T03:18:11.702742Z",
    "terms": "By using Protari you acknowledge that you are authorised to do so and your use will comply with these [terms and conditions](http://www.example.com/terms-of-use). All activity is logged and may be audited."
  }
}

The name fields above are the names you can use to refer to datasets in subsequent API calls.

/datasets/{dataset_name}

Place a specific dataset name at the end of the http://localhost:8080/v1/datasets/ endpoint to request the full metadata (including the accessible fields and values) for that dataset.

The output for http://localhost:8080/v1/datasets/sample is:

{
  "description": "A high-level description is available [here](http://www.example.com/sample)",
  "header": {
    "api_version": "1.3.0",
    "base_url": "http://localhost:8080/v1",
    "data_terms": "This data was generated using Protari. You can share it according to these [terms and conditions](http://www.example.com/terms-of-sharing-data). Protari perturbs (slightly adjusts) data outputs to protect privacy. No reliance should be placed on data cells with small values.",
    "organization": {
      "name": "Sample",
      "title": "Sample organization"
    },
    "prepared_at": "2019-02-14T03:29:23.236696Z",
    "terms": "By using Protari you acknowledge that you are authorised to do so and your use will comply with these [terms and conditions](http://www.example.com/terms-of-use). All activity is logged and may be audited."
  },
  "main_resource_name": "person",
  "name": "sample",
  "notes": "For detailed notes on this dataset, see http://www.example.com/ or <a href='http://www.example.com/notes'>here</a>",
  "query_classes": {
    "aggregation": {
      "allow_disjunctions_and_custom_groups": true,
      "allowed_functions": [
        {
          "name": "count",
          "parameter_types": [
            {
              "description": "A field whose sentinel values should be excluded from the count.",
              "is_required": false
            }
          ]
        },
        {
          "name": "mean",
          "parameter_types": [
            {
              "description": "The field whose non-sentinel values are to be averaged.",
              "is_required": true,
              "requires_numeric_data": true
            }
          ]
        },
        {
          "name": "sum",
          "parameter_types": [
            {
              "description": "The field whose non-sentinel values are to be summed.",
              "is_required": true,
              "requires_numeric_data": true
            }
          ]
        }
      ],
      "field_exclusion_rules": [
        {
          "limit": 4
        },
        {
          "fields": [
            {
              "name": "AMOUNT"
            },
            {
              "name": "COUNTRY"
            },
            {
              "name": "POSTCODE"
            }
          ],
          "limit": 2
        }
      ],
      "max_functions": 2,
      "maximum_output_values_length": 2000
    }
  },
  "relationships": [],
  "resources": [
    {
      "fields": [
        {
          "name": "SEX",
          "title": "Sex",
          "type": "string",
          "values": [
            {
              "name": "F",
              "title": "Female"
            },
            {
              "name": "M",
              "title": "Male"
            },
            {
              "is_sentinel": true,
              "name": "",
              "title": "Not stated"
            }
          ]
        },
        {
          "name": "COUNTRY",
          "title": "Country of Birth",
          "type": "integer",
          "values": [
            {
              "name": "1000",
              "title": "Oceania",
              "values": [
                {
                  "name": "1100",
                  "title": "Australia (including External Territories)",
                  "values": [
                    {
                      "name": "1101",
                      "title": "Australia"
                    },
                    {
                      "name": "1102",
                      "title": "Norfolk Island"
                    },
                    {
                      "name": "1199",
                      "title": "Australian External Territories"
                    }
                  ]
                },
                {
                  "name": "1201",
                  "title": "New Zealand"
                },
                {
                  "name": "1300",
                  "title": "Melanesia",
                  "values": [
                    {
                      "name": "1301",
                      "title": "New Caledonia"
                    },
                    {
                      "name": "1302",
                      "title": "Papua New Guinea"
                    },
                    {
                      "name": "1303",
                      "title": "Solomon Islands"
                    },
                    {
                      "name": "1304",
                      "title": "Vanuatu"
                    }
                  ]
                }
              ]
            },
            {
              "name": "9999",
              "title": "Other"
            },
            {
              "is_sentinel": true,
              "name": "null",
              "title": "Not stated"
            }
          ]
        },
        {
          "fields": [
            {
              "description": "Type of dwelling of usual residence",
              "name": "DWELL",
              "title": "Dwelling Type",
              "type": "string",
              "values": [
                {
                  "name": "001",
                  "title": "House"
                },
                {
                  "name": "002",
                  "title": "Apartment"
                },
                {
                  "name": "003",
                  "title": "Other"
                }
              ]
            },
            {
              "can_be_above_maximum": true,
              "can_be_below_minimum": false,
              "exclusive_maximum": 8,
              "has_numeric_data": true,
              "has_range_data": true,
              "maximum": 7,
              "minimum": 0,
              "name": "ROOMS",
              "title": "Number of Bedrooms",
              "type": "integer",
              "values": [
                {
                  "is_sentinel": true,
                  "name": "null",
                  "title": "Not stated"
                },
                {
                  "is_sentinel": true,
                  "name": "999",
                  "title": "Not applicable"
                },
                {
                  "name": "0",
                  "title": "None (including bedsits)",
                  "value": 0
                },
                {
                  "name": "1",
                  "title": "1",
                  "value": 1
                },
                {
                  "name": "2",
                  "title": "2",
                  "value": 2
                },
                {
                  "name": "3",
                  "title": "3",
                  "value": 3
                },
                {
                  "name": "4",
                  "title": "4",
                  "value": 4
                },
                {
                  "name": "5",
                  "title": "5",
                  "value": 5
                },
                {
                  "name": "6",
                  "title": "6",
                  "value": 6
                },
                {
                  "name": "7",
                  "title": "7",
                  "value": 7
                },
                {
                  "exclusions": [
                    999
                  ],
                  "minimum": 8,
                  "name": ">=8",
                  "title": ">=8"
                }
              ]
            }
          ],
          "title": "Dwelling"
        },
        {
          "description": "Postcode of usual residence",
          "name": "POSTCODE",
          "region_type": "poa",
          "title": "Postcode",
          "type": "string",
          "values": [
            {
              "name": "6722"
            },
            {
              "name": "6725"
            },
            {
              "name": "0872"
            },
            {
              "name": "6762"
            },
            {
              "name": "6642"
            },
            {
              "name": "6710"
            },
            {
              "name": "6429"
            }
          ]
        },
        {
          "can_be_above_maximum": true,
          "can_be_below_minimum": true,
          "exclusive_maximum": 100,
          "has_numeric_data": true,
          "has_range_data": true,
          "maximum": 99.99,
          "minimum": 0,
          "name": "AMOUNT",
          "precision": 0.01,
          "title": "Dollar Amount",
          "type": "number",
          "values": [
            {
              "exclusive_maximum": 0,
              "maximum": -0.01,
              "name": "<0",
              "title": "<0"
            },
            {
              "exclusive_maximum": 10,
              "maximum": 9.99,
              "minimum": 0,
              "name": "0-9.99",
              "title": "0-9.99"
            },
            {
              "exclusive_maximum": 20,
              "maximum": 19.99,
              "minimum": 10,
              "name": "10-19.99",
              "title": "10-19.99"
            },
            {
              "exclusive_maximum": 30,
              "maximum": 29.99,
              "minimum": 20,
              "name": "20-29.99",
              "title": "20-29.99"
            },
            {
              "exclusive_maximum": 40,
              "maximum": 39.99,
              "minimum": 30,
              "name": "30-39.99",
              "title": "30-39.99"
            },
            {
              "exclusive_maximum": 50,
              "maximum": 49.99,
              "minimum": 40,
              "name": "40-49.99",
              "title": "40-49.99"
            },
            {
              "exclusive_maximum": 60,
              "maximum": 59.99,
              "minimum": 50,
              "name": "50-59.99",
              "title": "50-59.99"
            },
            {
              "exclusive_maximum": 70,
              "maximum": 69.99,
              "minimum": 60,
              "name": "60-69.99",
              "title": "60-69.99"
            },
            {
              "exclusive_maximum": 80,
              "maximum": 79.99,
              "minimum": 70,
              "name": "70-79.99",
              "title": "70-79.99"
            },
            {
              "exclusive_maximum": 90,
              "maximum": 89.99,
              "minimum": 80,
              "name": "80-89.99",
              "title": "80-89.99"
            },
            {
              "exclusive_maximum": 100,
              "maximum": 99.99,
              "minimum": 90,
              "name": "90-99.99",
              "title": "90-99.99"
            },
            {
              "minimum": 100,
              "name": ">=100",
              "title": ">=100"
            }
          ]
        },
        {
          "can_be_above_maximum": true,
          "can_be_below_minimum": false,
          "exclusive_maximum": "2021-01-01",
          "has_range_data": true,
          "maximum": "2020-12-31",
          "minimum": "1970-01-01",
          "name": "EXPIRY",
          "precision": "P1D",
          "title": "Expiry date",
          "type": "date",
          "values": [
            {
              "is_sentinel": true,
              "name": "null",
              "title": "Unknown"
            },
            {
              "exclusive_maximum": "1975-01-01",
              "maximum": "1974-12-31",
              "minimum": "1970-01-01",
              "name": "1970-01-01/1974-12-31",
              "title": "1970-01-01 to 1974-12-31"
            },
            {
              "exclusive_maximum": "1980-01-01",
              "maximum": "1979-12-31",
              "minimum": "1975-01-01",
              "name": "1975-01-01/1979-12-31",
              "title": "1975-01-01 to 1979-12-31"
            },
            {
              "exclusive_maximum": "1985-01-01",
              "maximum": "1984-12-31",
              "minimum": "1980-01-01",
              "name": "1980-01-01/1984-12-31",
              "title": "1980-01-01 to 1984-12-31"
            },
            {
              "exclusive_maximum": "1990-01-01",
              "maximum": "1989-12-31",
              "minimum": "1985-01-01",
              "name": "1985-01-01/1989-12-31",
              "title": "1985-01-01 to 1989-12-31"
            },
            {
              "exclusive_maximum": "1995-01-01",
              "maximum": "1994-12-31",
              "minimum": "1990-01-01",
              "name": "1990-01-01/1994-12-31",
              "title": "1990-01-01 to 1994-12-31"
            },
            {
              "exclusive_maximum": "2000-01-01",
              "maximum": "1999-12-31",
              "minimum": "1995-01-01",
              "name": "1995-01-01/1999-12-31",
              "title": "1995-01-01 to 1999-12-31"
            },
            {
              "exclusive_maximum": "2005-01-01",
              "maximum": "2004-12-31",
              "minimum": "2000-01-01",
              "name": "2000-01-01/2004-12-31",
              "title": "2000-01-01 to 2004-12-31"
            },
            {
              "exclusive_maximum": "2010-01-01",
              "maximum": "2009-12-31",
              "minimum": "2005-01-01",
              "name": "2005-01-01/2009-12-31",
              "title": "2005-01-01 to 2009-12-31"
            },
            {
              "exclusive_maximum": "2015-01-01",
              "maximum": "2014-12-31",
              "minimum": "2010-01-01",
              "name": "2010-01-01/2014-12-31",
              "title": "2010-01-01 to 2014-12-31"
            },
            {
              "exclusive_maximum": "2020-01-01",
              "maximum": "2019-12-31",
              "minimum": "2015-01-01",
              "name": "2015-01-01/2019-12-31",
              "title": "2015-01-01 to 2019-12-31"
            },
            {
              "exclusive_maximum": "2021-01-01",
              "maximum": "2020-12-31",
              "minimum": "2020-01-01",
              "name": "2020-01-01/2020-12-31",
              "title": "2020-01-01 to 2020-12-31"
            },
            {
              "minimum": "2021-01-01",
              "name": ">=2021-01-01",
              "title": ">=2021-01-01"
            }
          ]
        }
      ],
      "name": "person",
      "title": "person",
      "unit": {
        "plural": "people",
        "singular": "person"
      }
    }
  ],
  "title": "Sample dataset",
  "unit": {
    "plural": "people",
    "singular": "person"
  }
}

The field names and value names are the names you can use to refer to fields and values in subsequent API calls.

You can flatten the fields hierarchy, the value hierarchy, or both, by passing flatten=fields, values or all respectively.

To suppress all values from the output, pass show_values=false. (In this case flatten=values has no effect.)

Relationships for Multi-Table Datasets

When a dataset has multiple tables, the relationships field will be populated with an array of relationship objects specifying how resources may be joined. The structure of this object is:

You can use a .-separated list of relationship names to connect the main resource to another resource. Eg. the joined_person dataset in the protari-sample repository has main_resource person, with the following relationships defined:

{
  ...
  "relationships": [
    {
      "host_resource_name": "person",
      "name": "buyer_transactions",
      "target_resource_name": "transaction",
      "title": "Buyer Transactions"
    },
    {
      "host_resource_name": "person",
      "name": "seller_transactions",
      "target_resource_name": "transaction",
      "title": "Seller Transactions"
    },
    {
      "host_resource_name": "person",
      "name": "household",
      "target_resource_name": "household"
    },
    {
      "host_resource_name": "transaction",
      "name": "item",
      "target_resource_name": "item"
    }
  ],
  ....
}

As a result:

Note the joined_household dataset presents the same data at the household level instead of the person level.

/datasets/{dataset_name}/fields/{field_name}

Use this endpoint to get metadata (including the allowed values) about a specific field. You can optionally pass flatten=values as for the previous endpoint.

For datasets with more than one data resource, you can retrieve metadata about a field that is not on the main resource by specifying the fields name as {resource name}.{field name}. For example a field with the name "type" on a data resource with the name "transaction" would be specified as type.transaction. The names of fields on the main resource should not include the resource name even when a dataset has multiple data resources.

Eg. The output for http://localhost:8080/v1/datasets/sample/fields/SEX is:

{
  "field": {
    "name": "SEX",
    "title": "Sex",
    "type": "string",
    "values": [
      {
        "name": "F",
        "title": "Female"
      },
      {
        "name": "M",
        "title": "Male"
      },
      {
        "is_sentinel": true,
        "name": "",
        "title": "Not stated"
      }
    ]
  },
  "header": {
    "api_version": "1.2.0",
    "base_url": "http://localhost:8080/v1",
    "data_terms": "This data was generated using Protari. You can share it according to these [terms and conditions](http://www.example.com/terms-of-sharing-data). Protari perturbs (slightly adjusts) data outputs to protect privacy. No reliance should be placed on data cells with small values.",
    "organization": {
      "name": "Sample",
      "title": "Sample organization"
    },
    "prepared_at": "2019-02-14T03:32:18.331244Z",
    "terms": "By using Protari you acknowledge that you are authorised to do so and your use will comply with these [terms and conditions](http://www.example.com/terms-of-use). All activity is logged and may be audited."
  }
}

/datasets/{dataset_name}/aggregation

The http://localhost:8080/v1/datasets/{dataset_name}/aggregation URL is the workhorse of the API, letting you ask for aggregated data from the dataset.

With no further parameters, at this URL the API returns a perturbed count of all fields in the dataset in json format, broken down by year, eg. http://localhost:8080/v1/datasets/sample_longitudinal/aggregation:

{
  "fields": [
    {
      "as_string": "YEAR",
      "is_longitudinal": true,
      "name": "YEAR",
      "title": "Year"
    },
    {
      "as_string": "perturbed count",
      "function": {
        "name": "count"
      },
      "is_perturbed": true,
      "is_result": true,
      "name": "perturbed count",
      "title": "perturbed count"
    }
  ],
  "header": {
    "api_version": "1.2.0",
    "base_url": "http://localhost:8080/v1",
    "data_terms": "This data was generated using Protari. You can share it according to these [terms and conditions](http://www.example.com/terms-of-sharing-data). Protari perturbs (slightly adjusts) data outputs to protect privacy. No reliance should be placed on data cells with small values.",
    "organization": {
      "name": "Sample",
      "title": "Sample organization"
    },
    "prepared_at": "2019-02-14T03:32:46.010943Z",
    "terms": "By using Protari you acknowledge that you are authorised to do so and your use will comply with these [terms and conditions](http://www.example.com/terms-of-use). All activity is logged and may be audited."
  },
  "query": {
    "as_string": "function=count",
    "dataset": {
      "resources": [
        {
          "name": "person",
          "title": "person",
          "unit": {
            "plural": "people",
            "singular": "person"
          }
        }
      ]
    },
    "function": [
      {
        "as_string": "count",
        "type": {
          "name": "count",
          "parameter_types": [
            {
              "description": "A field whose sentinel values should be excluded from the count.",
              "is_required": false
            }
          ]
        }
      }
    ]
  },
  "values": [
    [
      "2000",
      204
    ],
    [
      "2005",
      177
    ]
  ]
}

You can request this in csv format by appending /csv, with the result:

YEAR,perturbed count
2000,204
2005,177

The SDMX-JSON format is also available by appending /sdmx-json.

To make more specific queries, you simply add query parameters to the URL, starting with ? and separating them with &, eg.

http://localhost:8080/v1/datasets/sample_longitudinal/aggregation?group_by=DWELL&where=SEX=M

The results are confidentialised according to the dataset's configuration.

group by

Use this to show the results broken down according to the value in a particular field (or fields). Supply as a comma-separated list of field names, eg.

http://localhost:8080/v1/datasets/sample_longitudinal/aggregation/csv?group_by=DWELL,SEX

returns:

DWELL,SEX,YEAR,perturbed count
001,F,2000,18
001,F,2005,24
001,M,2000,18
001,M,2005,21
001,,2000,15
001,,2005,16
002,F,2000,20
002,F,2005,20
002,M,2000,23
002,M,2005,23
002,,2000,22
002,,2005,16
003,F,2000,27
003,F,2005,17
003,M,2000,25
003,M,2005,11
003,,2000,16
003,,2005,23

The first line tells you there were approximately 18 females dwelling in houses (DWELL "001") in 2000, etc.

Notice that the API has automatically grouped by the longitudinal field, YEAR.

You can also group by numeric fields, eg.

http://localhost:8080/v1/datasets/sample/aggregation/csv?group_by=AMOUNT

returns:

AMOUNT,perturbed count
<0,0
0-9.99,31
10-19.99,14
20-29.99,29
30-39.99,17
40-49.99,12
50-59.99,24
60-69.99,20
70-79.99,11
80-89.99,23
90-99.99,19
>=100,0

and

http://localhost:8080/v1/datasets/sample/aggregation/csv?group_by=ROOMS

returns:

ROOMS,perturbed count
null,5
999,12
0,7
1,15
2,15
3,25
4,46
5,30
6,9
7,5
>=8,30

Custom groups

Custom groups are only available if the allow_disjunctions_and_custom_groups flag has been set to true (it is false by default). When they are available, you can specify custom groups for group-by fields, by using colons to separate groups and semi-colons to separate values within the groups. Eg:

http://localhost:8080/v1/datasets/sample/aggregation?group_by=POSTCODE:6722;6725:6429;6710

This returns three perturbed counts: those in postcodes 6722 or 6725; those in postcodes 6429 or 6710; and those who live in any other postcode:

{
  ...
  "values": [
    [
      "6722;6725",
      70
    ],
    [
      "6429;6710",
      57
    ],
    [
      "other",
      75
    ]
  ]
}

For integer categorical fields, you can replace sequences such as 0;1;2;3 with 0-3. The COUNTRY field on the sample dataset is a good example:

http://localhost:8080/v1/datasets/sample/aggregation?group_by=COUNTRY:1101-1201

From the output, you will see that Protari has treated this as the same as 1101;1102;1199;1201:

{
  "fields": [
    {
      "as_string": "COUNTRY:1101-1201",
      "custom_groups": {
        "else": {
          "name": "other"
        },
        "groups": [
          {
            "name": "1101-1201",
            "values": [
              {
                "as_string": "1101"
              },
              {
                "as_string": "1102"
              },
              {
                "as_string": "1199"
              },
              {
                "as_string": "1201"
              }
            ]
          }
        ]
      },
      "name": "COUNTRY",
      "title": "Country of Birth"
    },
    {
      "as_string": "perturbed count",
      "function": {
        "name": "count"
      },
      "is_perturbed": true,
      "is_result": true,
      "name": "perturbed count",
      "title": "perturbed count"
    }
  ],
  "values": [
    [
      "1101-1201",
      127
    ],
    [
      "other",
      69
    ]
  ],
  ...
}

You can even refer to hierarchical values in custom groups, eg. group_by=COUNTRY:1100:1201;1300.

On range data fields, you can use the full range syntax to define the custom groups, eg:

http://localhost:8080/v1/datasets/sample/aggregation?group_by=ROOMS:0:1;2:3-6:>=7

returns:

{
  "fields": [
    {
      "as_string": "ROOMS:0:1;2:3-6:>=7",
      "custom_groups": {
        "else": {
          "name": "other"
        },
        "groups": [
          {
            "name": "0",
            "values": [
              {
                "as_string": "0",
                "value": 0
              }
            ]
          },
          {
            "name": "1;2",
            "values": [
              {
                "as_string": "1",
                "value": 1
              },
              {
                "as_string": "2",
                "value": 2
              }
            ]
          },
          {
            "name": "3-6",
            "values": [
              {
                "as_string": "3-6",
                "exclusive_maximum": 7,
                "maximum": 6,
                "minimum": 3
              }
            ]
          },
          {
            "name": ">=7",
            "values": [
              {
                "as_string": ">=7",
                "exclusions": [
                  999
                ],
                "minimum": 7
              }
            ]
          }
        ]
      },
      "name": "ROOMS",
      "title": "Number of Bedrooms"
    },
...
  "values": [
    [
      "0",
      7
    ],
    [
      "1;2",
      30
    ],
    [
      "3-6",
      108
    ],
    [
      ">=7",
      36
    ],
    [
      "other",
      10
    ]
  ]
}

Custom groups cannot overlap (eg. group_by=ROOMS:3-6:>=5 will raise an error).

The API allows you to specify up to 100 custom groups, each of which can be at most 1000 characters long, eg. RENT:a;b;c:d;e has 2 custom groups, of character lengths 5 and 3 respectively.

Using group by with multi-table datasets

With multi-table datasets, you can group by fields on other resources. The results of such query apply to those units on the main resource (eg. people) who had at least one entry in the joined resource (eg. people with at least one transaction).

For example,

http://localhost:8080/v1/datasets/joined_person/aggregation/csv?group_by=buyer_transactions.type

returns:

type,perturbed count
A,48
B,51
C,47
D,57
E,54
F,64
G,46

This is the (perturbed) count of the number of people who had at least one transaction where they were a buyer, by transaction type.

where

You can restrict the query to records meeting a condition with the "where" parameter. For string fields, only equality is supported (you can include more than one value by separating them with semi-colons). For range data fields (ie. numeric data fields and dates), equality and ranges are both supported. You can include multiple conditions.

Eg. to count the total number of males, and return a csv file:

http://localhost:8080/v1/datasets/sample_longitudinal/aggregation/csv?where=SEX=M

which returns:

YEAR,perturbed count
2000,71
2005,60

You can request records with a null or empty value as follows:

http://localhost:8080/v1/datasets/sample_longitudinal/aggregation?where=SEX=

Some examples of requesting ranges are:

http://localhost:8080/v1/datasets/sample_longitudinal/aggregation?where=ROOMS>=8
http://localhost:8080/v1/datasets/sample_longitudinal/aggregation?where=AMOUNT=10-19.99
http://localhost:8080/v1/datasets/sample_longitudinal/aggregation?where=AMOUNT>=100
http://localhost:8080/v1/datasets/sample/aggregation?where=EXPIRY=2010-01-01/2014-12-31

Ranges are inclusive of both ends.

Note that date fields use / instead of - to separate the start and end of the range, since - is already used in the date format.

If disjunctions are allowed (ie. disjunctions_and_custom_groups is true), you can combine different values on the same field into the one query. Eg:

http://localhost:8080/v1/datasets/sample/aggregation?where=ROOMS=1;3-5;>=7
http://localhost:8080/v1/datasets/sample/aggregation?where=AMOUNT=10-39.99
http://localhost:8080/v1/datasets/sample/aggregation?where=AMOUNT>=40
http://localhost:8080/v1/datasets/sample/aggregation?where=EXPIRY=2010-01-01/2019-12-31
http://localhost:8080/v1/datasets/sample/aggregation?where=ROOMS>=4
http://localhost:8080/v1/datasets/sample/aggregation?where=SEX=M;F
http://localhost:8080/v1/datasets/sample/aggregation?where=ROOMS=1-5,DWELL=001

The where clause in the first example above is returned with the result as follows:

    "where": [
      {
        "as_string": "ROOMS=1;3-5;>=7",
        "condition_values": {
          "equals": [
            {
              "name": "1",
              "value": 1
            },
            {
              "as_string": "3-5",
              "component_ranges": [
                {
                  "name": "3",
                  "value": 3
                },
                {
                  "name": "4",
                  "value": 4
                },
                {
                  "name": "5",
                  "value": 5
                }
              ],
              "exclusive_maximum": 6,
              "maximum": 5,
              "minimum": 3
            },
            {
              "as_string": ">=7",
              "component_ranges": [
                {
                  "name": "7",
                  "value": 7
                },
                {
                  "exclusions": [
                    999
                  ],
                  "minimum": 8,
                  "name": ">=8"
                }
              ],
              "exclusions": [
                999
              ],
              "minimum": 7
            }
          ]
        },
        "name": "ROOMS",
        "title": "Number of Bedrooms",
        "values_as_received": {
          "equals": [
            {
              "as_string": "1"
            },
            {
              "as_string": "3-5"
            },
            {
              "as_string": ">=7"
            }
          ]
        }
      }
    ]

If disjunctions are not allowed (ie. disjunctions_and_custom_groups is false), you can still request different values on the same field as long as you also group by the same field, eg.

http://localhost:8080/v1/datasets/sample_longitudinal/aggregation?group_by=ROOMS&where=ROOMS=3-5,YEAR=2000

This returns the values:

  "values": [
    [
      "3",
      "2000",
      28
    ],
    [
      "4",
      "2000",
      44
    ],
    [
      "5",
      "2000",
      27
    ]
  ]

In this case, totals=true will not total across this field.

If you use a range that is not aligned with the allowed intervals, eg.

http://localhost:8080/v1/datasets/sample/aggregation?where=AMOUNT=0-13

the result is an error with a suggestion for an allowed query:

{
  "detail": "Upper bound 13 not allowed - try 9.99 or 19.99 instead.",
  "status": 400,
  "title": "",
  "type": "about:blank"
}

For integer categorical fields, you can also replace sequences such as 0;1;2;3 with 0-3 (as noted earlier for custom group). The COUNTRY field on the sample dataset is a good example:

http://localhost:8080/v1/datasets/sample/aggregation?where=COUNTRY=1101-1201

This returns (note the 1101-1201 has been expanded to the equivalent 1101;1102;1199;1201):

{
  ...
  "query": {
    ...
    "where": [
      {
        "as_string": "COUNTRY=1101-1201",
        "condition_values": {
          "equals": [
            {
              "as_string": "1101"
            },
            {
              "as_string": "1102"
            },
            {
              "as_string": "1199"
            },
            {
              "as_string": "1201"
            }
          ]
        },
        "name": "COUNTRY",
        "title": "Country of Birth",
        "values_as_received": {
          "equals": [
            {
              "as_string": "1101-1201"
            }
          ]
        }
      }
    ]
  },
  "values": [
    [
      127
    ]
  ]
}

The API allows you to specify a maximum of 60 alternative values on each condition field, eg. RENT=a;b;c;d;e,SEX=M has 5 alternative values for RENT, and 1 for SEX.

Using where with hierarchical values

When you use a hierarchical value in a where clause, the API will expand it into its child values. To see this in action, try this query:

http://localhost:8080/v1/datasets/sample/aggregation?where=COUNTRY=1100

You will receive a result like:

{
  "fields": [
    {
      "as_string": "perturbed count",
      "function": {
        "name": "count"
      },
      "is_perturbed": true,
      "is_result": true,
      "name": "perturbed count",
      "title": "perturbed count"
    }
  ],
  "header": {
    "api_version": "1.2.0",
    "base_url": "http://localhost:8080/v1",
    "data_terms": "This data was generated using Protari. You can share it according to these [terms and conditions](http://www.example.com/terms-of-sharing-data). Protari perturbs (slightly adjusts) data outputs to protect privacy. No reliance should be placed on data cells with small values.",
    "organization": {
      "name": "Sample",
      "title": "Sample organization"
    },
    "prepared_at": "2019-02-14T04:12:42.083285Z",
    "terms": "By using Protari you acknowledge that you are authorised to do so and your use will comply with these [terms and conditions](http://www.example.com/terms-of-use). All activity is logged and may be audited."
  },
  "query": {
    "as_string": "function=count&where=COUNTRY=1100",
    "dataset": {
      "resources": [
        {
          "name": "person",
          "title": "person",
          "unit": {
            "plural": "people",
            "singular": "person"
          }
        }
      ]
    },
    "function": [
      {
        "as_string": "count",
        "type": {
          "name": "count",
          "parameter_types": [
            {
              "description": "A field whose sentinel values should be excluded from the count.",
              "is_required": false
            }
          ]
        }
      }
    ],
    "where": [
      {
        "as_string": "COUNTRY=1100",
        "condition_values": {
          "equals": [
            {
              "as_string": "1101"
            },
            {
              "as_string": "1102"
            },
            {
              "as_string": "1199"
            }
          ]
        },
        "name": "COUNTRY",
        "title": "Country of Birth",
        "values_as_received": {
          "equals": [
            {
              "as_string": "1100"
            }
          ]
        }
      }
    ]
  },
  "values": [
    [
      105
    ]
  ]
}

In the returned where clause, you can see that the API has converted the field value 1100 into its child values 1101, 1102 and 1199. That's because 1100 is configured as a "parent" value. The received query values (here, only 1100) are also returned in values_as_received.

Using where with multi-table datasets

This is very similar to using group by with multi-table datasets, eg.

http://localhost:8080/v1/datasets/joined_person/aggregation/csv?where=buyer_transactions.type=A

returns:

perturbed count
48

This is the count of all people who had at least one transaction of type A where they were a buyer.

function

So far, all the requests have returned counts. You can also request means and sums by using the function parameter, eg.

http://localhost:8080/v1/datasets/sample_longitudinal/aggregation/csv?function=mean ROOMS

which returns

YEAR,perturbed mean of ROOMS
2000,4.37
2005,4.36

In this result, sentinel and "null" values of ROOMS have been excluded. (Eg. 999 is a "sentinel" value because it corresponds to an unknown number of rooms. Including it in the mean would skew the results.)

You can see the list of excluded values (sentinels and null values) in the json output:

http://localhost:8080/v1/datasets/sample_longitudinal/aggregation?function=mean ROOMS
  "query": {
    "as_string": "function=mean ROOMS",
    "dataset": {
      "resources": [
        {
          "name": "person",
          "title": "person",
          "unit": {
            "plural": "people",
            "singular": "person"
          }
        }
      ]
    },
    "function": [
      {
        "as_string": "mean ROOMS",
        "parameters": [
          {
            "as_string": "ROOMS",
            "condition_values": {
              "not_equals": [
                {
                  "as_string": "null"
                },
                {
                  "as_string": "999"
                }
              ]
            },
            "name": "ROOMS",
            "title": "Number of Bedrooms"
          }
        ],
        "type": {
          "name": "mean",
          "parameter_types": [
            {
              "description": "The field whose non-sentinel values are to be averaged.",
              "is_required": true,
              "requires_numeric_data": true
            }
          ]
        }
      }
    ]
    ...
  }

The result has also been rounded to an appropriate number of decimal places (in this case because ROOMS is an integer, but more generally based on the field's decimal_places).

You can also exclude a field's sentinel and null values from counts, by including the field as an argument to count, eg.:

http://localhost:8080/v1/datasets/sample_longitudinal/aggregation?function=count ROOMS
http://localhost:8080/v1/datasets/sample_longitudinal/aggregation?function=count SEX

giving counts of 177 in 2000 and 151 in 2005 for people with valid ROOMS values, and counts of 137 in 2000 and 128 in 2005 for people with a stated SEX. Recall the total count is 199 in each year.

Requesting Multiple Functions

Optionally, the data custodian may allow users to request more than one function at a time (by default you cannot). Be aware that in this case, sentinel and null values for all function fields are excluded for all the results.

So, for example:

http://localhost:8080/v1/datasets/sample_longitudinal/aggregation/csv?function=mean ROOMS,count SEX

returns

YEAR,perturbed count of SEX,perturbed mean of ROOMS
2000,123,4.32
2005,114,4.38

ie. 123 for the count in 2000 and 114 in 2005 - these are the records that have both a valid ROOMS value and a stated SEX.

Because this is potentially misleading, max_functions defaults to 1 if it is not included in the global settings.

Using function with multi-table datasets

You can only request functions of fields from the main resource.

Eg.

http://localhost:8080/v1/datasets/joined_person/aggregation?function=mean household.RENT

returns

{
    "detail": "Cannot aggregate on quantities from household.",
    "name": "InvalidQueryError",
    "status": 400,
    "title": "Invalid query"
}

totals

Add totals=true to retrieve all the cross-totals in the one request. This is only available with the standard json format. Eg.

http://localhost:8080/v1/datasets/sample_longitudinal/aggregation?group_by=DWELL,SEX&totals=true

also returns the results of grouping by DWELL on its own, SEX on its own, and the result with no group by parameter. If you think of the result as a two-dimensional table with DWELL values down and SEX values across, with each entry in the table being a count, then the first totals are the column totals, the second are row totals and the final is the grand total in the bottom right corner.