Guide to Using the API

The best way to explore the capabilities of the API is to use the automatically-generated OpenAPI interface at /v1/ui/ (eg. https://protari.example.com/v1/u1/, if the API was hosted at https://protari.example.com/).

Some specific examples of calls to the API are described below. Note you will need to replace https://protari.example.com/ with the URL you have been given, and change references to datasets and fields to those appropriate to the data you have access to.

Furthermore, data from Protari is usually only accessible to authorized users. Unfortunately, users cannot easily provide the necessary authorization information by simply pointing their web browser to an endpoint, so it may be necessary to use a tool like curl to supply this

/about

The https://protari.example.com/v1/about endpoint returns just the "header" property (which is included at every endpoint):

{
  "header": {
    "api_version": "1.4.0",
    "base_url": "https://protari.example.com/v1",
    "data_terms": "This data was generated using Protari. You can share it according to these [terms and conditions](http://www.example.com/terms-of-sharing-data). Protari perturbs (slightly adjusts) data outputs to protect privacy. No reliance should be placed on data cells with small values.",
    "organization": {
      "name": "Sample",
      "title": "Sample organization"
    },
    "prepared_at": "2019-02-14T03:15:16.868564Z",
    "terms": "By using Protari you acknowledge that you are authorised to do so and your use will comply with these [terms and conditions](http://www.example.com/terms-of-use). All activity is logged and may be audited."
  }
}

/datasets/

The https://protari.example.com/v1/datasets/ endpoint returns a JSON-formatted list of the available datasets, with high-level metadata, that are visible to all users.

Eg.

{
  "datasets": [
    {
      "description": "A sample household-level dataset with <a href='http://www.example.com/joined' target='_blank'>joined</a> tables",
      "main_resource_name": "household",
      "name": "joined_household",
      "resources": [
        {
          "description": "Some detail about the transaction level data",
          "name": "transaction",
          "notes": "And some notes about the transaction level data",
          "title": "Transaction",
          "unit": {
            "plural": "transactions",
            "singular": "transaction"
          }
        },
        {
          "description": "Some detail about the person level data",
          "name": "person",
          "title": "Person",
          "unit": {
            "plural": "people",
            "singular": "person"
          }
        },
        {
          "description": "Some detail about the item level data",
          "name": "item",
          "title": "Item",
          "unit": {
            "plural": "items",
            "singular": "item"
          }
        },
        {
          "description": "Some detail about the [household](http://www.example.com/household) level data",
          "name": "household",
          "title": "Household",
          "unit": {
            "plural": "households",
            "singular": "household"
          }
        }
      ],
      "title": "Sample household dataset with joined data",
      "unit": {
        "plural": "households",
        "singular": "household"
      }
    },
    {
      "description": "A sample person-level dataset with <a href='http://www.example.com/joined' target='_blank'>joined</a> tables",
      "main_resource_name": "person",
      "name": "joined_person",
      "resources": [
        {
          "description": "Some detail about the transaction level data",
          "name": "transaction",
          "notes": "And some notes about the transaction level data",
          "title": "Transaction",
          "unit": {
            "plural": "transactions",
            "singular": "transaction"
          }
        },
        {
          "description": "Some detail about the person level data",
          "name": "person",
          "title": "Person",
          "unit": {
            "plural": "people",
            "singular": "person"
          }
        },
        {
          "description": "Some detail about the item level data",
          "name": "item",
          "title": "Item",
          "unit": {
            "plural": "items",
            "singular": "item"
          }
        },
        {
          "description": "Some detail about the [household](http://www.example.com/household) level data",
          "name": "household",
          "title": "Household",
          "unit": {
            "plural": "households",
            "singular": "household"
          }
        }
      ],
      "title": "Sample person dataset with joined data",
      "unit": {
        "plural": "people",
        "singular": "person"
      }
    },
    {
      "description": "This dataset contains sample longitudinal data.",
      "name": "sample_longitudinal",
      "resources": [
        {
          "name": "person",
          "title": "person",
          "unit": {
            "plural": "people",
            "singular": "person"
          }
        }
      ],
      "title": "Sample longitudinal dataset",
      "unit": {
        "plural": "people",
        "singular": "person"
      }
    },
    {
      "description": "This dataset contains sample data, perturbed using Laplace-distributed noise.",
      "name": "sample_laplace",
      "resources": [
        {
          "name": "person",
          "title": "person",
          "unit": {
            "plural": "people",
            "singular": "person"
          }
        }
      ],
      "title": "Sample dataset with Laplace noise",
      "unit": {
        "plural": "people",
        "singular": "person"
      }
    },
    {
      "description": "A high-level description is available [here](http://www.example.com/sample)",
      "name": "sample",
      "notes": "For detailed notes on this dataset, see http://www.example.com/ or <a href='http://www.example.com/notes'>here</a>",
      "resources": [
        {
          "name": "person",
          "title": "person",
          "unit": {
            "plural": "people",
            "singular": "person"
          }
        }
      ],
      "title": "Sample dataset",
      "unit": {
        "plural": "people",
        "singular": "person"
      }
    }
  ],
  "header": {
    "api_version": "1.4.0",
    "base_url": "https://protari.example.com/v1",
    "data_terms": "This data was generated using Protari. You can share it according to these [terms and conditions](http://www.example.com/terms-of-sharing-data). Protari perturbs (slightly adjusts) data outputs to protect privacy. No reliance should be placed on data cells with small values.",
    "organization": {
      "name": "Sample",
      "title": "Sample organization"
    },
    "prepared_at": "2019-02-14T03:18:11.702742Z",
    "terms": "By using Protari you acknowledge that you are authorised to do so and your use will comply with these [terms and conditions](http://www.example.com/terms-of-use). All activity is logged and may be audited."
  }
}

The name fields above are the names you can use to refer to datasets in subsequent API calls.

/datasets/{dataset_name}

Place a specific dataset name at the end of the https://protari.example.com/v1/datasets/ endpoint to request the full metadata (including the accessible fields and values) for that dataset.

The output for https://protari.example.com/v1/datasets/sample is:

{
  "description": "A high-level description is available [here](http://www.example.com/sample)",
  "header": {
    "api_version": "1.7.0",
    "base_url": "https://protari.example.com/v1",
    "data_terms": "This data was generated using Protari. You can share it according to these [terms and conditions](http://www.example.com/terms-of-sharing-data). Protari perturbs (slightly adjusts) data outputs to protect privacy. No reliance should be placed on data cells with small values.",
    "organization": {
      "name": "Sample",
      "title": "Sample organization"
    },
    "prepared_at": "2020-02-04T02:24:22.774965Z",
    "terms": "All data is for sample purposes only."
  },
  "main_resource_name": "person",
  "name": "sample",
  "notes": "For detailed notes on this dataset, see http://www.example.com/ or <a href='http://www.example.com/notes'>here</a>",
  "query_classes": {
    "aggregation": {
      "allow_disjunctions_and_custom_groups": true,
      "allowed_functions": [
        {
          "name": "count",
          "parameter_types": [
            {
              "is_required": false,
              "title": "exclude any sentinel (eg. missing) values of"
            }
          ],
          "title": "Count"
        },
        {
          "name": "mean",
          "parameter_types": [
            {
              "is_required": true,
              "notes": "excludes any sentinel values",
              "requires_numeric_data": true,
              "title": "of"
            }
          ],
          "title": "Average"
        },
        {
          "name": "sum",
          "parameter_types": [
            {
              "is_required": true,
              "notes": "excludes any sentinel values",
              "requires_numeric_data": true,
              "title": "of"
            }
          ],
          "title": "Sum"
        }
      ],
      "field_exclusion_rules": [
        {
          "limit": 4
        },
        {
          "fields": [
            {
              "name": "COUNTRY"
            },
            {
              "name": "AMOUNT"
            },
            {
              "name": "POSTCODE"
            }
          ],
          "limit": 2
        }
      ],
      "max_functions": 2,
      "maximum_output_values_length": 2000
    }
  },
  "relationships": [],
  "resources": [
    {
      "fields": [
        {
          "name": "SEX",
          "title": "Sex",
          "type": "string",
          "values": [
            {
              "name": "F",
              "title": "Female"
            },
            {
              "name": "M",
              "title": "Male"
            },
            {
              "is_sentinel": true,
              "name": "",
              "title": "Not stated"
            }
          ]
        },
        {
          "name": "COUNTRY",
          "title": "Country of Birth",
          "type": "integer",
          "values": [
            {
              "name": "1000",
              "title": "Oceania",
              "values": [
                {
                  "name": "1100",
                  "title": "Australia (including External Territories)",
                  "values": [
                    {
                      "description": "Mainland Australia (ie. excluding Norfolk Island and other external territories)",
                      "name": "1101",
                      "title": "Australia"
                    },
                    {
                      "name": "1102",
                      "title": "Norfolk Island"
                    },
                    {
                      "name": "1199",
                      "title": "Australian External Territories"
                    }
                  ]
                },
                {
                  "name": "1201",
                  "title": "New Zealand"
                },
                {
                  "name": "1300",
                  "title": "Melanesia",
                  "values": [
                    {
                      "name": "1301",
                      "title": "New Caledonia"
                    },
                    {
                      "name": "1302",
                      "title": "Papua New Guinea"
                    },
                    {
                      "name": "1303",
                      "title": "Solomon Islands"
                    },
                    {
                      "name": "1304",
                      "title": "Vanuatu"
                    }
                  ]
                }
              ]
            },
            {
              "name": "9999",
              "title": "Other"
            },
            {
              "is_sentinel": true,
              "name": "null",
              "title": "Not stated"
            }
          ]
        },
        {
          "fields": [
            {
              "description": "Type of dwelling of usual residence",
              "name": "DWELL",
              "title": "Dwelling Type",
              "type": "string",
              "values": [
                {
                  "description": "A separate or semi-detached house",
                  "name": "001",
                  "title": "House"
                },
                {
                  "name": "002",
                  "title": "Apartment"
                },
                {
                  "name": "003",
                  "title": "Other"
                },
                {
                  "is_sentinel": true,
                  "name": "not known",
                  "title": "Not known"
                }
              ]
            },
            {
              "can_be_above_maximum": true,
              "can_be_below_minimum": false,
              "exclusive_maximum": 8,
              "has_numeric_data": true,
              "is_ordered": true,
              "maximum": 7,
              "minimum": 0,
              "name": "ROOMS",
              "title": "Number of Bedrooms",
              "type": "integer",
              "values": [
                {
                  "exclusive_maximum": 1,
                  "maximum": 0,
                  "minimum": 0,
                  "name": "0",
                  "title": "None (including bedsits)"
                },
                {
                  "is_sentinel": true,
                  "name": "null",
                  "title": "Not stated"
                },
                {
                  "is_sentinel": true,
                  "name": "999",
                  "title": "Not applicable"
                },
                {
                  "exclusive_maximum": 2,
                  "maximum": 1,
                  "minimum": 1,
                  "name": "1",
                  "title": "1"
                },
                {
                  "exclusive_maximum": 3,
                  "maximum": 2,
                  "minimum": 2,
                  "name": "2",
                  "title": "2"
                },
                {
                  "exclusive_maximum": 4,
                  "maximum": 3,
                  "minimum": 3,
                  "name": "3",
                  "title": "3"
                },
                {
                  "exclusive_maximum": 5,
                  "maximum": 4,
                  "minimum": 4,
                  "name": "4",
                  "title": "4"
                },
                {
                  "exclusive_maximum": 6,
                  "maximum": 5,
                  "minimum": 5,
                  "name": "5",
                  "title": "5"
                },
                {
                  "exclusive_maximum": 7,
                  "maximum": 6,
                  "minimum": 6,
                  "name": "6",
                  "title": "6"
                },
                {
                  "exclusive_maximum": 8,
                  "maximum": 7,
                  "minimum": 7,
                  "name": "7",
                  "title": "7"
                },
                {
                  "minimum": 8,
                  "name": ">=8",
                  "title": ">=8"
                }
              ]
            }
          ],
          "title": "Dwelling"
        },
        {
          "description": "Postcode of usual residence",
          "name": "POSTCODE",
          "region_type": "poa",
          "title": "Postcode",
          "type": "string",
          "values": [
            {
              "name": "6722"
            },
            {
              "name": "6725"
            },
            {
              "name": "0872"
            },
            {
              "name": "6762"
            },
            {
              "name": "6642"
            },
            {
              "name": "6710"
            },
            {
              "name": "6429"
            }
          ]
        },
        {
          "can_be_above_maximum": true,
          "can_be_below_minimum": true,
          "exclusive_maximum": 100,
          "has_numeric_data": true,
          "is_ordered": true,
          "maximum": 99.99,
          "minimum": 0,
          "name": "AMOUNT",
          "precision": 0.01,
          "title": "Dollar Amount",
          "type": "number",
          "values": [
            {
              "exclusive_maximum": 0,
              "name": "<0",
              "title": "<0"
            },
            {
              "exclusive_maximum": 10,
              "maximum": 9.99,
              "minimum": 0,
              "name": "0-9.99",
              "title": "0-9.99"
            },
            {
              "exclusive_maximum": 20,
              "maximum": 19.99,
              "minimum": 10,
              "name": "10-19.99",
              "title": "10-19.99"
            },
            {
              "exclusive_maximum": 30,
              "maximum": 29.99,
              "minimum": 20,
              "name": "20-29.99",
              "title": "20-29.99"
            },
            {
              "exclusive_maximum": 40,
              "maximum": 39.99,
              "minimum": 30,
              "name": "30-39.99",
              "title": "30-39.99"
            },
            {
              "exclusive_maximum": 50,
              "maximum": 49.99,
              "minimum": 40,
              "name": "40-49.99",
              "title": "40-49.99"
            },
            {
              "exclusive_maximum": 60,
              "maximum": 59.99,
              "minimum": 50,
              "name": "50-59.99",
              "title": "50-59.99"
            },
            {
              "exclusive_maximum": 70,
              "maximum": 69.99,
              "minimum": 60,
              "name": "60-69.99",
              "title": "60-69.99"
            },
            {
              "exclusive_maximum": 80,
              "maximum": 79.99,
              "minimum": 70,
              "name": "70-79.99",
              "title": "70-79.99"
            },
            {
              "exclusive_maximum": 90,
              "maximum": 89.99,
              "minimum": 80,
              "name": "80-89.99",
              "title": "80-89.99"
            },
            {
              "exclusive_maximum": 100,
              "maximum": 99.99,
              "minimum": 90,
              "name": "90-99.99",
              "title": "90-99.99"
            },
            {
              "minimum": 100,
              "name": ">=100",
              "title": ">=100"
            }
          ]
        },
        {
          "can_be_above_maximum": true,
          "can_be_below_minimum": false,
          "exclusive_maximum": "2021-01-01",
          "is_ordered": true,
          "maximum": "2020-12-31",
          "minimum": "1970-01-01",
          "name": "EXPIRY",
          "precision": "P1D",
          "title": "Expiry date",
          "type": "date",
          "values": [
            {
              "is_sentinel": true,
              "name": "null",
              "title": "Unknown"
            },
            {
              "exclusive_maximum": "1975-01-01",
              "maximum": "1974-12-31",
              "minimum": "1970-01-01",
              "name": "1970-01-01/1974-12-31",
              "title": "1970-01-01 to 1974-12-31"
            },
            {
              "exclusive_maximum": "1980-01-01",
              "maximum": "1979-12-31",
              "minimum": "1975-01-01",
              "name": "1975-01-01/1979-12-31",
              "title": "1975-01-01 to 1979-12-31"
            },
            {
              "exclusive_maximum": "1985-01-01",
              "maximum": "1984-12-31",
              "minimum": "1980-01-01",
              "name": "1980-01-01/1984-12-31",
              "title": "1980-01-01 to 1984-12-31"
            },
            {
              "exclusive_maximum": "1990-01-01",
              "maximum": "1989-12-31",
              "minimum": "1985-01-01",
              "name": "1985-01-01/1989-12-31",
              "title": "1985-01-01 to 1989-12-31"
            },
            {
              "exclusive_maximum": "1995-01-01",
              "maximum": "1994-12-31",
              "minimum": "1990-01-01",
              "name": "1990-01-01/1994-12-31",
              "title": "1990-01-01 to 1994-12-31"
            },
            {
              "exclusive_maximum": "2000-01-01",
              "maximum": "1999-12-31",
              "minimum": "1995-01-01",
              "name": "1995-01-01/1999-12-31",
              "title": "1995-01-01 to 1999-12-31"
            },
            {
              "exclusive_maximum": "2005-01-01",
              "maximum": "2004-12-31",
              "minimum": "2000-01-01",
              "name": "2000-01-01/2004-12-31",
              "title": "2000-01-01 to 2004-12-31"
            },
            {
              "exclusive_maximum": "2010-01-01",
              "maximum": "2009-12-31",
              "minimum": "2005-01-01",
              "name": "2005-01-01/2009-12-31",
              "title": "2005-01-01 to 2009-12-31"
            },
            {
              "exclusive_maximum": "2015-01-01",
              "maximum": "2014-12-31",
              "minimum": "2010-01-01",
              "name": "2010-01-01/2014-12-31",
              "title": "2010-01-01 to 2014-12-31"
            },
            {
              "exclusive_maximum": "2020-01-01",
              "maximum": "2019-12-31",
              "minimum": "2015-01-01",
              "name": "2015-01-01/2019-12-31",
              "title": "2015-01-01 to 2019-12-31"
            },
            {
              "exclusive_maximum": "2021-01-01",
              "maximum": "2020-12-31",
              "minimum": "2020-01-01",
              "name": "2020-01-01/2020-12-31",
              "title": "2020-01-01 to 2020-12-31"
            },
            {
              "minimum": "2021-01-01",
              "name": ">=2021-01-01",
              "title": ">=2021-01-01"
            }
          ]
        },
        {
          "description": "State of usual residence",
          "name": "state",
          "region_type": "ste",
          "title": "State",
          "type": "string",
          "values": [
            {
              "name": "1",
              "title": "New South Wales"
            },
            {
              "name": "2",
              "title": "Victoria"
            },
            {
              "name": "3",
              "title": "Queensland"
            },
            {
              "name": "4",
              "title": "South Australia"
            },
            {
              "name": "5",
              "title": "Western Australia"
            },
            {
              "name": "6",
              "title": "Tasmania"
            },
            {
              "name": "7",
              "title": "Northern Territory"
            },
            {
              "name": "8",
              "title": "Australian Capital Territory"
            },
            {
              "name": "9",
              "title": "Other Territories"
            },
            {
              "is_sentinel": true,
              "name": "&",
              "title": "Missing"
            }
          ]
        }
      ],
      "name": "person",
      "title": "person",
      "unit": {
        "plural": "people",
        "singular": "person"
      }
    }
  ],
  "title": "Sample dataset",
  "unit": {
    "plural": "people",
    "singular": "person"
  }
}

The field names and value names are the names you can use to refer to fields and values in subsequent API calls.

You can flatten the fields hierarchy, the value hierarchy, or both, by passing flatten=fields, values or all respectively.

To suppress all values from the output, pass show_values=false. (In this case flatten=values has no effect.)

Relationships for Multi-Table Datasets

When a dataset has multiple tables, the relationships field will be populated with an array of relationship objects specifying how resources may be joined. The structure of this object is:

You can use a .-separated list of relationship names to connect the main resource to another resource. Eg. the joined_person dataset in the protari-sample repository has main_resource person, with the following relationships defined:

{
  ...
  "relationships": [
    {
      "host_resource_name": "person",
      "name": "buyer_transactions",
      "target_resource_name": "transaction",
      "title": "Buyer Transactions"
    },
    {
      "host_resource_name": "person",
      "name": "seller_transactions",
      "target_resource_name": "transaction",
      "title": "Seller Transactions"
    },
    {
      "host_resource_name": "person",
      "name": "household",
      "target_resource_name": "household"
    },
    {
      "host_resource_name": "transaction",
      "name": "item",
      "target_resource_name": "item"
    }
  ],
  ...
}

As a result:

Note that the joined_household dataset presents the same data at the household level instead of the person level.

/datasets/{dataset_name}/fields/{field_name}

Use this endpoint to get metadata (including the allowed values) about a specific field. You can optionally pass flatten=values as for the previous endpoint.

For datasets with more than one data resource, you can retrieve metadata about a field that is not on the main resource by specifying the fields name as {resource name}.{field name}. For example a field with the name "type" on a data resource with the name "transaction" would be specified as type.transaction. The names of fields on the main resource should not include the resource name even when a dataset has multiple data resources.

Eg. The output for https://protari.example.com/v1/datasets/sample/fields/SEX is:

{
  "field": {
    "name": "SEX",
    "title": "Sex",
    "type": "string",
    "values": [
      {
        "name": "F",
        "title": "Female"
      },
      {
        "name": "M",
        "title": "Male"
      },
      {
        "is_sentinel": true,
        "name": "",
        "title": "Not stated"
      }
    ]
  },
  "header": {
    "api_version": "1.4.0",
    "base_url": "https://protari.example.com/v1",
    "data_terms": "This data was generated using Protari. You can share it according to these [terms and conditions](http://www.example.com/terms-of-sharing-data). Protari perturbs (slightly adjusts) data outputs to protect privacy. No reliance should be placed on data cells with small values.",
    "organization": {
      "name": "Sample",
      "title": "Sample organization"
    },
    "prepared_at": "2019-02-14T03:32:18.331244Z",
    "terms": "By using Protari you acknowledge that you are authorised to do so and your use will comply with these [terms and conditions](http://www.example.com/terms-of-use). All activity is logged and may be audited."
  }
}

/datasets/{dataset_name}/aggregation

The https://protari.example.com/v1/datasets/{dataset_name}/aggregation URL is the workhorse of the API, letting you ask for aggregated data from the dataset.

Note that query limits may apply to this endpoint.

With no further parameters, at this URL the API returns a perturbed count of the units in the dataset in JSON format. For example, the query https://protari.example.com/v1/datasets/sample/aggregation returns

{
  "fields": [
    {
      "as_string": "perturbed count",
      "function": {
        "name": "count"
      },
      "is_perturbed": true,
      "is_result": true,
      "name": "perturbed count",
      "title": "perturbed Count"
    }
  ],
  "header": {
    "api_version": "1.7.0",
    "base_url": "http://localhost:8080/v1",
    "data_terms": "This data was generated using Protari. You can share it according to these [terms and conditions](http://www.example.com/terms-of-sharing-data). Protari perturbs (slightly adjusts) data outputs to protect privacy. No reliance should be placed on data cells with small values.",
    "organization": {
      "name": "Sample",
      "title": "Sample organization"
    },
    "prepared_at": "2020-02-21T02:20:59.100593Z",
    "terms": "By using Protari you acknowledge that you are authorised to do so and your use will comply with these [terms and conditions](http://www.example.com/terms-of-use). All activity is logged and may be audited."
  },
  "query": {
    "as_string": "function=count",
    "dataset": {
      "name": "sample",
      "resources": [
        {
          "name": "person",
          "title": "person",
          "unit": {
            "plural": "people",
            "singular": "person"
          }
        }
      ]
    },
    "function": [
      {
        "as_string": "count",
        "type": {
          "name": "count",
          "parameter_types": [
            {
              "is_required": false,
              "title": "exclude any sentinel (eg. missing) values of"
            }
          ],
          "title": "Count"
        }
      }
    ]
  },
  "values": [
    [
      204
    ]
  ]
}

This output indicates that the perturbed count of people in the sample dataset is 204.

If the dataset has longitudinal fields, the API returns, for each combination of values of the longitudinal fields, the perturbed count of the number of units in the dataset having that combination of values for the longitudinal fields. For example, the query https://protari.example.com/v1/datasets/sample_longitudinal/aggregation returns

{
  "fields": [
    {
      "as_string": "YEAR",
      "is_longitudinal": true,
      "name": "YEAR",
      "title": "Year"
    },
    {
      "as_string": "perturbed count",
      "function": {
        "name": "count"
      },
      "is_perturbed": true,
      "is_result": true,
      "name": "perturbed count",
      "title": "perturbed count"
    }
  ],
  "header": {
    "api_version": "1.4.0",
    "base_url": "https://protari.example.com/v1",
    "data_terms": "This data was generated using Protari. You can share it according to these [terms and conditions](http://www.example.com/terms-of-sharing-data). Protari perturbs (slightly adjusts) data outputs to protect privacy. No reliance should be placed on data cells with small values.",
    "organization": {
      "name": "Sample",
      "title": "Sample organization"
    },
    "prepared_at": "2019-02-14T03:32:46.010943Z",
    "terms": "By using Protari you acknowledge that you are authorised to do so and your use will comply with these [terms and conditions](http://www.example.com/terms-of-use). All activity is logged and may be audited."
  },
  "query": {
    "as_string": "function=count",
    "dataset": {
      "resources": [
        {
          "name": "person",
          "title": "person",
          "unit": {
            "plural": "people",
            "singular": "person"
          }
        }
      ]
    },
    "function": [
      {
        "as_string": "count",
        "type": {
          "name": "count",
          "parameter_types": [
            {
              "description": "A field whose sentinel values should be excluded from the count.",
              "is_required": false
            }
          ]
        }
      }
    ]
  },
  "values": [
    [
      "2000",
      204
    ],
    [
      "2005",
      177
    ]
  ]
}

Note that, since the sample_longitudinal dataset includes a longitudinal YEAR field (and no other longitudinal fields), the perturbed count of people in the dataset has been broken down according to the two values of the YEAR field (namely, 2000 and 2005).

You can request the result of a query in CSV format by appending /csv. For example, the query https://protari.example.com/v1/datasets/sample_longitudinal/aggregation/csv returns

YEAR,perturbed count
2000,204
2005,177

The SDMX-JSON format is also available by appending /sdmx-json.

To make more specific queries, you simply add query parameters to the URL, starting with ? and separating them with &, eg.

https://protari.example.com/v1/datasets/sample_longitudinal/aggregation?group_by=DWELL&where=SEX=M

The results are confidentialised according to the dataset's configuration.

group by

Use this to show the results broken down according to the value in a particular field (or fields). Supply as a comma-separated list of field names, eg.

https://protari.example.com/v1/datasets/sample_longitudinal/aggregation/csv?group_by=DWELL,SEX

returns:

DWELL,SEX,YEAR,perturbed count
001,F,2000,18
001,F,2005,24
001,M,2000,18
001,M,2005,21
001,,2000,15
001,,2005,16
002,F,2000,20
002,F,2005,20
002,M,2000,23
002,M,2005,23
002,,2000,22
002,,2005,16
003,F,2000,27
003,F,2005,17
003,M,2000,25
003,M,2005,11
003,,2000,16
003,,2005,23

The first line tells you there were approximately 18 females dwelling in houses (DWELL "001") in 2000, etc.

Note that the API has automatically grouped by the longitudinal field, YEAR.

Group-by clauses in Protari queries resemble group-by clauses in queries expressed in SQL, a very popular language for the reading and writing of data in relational databases. Indeed, as this guide will make clear, Protari queries share several basic features of SQL queries. Although it is by no means necessary for users of Protari to be familiar with SQL, considering that it is likely that many readers of this guide will be better able to readily understand the meaning of any given example Protari query if they are presented with the SQL statement that corresponds to that query, the guide includes several such SQL statements. Readers who are not familiar with SQL can safely ignore these statements.

The last-mentioned example Protari query corresponds to the following SQL statement:

SELECT count(*)
FROM sample_longitudinal 
GROUP BY
    CASE
        WHEN (sample_longitudinal."DWELL" = '001') THEN '001'
        WHEN (sample_longitudinal."DWELL" = '002') THEN '002'
        WHEN (sample_longitudinal."DWELL" = '003') THEN '003'
        ELSE 'out-of-range'
    END,
    CASE
        WHEN (sample_longitudinal."SEX" = 'F') THEN 'F'
        WHEN (sample_longitudinal."SEX" = 'M') THEN 'M'
        WHEN (sample_longitudinal."SEX" = '') THEN ''
        ELSE 'out-of-range' END,
    CASE
        WHEN (sample_longitudinal."YEAR" = '2000') THEN '2000'
        WHEN (sample_longitudinal."YEAR" = '2005') THEN '2005'
        ELSE 'out-of-range'
    END
ORDER BY
    CASE
        WHEN (sample_longitudinal."DWELL" = '001') THEN '001'
        WHEN (sample_longitudinal."DWELL" = '002') THEN '002'
        WHEN (sample_longitudinal."DWELL" = '003') THEN '003'
        ELSE 'out-of-range'
    END,
    CASE
        WHEN (sample_longitudinal."SEX" = 'F') THEN 'F'
        WHEN (sample_longitudinal."SEX" = 'M') THEN 'M'
        WHEN (sample_longitudinal."SEX" = '') THEN ''
        ELSE 'out-of-range'
    END,
    CASE
        WHEN (sample_longitudinal."YEAR" = '2000') THEN '2000'
        WHEN (sample_longitudinal."YEAR" = '2005') THEN '2005'
        ELSE 'out-of-range'
    END

As this example illustrates, the SQL statement corresponding to a group-by query always includes an ORDER BY clause. However, since such a clause is often long, is important only for the presentation of the result of the query, and has exactly the same content as the GROUP BY clause, in the example SQL statements given in the remainder of this guide, any ORDER BY clauses are omitted.

You can also group by numeric fields, eg.

https://protari.example.com/v1/datasets/sample/aggregation/csv?group_by=AMOUNT

returns:

AMOUNT,perturbed count
<0,0
0-9.99,31
10-19.99,14
20-29.99,29
30-39.99,17
40-49.99,12
50-59.99,24
60-69.99,20
70-79.99,11
80-89.99,23
90-99.99,19
>=100,0

and

https://protari.example.com/v1/datasets/sample/aggregation/csv?group_by=ROOMS

returns:

ROOMS,perturbed count
null,5
999,12
0,7
1,15
2,15
3,25
4,46
5,30
6,9
7,5
>=8,30

It is possible to group by decimal fields, because apart from null and sentinel values, the values of such fields are always ranges. That is, like non-categorical integer fields and date fields, decimal fields are range data fields.

The two queries just mentioned respectively correspond to the following two SQL statements:

SELECT count(*)
FROM sample
GROUP BY
    CASE
        WHEN (sample."AMOUNT" < 0) THEN '<0'
        WHEN (sample."AMOUNT" >= 0 AND sample."AMOUNT" < 10) THEN '0-9.99'
        WHEN (sample."AMOUNT" >= 10 AND sample."AMOUNT" < 20) THEN '10-19.99'
        WHEN (sample."AMOUNT" >= 20 AND sample."AMOUNT" < 30) THEN '20-29.99'
        WHEN (sample."AMOUNT" >= 30 AND sample."AMOUNT" < 40) THEN '30-39.99'
        WHEN (sample."AMOUNT" >= 40 AND sample."AMOUNT" < 50) THEN '40-49.99'
        WHEN (sample."AMOUNT" >= 50 AND sample."AMOUNT" < 60) THEN '50-59.99'
        WHEN (sample."AMOUNT" >= 60 AND sample."AMOUNT" < 70) THEN '60-69.99'
        WHEN (sample."AMOUNT" >= 70 AND sample."AMOUNT" < 80) THEN '70-79.99'
        WHEN (sample."AMOUNT" >= 80 AND sample."AMOUNT" < 90) THEN '80-89.99'
        WHEN (sample."AMOUNT" >= 90 AND sample."AMOUNT" < 100) THEN '90-99.99'
        WHEN (sample."AMOUNT" >= 100) THEN '>=100'
        ELSE 'out-of-range'
    END
SELECT count(*)
FROM sample
GROUP BY
    CASE
        WHEN (sample."ROOMS" = 0) THEN '0'
        WHEN (sample."ROOMS" IS NULL) THEN 'null'
        WHEN (sample."ROOMS" = '999') THEN '999'
        WHEN (sample."ROOMS" = 1) THEN '1'
        WHEN (sample."ROOMS" = 2) THEN '2'
        WHEN (sample."ROOMS" = 3) THEN '3'
        WHEN (sample."ROOMS" = 4) THEN '4'
        WHEN (sample."ROOMS" = 5) THEN '5'
        WHEN (sample."ROOMS" = 6) THEN '6'
        WHEN (sample."ROOMS" = 7) THEN '7'
        WHEN (sample."ROOMS" >= 8 AND sample."ROOMS" IS NOT NULL AND sample."ROOMS" != '999') THEN '>=8'
        ELSE 'out-of-range'
    END

Grouping by longitudinal fields

A longitudinal dataset is characterised by the fact that it includes multiple records for each of its units. In particular, the records of a unit correspond to some (or all) of the possible combinations of values of the longitudinal fields of the dataset. In the common scenario where there is only one longitudinal field, the records of a unit correspond to some (or all) of the values of that field.

The sample_longitudinal dataset, which has the YEAR field as its only longitudinal field, is an example of a longitudinal dataset. In this dataset, because the YEAR field has the years 2000 and 2005 as its values, each person has data recorded for the year 2000 and/or the year 2005. Thus, you can think of the sample_longitudinal dataset as being equivalent to two non-longitudinal datasets - one containing all the records from the year 2000, and another containing all the records from the year 2005.

When querying a longitudinal dataset, you can refer to the fields of its (implicit) associated non-longitudinal datasets by means of an @ operator. Intuitively, by using this operator, you can refer to a non-longitudinal field as it is recorded in one of the associated non-longitudinal datasets (or, in other words, refer to the value of a non-longitudinal field "at" a particular value of a longitudinal field). For example, the query

https://protari.example.com/v1/datasets/sample_longitudinal/aggregation/csv?group_by=DWELL@2000

asks for the number of people, broken down according to every combination of a type of dwelling lived in by a person in the year 2000 and a year. Years are included in the breakdown because Protari automatically groups by longitudinal fields. This query returns

DWELL2000,YEAR,perturbed count
001,2000,51
001,2005,41
002,2000,74
002,2005,59
003,2000,75
003,2005,60

This output indicates that

The query is equivalent to the following SQL statement:

SELECT count(*)
FROM sample_longitudinal JOIN (SELECT sample_longitudinal."DWELL" AS "DWELL", sample_longitudinal."UID" AS "UID" 
                               FROM sample_longitudinal
                               WHERE sample_longitudinal."YEAR" = '2000') AS _derived_0 ON _derived_0."UID" = sample_longitudinal."UID" 
GROUP BY
    CASE
        WHEN (_derived_0."DWELL" = '001') THEN '001'
        WHEN (_derived_0."DWELL" = '002') THEN '002'
        WHEN (_derived_0."DWELL" = '003') THEN '003'
        ELSE 'out-of-range'
    END,
    CASE
        WHEN (sample_longitudinal."YEAR" = '2000') THEN '2000'
        WHEN (sample_longitudinal."YEAR" = '2005') THEN '2005'
        ELSE 'out-of-range'
    END

Here "UID" refers to a hidden field (ie. a field not exposed to users of Protari) whose values are identifiers for the units of the sample_longitudinal dataset.

More formally, Protari considers any longitudinal dataset to be equivalent, with respect to the information it contains, to a non-longitudinal dataset that is defined as follows:

For example, in the non-longitudinal dataset that is equivalent to the longitudinal sample_longitudinal dataset, there is a DWELL2000 field, which represents the type of the dwelling in which a person resided in the year 2000. For a given person, if there is a record in the sample_longitudinal dataset associated with the person such that the value of the longitudinal field YEAR in it is 2000, then the value of the DWELL2000 field is the value of the DWELL field in that record. Otherwise, the value of the DWELL2000 field is undefined (ie. null).

Custom groups

Custom groups are only available if the allow_disjunctions_and_custom_groups flag has been set to true (it is false by default). When they are available, you can specify custom groups for group-by fields, by using colons to separate groups and semi-colons to separate values within the groups. Eg:

https://protari.example.com/v1/datasets/sample/aggregation?group_by=POSTCODE:6722;6725:6429;6710

This returns three perturbed counts: those in postcodes 6722 or 6725; those in postcodes 6429 or 6710; and those who live in any other postcode:

{
  ...
  "values": [
    [
      "6722;6725",
      70
    ],
    [
      "6429;6710",
      57
    ],
    [
      "other",
      75
    ]
  ]
}

The query just mentioned corresponds to the following SQL statement:

SELECT count(*)
FROM sample
GROUP BY
    CASE
        WHEN (sample."POSTCODE" = '6722' OR sample."POSTCODE" = '6725') THEN '6722;6725'
        WHEN (sample."POSTCODE" = '6429' OR sample."POSTCODE" = '6710') THEN '6429;6710'
        ELSE 'other'
    END

In custom groups defined for fields marked is_ordered: true, such as the ROOMS field of the sample dataset, you can replace sequences (eg. 1;2;3) with ranges (eg. 1-3):

https://protari.example.com/v1/datasets/sample/aggregation?group_by=ROOMS:1-3

From the output, you will see that Protari has treated 1-3 as the same as 1;2;3:

{
  "fields": [
    {
      "as_string": "ROOMS:1-3",
      "custom_groups": {
        "else": {
          "name": "other"
        },
        "groups": [
          {
            "name": "1-3",
            "values": [
              {
                "exclusive_maximum": 2,
                "maximum": 1,
                "minimum": 1,
                "name": "1",
                "title": "1"
              },
              {
                "exclusive_maximum": 3,
                "maximum": 2,
                "minimum": 2,
                "name": "2",
                "title": "2"
              },
              {
                "exclusive_maximum": 4,
                "maximum": 3,
                "minimum": 3,
                "name": "3",
                "title": "3"
              }
            ]
          }
        ]
      },
      "name": "ROOMS",
      "title": "Number of Bedrooms"
    },
  ...
  "values": [
    [
      "1-3",
      55
    ],
    [
      "other",
      151
    ]
  ]
}

The query just mentioned corresponds to the following SQL statement:

SELECT count(*)
FROM sample
GROUP BY
    CASE
        WHEN (sample."ROOMS" = 1 OR sample."ROOMS" = 2 OR sample."ROOMS" = 3) THEN '1-3'
        ELSE 'other'
    END

You can even refer to hierarchical values in custom groups, eg. group_by=COUNTRY:1100:1201;1300. If you do this, Protari will convert each hierarchical value into a disjunction of its child values.

On range data fields, you can use the full range syntax to define the custom groups, eg:

https://protari.example.com/v1/datasets/sample/aggregation?group_by=ROOMS:0:1;2:3-6:>=7

returns:

{
  "fields": [
    {
      "as_string": "ROOMS:0:1;2:3-6:>=7",
      "custom_groups": {
        "else": {
          "name": "other"
        },
        "groups": [
          {
            "name": "0",
            "values": [
              {
                "exclusive_maximum": 1,
                "maximum": 0,
                "minimum": 0,
                "name": "0",
                "title": "None (including bedsits)"
              }
            ]
          },
          {
            "name": "1;2",
            "values": [
              {
                "exclusive_maximum": 2,
                "maximum": 1,
                "minimum": 1,
                "name": "1",
                "title": "1"
              },
              {
                "exclusive_maximum": 3,
                "maximum": 2,
                "minimum": 2,
                "name": "2",
                "title": "2"
              }
            ]
          },
          {
            "name": "3-6",
            "values": [
              {
                "exclusive_maximum": 4,
                "maximum": 3,
                "minimum": 3,
                "name": "3",
                "title": "3"
              },
              {
                "exclusive_maximum": 5,
                "maximum": 4,
                "minimum": 4,
                "name": "4",
                "title": "4"
              },
              {
                "exclusive_maximum": 6,
                "maximum": 5,
                "minimum": 5,
                "name": "5",
                "title": "5"
              },
              {
                "exclusive_maximum": 7,
                "maximum": 6,
                "minimum": 6,
                "name": "6",
                "title": "6"
              }
            ]
          },
          {
            "name": ">=7",
            "values": [
              {
                "exclusive_maximum": 8,
                "maximum": 7,
                "minimum": 7,
                "name": "7",
                "title": "7"
              },
              {
                "minimum": 8,
                "name": ">=8",
                "title": ">=8"
              }
            ]
          }
        ]
      },
      "name": "ROOMS",
      "title": "Number of Bedrooms"
    },
...
  "values": [
    [
      "0",
      14
    ],
    [
      "1;2",
      30
    ],
    [
      "3-6",
      112
    ],
    [
      ">=7",
      34
    ],
    [
      "other",
      10
    ]
  ]
}

The query just mentioned corresponds to the following SQL statement:

SELECT count(*)
FROM sample
GROUP BY
    CASE
        WHEN (sample."ROOMS" = 0) THEN '0'
        WHEN (sample."ROOMS" = 1 OR sample."ROOMS" = 2) THEN '1;2'
        WHEN (sample."ROOMS" = 3 OR sample."ROOMS" = 4 OR sample."ROOMS" = 5 OR sample."ROOMS" = 6) THEN '3-6'
        WHEN (sample."ROOMS" = 7 OR sample."ROOMS" >= 8 AND sample."ROOMS" IS NOT NULL AND sample."ROOMS" != '999') THEN '>=7'
        ELSE 'other'
    END

As it consists of a range of values, and ranges cannot include null or sentinel values, the >=7 group does not include the null value or the value 999.

Custom groups cannot overlap (eg. group_by=ROOMS:3-6:>=5 will raise an error).

The API allows you to specify up to 100 custom groups, each of which can be at most 1000 characters long, eg. RENT:a;b;c:d;e has 2 custom groups, of character lengths 5 and 3 respectively.

Using group by with multi-table datasets

On a multi-table dataset, not only can you group by fields from the main resource of the dataset, you can group by fields from resources related (ie. joined) to the main resource. The result of a query on a multi-table dataset that groups by a field from a resource other than the main resource applies to those units on the main resource (eg. people) who have at least one record in the joined resource (eg. people with at least one transaction).

For example,

https://protari.example.com/v1/datasets/joined_person/aggregation/csv?group_by=buyer_transactions.type

returns

buyer_transactions.type,perturbed count
A,48
B,51
C,47
D,57
E,54
F,64
G,46

This output indicates, for each type of transaction, the (perturbed) number of people who were a buyer in a transaction of that type.

The query corresponds to the following SQL statement:

SELECT count(*)
FROM
    (SELECT joined_person._id AS _id, transaction.type AS _internal_name_0
     FROM joined_person JOIN transaction ON transaction.person_id = joined_person._id 
     GROUP BY joined_person._id, transaction.type) AS multi_table_sql_joined_data
GROUP BY
    CASE
        WHEN (multi_table_sql_joined_data._internal_name_0 = 'A') THEN 'A'
        WHEN (multi_table_sql_joined_data._internal_name_0 = 'B') THEN 'B'
        WHEN (multi_table_sql_joined_data._internal_name_0 = 'C') THEN 'C'
        WHEN (multi_table_sql_joined_data._internal_name_0 = 'D') THEN 'D'
        WHEN (multi_table_sql_joined_data._internal_name_0 = 'E') THEN 'E'
        WHEN (multi_table_sql_joined_data._internal_name_0 = 'F') THEN 'F'
        WHEN (multi_table_sql_joined_data._internal_name_0 = 'G') THEN 'G'
        ELSE 'out-of-range'
    END

Note that the SQL statement involves a join. This join corresponds to the buyer_transactions relationship from the person resource to the transaction resource.

Another example of a group-by query on a multi-table dataset is the following:

https://protari.example.com/v1/datasets/joined_person/aggregation/csv?group_by=seller_transactions.item.item_name

This query returns

seller_transactions.item.item_name,perturbed count
A,11
B,16
C,15
D,16
NF,10
NP,15

This output indicates, for each item name, the (perturbed) number of people who were a seller in a transaction where the item transacted has that name.

The query corresponds to the following SQL statement:

SELECT count(*)
FROM
    (SELECT joined_person._id AS _id, item.item_name AS _internal_name_0
     FROM joined_person JOIN transaction ON transaction.provider_id = joined_person._id JOIN item ON transaction.item_id = item._id
     GROUP BY joined_person._id, item.item_name) AS multi_table_sql_joined_data 
GROUP BY
    CASE
        WHEN (multi_table_sql_joined_data._internal_name_0 = 'A') THEN 'A'
        WHEN (multi_table_sql_joined_data._internal_name_0 = 'B') THEN 'B'
        WHEN (multi_table_sql_joined_data._internal_name_0 = 'C') THEN 'C'
        WHEN (multi_table_sql_joined_data._internal_name_0 = 'D') THEN 'D'
        WHEN (multi_table_sql_joined_data._internal_name_0 = 'NF') THEN 'NF'
        WHEN (multi_table_sql_joined_data._internal_name_0 = 'NP') THEN 'NP'
        ELSE 'out-of-range'
    END

Note that the SQL statement involves two joins: the first corresponds to the seller_transactions relationship from the person resource to the transaction resource, and the second corresponds to the item relationship from the transaction resource to the item resource.

where

You can restrict the query to records meeting a condition with the "where" parameter. Protari where clauses resemble SQL where clauses, but (for reasons of privacy protection) the types of conditions allowed in the former are fewer than those allowed in the latter. For string fields, only equality is supported (you can include more than one value by separating them with semi-colons). For range data fields (ie. numeric data fields and dates), equality and ranges are both supported. You can include multiple conditions (provided they are on different fields). If there are multiple conditions, they are treated as a single conjunctive condition (ie. a condition that is satisfied if and only if each of the multiple conditions is satisfied).

Eg. to count the total number of males, and return a CSV file:

https://protari.example.com/v1/datasets/sample_longitudinal/aggregation/csv?where=SEX=M

which returns:

YEAR,perturbed count
2000,71
2005,60

This query corresponds to the following SQL statement:

SELECT count(*)
FROM sample_longitudinal
WHERE sample_longitudinal."SEX" = 'M'
GROUP BY
    CASE
        WHEN (sample_longitudinal."YEAR" = '2000') THEN '2000'
        WHEN (sample_longitudinal."YEAR" = '2005') THEN '2005'
        ELSE 'out-of-range'
    END

You can request records with a null or empty value as follows:

https://protari.example.com/v1/datasets/sample_longitudinal/aggregation?where=SEX=

You can also request ranges on any fields marked is_ordered: true, using the <, >= and - operators. The operators <= and > are not supported.

https://protari.example.com/v1/datasets/sample_longitudinal/aggregation?where=ROOMS>=8
https://protari.example.com/v1/datasets/sample_longitudinal/aggregation?where=AMOUNT=10-19.99
https://protari.example.com/v1/datasets/sample_longitudinal/aggregation?where=AMOUNT>=100
https://protari.example.com/v1/datasets/sample/aggregation?where=EXPIRY=2010-01-01/2014-12-31

Ranges are inclusive of both ends.

Note that date fields use / instead of - to separate the start and end of the range, since - is already used in the date format.

The first and last of the previous four queries respectively correspond to the following two SQL statements:

SELECT count(*)
FROM sample_longitudinal
WHERE sample_longitudinal."ROOMS" >= 8 AND sample_longitudinal."ROOMS" IS NOT NULL AND sample_longitudinal."ROOMS" != '999'
GROUP BY
    CASE
        WHEN (sample_longitudinal."YEAR" = '2000') THEN '2000'
        WHEN (sample_longitudinal."YEAR" = '2005') THEN '2005'
        ELSE 'out-of-range'
    END
SELECT count(*)
FROM sample
WHERE sample."EXPIRY" >= 2010-01-01 AND sample."EXPIRY" < 2015-01-01 AND sample."EXPIRY" IS NOT NULL

If disjunctions are allowed (ie. disjunctions_and_custom_groups is true), you can combine different values on the same field into the one query. Eg:

https://protari.example.com/v1/datasets/sample/aggregation?where=ROOMS=1;3-5;>=7
https://protari.example.com/v1/datasets/sample/aggregation?where=AMOUNT=10-39.99
https://protari.example.com/v1/datasets/sample/aggregation?where=AMOUNT>=40
https://protari.example.com/v1/datasets/sample/aggregation?where=EXPIRY=2010-01-01/2019-12-31
https://protari.example.com/v1/datasets/sample/aggregation?where=ROOMS>=4
https://protari.example.com/v1/datasets/sample/aggregation?where=SEX=M;F
https://protari.example.com/v1/datasets/sample/aggregation?where=ROOMS=1-5,DWELL=001

The where clause in the first example above is returned with the result as follows:

    "where": [
      {
        "as_string": "ROOMS=1;3-5;>=7",
        "condition_values": {
          "equals": [
            {
              "exclusive_maximum": 2,
              "maximum": 1,
              "minimum": 1,
              "name": "1",
              "title": "1"
            },
            {
              "exclusive_maximum": 4,
              "maximum": 3,
              "minimum": 3,
              "name": "3",
              "title": "3"
            },
            {
              "exclusive_maximum": 5,
              "maximum": 4,
              "minimum": 4,
              "name": "4",
              "title": "4"
            },
            {
              "exclusive_maximum": 6,
              "maximum": 5,
              "minimum": 5,
              "name": "5",
              "title": "5"
            },
            {
              "exclusive_maximum": 8,
              "maximum": 7,
              "minimum": 7,
              "name": "7",
              "title": "7"
            },
            {
              "minimum": 8,
              "name": ">=8",
              "title": ">=8"
            }
          ]
        },
        "name": "ROOMS",
        "title": "Number of Bedrooms",
        "values_as_received": {
          "equals": [
            {
              "as_string": "1"
            },
            {
              "as_string": "3-5"
            },
            {
              "as_string": ">=7"
            }
          ]
        }
      }
    ]

The query currently under consideration corresponds to the following SQL statement:

SELECT count(*)
FROM sample
WHERE sample."ROOMS" = 1 OR sample."ROOMS" = 3 OR sample."ROOMS" = 4 OR sample."ROOMS" = 5 OR sample."ROOMS" = 7 OR sample."ROOMS" >= 8 AND sample."ROOMS" IS NOT NULL AND sample."ROOMS" != '999'

The last query in the above list of example queries, which is reproduced below, illustrates the inclusion of multiple conditions in a where clause:

https://protari.example.com/v1/datasets/sample/aggregation?where=ROOMS=1-5,DWELL=001

The where clause for this query is returned as follows:

    "where": [
      {
        "as_string": "ROOMS=1-5",
        "condition_values": {
          "equals": [
            {
              "exclusive_maximum": 2,
              "maximum": 1,
              "minimum": 1,
              "name": "1",
              "title": "1"
            },
            {
              "exclusive_maximum": 3,
              "maximum": 2,
              "minimum": 2,
              "name": "2",
              "title": "2"
            },
            {
              "exclusive_maximum": 4,
              "maximum": 3,
              "minimum": 3,
              "name": "3",
              "title": "3"
            },
            {
              "exclusive_maximum": 5,
              "maximum": 4,
              "minimum": 4,
              "name": "4",
              "title": "4"
            },
            {
              "exclusive_maximum": 6,
              "maximum": 5,
              "minimum": 5,
              "name": "5",
              "title": "5"
            }
          ]
        },
        "name": "ROOMS",
        "title": "Number of Bedrooms",
        "values_as_received": {
          "equals": [
            {
              "as_string": "1-5"
            }
          ]
        }
      },
      {
        "as_string": "DWELL=001",
        "condition_values": {
          "equals": [
            {
              "description": "A separate or semi-detached house",
              "name": "001",
              "title": "House"
            }
          ]
        },
        "description": "Type of dwelling of usual residence",
        "name": "DWELL",
        "title": "Dwelling Type",
        "values_as_received": {
          "equals": [
            {
              "as_string": "001"
            }
          ]
        }
      }
    ]
  },
  "values": [
    [
      44
    ]
  ]
}

The query currently under consideration corresponds to the following SQL statement:

SELECT count(*)
FROM sample
WHERE (sample."ROOMS" = 1 OR sample."ROOMS" = 2 OR sample."ROOMS" = 3 OR sample."ROOMS" = 4 OR sample."ROOMS" = 5) AND sample."DWELL" = '001'

Thus the query returns the count of people who have between 1 and 5 rooms (inclusive) and have a dwelling of type "001".

If disjunctions are not allowed (ie. disjunctions_and_custom_groups is false), you can still request different values on the same field as long as you also group by the same field, eg.

https://protari.example.com/v1/datasets/sample_longitudinal/aggregation?group_by=ROOMS&where=ROOMS=3-5,YEAR=2000

This query corresponds to the following SQL statement:

SELECT count(*)
FROM sample_longitudinal
WHERE (sample_longitudinal."ROOMS" = 3 OR sample_longitudinal."ROOMS" = 4 OR sample_longitudinal."ROOMS" = 5) AND sample_longitudinal."YEAR" = '2000'
GROUP BY
    CASE
        WHEN (sample_longitudinal."ROOMS" = 3) THEN '3'
        WHEN (sample_longitudinal."ROOMS" = 4) THEN '4'
        WHEN (sample_longitudinal."ROOMS" = 5) THEN '5'
        ELSE 'out-of-range'
    END,
    CASE
        WHEN (sample_longitudinal."YEAR" = '2000') THEN '2000'
        WHEN (sample_longitudinal."YEAR" = '2005') THEN '2005'
        ELSE 'out-of-range'
    END

It returns the values:

  "values": [
    [
      "3",
      "2000",
      28
    ],
    [
      "4",
      "2000",
      44
    ],
    [
      "5",
      "2000",
      27
    ]
  ]

In this case, totals=true will not total across this field.

If you use a range that is not aligned with the allowed intervals, eg.

https://protari.example.com/v1/datasets/sample/aggregation?where=AMOUNT=0-13

the result is an error with a suggestion for an allowed query:

{
  "detail": "Upper bound 13 not allowed - try 9.99 or 19.99 instead.",
  "status": 400,
  "title": "",
  "type": "about:blank"
}

On fields marked is_ordered: true, you can also replace sequences such as 0;1;2;3 with 0-3 (as noted earlier for custom groups). The COUNTRY field of the sample dataset is a good example:

https://protari.example.com/v1/datasets/sample/aggregation?where=COUNTRY=1101-1201

This returns (note the 1101-1201 has been expanded to the equivalent 1101;1102;1199;1201):

{
  ...
  "query": {
    ...
    "where": [
      {
        "as_string": "COUNTRY=1101-1201",
        "condition_values": {
          "equals": [
            {
              "description": "Mainland Australia (ie. excluding Norfolk Island and other external territories)",
              "name": "1101",
              "title": "Australia"
            },
            {
              "name": "1102",
              "title": "Norfolk Island"
            },
            {
              "name": "1199",
              "title": "Australian External Territories"
            },
            {
              "name": "1201",
              "title": "New Zealand"
            }
          ]
        },
        "name": "COUNTRY",
        "title": "Country of Birth",
        "values_as_received": {
          "equals": [
            {
              "as_string": "1101-1201"
            }
          ]
        }
      }
    ]
  },
  "values": [
    [
      128
    ]
  ]
}

The API allows you to specify a maximum of 60 alternative values on each condition field, eg. RENT=a;b;c;d;e,SEX=M has 5 alternative values for RENT, and 1 for SEX.

Using where with hierarchical values

When you use a hierarchical value in a where clause, the API will expand it into its child values. To see this in action, try this query:

https://protari.example.com/v1/datasets/sample/aggregation?where=COUNTRY=1100

You will receive a result like:

{
  "fields": [
    {
      "as_string": "perturbed count",
      "function": {
        "name": "count"
      },
      "is_perturbed": true,
      "is_result": true,
      "name": "perturbed count",
      "title": "perturbed count"
    }
  ],
  "header": {
    "api_version": "1.4.0",
    "base_url": "https://protari.example.com/v1",
    "data_terms": "This data was generated using Protari. You can share it according to these [terms and conditions](http://www.example.com/terms-of-sharing-data). Protari perturbs (slightly adjusts) data outputs to protect privacy. No reliance should be placed on data cells with small values.",
    "organization": {
      "name": "Sample",
      "title": "Sample organization"
    },
    "prepared_at": "2019-02-14T04:12:42.083285Z",
    "terms": "By using Protari you acknowledge that you are authorised to do so and your use will comply with these [terms and conditions](http://www.example.com/terms-of-use). All activity is logged and may be audited."
  },
  "query": {
    "as_string": "function=count&where=COUNTRY=1100",
    "dataset": {
      "resources": [
        {
          "name": "person",
          "title": "person",
          "unit": {
            "plural": "people",
            "singular": "person"
          }
        }
      ]
    },
    "function": [
      {
        "as_string": "count",
        "type": {
          "name": "count",
          "parameter_types": [
            {
              "description": "A field whose sentinel values should be excluded from the count.",
              "is_required": false
            }
          ]
        }
      }
    ],
    "where": [
      {
        "as_string": "COUNTRY=1100",
        "condition_values": {
          "equals": [
            {
              "description": "Mainland Australia (ie. excluding Norfolk Island and other external territories)",
              "name": "1101",
              "title": "Australia"
            },
            {
              "name": "1102",
              "title": "Norfolk Island"
            },
            {
              "name": "1199",
              "title": "Australian External Territories"
            }
          ]
        },
        "name": "COUNTRY",
        "title": "Country of Birth",
        "values_as_received": {
          "equals": [
            {
              "as_string": "1100"
            }
          ]
        }
      }
    ]
  },
  "values": [
    [
      107
    ]
  ]
}

In the returned where clause, you can see that the API has converted the field value 1100 into its child values 1101, 1102 and 1199. That's because 1100 is configured as a "parent" value. The received query values (here, only 1100) are also returned in values_as_received.

The conversion of hierarchical values into child values is reflected in the corresponding SQL statement:

SELECT count(*)
FROM sample
WHERE sample."COUNTRY" = '1101' OR sample."COUNTRY" = '1102' OR sample."COUNTRY" = '1199'

Note that you cannot use hierarchical values in a range query, even if the field is marked is_ordered: true. For example, the query:

https://protari.example.com/v1/datasets/sample/aggregation?where=COUNTRY<1100

Yields the error:

{
  "detail": "Unable to use hierachical value 1100 in a range query",
  "name": "InvalidValueError",
  "status": 400,
  "title": "Unknown value"
}

Using where with longitudinal values

As discussed in the section on group-by queries, you can use the @ operator to refer to the value of a non-longitudinal field "at" a particular value of a longitudinal field. This operator can also be used in where clauses. For example, the query

https://protari.example.com/v1/datasets/sample_longitudinal/aggregation/csv?where=DWELL@2000=003

asks for the number of people whose dwelling was of type "003" in the year 2000, broken down according to year (since Protari automatically groups by longitudinal fields). It returns

YEAR,perturbed count
2000,75
2005,60

This result indicates that

The query is equivalent to the following SQL statement:

SELECT count(*)
FROM sample_longitudinal JOIN (SELECT sample_longitudinal."DWELL" AS "DWELL", sample_longitudinal."UID" AS "UID"
                               FROM sample_longitudinal
                               WHERE sample_longitudinal."YEAR" = '2000') AS _derived_0 ON _derived_0."UID" = sample_longitudinal."UID" 
WHERE _derived_0."DWELL" = '003'
GROUP BY
    CASE
        WHEN (sample_longitudinal."YEAR" = '2000') THEN '2000'
        WHEN (sample_longitudinal."YEAR" = '2005') THEN '2005'
        ELSE 'out-of-range'
    END

Using where with multi-table datasets

This is very similar to using group by with multi-table datasets, eg.

https://protari.example.com/v1/datasets/joined_person/aggregation/csv?where=buyer_transactions.type=A

returns:

perturbed count
48

This example query, whose corresponding SQL statement is given below, is the count of all people who had at least one transaction of type A where they were a buyer.

SELECT count(*)
FROM
    (SELECT joined_household._id AS _id
     FROM joined_household JOIN joined_person ON joined_person.household_id = joined_household._id JOIN transaction ON transaction.person_id = joined_person._id 
     WHERE transaction.type = 'A'
     GROUP BY joined_household._id) AS multi_table_sql_joined_data

function

So far, all the requests have returned counts. You can also request means and sums on integer and decimal fields by using the function parameter, eg.

https://protari.example.com/v1/datasets/sample_longitudinal/aggregation/csv?function=mean ROOMS

which returns

YEAR,perturbed mean of ROOMS
2000,4.37
2005,4.36

In this result, sentinel and "null" values of ROOMS have been excluded. (Eg. 999 is a "sentinel" value because it corresponds to an unknown number of rooms. Including it in the mean would skew the results.)

You can see the list of excluded values (sentinels and null values) in the JSON output:

https://protari.example.com/v1/datasets/sample_longitudinal/aggregation?function=mean ROOMS
  "query": {
    "as_string": "function=mean ROOMS",
    "dataset": {
      "resources": [
        {
          "name": "person",
          "title": "person",
          "unit": {
            "plural": "people",
            "singular": "person"
          }
        }
      ]
    },
    "function": [
      {
        "as_string": "mean ROOMS",
        "parameters": [
          {
            "as_string": "ROOMS",
            "condition_values": {
              "not_equals": [
                {
                  "as_string": "null"
                },
                {
                  "as_string": "999"
                }
              ]
            },
            "name": "ROOMS",
            "title": "Number of Bedrooms"
          }
        ],
        "type": {
          "name": "mean",
          "parameter_types": [
            {
              "description": "The field whose non-sentinel values are to be averaged.",
              "is_required": true,
              "requires_numeric_data": true
            }
          ]
        }
      }
    ]
    ...
  }

The result has also been rounded to an appropriate number of decimal places (in this case because ROOMS is an integer field, but more generally based on the field's decimal_places).

The SQL statement that corresponds to the example query is given below:

SELECT avg(sample_longitudinal."ROOMS")
FROM sample_longitudinal
WHERE NOT (sample_longitudinal."ROOMS" IS NULL OR sample_longitudinal."ROOMS" = '999')
GROUP BY
    CASE
        WHEN (sample_longitudinal."YEAR" = '2000') THEN '2000'
        WHEN (sample_longitudinal."YEAR" = '2005') THEN '2005'
        ELSE 'out-of-range'
    END

The sum query

https://protari.example.com/v1/datasets/sample_longitudinal/aggregation/csv?function=sum AMOUNT

which returns

YEAR,perturbed sum of AMOUNT
2000,9515.1
2005,8028.88

corresponds to the following SQL statement:

SELECT sum(sample_longitudinal."AMOUNT") 
FROM sample_longitudinal 
GROUP BY
    CASE
        WHEN (sample_longitudinal."YEAR" = '2000') THEN '2000'
        WHEN (sample_longitudinal."YEAR" = '2005') THEN '2005'
        ELSE 'out-of-range'
    END

Since the AMOUNT field does not have any null or sentinel values, all values of the field are included in the summation.

You can also exclude a field's sentinel and null values from counts, by including the field as an argument to an explicit count function. For example, the query

https://protari.example.com/v1/datasets/sample_longitudinal/aggregation?function=count ROOMS

returns counts of 177 in 2000 and 151 in 2005 for people with valid ROOMS values, and the query

https://protari.example.com/v1/datasets/sample_longitudinal/aggregation?function=count ROOMS

returns counts of 137 in 2000 and 128 in 2005 for people with a stated SEX. Recall that the total count is 199 for each year.

Requesting Multiple Functions

Optionally, the data custodian may allow users to request more than one function at a time (by default you cannot). Be aware that in this case, sentinel and null values for all function fields are excluded for all the results.

So, for example:

https://protari.example.com/v1/datasets/sample_longitudinal/aggregation/csv?function=mean ROOMS,count SEX

returns

YEAR,perturbed count of SEX,perturbed mean of ROOMS
2000,123,4.32
2005,114,4.38

ie. 123 for the count in 2000 and 114 in 2005 - these are the records that have both a valid ROOMS value and a stated SEX.

Because this is potentially misleading, max_functions defaults to 1 if it is not included in the global settings.

Using function with multi-table datasets

On a multi-table dataset, you can only request functions of fields from the main resource. For example, the query

https://protari.example.com/v1/datasets/joined_person/aggregation?function=mean household.RENT

on the joined_person dataset returns

{
    "detail": "Cannot aggregate on quantities from household.",
    "name": "InvalidQueryError",
    "status": 400,
    "title": "Invalid query"
}

because the household resource is not the main resource of the dataset.

totals

Add totals=true to retrieve all the cross-totals in the one request. This is only available with the standard JSON format. For example, not only does the query

https://protari.example.com/v1/datasets/sample/aggregation?group_by=DWELL,SEX&totals=true

return the result of grouping by both DWELL and SEX, it also returns the result of grouping by DWELL on its own, the result of grouping by SEX on its own, and the result of a simple count query without grouping by any field. Protari obtains these results by implicitly performing several queries, one for each result. The response to the example query is shown below:

{
  "fields": [
    {
      "as_string": "DWELL",
      "description": "Type of dwelling of usual residence",
      "name": "DWELL",
      "title": "Dwelling Type"
    },
    {
      "as_string": "SEX",
      "name": "SEX",
      "title": "Sex"
    },
    {
      "as_string": "perturbed count",
      "function": {
        "name": "count"
      },
      "is_perturbed": true,
      "is_result": true,
      "name": "perturbed count",
      "title": "perturbed Count"
    }
  ],
  ...
  "query": {
    "as_string": "function=count&group_by=DWELL,SEX",
    "dataset": {
      "name": "sample",
      "resources": [
        {
          "name": "person",
          "title": "person",
          "unit": {
            "plural": "people",
            "singular": "person"
          }
        }
      ]
    },
    "function": [
      {
        "as_string": "count",
        "type": {
          "name": "count",
          "parameter_types": [
            {
              "is_required": false,
              "title": "exclude any sentinel (eg. missing) values of"
            }
          ],
          "title": "Count"
        }
      }
    ],
    "group_by": [
      {
        "as_string": "DWELL",
        "description": "Type of dwelling of usual residence",
        "name": "DWELL",
        "title": "Dwelling Type"
      },
      {
        "as_string": "SEX",
        "name": "SEX",
        "title": "Sex"
      }
    ]
  },
  "totals": [
    {
      "fields": [
        {
          "as_string": "perturbed count",
          "function": {
            "name": "count"
          },
          "is_perturbed": true,
          "is_result": true,
          "name": "perturbed count",
          "title": "perturbed Count"
        }
      ],
      "query": {
        "as_string": "function=count",
        "dataset": {
          "name": "sample",
          "resources": [
            {
              "name": "person",
              "title": "person",
              "unit": {
                "plural": "people",
                "singular": "person"
              }
            }
          ]
        },
        "function": [
          {
            "as_string": "count",
            "type": {
              "name": "count",
              "parameter_types": [
                {
                  "is_required": false,
                  "title": "exclude any sentinel (eg. missing) values of"
                }
              ],
              "title": "Count"
            }
          }
        ]
      },
      "values": [
        [
          204
        ]
      ]
    },
    {
      "fields": [
        {
          "as_string": "DWELL",
          "description": "Type of dwelling of usual residence",
          "name": "DWELL",
          "title": "Dwelling Type"
        },
        {
          "as_string": "perturbed count",
          "function": {
            "name": "count"
          },
          "is_perturbed": true,
          "is_result": true,
          "name": "perturbed count",
          "title": "perturbed Count"
        }
      ],
      "query": {
        "as_string": "function=count&group_by=DWELL",
        "dataset": {
          "name": "sample",
          "resources": [
            {
              "name": "person",
              "title": "person",
              "unit": {
                "plural": "people",
                "singular": "person"
              }
            }
          ]
        },
        "function": [
          {
            "as_string": "count",
            "type": {
              "name": "count",
              "parameter_types": [
                {
                  "is_required": false,
                  "title": "exclude any sentinel (eg. missing) values of"
                }
              ],
              "title": "Count"
            }
          }
        ],
        "group_by": [
          {
            "as_string": "DWELL",
            "description": "Type of dwelling of usual residence",
            "name": "DWELL",
            "title": "Dwelling Type"
          }
        ]
      },
      "values": [
        [
          "001",
          60
        ],
        [
          "002",
          76
        ],
        [
          "003",
          63
        ]
      ]
    },
    {
      "fields": [
        {
          "as_string": "SEX",
          "name": "SEX",
          "title": "Sex"
        },
        {
          "as_string": "perturbed count",
          "function": {
            "name": "count"
          },
          "is_perturbed": true,
          "is_result": true,
          "name": "perturbed count",
          "title": "perturbed Count"
        }
      ],
      "query": {
        "as_string": "function=count&group_by=SEX",
        "dataset": {
          "name": "sample",
          "resources": [
            {
              "name": "person",
              "title": "person",
              "unit": {
                "plural": "people",
                "singular": "person"
              }
            }
          ]
        },
        "function": [
          {
            "as_string": "count",
            "type": {
              "name": "count",
              "parameter_types": [
                {
                  "is_required": false,
                  "title": "exclude any sentinel (eg. missing) values of"
                }
              ],
              "title": "Count"
            }
          }
        ],
        "group_by": [
          {
            "as_string": "SEX",
            "name": "SEX",
            "title": "Sex"
          }
        ]
      },
      "values": [
        [
          "F",
          64
        ],
        [
          "M",
          78
        ],
        [
          "",
          67
        ]
      ]
    }
  ],
  "values": [
    [
      "001",
      "F",
      22
    ],
    [
      "001",
      "M",
      24
    ],
    [
      "001",
      "",
      19
    ],
    [
      "002",
      "F",
      25
    ],
    [
      "002",
      "M",
      33
    ],
    [
      "002",
      "",
      22
    ],
    [
      "003",
      "F",
      15
    ],
    [
      "003",
      "M",
      26
    ],
    [
      "003",
      "",
      27
    ]
  ]
}

This response indicates that the total number of people is 204, of which

Two relatively complex queries

To illustrate simultaneous usage of the various possible components of a query, this section discusses a couple of examples of relatively complex queries.

First, suppose that you wish to query the multi-table joined_person dataset to find out the (perturbed) number of people who have a stated sex and an income between $40,000 and $49,999 per year, broken down by the postcodes of their households. You can determine the number in question by issuing the following query:

https://protari.example.com/v1/datasets/joined_person/aggregation/csv?group_by=household.POSTCODE,SEX&where=SEX=M;F,INCOME=40-49.999

This query, which includes

returns

household.POSTCODE,SEX,perturbed count
6722,M,0
6722,F,7
6725,M,0
6725,F,0
0872,M,0
0872,F,0
6762,M,0
6762,F,0
6642,M,0
6642,F,0
6710,M,0
6710,F,0
6429,M,0
6429,F,0

This result indicates that there are 7 females earning between $40,000 and $49,999 per year whose household has postcode 6722 and, moreover, that if there are any other people in that income bracket, they must be of unstated sex. (The fact that there are many zero perturbed counts is a consequence of the perturbation algorithm used by Protari, which perturbs all very low true counts to zero.)

The query corresponds to the following SQL statement:

SELECT count(*)
FROM
    (SELECT joined_person._id AS _id, joined_household."POSTCODE" AS _internal_name_0, joined_person."SEX" AS _internal_name_1
     FROM joined_person JOIN joined_household ON joined_person.household_id = joined_household._id
     WHERE (joined_person."SEX" = 'M' OR joined_person."SEX" = 'F') AND joined_person."INCOME" >= 40 AND joined_person."INCOME" < 50
     GROUP BY joined_person._id, joined_household."POSTCODE", joined_person."SEX", joined_person."RECORD_KEY") AS multi_table_sql_joined_data 
GROUP BY
    CASE
        WHEN (multi_table_sql_joined_data._internal_name_0 = '6722') THEN '6722'
        WHEN (multi_table_sql_joined_data._internal_name_0 = '6725') THEN '6725'
        WHEN (multi_table_sql_joined_data._internal_name_0 = '0872') THEN '0872'
        WHEN (multi_table_sql_joined_data._internal_name_0 = '6762') THEN '6762'
        WHEN (multi_table_sql_joined_data._internal_name_0 = '6642') THEN '6642'
        WHEN (multi_table_sql_joined_data._internal_name_0 = '6710') THEN '6710'
        WHEN (multi_table_sql_joined_data._internal_name_0 = '6429') THEN '6429'
        ELSE 'out-of-range'
    END,
    CASE
        WHEN (multi_table_sql_joined_data._internal_name_1 = 'F') THEN 'F'
        WHEN (multi_table_sql_joined_data._internal_name_1 = 'M') THEN 'M'
        WHEN (multi_table_sql_joined_data._internal_name_1 = '') THEN ''
        ELSE 'out-of-range'
    END

Note that the condition on the INCOME field refers to a single component range (ie. a single value) of the field. If one wished to allow for a larger range of incomes - say the range of incomes between $20,000 and $69,999 per year - it is not sufficient to simply replace INCOME=40-49.999 with INCOME=20-69.999 in the where clause of the query. One would also need to include the INCOME field in the group-by clause (since the new condition on the INCOME field would involve multiple values). Note also that attention is restricted to people with stated sex by means of a disjunctive condition on the SEX field involving the values "M" and "F". Such restriction cannot be accomplished by means of a condition that is satisfied if and only if the value of the SEX field is not "" (ie. the empty string, which represents an unstated sex), because Protari does not include a negation operator (in contrast to SQL, which includes the NOT operator). Thus conditions involving negations are possible for fields having more than two values only if the allow_disjunctions_and_custom_groups flag for the dataset is set to true.

Second, suppose that you wish to query the sample dataset to find out the (perturbed) number of people who reside in Australia (including Norfolk Island and Australian External Territories) or Melanesia (excluding New Caledonia) and whose residence has between 0 and 4 bedrooms, broken down according to their countries of residence and whether or not their dollar amounts are "extreme" (where a dollar amount is considered to be extreme if it is less than 10,000 or greater than or equal to 90,000). You can determine the number in question by issuing the following query:

https://protari.example.com/v1/datasets/sample/aggregation/csv?group_by=COUNTRY,AMOUNT:10-89.99&where=COUNTRY=1100;1302;1303;1304,ROOMS=0-4

This query, which includes

returns

COUNTRY,AMOUNT,perturbed count
1101,10-89.99,31
1101,other,9
1102,10-89.99,4
1102,other,0
1199,10-89.99,3
1199,other,5
1302,10-89.99,0
1302,other,0
1303,10-89.99,6
1303,other,0
1304,10-89.99,0
1304,other,5

and corresponds to the following SQL statement:

SELECT count(*)
FROM sample
WHERE (sample."COUNTRY" = '1101' OR sample."COUNTRY" = '1102' OR sample."COUNTRY" = '1199' OR sample."COUNTRY" = '1302' OR sample."COUNTRY" = '1303' OR sample."COUNTRY" = '1304') AND (sample."ROOMS" = 0 OR sample."ROOMS" = 1 OR sample."ROOMS" = 2 OR sample."ROOMS" = 3 OR sample."ROOMS" = 4)
GROUP BY
    CASE 
        WHEN (sample."COUNTRY" = '1000') THEN '1000'
        WHEN (sample."COUNTRY" = '1100') THEN '1100'
        WHEN (sample."COUNTRY" = '1101') THEN '1101'
        WHEN (sample."COUNTRY" = '1102') THEN '1102'
        WHEN (sample."COUNTRY" = '1199') THEN '1199'
        WHEN (sample."COUNTRY" = '1201') THEN '1201'
        WHEN (sample."COUNTRY" = '1300') THEN '1300'
        WHEN (sample."COUNTRY" = '1301') THEN '1301'
        WHEN (sample."COUNTRY" = '1302') THEN '1302'
        WHEN (sample."COUNTRY" = '1303') THEN '1303'
        WHEN (sample."COUNTRY" = '1304') THEN '1304'
        WHEN (sample."COUNTRY" = '9999') THEN '9999'
        WHEN (sample."COUNTRY" IS NULL) THEN 'null'
        ELSE 'out-of-range'
    END,
    CASE
        WHEN (sample."AMOUNT" >= 10 AND sample."AMOUNT" < 20 OR sample."AMOUNT" >= 20 AND sample."AMOUNT" < 30 OR sample."AMOUNT" >= 30 AND sample."AMOUNT" < 40 OR sample."AMOUNT" >= 40 AND sample."AMOUNT" < 50 OR sample."AMOUNT" >= 50 AND sample."AMOUNT" < 60 OR sample."AMOUNT" >= 60 AND sample."AMOUNT" < 70 OR sample."AMOUNT" >= 70 AND sample."AMOUNT" < 80 OR sample."AMOUNT" >= 80 AND sample."AMOUNT" < 90) THEN '10-89.99'
        ELSE 'other'
    END

The condition on the COUNTRY field in the where clause of the query ensures that all countries other than those in Australia (including Norfolk Island and Australian External Territories) or Melanesia (excluding New Caledonia) are excluded from the grouping for the COUNTRY field. The "other" group for the AMOUNT field corresponds to extreme dollar amounts.

Providing an Authorization Header

All requests can include an Authorization header to access datasets with authorization requirements, using a tool like curl. For example, when using OpenID Connect, you will need to get a token from the authentication provider, and provide it in the request header. Specifically, replace TOKEN in the following with the actual token:

curl -X GET --header 'Accept: application/json' --header 'authorization: Bearer TOKEN' 'https://protari.example.com/v1/datasets/'