Range and Numeric Data Fields

Range data fields are fields that can only be queried in allowed ranges. This includes both fields with numeric data (eg. with an exact weekly rent amount, an integer age in years), and date fields (eg. a transaction date). It does not include integer fields where the values are categorical (eg. a numeric country code or postcode).

Fields with type "number" or "date" are always considered to have range data; fields with type "integer" may or may not.

A field is considered to have numeric range data if and only if:

In the first case, the type defaults to number, if not already explicitly defined. In the remaining cases, the type defaults to integer, if not already explicitly defined.

Hierarchical values are not allowed on range fields.

Numeric fields must meet the following conditions or else Protari will issue an error and not start:

The only sentinel value allowed on date fields is "null".

Further, if the "db-values" validation check is performed (the default), then:

You can configure a range field so that Protari restricts the range over which it can be queried. Eg. You can restrict ages above 100 from being queried, except as a group.

If an integer field is not numeric, it is categorical. You can still query categorical integer fields using the range notation, eg. the COUNTRY field in the sample dataset:

http://localhost:8080/v1/datasets/sample/aggregation?where=COUNTRY=1000-1110

Note that range notation - whether on categorical or range data fields - does not expand into nested values; in the example above, the returned count is 100, whereas the query http://localhost:8080/v1/datasets/sample/aggregation?where=COUNTRY=1000 counts the number of people from Oceania (1000), which includes all the values to at least 1304, and returns 178.

The following sections give examples of configuring numeric data fields.

A numeric integer field

This is the "ROOMS" field from the sample dataset:

  {
    "fields": [
      {
        "can_be_below_minimum": false,
        "maximum": 7,
        "minimum": 0,
        "name": "ROOMS",
        "title": "Number of Bedrooms",
        "type": "integer",
        "values": [
          {
            "name": 0,
            "title": "None (including bedsits)"
          },
          {
            "is_sentinel": true,
            "name": null,
            "title": "Not stated"
          },
          {
            "is_sentinel": true,
            "name": 999,
            "title": "Not applicable"
          }
        ]
      }
    ]
  }

Note:

http://localhost:8080/v1/datasets/sample presents this as:

        {
          "can_be_above_maximum": true,
          "can_be_below_minimum": false,
          "exclusive_maximum": 8,
          "has_numeric_data": true,
          "has_range_data": true,
          "maximum": 7,
          "minimum": 0,
          "name": "ROOMS",
          "title": "Number of Bedrooms",
          "type": "integer",
          "values": [
            {
              "is_sentinel": true,
              "name": "null",
              "title": "Not stated"
            },
            {
              "is_sentinel": true,
              "name": "999",
              "title": "Not applicable"
            },
            {
              "name": "0",
              "title": "None (including bedsits)",
              "value": 0
            },
            {
              "name": "1",
              "title": "1",
              "value": 1
            },
            {
              "name": "2",
              "title": "2",
              "value": 2
            },
            {
              "name": "3",
              "title": "3",
              "value": 3
            },
            {
              "name": "4",
              "title": "4",
              "value": 4
            },
            {
              "name": "5",
              "title": "5",
              "value": 5
            },
            {
              "name": "6",
              "title": "6",
              "value": 6
            },
            {
              "name": "7",
              "title": "7",
              "value": 7
            },
            {
              "exclusions": [
                999
              ],
              "minimum": 8,
              "name": ">=8",
              "title": ">=8"
            }
          ]
        }

Here are some valid queries on ROOMS:

http://localhost:8080/v1/datasets/sample/aggregation?where=ROOMS=3
http://localhost:8080/v1/datasets/sample/aggregation?where=ROOMS=0
http://localhost:8080/v1/datasets/sample/aggregation?where=ROOMS=null
http://localhost:8080/v1/datasets/sample/aggregation?where=ROOMS=3-6
http://localhost:8080/v1/datasets/sample/aggregation?where=ROOMS=2;4-6
http://localhost:8080/v1/datasets/sample/aggregation?where=ROOMS>=4
http://localhost:8080/v1/datasets/sample/aggregation?group_by=ROOMS

The last query returns the record count for each value of ROOMS, eg:

  "values": [
    [
      "0",
      12
    ],
    [
      "1",
      15
    ],
    [
      "2",
      18
    ],
    [
      "3",
      25
    ],
    [
      "4",
      44
    ],
    [
      "5",
      37
    ],
    [
      "6",
      9
    ],
    [
      "7",
      12
    ],
    [
      "999",
      9
    ],
    [
      ">=8",
      24
    ],
    [
      "null",
      6
    ]
  ]

Note the field value names are all returned as strings, even though some or all of them correspond to numbers.

For comparison, these queries are not allowed, because can_be_below_minimum is false:

http://localhost:8080/v1/datasets/sample/aggregation?where=ROOMS<0
http://localhost:8080/v1/datasets/sample/aggregation?where=ROOMS<4

A numeric field with a custom interval size

If the following configuration line is added to "ROOMS":

    "interval_size": 4,

and the following lines are removed from its values:

  {
    "name": 0,
    "title": "None (including bedsits)"
  },

then http://localhost:8080/v1/datasets/sample presents this as:

    {
      "can_be_above_maximum": true,
      "can_be_below_minimum": false,
      "exclusive_maximum": 8,
      "has_numeric_data": true,
      "has_range_data": true,
      "maximum": 7,
      "minimum": 0,
      "name": "ROOMS",
      "title": "Number of Bedrooms",
      "type": "integer",
      "values": [
        {
          "is_sentinel": true,
          "name": "null",
          "title": "Not stated"
        },
        {
          "is_sentinel": true,
          "name": "999",
          "title": "Not applicable"
        },
        {
          "exclusive_maximum": 4,
          "maximum": 3,
          "minimum": 0,
          "name": "0-3"
        },
        {
          "exclusive_maximum": 8,
          "maximum": 7,
          "minimum": 4,
          "name": "4-7"
        },
        {
          "minimum": 8,
          "name": ">=8"
        }
      ]
    }

Single values of ROOMS can no longer be queried, only queries over ranges whose ends match the intervals, eg.

http://localhost:8080/v1/datasets/sample/aggregation?where=ROOMS=0-3
http://localhost:8080/v1/datasets/sample/aggregation?where=ROOMS=0-7
http://localhost:8080/v1/datasets/sample/aggregation?where=ROOMS>=4

A (floating point) number field

This is the "AMOUNT" field from the sample dataset:

  {
    "fields": [
      {
        "precision": 0.01,
        "exclusive_maximum": 100,
        "interval_size": 10,
        "minimum": 0,
        "name": "AMOUNT",
        "title": "Dollar Amount",
        "type": "number"
      }
    ]
  }

Note:

http://localhost:8080/v1/datasets/sample presents this as:

as

{
          {
      "can_be_above_maximum": true,
      "can_be_below_minimum": true,
      "decimal_places": 2,
      "exclusive_maximum": 100,
      "has_numeric_data": true,
      "has_range_data": true,
      "maximum": 99.99,
      "minimum": 0,
      "name": "AMOUNT",
      "precision": 0.01,
      "title": "Dollar Amount",
      "type": "number",
      "values": [
        {
          "exclusive_maximum": 0,
          "maximum": -0.01,
          "name": "<0",
          "title": "<0"
        },
        {
          "exclusive_maximum": 10,
          "maximum": 9.99,
          "minimum": 0,
          "name": "0-9.99",
          "title": "0-9.99"
        },
        {
          "exclusive_maximum": 20,
          "maximum": 19.99,
          "minimum": 10,
          "name": "10-19.99",
          "title": "10-19.99"
        },
        {
          "exclusive_maximum": 30,
          "maximum": 29.99,
          "minimum": 20,
          "name": "20-29.99",
          "title": "20-29.99"
        },
        {
          "exclusive_maximum": 40,
          "maximum": 39.99,
          "minimum": 30,
          "name": "30-39.99",
          "title": "30-39.99"
        },
        {
          "exclusive_maximum": 50,
          "maximum": 49.99,
          "minimum": 40,
          "name": "40-49.99",
          "title": "40-49.99"
        },
        {
          "exclusive_maximum": 60,
          "maximum": 59.99,
          "minimum": 50,
          "name": "50-59.99",
          "title": "50-59.99"
        },
        {
          "exclusive_maximum": 70,
          "maximum": 69.99,
          "minimum": 60,
          "name": "60-69.99",
          "title": "60-69.99"
        },
        {
          "exclusive_maximum": 80,
          "maximum": 79.99,
          "minimum": 70,
          "name": "70-79.99",
          "title": "70-79.99"
        },
        {
          "exclusive_maximum": 90,
          "maximum": 89.99,
          "minimum": 80,
          "name": "80-89.99",
          "title": "80-89.99"
        },
        {
          "exclusive_maximum": 100,
          "maximum": 99.99,
          "minimum": 90,
          "name": "90-99.99",
          "title": "90-99.99"
        },
        {
          "minimum": 100,
          "name": ">=100",
          "title": ">=100"
        }
      ]
    }

Here are some valid queries on AMOUNT:

http://localhost:8080/v1/datasets/sample/aggregation?where=AMOUNT<10
http://localhost:8080/v1/datasets/sample/aggregation?where=AMOUNT=10-19.99
http://localhost:8080/v1/datasets/sample/aggregation?where=AMOUNT=10-29.99
http://localhost:8080/v1/datasets/sample/aggregation?where=AMOUNT<10;50-69.99
http://localhost:8080/v1/datasets/sample/aggregation?where=AMOUNT>=70

Here is how the third of these queries is returned:

    "where": [
      {
        "as_string": "AMOUNT=10-29.99",
        "condition_values": {
          "equals": [
            {
              "as_string": "10-29.99",
              "exclusive_maximum": 30.0,
              "maximum": 29.99,
              "minimum": 10
            }
          ]
        },
        "name": "AMOUNT",
        "title": "Dollar Amount",
        "values_as_received": {
          "equals": [
            {
              "as_string": "10-29.99",
              "exclusive_maximum": 30.0,
              "maximum": 29.99,
              "minimum": 10
            }
          ]
        }
      }
    ]

Note this contains:

Here is a "group by" query:

http://localhost:8080/v1/datasets/sample/aggregation?group_by=AMOUNT

This returns the record count in each of the specified intervals, eg:

  "values": [
    [
      "0-9.99",
      31
    ],
    [
      "10-19.99",
      15
    ],
    [
      "20-29.99",
      25
    ],
    [
      "30-39.99",
      21
    ],
    [
      "40-49.99",
      15
    ],
    [
      "50-59.99",
      25
    ],
    [
      "60-69.99",
      18
    ],
    [
      "70-79.99",
      15
    ],
    [
      "80-89.99",
      18
    ],
    [
      "90-99.99",
      21
    ]
  ]

Note that the ends of the ranges are specified using inclusive values, eg. 10-19.99. In fact, the query executed by Protari is done using the corresponding exclusive maximum, ie. on values greater than or equal to 10 and less than 20. This sleight of hand is necessary so that any values in the data between 19.99 and 20 are captured in one of the groups, but without introducing a potentially confusing syntax into Protari.

A field with range boundaries

Suppose "AMOUNT" ranged from 0 to 1 million, and we want to present intervals of increasing size as the amounts get bigger. Instead of defining an interval size, a minimum and maximum, we could define the field using range_boundaries, eg:

  {
    "fields": [
      {
        "decimal_places": 2,
        "range_boundaries": [0, 1000, 20000, 100000, 1000000],
        "name": "AMOUNT",
        "title": "Dollar Amount",
        "type": "number"
      }
    ]
  }

http://localhost:8080/v1/datasets/sample presents this as:

  {
      "can_be_above_maximum": true,
      "can_be_below_minimum": true,
      "decimal_places": 2,
      "exclusive_maximum": 1000000,
      "has_numeric_data": true,
      "has_range_data": true,
      "maximum": 999999.99,
      "minimum": 0,
      "name": "AMOUNT",
      "precision": 0.01,
      "title": "Dollar Amount",
      "type": "number",
      "values": [
        {
          "exclusive_maximum": 0,
          "maximum": -0.01,
          "name": "<0"
        },
        {
          "exclusive_maximum": 1000,
          "maximum": 999.99,
          "minimum": 0,
          "name": "0-999.99"
        },
        {
          "exclusive_maximum": 20000,
          "maximum": 19999.99,
          "minimum": 1000,
          "name": "1000-19999.99"
        },
        {
          "exclusive_maximum": 100000,
          "maximum": 99999.99,
          "minimum": 20000,
          "name": "20000-99999.99"
        },
        {
          "exclusive_maximum": 1000000,
          "maximum": 999999.99,
          "minimum": 100000,
          "name": "100000-999999.99"
        },
        {
          "minimum": 1000000,
          "name": ">=1000000"
        }
    }

A date field

This is the "EXPIRY" field from the sample dataset:

  {
    "fields": [
      {
        "can_be_below_minimum": false,
        "exclusive_maximum": "2021-01-01",
        "interval_size": "P5Y",
        "minimum": "1970-01-01",
        "name": "EXPIRY",
        "precision": "P1D",
        "title": "Expiry date",
        "type": "date",
        "values": [
          {
            "is_sentinel": true,
            "name": null,
            "title": "Unknown"
          }
        ]
      }
    ]
  }

Note:

http://localhost:8080/v1/datasets/sample presents this as:

as

    {
      "can_be_above_maximum": true,
      "can_be_below_minimum": false,
      "exclusive_maximum": "2021-01-01",
      "has_numeric_data": false,
      "has_range_data": true,
      "maximum": "2020-12-31",
      "minimum": "1970-01-01",
      "name": "EXPIRY",
      "precision": "P1D",
      "title": "Expiry date",
      "type": "date",
      "values": [
        {
          "is_sentinel": true,
          "name": "null",
          "title": "Unknown"
        },
        {
          "exclusive_maximum": "1975-01-01",
          "maximum": "1974-12-31",
          "minimum": "1970-01-01",
          "name": "1970-01-01/1974-12-31",
          "title": "1970-01-01 to 1974-12-31"
        },
        {
          "exclusive_maximum": "1980-01-01",
          "maximum": "1979-12-31",
          "minimum": "1975-01-01",
          "name": "1975-01-01/1979-12-31",
          "title": "1975-01-01 to 1979-12-31"
        },
        {
          "exclusive_maximum": "1985-01-01",
          "maximum": "1984-12-31",
          "minimum": "1980-01-01",
          "name": "1980-01-01/1984-12-31",
          "title": "1980-01-01 to 1984-12-31"
        },
        {
          "exclusive_maximum": "1990-01-01",
          "maximum": "1989-12-31",
          "minimum": "1985-01-01",
          "name": "1985-01-01/1989-12-31",
          "title": "1985-01-01 to 1989-12-31"
        },
        {
          "exclusive_maximum": "1995-01-01",
          "maximum": "1994-12-31",
          "minimum": "1990-01-01",
          "name": "1990-01-01/1994-12-31",
          "title": "1990-01-01 to 1994-12-31"
        },
        {
          "exclusive_maximum": "2000-01-01",
          "maximum": "1999-12-31",
          "minimum": "1995-01-01",
          "name": "1995-01-01/1999-12-31",
          "title": "1995-01-01 to 1999-12-31"
        },
        {
          "exclusive_maximum": "2005-01-01",
          "maximum": "2004-12-31",
          "minimum": "2000-01-01",
          "name": "2000-01-01/2004-12-31",
          "title": "2000-01-01 to 2004-12-31"
        },
        {
          "exclusive_maximum": "2010-01-01",
          "maximum": "2009-12-31",
          "minimum": "2005-01-01",
          "name": "2005-01-01/2009-12-31",
          "title": "2005-01-01 to 2009-12-31"
        },
        {
          "exclusive_maximum": "2015-01-01",
          "maximum": "2014-12-31",
          "minimum": "2010-01-01",
          "name": "2010-01-01/2014-12-31",
          "title": "2010-01-01 to 2014-12-31"
        },
        {
          "exclusive_maximum": "2020-01-01",
          "maximum": "2019-12-31",
          "minimum": "2015-01-01",
          "name": "2015-01-01/2019-12-31",
          "title": "2015-01-01 to 2019-12-31"
        },
        {
          "exclusive_maximum": "2021-01-01",
          "maximum": "2020-12-31",
          "minimum": "2020-01-01",
          "name": "2020-01-01/2020-12-31",
          "title": "2020-01-01 to 2020-12-31"
        },
        {
          "minimum": "2021-01-01",
          "name": ">=2021-01-01",
          "title": ">=2021-01-01"
        }
      ]
    }

Here are some valid queries on EXPIRY:

http://localhost:8080/v1/datasets/sample/aggregation?where=EXPIRY=2010-01-01/2014-12-31
http://localhost:8080/v1/datasets/sample/aggregation?where=EXPIRY=2010-01-01/2019-12-31
http://localhost:8080/v1/datasets/sample/aggregation?where=EXPIRY>=2010-01-01
http://localhost:8080/v1/datasets/sample/aggregation?where=EXPIRY=2000-01-01/2009-12-31;>=2015-01-01

Note the query syntax uses ISO8601's "start/end" syntax.

Here is how the second of these queries is returned:

    "where": [
      {
        "as_string": "EXPIRY=2010-01-01/2019-12-31",
        "condition_values": {
          "equals": [
            {
              "as_string": "2010-01-01/2019-12-31",
              "component_ranges": [
                {
                  "exclusive_maximum": "2015-01-01",
                  "maximum": "2014-12-31",
                  "minimum": "2010-01-01",
                  "name": "2010-01-01/2014-12-31"
                },
                {
                  "exclusive_maximum": "2020-01-01",
                  "maximum": "2019-12-31",
                  "minimum": "2015-01-01",
                  "name": "2015-01-01/2019-12-31"
                }
              ],
              "exclusive_maximum": "2020-01-01",
              "maximum": "2019-12-31",
              "minimum": "2010-01-01"
            }
          ]
        },
        "name": "EXPIRY",
        "title": "Expiry date",
        "values_as_received": {
          "equals": [
            {
              "as_string": "2010-01-01/2019-12-31"
            }
          ]
        }
      }
    ]

Note this contains:

Here is a "group by" query:

http://localhost:8080/v1/datasets/sample/aggregation?group_by=EXPIRY

This returns the record count in each of the specified intervals, eg:

  "values": [
    [
      "null",
      12
    ],
    [
      "1970-01-01/1974-12-31",
      12
    ],
    [
      "1975-01-01/1979-12-31",
      27
    ],
    [
      "1980-01-01/1984-12-31",
      12
    ],
    [
      "1985-01-01/1989-12-31",
      15
    ],
    [
      "1990-01-01/1994-12-31",
      18
    ],
    [
      "1995-01-01/1999-12-31",
      18
    ],
    [
      "2000-01-01/2004-12-31",
      18
    ],
    [
      "2005-01-01/2009-12-31",
      18
    ],
    [
      "2010-01-01/2014-12-31",
      27
    ],
    [
      "2015-01-01/2019-12-31",
      27
    ],
    [
      "2020-01-01/2020-12-31",
      0
    ],
    [
      ">=2021-01-01",
      0
    ]
  ]

Note that the ends of the ranges are specified using inclusive values, as for numeric fields.

Other properties for fields with numeric data

has_fixed_decimal

A boolean flag for whether the number of decimal places reported on means should be held fixed at the field's number of decimal places, or increased with the count.

Eg. The average of 10 numbers, each of which is accurate to 2 decimal places, is accurate to 3 decimal places. By default, Protari shows the result of such a calculation to 3 decimal places. If the value is a dollar amount, however, it may still be better to show the result to 2 decimal places.

Limitations

The SQL data interface sets a default maximum of 160 groupings allowed on a numeric field, to accommodate a range of SQL varieties.

This means by default a numeric integer field cannot have more than 158 distinct values, including sentinel values. In order to allow more, set both can_be_below_minimum and can_be_above_maximum to false on the field.

You can change this by setting max_custom_groups as an argument of get_aggregated_sql_data in the global settings.