Numeric Data Fields

Fields with type "number" are always considered to have numeric data. Fields with type "integer" may or may not; integer fields are often used for categorical data as well (eg. for a numeric country code).

A field is considered to have numeric data if and only if:

In the first case, the type defaults to number, if not already explicitly defined. In the remaining cases, the type defaults to integer, if not already explicitly defined.

Numeric fields must meet the following conditions or else Protari will issue an error and not start:

Further, if the "db-values" validation check is performed (the default), then:

You can configure a numeric field so that Protari restricts the range over which it can be queried. Eg. You can restrict ages above 100 from being queried, except as a group.

If an integer field is not numeric, it is categorical. You can still query categorical integer fields using the range notation, eg. the COUNTRY field in the sample dataset:

http://localhost:8080/v1/datasets/sample/aggregation?where=COUNTRY=1000-1110

Note that range notation - whether on categorical or numeric data fields - does not expand into nested values; in the example above, the returned count is 100, whereas the query http://localhost:8080/v1/datasets/sample/aggregation?where=COUNTRY=1000 counts the number of people from Oceania (1000), which includes all the values to at least 1304, and returns 178.

The following sections give examples of configuring numeric data fields.

A numeric integer field

This is the "ROOMS" field from the sample dataset:

  {
    "fields": [
      {
        "can_be_below_minimum": false,
        "maximum": 7,
        "minimum": 0,
        "name": "ROOMS",
        "title": "Number of Bedrooms",
        "type": "integer",
        "values": [
          {
            "name": 0,
            "title": "None (including bedsits)"
          },
          {
            "is_sentinel": true,
            "name": null,
            "title": "Not stated"
          },
          {
            "is_sentinel": true,
            "name": 999,
            "title": "Not applicable"
          }
        ]
      }
    ]
  }

Note:

http://localhost:8080/v1/datasets/sample presents this as:

        {
          "can_be_above_maximum": true,
          "can_be_below_minimum": false,
          "exclusive_maximum": 8,
          "has_numeric_data": true,
          "maximum": 7,
          "minimum": 0,
          "name": "ROOMS",
          "title": "Number of Bedrooms",
          "type": "integer",
          "values": [
            {
              "is_sentinel": true,
              "name": "null",
              "title": "Not stated"
            },
            {
              "is_sentinel": true,
              "name": "999",
              "title": "Not applicable"
            },
            {
              "is_numeric": true,
              "name": "0",
              "title": "None (including bedsits)",
              "value": 0
            },
            {
              "is_numeric": true,
              "name": "1",
              "title": "1",
              "value": 1
            },
            {
              "is_numeric": true,
              "name": "2",
              "title": "2",
              "value": 2
            },
            {
              "is_numeric": true,
              "name": "3",
              "title": "3",
              "value": 3
            },
            {
              "is_numeric": true,
              "name": "4",
              "title": "4",
              "value": 4
            },
            {
              "is_numeric": true,
              "name": "5",
              "title": "5",
              "value": 5
            },
            {
              "is_numeric": true,
              "name": "6",
              "title": "6",
              "value": 6
            },
            {
              "is_numeric": true,
              "name": "7",
              "title": "7",
              "value": 7
            },
            {
              "exclusions": [
                999
              ],
              "is_numeric": true,
              "minimum": 8,
              "name": ">=8",
              "title": ">=8"
            }
          ]
        }

Here are some valid queries on ROOMS:

http://localhost:8080/v1/datasets/sample/aggregation?where=ROOMS=3
http://localhost:8080/v1/datasets/sample/aggregation?where=ROOMS=0
http://localhost:8080/v1/datasets/sample/aggregation?where=ROOMS=null
http://localhost:8080/v1/datasets/sample/aggregation?where=ROOMS=3-6
http://localhost:8080/v1/datasets/sample/aggregation?where=ROOMS=2;4-6
http://localhost:8080/v1/datasets/sample/aggregation?where=ROOMS>=4
http://localhost:8080/v1/datasets/sample/aggregation?group_by=ROOMS

The last query returns the record count for each value of ROOMS, eg:

  "values": [
    [
      "0",
      12
    ],
    [
      "1",
      15
    ],
    [
      "2",
      18
    ],
    [
      "3",
      25
    ],
    [
      "4",
      44
    ],
    [
      "5",
      37
    ],
    [
      "6",
      9
    ],
    [
      "7",
      12
    ],
    [
      "999",
      9
    ],
    [
      ">=8",
      24
    ],
    [
      "null",
      6
    ]
  ]

Note the field value names are all returned as strings, even though some or all of them correspond to numbers.

For comparison, these queries are not allowed, because can_be_below_minimum is false:

http://localhost:8080/v1/datasets/sample/aggregation?where=ROOMS<0
http://localhost:8080/v1/datasets/sample/aggregation?where=ROOMS<4

A numeric field with a custom interval size

If the following configuration line is added to "ROOMS":

    "interval_size": 4,

and the following lines are removed from its values:

  {
    "name": 0,
    "title": "None (including bedsits)"
  },

then http://localhost:8080/v1/datasets/sample presents this as:

    {
      "can_be_above_maximum": true,
      "can_be_below_minimum": false,
      "exclusive_maximum": 8,
      "has_numeric_data": true,
      "maximum": 7,
      "minimum": 0,
      "name": "ROOMS",
      "title": "Number of Bedrooms",
      "type": "integer",
      "values": [
        {
          "is_sentinel": true,
          "name": "null",
          "title": "Not stated"
        },
        {
          "is_sentinel": true,
          "name": "999",
          "title": "Not applicable"
        },
        {
          "exclusive_maximum": 4,
          "is_numeric": true,
          "maximum": 3,
          "minimum": 0,
          "name": "0-3"
        },
        {
          "exclusive_maximum": 8,
          "is_numeric": true,
          "maximum": 7,
          "minimum": 4,
          "name": "4-7"
        },
        {
          "is_numeric": true,
          "minimum": 8,
          "name": ">=8"
        }
      ]
    }

Single values of ROOMS can no longer be queried, only queries over ranges whose ends match the intervals, eg.

http://localhost:8080/v1/datasets/sample/aggregation?where=ROOMS=0-3
http://localhost:8080/v1/datasets/sample/aggregation?where=ROOMS=0-7
http://localhost:8080/v1/datasets/sample/aggregation?where=ROOMS>=4

A (floating point) number field

This is the "AMOUNT" field from the sample dataset:

  {
    "fields": [
      {
        "decimal_places": 2,
        "exclusive_maximum": 100,
        "interval_size": 10,
        "minimum": 0,
        "name": "AMOUNT",
        "title": "Dollar Amount",
        "type": "number"
      }
    ]
  }

Note:

http://localhost:8080/v1/datasets/sample presents this as:

as

{
      "can_be_above_maximum": true,
      "can_be_below_minimum": true,
      "decimal_places": 2,
      "exclusive_maximum": 100,
      "has_numeric_data": true,
      "maximum": 99.99,
      "minimum": 0,
      "name": "AMOUNT",
      "precision": 0.01,
      "title": "Dollar Amount",
      "type": "number",
      "values": [
        {
          "exclusive_maximum": 0,
          "is_numeric": true,
          "maximum": -0.01,
          "name": "<0"
        },
        {
          "exclusive_maximum": 10,
          "is_numeric": true,
          "maximum": 9.99,
          "minimum": 0,
          "name": "0-9.99"
        },
        {
          "exclusive_maximum": 20,
          "is_numeric": true,
          "maximum": 19.99,
          "minimum": 10,
          "name": "10-19.99"
        },
        {
          "exclusive_maximum": 30,
          "is_numeric": true,
          "maximum": 29.99,
          "minimum": 20,
          "name": "20-29.99"
        },
        {
          "exclusive_maximum": 40,
          "is_numeric": true,
          "maximum": 39.99,
          "minimum": 30,
          "name": "30-39.99"
        },
        {
          "exclusive_maximum": 50,
          "is_numeric": true,
          "maximum": 49.99,
          "minimum": 40,
          "name": "40-49.99"
        },
        {
          "exclusive_maximum": 60,
          "is_numeric": true,
          "maximum": 59.99,
          "minimum": 50,
          "name": "50-59.99"
        },
        {
          "exclusive_maximum": 70,
          "is_numeric": true,
          "maximum": 69.99,
          "minimum": 60,
          "name": "60-69.99"
        },
        {
          "exclusive_maximum": 80,
          "is_numeric": true,
          "maximum": 79.99,
          "minimum": 70,
          "name": "70-79.99"
        },
        {
          "exclusive_maximum": 90,
          "is_numeric": true,
          "maximum": 89.99,
          "minimum": 80,
          "name": "80-89.99"
        },
        {
          "exclusive_maximum": 100,
          "is_numeric": true,
          "maximum": 99.99,
          "minimum": 90,
          "name": "90-99.99"
        },
        {
          "is_numeric": true,
          "minimum": 100,
          "name": ">=100"
        }
      ]
    }

Here are some valid queries on AMOUNT:

http://localhost:8080/v1/datasets/sample/aggregation?where=AMOUNT<10
http://localhost:8080/v1/datasets/sample/aggregation?where=AMOUNT=10-19.99
http://localhost:8080/v1/datasets/sample/aggregation?where=AMOUNT=10-29.99
http://localhost:8080/v1/datasets/sample/aggregation?where=AMOUNT<10;50-69.99
http://localhost:8080/v1/datasets/sample/aggregation?where=AMOUNT>=70

Here is how the third of these queries is returned:

    "where": [
      {
        "as_string": "AMOUNT=10-29.99",
        "condition_values": {
          "equals": [
            {
              "as_string": "10-29.99",
              "exclusive_maximum": 30.0,
              "maximum": 29.99,
              "minimum": 10
            }
          ]
        },
        "name": "AMOUNT",
        "title": "Dollar Amount",
        "values_as_received": {
          "equals": [
            {
              "as_string": "10-29.99",
              "exclusive_maximum": 30.0,
              "maximum": 29.99,
              "minimum": 10
            }
          ]
        }
      }
    ]

Note this contains:

Here is a "group by" query:

http://localhost:8080/v1/datasets/sample/aggregation?group_by=AMOUNT

This returns the record count in each of the specified intervals, eg:

  "values": [
    [
      "0-9.99",
      31
    ],
    [
      "10-19.99",
      15
    ],
    [
      "20-29.99",
      25
    ],
    [
      "30-39.99",
      21
    ],
    [
      "40-49.99",
      15
    ],
    [
      "50-59.99",
      25
    ],
    [
      "60-69.99",
      18
    ],
    [
      "70-79.99",
      15
    ],
    [
      "80-89.99",
      18
    ],
    [
      "90-99.99",
      21
    ]
  ]

Note that the ends of the ranges are specified using inclusive values, eg. 10-19.99. In fact, the query executed by Protari is done using the corresponding exclusive maximum, ie. on values greater than or equal to 10 and less than 20. This sleight of hand is necessary so that any values in the data between 19.99 and 20 are captured in one of the groups, but without introducing a potentially confusing syntax into Protari.

A field with range boundaries

Suppose "AMOUNT" ranged from 0 to 1 million, and we want to present intervals of increasing size as the amounts get bigger. Instead of defining an interval size, a minimum and maximum, we could define the field using range_boundaries, eg:

  {
    "fields": [
      {
        "decimal_places": 2,
        "range_boundaries": [0, 1000, 20000, 100000, 1000000],
        "name": "AMOUNT",
        "title": "Dollar Amount",
        "type": "number"
      }
    ]
  }

http://localhost:8080/v1/datasets/sample presents this as:

  {
      "can_be_above_maximum": true,
      "can_be_below_minimum": true,
      "decimal_places": 2,
      "exclusive_maximum": 1000000,
      "has_numeric_data": true,
      "maximum": 999999.99,
      "minimum": 0,
      "name": "AMOUNT",
      "precision": 0.01,
      "title": "Dollar Amount",
      "type": "number",
      "values": [
        {
          "exclusive_maximum": 0,
          "is_numeric": true,
          "maximum": -0.01,
          "name": "<0"
        },
        {
          "exclusive_maximum": 1000,
          "is_numeric": true,
          "maximum": 999.99,
          "minimum": 0,
          "name": "0-999.99"
        },
        {
          "exclusive_maximum": 20000,
          "is_numeric": true,
          "maximum": 19999.99,
          "minimum": 1000,
          "name": "1000-19999.99"
        },
        {
          "exclusive_maximum": 100000,
          "is_numeric": true,
          "maximum": 99999.99,
          "minimum": 20000,
          "name": "20000-99999.99"
        },
        {
          "exclusive_maximum": 1000000,
          "is_numeric": true,
          "maximum": 999999.99,
          "minimum": 100000,
          "name": "100000-999999.99"
        },
        {
          "is_numeric": true,
          "minimum": 1000000,
          "name": ">=1000000"
        }
    }

Other properties for fields with numeric data

has_fixed_decimal

A boolean flag for whether the number of decimal places reported on means should be held fixed at the field's number of decimal places, or increased with the count.

Eg. The average of 10 numbers, each of which is accurate to 2 decimal places, is accurate to 3 decimal places. By default, Protari shows the result of such a calculation to 3 decimal places. If the value is a dollar amount, however, it may still be better to show the result to 2 decimal places.

Limitations

The SQL data interface sets a default maximum of 160 groupings allowed on a numeric field, to accommodate a range of SQL varieties.

This means by default a numeric integer field cannot have more than 158 distinct values, including sentinel values. In order to allow more, set both can_be_below_minimum and can_be_above_maximum to false on the field.

You can change this by setting max_custom_groups as an argument to the data interface in the global settings.