Configuring Multi-Table Datasets

Introduction

It is now possible to use Protari with a dataset stored as a SQL database consisting of several tables. This document describes the configuration for such a deployment. It should be read in conjunction with the general documentation on configuring a dataset here.

Protari joins data resources using inner joins, eg. a query such as group_by=buyer_transactions.type should be interpreted as: how many people had at least one transaction of each type, in which they were the buyer?

Protari does not currently support using more than one relationship to the same resource within one query. Attempts to submit queries with this structure will return an error.

Data Resources

In a multi-table dataset, there are several sets of fields, one for each table.

In order to accommodate this, a resources array has been added to the dataset configuration. Each data resource element within this array contains the metadata for one table in the dataset, including how that table is related to the others.

(This format is an extension of the Data Package specification. Each data resource is itself based on an extension of the Data Resource specification, using a schema that conforms to an extension of the Table Schema specification.)

Each data resource includes these additional fields:

There are also two additional fields that at the top level of dataset_config:

Relationships

Each data resource's foreign_keys are used to establish the relationships between all the resources, including the path back to the main resource. The foreign key's relationship_names are used to specify fields when querying the dataset, by prefixing them to the data resource's field names, eg. buyer_transactions.item.

The following examples should make this clearer. They are taken from the sample_joined dataset, whose complete configuration file is provided at the end.

A "One-to-Many" Example

The example below demonstrates a case where each unit on the main resource (eg. each person) has many entries in the joined resource (eg. one person can have many transactions).

The foreign_keys are always defined on the data resource which can have many entries per unit. In this example, that's the transaction data resource.

This example defines two relationships back to the main person data resource. The first links the person_id column in the transaction table to the _id column in the person table, and makes this link accessible through the API as buyer_transactions. The second links the provider_id column in the transaction table to the _id column in the person table, and makes this link accessible through the API as seller_transactions.

This means that the type field of the transaction data resource is accessible as either buyer_transactions.type or seller_transactions.type, depending on the relationship of interest.

{
  "resources": [
    {
      "name": "transaction",
      "foreign_keys": [
        {
          "fields": [
            "person_id"
          ],
          "forward_relationship_name": "buyer",
          "reference": {
            "fields": [
              "_id"
            ],
            "resource": "person",
            "reverse_relationship_name": "buyer_transactions",
            "reverse_relationship_title": "Buyer Transactions"
          }
        },
        {
          "fields": [
            "provider_id"
          ],
          "forward_relationship_name": "seller",
          "reference": {
            "fields": [
              "_id"
            ],
            "resource": "person",
            "reverse_relationship_name": "seller_transactions",
            "reverse_relationship_title": "Seller Transactions"
          }
        },
        ...
      ],
    ...
    },
    ...
  ],
  ...
}

In the above example, queries use the "reverse" relationship names to refer to the transaction table, because when moving from the main resource ('person') to the transaction resource, you are moving against the direction of the foreign key (which is defined on the transaction table).

The "forward" relationship names, buyer and seller, are not used in this example. See below for an example that uses the forward names.

A "Many-to-One" Foreign Key Example

The example below demonstrates a case where multiple units on the main resource (eg. people) can have a single entry in the joined resource (eg. many people can live in one household). (This example is taken from the sample_joined dataset.)

As noted before, the foreign_keys are always defined on the data resource which can have many entries per unit. In this example, that's the main person data resource.

This example defines a single relationship to the household data resource, which links the household_id column in the person table to the _id column in the household table, and makes this link accessible through the API as household.

This means that the postcode field of the household data resource is accessible as household.postcode.

  "resources": [
    {
      "name": "person",
      "foreign_keys": [
        {
          "fields": [
            "household_id"
          ],
          "forward_relationship_name": "household",
          "reference": {
            "fields": [
              "_id"
            ],
            "resource": "household",
            "reverse_relationship_name": "people"
          }
        }
      ],
    ...
    },
    ...
  ],
  ...
}

In this case, it is the forward_relationship_name (household) that is used to refer to the relationship, because the foreign key is defined on people (the main resource) and points towards household.

A Sample Multi-Table Configuration

The below provides a complete example of how a multiple table dataset might be configured. In this case, there are 4 data resources, corresponding to 4 tables, in the dataset: person, household, transaction and item. The main resource is person which means that all queries should be interpreted in terms of the unit of that data resource, people.

Imagine this as:

The foreign_keys effectively encode the clauses in the above list.

{
  "description": "A sample dataset with joined tables",
  "main_resource_name": "person",
  "name": "sample_joined",
  "query_class_definitions": {
    "aggregation": {
      "allowed_functions": [
        "count",
        "mean",
        "sum"
      ],
      "run": "sql_tbe_multi_table_pipeline",
      "transform_definitions": {
        "check_field_exclusions": {
          "parameters": {
            "field_exclusion_rules": [
              {
                "limit": 4
              }
            ]
          }
        },
        "get_sql_top_rows": {
          "parameters": {
            "parameters": {
              "number": 4,
              "record_key_name": "{tbe_perturb.record_key_name}",
              "value_tiebreaker_name": "{tbe_perturb.record_key_name}"
            }
          }
        },
        "tbe_perturb": {
          "parameters": {
            "bigN": 1766976779,
            "m": [
              0.5,
              0.3,
              0.15,
              0.05
            ],
            "p_filename": "pmatrix1",
            "record_key_name": "RECORD_KEY",
            "s_filename": "smatrix1",
            "smallC": 5,
            "smallN": 7
          }
        }
      }
    }
  },
  "requires_permission_to_query": false,
  "requires_permission_to_see": false,
  "resources": [
    {
      "fields": [
        {
          "name": "type",
          "title": "Transaction Type",
          "type": "string",
          "values": [
            {
              "name": "A",
              "title": "avocado"
            },
            {
              "name": "B",
              "title": "banana"
            },
            {
              "name": "C",
              "title": "cucumber"
            },
            {
              "name": "D",
              "title": "dragonfruit"
            },
            {
              "name": "E",
              "title": "eggplant"
            },
            {
              "name": "F",
              "title": "fig"
            },
            {
              "name": "G",
              "title": "ginger"
            }
          ]
        },
        {
          "can_be_above_maximum": false,
          "can_be_below_minimum": false,
          "decimal_places": 2,
          "exclusive_maximum": 150,
          "interval_size": 5,
          "minimum": 0,
          "name": "value",
          "title": "Transaction Value",
          "type": "number",
          "values": [
            {
              "is_sentinel": true,
              "name": null,
              "title": "unknown"
            }
          ]
        }
      ],
      "foreign_keys": [
        {
          "fields": [
            "person_id"
          ],
          "forward_relationship_name": "buyer",
          "reference": {
            "fields": [
              "_id"
            ],
            "resource": "person",
            "reverse_relationship_name": "buyer_transactions",
            "reverse_relationship_title": "Buyer Transactions"
          }
        },
        {
          "fields": [
            "provider_id"
          ],
          "forward_relationship_name": "seller",
          "reference": {
            "fields": [
              "_id"
            ],
            "resource": "person",
            "reverse_relationship_name": "seller_transactions",
            "reverse_relationship_title": "Seller Transactions"
          }
        },
        {
          "fields": [
            "item_id"
          ],
          "forward_relationship_name": "item",
          "reference": {
            "fields": [
              "_id"
            ],
            "resource": "item",
            "reverse_relationship_name": "transactions"
          }
        }
      ],
      "name": "transaction",
      "table_name": "transaction",
      "unit": {
        "plural": "transaction",
        "singular": "transactions"
      },
      "unit_id_name": "_id"
    },
    {
      "fields": [
        {
          "fields": [
            {
              "description": "Type of dwelling of usual residence",
              "name": "DWELL",
              "title": "Dwelling Type",
              "type": "string",
              "values": [
                {
                  "name": "001",
                  "title": "House"
                },
                {
                  "name": "002",
                  "title": "Apartment"
                },
                {
                  "name": "003",
                  "title": "Other"
                }
              ]
            },
            {
              "can_be_below_minimum": false,
              "exclusive_maximum": 8,
              "minimum": 0,
              "name": "ROOMS",
              "title": "Number of Bedrooms",
              "type": "integer",
              "values": [
                {
                  "name": 0,
                  "title": "None (including bedsits)"
                },
                {
                  "is_sentinel": true,
                  "name": null,
                  "title": "Not stated"
                },
                {
                  "is_sentinel": true,
                  "name": 999,
                  "title": "Not applicable"
                }
              ]
            }
          ],
          "title": "Dwelling"
        },
        {
          "description": "Postcode of usual residence",
          "name": "POSTCODE",
          "region_type": "poa",
          "title": "Postcode",
          "type": "string",
          "values": [
            {
              "name": "6722"
            },
            {
              "name": "6725"
            },
            {
              "name": "0872"
            },
            {
              "name": "6762"
            },
            {
              "name": "6642"
            },
            {
              "name": "6710"
            },
            {
              "name": "6429"
            }
          ]
        },
        {
          "can_be_above_maximum": true,
          "can_be_below_minimum": false,
          "decimal_places": -1,
          "exclusive_maximum": 8000,
          "interval_size": 1000,
          "minimum": 0,
          "name": "RENT",
          "title": "Weekly rent in dollars",
          "type": "number"
        }
      ],
      "name": "household",
      "table_name": "household",
      "unit": {
        "plural": "household",
        "singular": "households"
      },
      "unit_id_name": "_id"
    },
    {
      "fields": [
        {
          "name": "SEX",
          "title": "Sex",
          "type": "string",
          "values": [
            {
              "name": "F",
              "title": "Female"
            },
            {
              "name": "M",
              "title": "Male"
            },
            {
              "is_sentinel": true,
              "name": "",
              "title": "Not stated"
            }
          ]
        },
        {
          "name": "COUNTRY",
          "title": "Country of Birth",
          "type": "integer",
          "values": [
            {
              "name": 1000,
              "title": "Oceania",
              "values": [
                {
                  "name": 1100,
                  "title": "Australia (including External Territories)",
                  "values": [
                    {
                      "name": 1101,
                      "title": "Australia"
                    },
                    {
                      "name": 1102,
                      "title": "Norfolk Island"
                    },
                    {
                      "name": 1199,
                      "title": "Australian External Territories"
                    }
                  ]
                },
                {
                  "name": 1201,
                  "title": "New Zealand"
                },
                {
                  "name": 1300,
                  "title": "Melanesia",
                  "values": [
                    {
                      "name": 1301,
                      "title": "New Caledonia"
                    },
                    {
                      "name": 1302,
                      "title": "Papua New Guinea"
                    },
                    {
                      "name": 1303,
                      "title": "Solomon Islands"
                    },
                    {
                      "name": 1304,
                      "title": "Vanuatu"
                    }
                  ]
                }
              ]
            },
            {
              "name": 9999,
              "title": "Other"
            },
            {
              "is_sentinel": true,
              "name": null,
              "title": "Not stated"
            }
          ]
        }
      ],
      "foreign_keys": [
        {
          "fields": [
            "household_id"
          ],
          "forward_relationship_name": "household",
          "reference": {
            "fields": [
              "_id"
            ],
            "resource": "household",
            "reverse_relationship_name": "people"
          }
        }
      ],
      "name": "person",
      "table_name": "sample_joined",
      "unit": {
        "plural": "people",
        "singular": "person"
      },
      "unit_id_name": "_id"
    },
    {
      "fields": [
        {
          "description": "Name of the Item",
          "name": "item_name",
          "type": "string",
          "values": [
            {
              "name": "A",
              "title": "Accordion"
            },
            {
              "name": "B",
              "title": "Bugle"
            },
            {
              "name": "C",
              "title": "Clarinet"
            },
            {
              "name": "D",
              "title": "Didgeridoo"
            },
            {
              "is_sentinel": true,
              "name": "NF",
              "title": "Not Found"
            },
            {
              "is_sentinel": true,
              "name": "NP",
              "title": "Not Provided"
            }
          ]
        }
      ],
      "name": "item",
      "table_name": "item",
      "unit": {
        "plural": "item",
        "singular": "items"
      },
      "unit_id_name": "_id"
    }
  ],
  "title": "Sample joined"
}