Naming
There are many words we could use to describe the API's purpose.
Unfortunately different groups have adopted conflicting terminologies, so that they all risk sowing confusion:
- De-identification. The Australian Privacy Act says "personal information is deāidentified if the information is no longer about an identifiable individual or an individual who is reasonably identifiable". However, "de-identification" is frequently understood to mean just the removal of identifying information from the raw data - not including additional data processing like aggregation and perturbation.
- Anonymisation. This is used by the Information Commissioner's Office in the UK: "Anonymisation is the process of turning data into a form which does not identify individuals and where identification is not likely to take place". However it is not in widespread use in all countries, and can legally mean something closer to de-identification.
- Privacy. Computer scientists and researchers see the aim of privacy as ensuring that individuals in a data set do not have their personal identifying data exposed. However in common and legal usage, privacy tends to refer to people rather than data, eg. privacy is not being observed or disturbed by others. When applied to data, it often has a broader meaning incorporating trust and control, eg. the choice of what information to collect, and how it is used. StatsNZ describes this interpretation on their site.
- Confidentiality. Colloquially and legally, keeping your personal information or data confidential means not sharing it inappropriately, eg. ensuring "that the risk of individual participants or the site being identified is minimised as much as possible" (quote from the Victoria State Government). Computer scientists often take a strong interpretation of this, that to keep data confidential it must not be interpretable even if it is intercepted by a third party. In this view, confidentiality typically involves encryption.
As a result, we have chosen to avoid all of these terms, and instead use the longer description of the API as "protection against re-identification".