Skip to main content

Data Platform Data Products

Here you will find documentation describing the concept of a Data Product within the Data Platform.

NOTE

these pages are primarily for internal audiences rather than users of the Data Platform; we will host user-facing documentation separately.

Definition

Put simply, a Data Product is a dataset or group of datasets together with metadata that describes that data. This will include descriptors and guidance on the usage of the data, as well as governance information regarding the ownership and retention period of the data within the platform.

Purpose

A Data Product exists to help customers effectively find, understand, and responsibly use data, as well as help administration and management of that data within the platform.

A Data Product is created and owned by a Data Product Owner, a person with comprehensive domain knowledge. The Data Product Owner is not part of the Data Platform team, they are people that operate on different teams and they have a system that contains data that they would like to share. This is because it is essential to know a domain before creating a Data Product, and it would be impossible for the Data Platform teams to have deep knowledge of every domain that uses the platform.

Goals

Our goals are:

  • Make data easily discoverable by users who wish to use that data. We do this by encouraging the owners of Data Products to supply high quality metadata
  • Make it easier for us to provide governance for data, for example by identfying owners, protective markings and retention periods.
  • Make data more usable, whatever the purpose, by applying product thinking to our data, and clearly describing the lineage and transformations of our Data Products

Defining a Data Product

IMPORTANT

The structure and usage of these metadata definitions are likely to change substantially throughout development

A Data Product will have a unique name, and is defined in code: YAML or JSON (we show YAML in these docs, as it is a more human-readable format). The code representation of the Data Product is referred to as Data Product Metadata (data about the data), and is currently captured in a single file.

This file contains the following sections:

Section name Purpose Documentation
specification Aids data discoverability by providing a name, description and tags for a Data Product. It also contains contact details of the Data Product owner. Data Product specification
governance Contains protective marking, retention period, expected update frequency and data lineage. Defining product governance
data-dictionary Contains the column defintions comprising the Data Product, along with type information and user-friendly names. Data dictionary guidance

Sections of metadata that we plan to capture within a Data Product, but are not yet implemented:

File name Purpose Documentation
transformations Describes the cleaning and transformation data will undergo before it is made available to consumers. Cleaning and transformation

Example Data Product

Example Data Product

Using a Data Product

Altering a Data Product

Over time you may need to update your Data Product. This may come though adding data assets to the Product, removing depricated data assets, changes to the data assets themselves (such as column changes or additions in structured data), or altering the descriptive metadata of the Data Product itself.

To communicate these changes in the Data Product to your users, and to maintain ongoing compatibility, changes to your Data Product (or the data assets associated with it) will automatically increment the version of your Data Product.

In general, backwards-compatible changes like adding a data asset will mean a minor increment in the version number (i.e. v1.0 -> v1.1), and backwards-incompatible changes like deleting a data asset will mean a major increment in the version number (i.e. v1.3 -> v2.0). More information can be found on the Data Product Versioning page.

For structured data, each major version will be associated with its own database. i.e. example_data_product_v1.example_table would be used for all minor versions v1.x, and example_data_product_v1.example_table would be used for all minor versions v2.x.

Users of your data will need to update their pipelines if they wish to use the latest version of a Data Product after a major version increment.

This page was last reviewed on 2 November 2023. It needs to be reviewed again on 2 May 2024 by the page owner #data-platform-notifications .
This page was set to be reviewed before 2 May 2024 by the page owner #data-platform-notifications. This might mean the content is out of date.