TR 41-2015
Data quality metrics

Standard No.
TR 41-2015
Published By
SG-SPRING SG1
Latest
TR 41-2015
Scope
This set of guidelines articulates and defines a common set of domain agnostic data quality metrics for structured and machine-readable datasets. The data may include: - Historic data containing past information (e.g. library book loan, transaction records); - Live data containing current information (e.g. library book availability). The data may be made available as: - Spot data, which is collected or recorded from time to time at some discrete time intervals or - A data stream, which includes continuous, steady streams or a sequence of information. Examples include stock prices, market data feeds, sensory feeds and video feeds. Data quality metrics for unstructured datasets are currently out of the scope of this document and the suggested guidelines may or may not be applicable to unstructured datasets. Industry agnosticism and generality are fundamental concerns in the selection process for inclusion in the base set of quality metrics. While other metrics may exemplify data quality in datasets used by some industries, if they may not be easily applied across the board, they are not included in these guidelines. Data providers are, however, encouraged to adopt the methodology described in 4.2, The Goal-Question-Metric methodology, to develop additional metrics that help to convey particular aspects of data quality that help potential buyers in their assessment of the dataset on offer. The following are outside of the scope of the Technical Reference: - Metrics that are subject to interpretation or address concerns that constitute part of the buyer's evaluation process are also outside of the scope of these guidelines. - Metrics that are derived from more than one basic metric where the method of calculation or derivation may vary according to the needs of the user are also not included. For example, ratio of non-empty records to maximum possible number of records can give an indication of the completeness or extensiveness of a dataset. However, since there may not be a maximum number or expected number of records for certain types of datasets, it would be up to the users evaluating the dataset to contextualise the published metrics against their requirements and expectations. - Recommendations on ways to apply the published metrics in order to answer higher order questions that pertain to data quality. It is noted that some of the metrics included are not intrinsic to the data, e.g. access cost and support. However, they have been included as part of the guidelines as they are important considerations and provide useful indicators to the feasibility of a dataset for the user.

TR 41-2015 history




Copyright ©2024 All Rights Reserved