Category: Bulk data · Last updated: · Permalink
BETA: The functionality described on this page is currently in beta stage and the mechanism may still change.
We release incremental update files (delta patches) for most datasets, allowing users to replicate the database state in a downstream system without re-importing the full data after each update.
Note: Delta updates are available in the FollowTheMoney entity format, not any of the other export formats. We also maintain a set of scripts for making CSV-based deltas more suited for human consumption. We're not currently planning to make deltas available for any of the nested export formats.
Incremental update data is designed to help a consumer application (ie. a downstream tool using the OpenSanctions data) to update its copy of the database to match the current state of the published data.
It's important to realise that the changes included in such delta files can stem either from the upstream data publisher (e.g. the US government publishes a new set of designations), or from changes in the way that OpenSanctions processes and cleans the upstream data (e.g. our data parsers are updated). For most data consumers, the distinction is irrelevant - but sometimes it's helpful to realise that an entity can slightly change in the data without having changed at the source.
Each dataset published on this site has many versions
, i.e. export builds created at a certain time. The basic process for performing an incremental update is therefore to:
version
of the dataset loaded into the data consumer application (You need to store the latest version
you've updated to in your own system).versions
that have been published since then.The current version of a published dataset is contained in its metadata index, available from this URL (replace sanctions
with the name of the relevant dataset):
https://data.opensanctions.org/datasets/latest/sanctions/index.json
The version
field in this file tells you the latest build ID, and the delta_url
field will point you to an index of the last 100 versions of the dataset, and the delta data file URLs for each of these:
{
"versions": {
"20240611014702-tcx": "https://data.opensanctions.org/artifacts/sanctions/20240611014702-tcx/entities.delta.json",
"20240610194702-edl": "https://data.opensanctions.org/artifacts/sanctions/20240610194702-edl/entities.delta.json",
"20240610160433-oiz": "https://data.opensanctions.org/artifacts/sanctions/20240610160433-oiz/entities.delta.json"
}
}
If the version of your dataset loaded into your consumer application is older than any of the versions contained in this index file, you should conduct a full reload of the dataset. Otherwise, you can fetch each of these delta files in ascending order (i.e. oldest first) and parse their contents.
Delta data files contain FtM entities in the same structure as the entities.ftm.json
file format (they are also line-based JSON files). However, each entity is wrapped into an envelope that indicates a type of operation (one of ADD
, MOD
, or DEL
) applicable to the entity:
{"op": "ADD", "entity": {"id": "...", "schema": "...", "properties": {...}}}
{"op": "MOD", "entity": {"id": "...", "schema": "...", "properties": {...}}}
{"op": "DEL", "entity": {"id": "..."}}
While ADD
(create a new entity) and MOD
(change an existing entity) operations contain the full entity data, the DEL
operation will only indicate the entity ID, and not contain the (now removed) data.
One aspect to note here is that delta files also reflect the ongoing entity de-duplication/integration, which may cause entities to change their identifier. If two entities in the dataset have been merged, for example, this would cause two DEL
operations to be triggered (for each of the old source IDs), and one ADD
operation (for the new, combined ID).
OpenSanctions is free for non-commercial users. Businesses must acquire a data license to use the dataset.