Frequently asked questions

#80: How can I receive incremental updates of the data?

Category: Bulk data · Last updated: · Permalink

BETA: The functionality described on this page is currently in beta stage and the mechanism may still change.

We release incremental update files (delta patches) for most datasets, allowing users to replicate the database state in a downstream system without re-importing the full data after each update.

Note: Delta updates are available in the FollowTheMoney entity format, not any of the other export formats. We also maintain a set of scripts for making CSV-based deltas more suited for human consumption. We're not currently planning to make deltas available for any of the nested export formats.

What's a delta?

Incremental update data is designed to help a consumer application (ie. a downstream tool using the OpenSanctions data) to update its copy of the database to match the current state of the published data.

It's important to realise that the changes included in such delta files can stem either from the upstream data publisher (e.g. the US government publishes a new set of designations), or from changes in the way that OpenSanctions processes and cleans the upstream data (e.g. our data parsers are updated). For most data consumers, the distinction is irrelevant - but sometimes it's helpful to realise that an entity can slightly change in the data without having changed at the source.

How to access delta files

Each dataset published on this site has many versions, i.e. export builds created at a certain time. The basic process for performing an incremental update is therefore to:

  1. Identify the current version of the dataset loaded into the data consumer application (You need to store the latest version you've updated to in your own system).
  2. Retrieve a list of all versions that have been published since then.
  3. Retrieve delta files for each of these versions in the order in which they were released.
  4. Apply the change commands contained in each of the delta data files to the database of the consuming application.

The current version of a published dataset is contained in its metadata index, available from this URL (replace sanctions with the name of the relevant dataset):

https://data.opensanctions.org/datasets/latest/sanctions/index.json

The version field in this file tells you the latest build ID, and the delta_url field will point you to an index of the last 100 versions of the dataset, and the delta data file URLs for each of these:

{
	"versions": {
    	"20240611014702-tcx": "https://data.opensanctions.org/artifacts/sanctions/20240611014702-tcx/entities.delta.json",
    	"20240610194702-edl": "https://data.opensanctions.org/artifacts/sanctions/20240610194702-edl/entities.delta.json",
    	"20240610160433-oiz": "https://data.opensanctions.org/artifacts/sanctions/20240610160433-oiz/entities.delta.json"
        }
}

If the version of your dataset loaded into your consumer application is older than any of the versions contained in this index file, you should conduct a full reload of the dataset. Otherwise, you can fetch each of these delta files in ascending order (i.e. oldest first) and parse their contents.

Understanding delta data files

Delta data files contain FtM entities in the same structure as the entities.ftm.json file format (they are also line-based JSON files). However, each entity is wrapped into an envelope that indicates a type of operation (one of ADD, MOD, or DEL) applicable to the entity:

{"op": "ADD", "entity": {"id": "...", "schema": "...", "properties": {...}}}
{"op": "MOD", "entity": {"id": "...", "schema": "...", "properties": {...}}}
{"op": "DEL", "entity": {"id": "..."}}

While ADD (create a new entity) and MOD (change an existing entity) operations contain the full entity data, the DEL operation will only indicate the entity ID, and not contain the (now removed) data.

One aspect to note here is that delta files also reflect the ongoing entity de-duplication/integration, which may cause entities to change their identifier. If two entities in the dataset have been merged, for example, this would cause two DEL operations to be triggered (for each of the old source IDs), and one ADD operation (for the new, combined ID).

Related questions

« Back to full FAQ index