Tracking sanctions over time

Targets are added, modified and deleted from public lists every day. You can track these changes using data.

Representing data changes over time is a wickedly hard problem. While the approach presented below is working well, it only represents an interim step before the data pipeline can be made more precise in its change detection and tracking capabilities.

Delta work in progress: https://github.com/opensanctions/opensanctions/issues/238

Using delta_targets.py to compute changes

The script delta_targets.py can be found in the contrib/delta folder of the OpenSanctions main code repository. Running this script does not require installing the OpenSanctions toolchain, but only the Python libraries defined in requirements.txt in the same folder.

The tool can be used to download two dated snapshots of any dataset or collection in the archive and get identify entities which have been added, removed, modified or merged in the interim. For example, you can invoke it like this:

$ python delta_targets.py sanctions -p 20240101 -c 20240301

This will generate a delta report between the previous (-p) and current (-c) timestamp of the data for all sources included in the sanctions collection. The output will be two files - in CSV and JSON format - indicating each changed target entity and the type of change it has experienced.

You could also generate a delta report for a specific data source like this:

$ python delta_targets.py eu_fsf -p 20240301 -c 20240304
$ python delta_targets.py us_ofac_sdn -p 20240301 -c 20240304

(Hint: the dataset name - e.g. us_ofac_sdn - is always the last part of the URL of the dataset profile page on the web site: https://www.opensanctions.org/datasets/us_ofac_sdn/)

Several caveats apply to this process:

  • Entities that are marked MODIFIED can be modified for two reasons: either the data source changed their representation of the record in the data (source change), or the OpenSanctions data pipeline was modified in order to better reflect the data (parser change). For example, it's possible that the data source used an unusual name to describe a country and we then adapted our pipeline to properly handle that new country name. This does not apply to ADDED or REMOVED entities - those changes are always results of source modifications.
  • Entities marked MERGED are always also MODIFIED. Merging entities takes place as part of the de-duplication process. Read more about the resulting identifier system here.

Picking the right dated extracts

The dated extracts for data sources are updated each time the source is crawled. For example, the OFAC SDN list is crawled and updated every two hours. Therefore, if you specify the dated version of us_ofac_sdn as 20240301, you're using an extract generated late in the day, just before midnight.

This means that if you want to compute the changes to that list between March 3 and March 6, 2024, the start date you'd want to specify starts a day earlier, using the data generated just before midnight on March 2nd. That way any changes made during the day on March 3 would already be reflected.

A less common case is source crawlers that do not run every day. You can see if this is the case by inspecting frequency badge on the dataset profile page, under Last updated. Dated extracts will not exist for dates on which the crawler did not execute. In those cases, you may have to experiment with the date stamps a little bit to get the closest date when a run was executed.