Frequently asked questions

#150: How can I download the bulk data?

Category: Bulk data · Last updated: · Permalink

You can download bulk data extracts of the database directly from this website, without any login or API key. While the bulk data files are free to use for non-commercial users, commercial use of the data requires a data license.

Picking a Data Collection

To download bulk data, start by visiting the OpenSanctions datasets page. Here, you can select a dataset or collection that matches your requirements, such as:

While there is a data collection for politically exposed persons (PEPs), we recommend downloading the complete database for PEP screening.

Choosing the Best Data Format

Once you've chosen a dataset or collection, you can select the data format that best suits your needs. We offer data in multiple formats, including:

Specialized formats are available for:

Data power users may also want to explore the statement-based CSV format, which provides per-value lineage and language metadata.

After selecting the desired format, click on the relevant download link on the dataset page to start the download process.

Understanding Data Updates

We update our data regularly to ensure up-to-date sanctions and PEP lists are available. To perform regular data updates, you can use one of the following methods:

  1. Re-fetching on a Schedule:

    • Re-fetch the latest version of the desired bulk data snapshot at scheduled intervals (e.g., every six hours or daily).
    • Use the URL format: https://data.opensanctions.org/datasets/latest/<dataset>/<format>
      • Replace <dataset> with the name of the dataset or collection.
      • Replace <format> with the file name of the format (e.g., entities.ftm.json or targets.simple.csv).
    • Note: We cannot predict the specific time at which new exports are published. Export cadences are available on the dataset overview page.
  2. Metadata Index Checking:

    • Fetch the dataset metadata index to check for new dataset versions.
    • Access metadata at: https://data.opensanctions.org/datasets/<dataset>/latest/index.json
    • Use the version ID or last_export timestamp to determine if an update is needed.
    • Utilize SHA1 checksums in the dataset.resources section to detect if the export is different from the file that was previously published.
    • Recommended frequency: Every 30 minutes for frequent updates.
  3. Delta Update Mechanism:

    • Use the delta update mechanism to retrieve incremental update files.
    • These files describe additions, modifications, and removals of entities between data export versions.

Accessing Historical Data

If you require historical data, you can access past versions of each dataset or collection by specifying the desired date (in YYYYMMDD format) in the download URL. For example:

https://data.opensanctions.org/datasets/20231001/default/entities.ftm.json

This URL fetches a version of the dataset published on October 1, 2023. Historical data is available for most core sanctions lists from around July 2021. Check the "Date Added" on the dataset profile page for specific availability details.

A shortcut is available to download the latest published version of a dataset:

https://data.opensanctions.org/datasets/latest/default/entities.ftm.json

Handling Entity Data Deletions and ID Changes

When an entity is no longer present in the source data, it will not appear in subsequent updates, and the total count of entities in entities.ftm.json will decrease accordingly. We do not provide explicit markers for deleted entities in the export files.

Entity IDs can change due to several reasons:

  • Merging Duplicates: Entities from multiple sources may be merged, resulting in a new cluster ID.
  • Source Updates: Changes in source data or processing methods may alter entity IDs.

To manage these changes, it's advisable to track both the entity's primary ID and the referents list to maintain consistency. This approach helps you avoid duplicate alerts and ensures you're referencing the correct entities. For detailed guidance, refer to our Identifiers Documentation.

Additional Information

  • Data Completeness: Even if no changes have occurred at the source, data files are re-exported to ensure you have the most recent confirmed state.
  • Data Deletion Policy: We reflect deletions from source data, typically within one week of the change.
  • Data Formats and Documentation: For more details on data formats and how to work with them, refer to the bulk data documentation.

Related questions

« Back to full FAQ index