Using the data

The datasets published by the project are made available in multiple formats, suitable for different purposes.

If you would like to see another file format or slice of the data included in this project, please get in touch to discuss your idea.

Please also refer to the FAQ about API access and the data structure reference.

Formats

Bulk data is made available in the following formats for each data source and collection we maintain:

Simplified CSV (comma-separated values) table

The tabular data export is targeted at analysts who wish to access the OpenSanctions data in a spreadsheet application like Microsoft Excel (help: open CSV in Excel).

Unfortunately, the structure of persons of interest data does not easily lend itself to a simple tabular form. For example, a person might have multiple nationalities, or have been a member of several political parties in their career.

The "Simplified CSV" format addresses this by presenting a highly limited view of the data, in which only a select set of key columns is provided. These include:

  • id: the unique identifier of the given entity.
  • schema: the entity type.
  • name: the display name of the given entity.
  • aliases: any alias names (e.g. other scripts, nom de guerre) provided by the data sources.
  • birth_date: for people, their birth date.
  • countries: a list of countries linked to this entity. Includes countries of residence, nationalities and corporate jurisdictions.
  • addresses: a list of known addresses for the entity.
  • identifiers: identifiers such as corporate registrations, passport numbers or tax identifiers linked to this sanctions target.
  • sanctions: details regarding the sanctions designation, if any.
  • phones: a list of phone numbers in E.164 format.
  • emails: a list of email addresses linked to the entity.
  • dataset: the dataset this entity is in.
  • last_seen: the last time this entity was observed in source data.
  • first_seen: the earliest date this entity has been noticed by OpenSanctions.

Further technical notes:

  • The CSV is formatted using , (comma) as a delimiter, encoded as utf-8.
  • Some fields in the CSV can contain multiple values in one cell. These are stored as a nested CSV using the ; (semicolon) delimiter.
  • The export contains only targeted entities, not all entities in the dataset.

Names-only text file

The simplest format we publish is a simple text file with the names of all persons and companies targeted in each dataset, one name per line. The format can be used for:

The plain text files are encoded in utf-8. If non-latin names don't show up correctly in your application, make sure you've opened the file with the right encoding.

JSON/FollowTheMoney-based formats

We offer two JSON-based export formats that are both based on the FollowTheMoney (FtM). They are a close representation of the internal data structure of OpenSanctions. The nested JSON format should be the preferred import method for software-based data consumers.

You can get a basic overview of the entity data structure in the FtM documentation, and browse the data dictionary to see details regarding the properties used by OpenSanctions.

Both formats use line-delimited JSON: each line of the exported files is a separate entity. While the FollowTheMoney entities (entities.ftm.json) export contains one entity per line, the nested JSON (targets.nested.json) format contains one line per target, with adjacent entities (e.g. addresses, sanctions) nested inside the properties section of the data structure.

The nested format and some of the provided metadata (dataset, first_seen, last_seen) are not part of FtM, but extensions developed for OpenSanctions.

Some further documentation regarding FtM tooling: