Frequently asked questions

#73: Can I select a specific subset of datasets to match against?

Category: API · Last updated: · Permalink

In some cases, you may want to customise the set of datasets which a query is to search, e.g. to select only a subset of relevant datasets.

You can use the include_dataset argument to pick only a select set of datasets: /match/default?include_dataset=us_ofac_sdn&include_dataset=us_ofac_cons.

Inversely, you can exclude a specific dataset from a collection query like this: /match/default?exclude_dataset=everypolitician. Note: this will hide all entities that exist in this dataset from all queries, even if the relevant entity combines data from multiple datasets and those other datasets are not excluded.

Using a manifest

When using the on-premise version of yente, you can also use the custom datasets to do this by adding a manifest file like this:

catalogs:
  - url: "https://data.opensanctions.org/datasets/latest/index.json"
    # Limit the dataset scope of the entities which will be indexed into yente. Useful
    # values include `default`, `sanctions` or `peps`. This will speed up the update
    # process in which data is re-indexed.
    scope: sanctions
    resource_name: entities.ftm.json
datasets:
  - name: europe
    title: European datasets
    datasets:
      - eu_fsf
      - eu_travel_bans
      - eu_sanctions_map
      - be_fod_sanctions
      - fr_tresor_gels_avoir
      # - gb_hmt_sanctions

This will create a new dataset collection named europe, which can be used in query endpoints, e.g. /match/europe and /search/europe. Please note that the other datasets included in the sanctions collection will still be stored in the index and attributes originating from those sources will be included in the person and company profiles. If your use cases requires building a completely custom dataset, please contact us.

Related questions

« Back to full FAQ index