The database includes hundreds of data sources - government watchlists, research databases, contextual resources. Not all screening processes will need to query the whole dataset, and picking your scope carefully is key to reducing false positive alerts.
The databases has various filtering mechanisms that expose technical and legal criteria which can be used to limit to scope of a query. Filtering occurs at two levels: the filtering of data sources, and the filtering of classes of entities described within those data sources. Keep in mind that a single entity can be sourced from multiple data sources.
Person
, Company
, Vessel
, CryptoWallet
. If you are screening a list that includes both organizations and natural persons, use LegalEntity
– it's an umbrella term for both. Schemata are explained in the data dictionary.default
collection, while sanctions
is a subset of sources limited to government-issued sanctions lists. Inside of that, us_sanctions
limits the scope to only US (federal) watchlists, and eu_sanctions
combines EU and member state watchlists. Additional collections are listed here.us_ofac_sdn
) can also be a filter. For example, you may wish to query all sanctions
lists, except those published by China (cn_sanctions
) and Russia (ru_mfa_sanctions
).role.pep
, sanction
, a company might be sanction.linked
, reg.warn
etc.
Sanction
entities close that gap: they're linked to companies and people, and detail the name of the sanctioning authority, the reason, time span, and measures imposed.Using the default
collection endpoint (/match/default
) is a good place to start. Pick relevant topics (eg. sanction
, sanction.linked
for sanctions screening, add role.pep
, role.rca
and reg.action
for basic AML checks), and run some experiments.
Then, use the include_dataset
and exclude_dataset
parameters to either pick a custom set of sources, or exclude sources that don't have regulatory relevance and produce false positives. Use the include_dataset
argument to pick only a select set of datasets: /match/default?include_dataset=us_ofac_sdn&include_dataset=us_ofac_cons
, and use exclude_dataset
to filter a specific dataset from a collection query like this: /match/default?exclude_dataset=iq_aml_list
.
Avoid using the peps
collection, instead filter for the relevant topics (role.pep
, role.rca
, and poi
) and consider implementing country filters.
When using the on-premise version of yente
, you can also use the custom datasets function to define custom collections. To do this by adding a manifest file like this:
catalogs:
- url: "https://data.opensanctions.org/datasets/latest/index.json"
# Limit the dataset scope of the entities which will be indexed into yente. Useful
# values include `default`, `sanctions` or `peps`. This will speed up the update
# process in which data is re-indexed.
scope: sanctions
resource_name: entities.ftm.json
datasets:
- name: europe
title: European datasets
datasets:
- eu_fsf
- eu_travel_bans
- eu_sanctions_map
- be_fod_sanctions
- fr_tresor_gels_avoir
# - gb_hmt_sanctions
This will create a new dataset collection named europe
, which can be used in query endpoints, e.g. /match/europe
and /search/europe
.
sanction
topic.