The database includes hundreds of data sources - government watchlists, research databases, contextual resources. Not all screening processes will need to query the whole dataset, and picking your scope carefully is key to reducing false positive alerts.
The databases has various filtering mechanisms that expose technical and legal criteria which can be used to limit to scope of a query. Filtering occurs at two levels: the filtering of data sources, and the filtering of classes of entities described within those data sources. Keep in mind that a single entity can be sourced from multiple data sources.
Person, Company, Vessel, CryptoWallet. If you are screening a list that includes both organizations and natural persons, use LegalEntity – it's an umbrella term for both. Schemata are explained in the data dictionary.default collection, while sanctions is a subset of sources limited to government-issued sanctions lists. Inside of that, us_sanctions limits the scope to only US (federal) watchlists, and eu_sanctions combines EU and member state watchlists. Additional collections are listed here.us_ofac_sdn) can also be a filter. For example, you may wish to query all sanctions lists, except those published by China (cn_sanctions) and Russia (ru_mfa_sanctions).role.pep, sanction, a company might be sanction.linked, reg.warn etc.
Sanction entities close that gap: they're linked to companies and people, and detail the name of the sanctioning authority, the reason, time span, and measures imposed.Using the default collection endpoint (/match/default) is a good place to start. Pick relevant topics (eg. sanction, sanction.linked for sanctions screening, add role.pep, role.rca and reg.action for basic AML checks), and run some experiments.
Then, use the include_dataset and exclude_dataset parameters to either pick a custom set of sources, or exclude sources that don't have regulatory relevance and produce false positives. Use the include_dataset argument to pick only a select set of datasets: /match/default?include_dataset=us_ofac_sdn&include_dataset=us_ofac_cons, and use exclude_dataset to filter a specific dataset from a collection query like this: /match/default?exclude_dataset=iq_aml_list.
Avoid using the peps collection, instead filter for the relevant topics (role.pep, role.rca, and poi) and consider implementing country filters.
When using the on-premise version of yente, you can also use the custom datasets function to define custom collections. To do this by adding a manifest file like this:
catalogs:
- url: "https://data.opensanctions.org/datasets/latest/index.json"
# Limit the dataset scope of the entities which will be indexed into yente. Useful
# values include `default`, `sanctions` or `peps`. This will speed up the update
# process in which data is re-indexed.
scope: sanctions
resource_name: entities.ftm.json
datasets:
- name: europe
title: European datasets
datasets:
- eu_fsf
- eu_travel_bans
- eu_sanctions_map
- be_fod_sanctions
- fr_tresor_gels_avoir
# - gb_hmt_sanctions
This will create a new dataset collection named europe, which can be used in query endpoints, e.g. /match/europe and /search/europe.
sanction topic.