Using the matching API to build a screening process

Screening checks are a different challenge to normal text searches: your query is supposed to describe a person or company in some detail to allow the OpenSanctions API to check if that entity (or a similar one) is flagged.

OpenSanctions' API server is a powerful way to query and access the entities in our database. The simplest way to use the API is to sign up for an API key and start screening your counterparties (customers, suppliers, etc.) international watchlists. Commercial license customers can also use an on-premise deployment to perform the same process within their own infrastructure.

The most basic way to do those bulk searches might be running simple text queries against the /search endpoint - but this will lead to imprecise and incomplete results. Instead, this guide will show you how to use the /match endpoint to get more precise screening results using query-by-example to do multi-attribute lookups. You can experiment with the /match endpoint via the advanced screening search.

Step 1: Speak the language

Let's say, for example, that you have a customers dataset that specifies the name, birth date, nationality and perhaps a national ID number for each person you want to check.

The first step would then be to implement a piece of code formats each of these entries conform with the entity format used by OpenSanctions, assigning each of the columns in your source data to one of the fields specified in the data dictionary (This, of course, works not just for people, but also companies, vessels, even crypto wallets).

Here's an example entity in JSON format:

{
    "schema": "Person",
    "properties": {
        "firstName": ["Arkadii"],
      	"fatherName": ["Romanovich"],
        "lastName": ["Rotenberg", "Ротенберг"],
        "birthDate": ["1951"],
        "nationality": ["Russia"],
    }
}

A few things to note:

  • The schema defines the type of entities to match this example against. Of course, the schema could also be Company, or a LegalEntity (a more general entity type that matches both people and companies!).
  • You can specify a list of property values, rather than a single value - for example, different variations of the name, or different addresses and identification numbers.
  • The API internally uses standardised formats for country codes, dates, phone numbers etc., but you can just supply a country name and the API will attempt to identify the correct country code (in this case: ru) for the entity.

Generating this JSON form of your records should be a simple exercise. Do not worry too much about details like whether a country name should live in the country or jurisdiction properties: the matching happens by data type (in this case: country), not precise field name.

Step 2: Choose where to look

OpenSanctions combines watchlists from hundreds of different data sources - some are sanctions lists, others databases of national politicians, even entities involved in crime. For this introduction, we'll query the whole database by using the default collection endpoint. This will produce results across all available data categories. In order to reduce false alarm rates, you will need to apply specific query filters later.

What data sources and collections will be queried is determined by the URL of the matching endpoint used in your integration, e.g. https://api.opensanctions.org/match/default.

Step 3: Chunk your lookups into batches

In order to avoid the overhead of sending thousands upon thousands of HTTP requests, you can group the entities to be matched into batches, sending a few of them at a time. A good batch size is 20 or 50, not 5000.

And now, the code

Below is an example Python script that demonstrates how to use the matching API. Note that when running this for your own data, you'll need to add a data source, and a place to store the highest-scoring matches for analyst review.

Note: This example uses authentication to access the hosted OpenSanctions API. If you're running the yente application, you can remove the API key header.

import requests
from pprint import pprint

# The OpenSanctions service API. This endpoint will only do sanctions checks.
URL = "https://api.opensanctions.org/match/sanctions?algorithm=logic-v2"

# Read an environment variable to get the API key:
API_KEY = os.environ.get("OPENSANCTIONS_API_KEY")

# Create an HTTP session which manages connections and defines shared header configuration:
session = requests.Session()
session.headers['Authorization'] = f"ApiKey {API_KEY}"

# A query for a person by name and birth date. Person names can be given as name parts 
# (ideal - shown here), or as a single "name" value (see company search below, less precise). 
EXAMPLE_1 = {
    "schema": "Person",
    "properties": {
        "firstName": ["Arkadii"],
      	"fatherName": ["Romanovich"],
        "lastName": ["Rotenberg", "Ротенберг"],
        "birthDate": ["1951"],
    },
}

# Similarly, a company search using just a name and jurisdiction.
EXAMPLE_2 = {
    "schema": "Company",
    "properties": {
        "name": ["Stroygazmontazh"],
        "jurisdiction": ["Russia"],
    },
}

# We put both of these queries into a matching batch, giving each of them an
# ID that we can recognize it by later:
BATCH = {"queries": {"q1": EXAMPLE_1, "q2": EXAMPLE_2}}

# This configures the scoring system.
PARAMS = {"algorithm": "best"}

# Send the batch off to the API and raise an exception for a non-OK response code.
response = session.post(URL, json=BATCH, params=PARAMS)
response.raise_for_status()

responses = response.json().get("responses")

# The responses will include a set of results for each entity, and a parsed version of
# the original query:
example_1_response = responses.get("q1")
example_2_response = responses.get("q2")

# You can use the returned query to debug if the API correctly parsed and interpreted 
# the queries you provided. If any of the fields or values are missing, it's an
# indication their format wasn't accepted by the system.
pprint(example_2_response["query"])

# The results are a list of entities, formatted using the same structure as your
# query examples. By default, the API will at most return five potential matches.
for result in example_2_response['results']:
    pprint(result)

Understanding the results

The results returned by the /match API contain basic information about each candidate or matching entity. For sanctioned entities, note of the programId property: it describes the sanctions program under which the designation was made.

If you want to retrieve additional details regarding an entity you can use the /entities/<id> endpoint to retrieve a nested representation that includes details about family and business relationships, and detailed sanctions designations. Each sanction-designated entity (Person, Company, Vessel etc.) can be tied to several Sanction objects. A Sanction object describes details about the sanctions imposed by an authority against an entity: the start and end dates, name and country of the authority, and the programId, which can be expanded into additional details on the relevant policy regime.

Of course, you can also view the OpenSanctions entity page (https://opensanctions.org/entities/<id>) for each result to see their documented connections to other items.

Reducing false alerts

If one of your queries returns a result, this is not immediately cause for alarm: the database for politically exposed persons in particular contains many individuals with common names, and matches will be fairly frequent. Instead, you should invest time to fine-tune the configuration of the matching system, and eventually also set up a process for human review.

The following strategies can be used to reduce error rates in results returned from the API:

  • Define good query filters to avoid screening against data sources or risk factors that are irrelevant to your use case.
  • Configure the scoring algorithm to define alerting thresholds relevant to your use case. A sanctions screening system may need to be more aggressive in matching names than a PEP screening process. Your specific data profile might benefit from up- or down-ranking one of the matching features we use.
  • Provide as much detail in your queries as possible. Explore the data dictionary to see the properties might be used as ways to quickly discard a match before it triggers an alarm.
Using the matching API to build a screening process - OpenSanctions