Data enrichment

Sanctions lists and PEP registers tell you who or what is designated. They rarely tell you what companies they own, who sits on their boards, or which subsidiaries they control. That context lives in company registries, financial databases, and public knowledge bases — none of which are risk-focused datasets.

We bridge that gap by cross-matching entities in our database against reference datasets and pulling in the relevant fragments: directors, shareholders, subsidiaries, family members, financial identifiers, and corporate relationships. The result is a more connected entity graph that supports compliance workflows beyond simple name screening.

What enrichment adds

Corporate hierarchies — directors, shareholders, beneficial owners, and subsidiaries drawn from official company registries.
Personal networks — family members, associates, and public roles from biographical databases.
Authoritative identifiers — LEI codes, PermID, FIGI, and OpenCorporates links for downstream data integration.
Investigative context — ICIJ Offshore Leaks records, OFAC enforcement actions, and port state control inspections.

A screening system limited to published lists can only match names and identifiers authorities have chosen to include. Enrichment surfaces the rest.

How enrichment works

The enrichment pipeline, using nomenklatura, matches entities in our database against reference datasets. When it finds a match, it imports the matched entity and its immediate network into an enrichment dataset.

Risk-focused vs. reference datasets

Risk-focused datasets — sanctions lists, debarment registers, PEP databases — contain only entities that a government or authority has explicitly flagged. We ingest these in full.

Reference datasets are general-purpose databases with no inherent risk focus. A national company registry can contain millions of entities, with the vast majority having no sanctions relevance.

We use reference datasets selectively, matching entities already in our database against a reference dataset, and importing only the fragments connected to an entity with a risk topic. A national registry becomes relevant when a designated entity appears as a shareholder. At that point, we pull in the company, its other directors, and its subsidiaries.

The enrichment cycle

Step 1: Matching

For each entity in the input — a sanctions list, a PEP register, or a prior enrichment result — the system searches for it in the reference dataset. For full-registry sources like company registers, this uses an entity resolution algorithm that compares names, dates of birth, registration numbers, and other identifying features. For API-based sources like Wikidata, a direct name search is used.

Step 2: Review

Confident matches are imported immediately. Uncertain candidates — multiple "John Smith" results, for example — enter a review queue where a human analyst or an automated matching algorithm verifies them before inclusion.

Step 3: Expansion

For confirmed matches, the system traverses the reference dataset to pull in related entities, representing relationships as interstitial entities (Ownership, Directorship, etc). If a sanctioned person is confirmed as a director of a company in a national registry, the system imports that company's other directors, its shareholders, and its subsidiaries.

Step 4: Enrichment datasets

Results are stored in dedicated enrichment datasets (often named ext_<source>, e.g. ext_ru_egrul). These contain only the subset of the reference dataset for which a link to an entity in our database has been confirmed.

Provenance and traceability

Every property value in the database carries full provenance. Each entity record includes a datasets array, listing every data source that contributed to it — an entity appearing in both us_ofac_sdn and ext_ru_egrul, for example, would list both. At the statement level, individual property values carry the data source, timestamps, and the raw value before normalization.

Risk propagation

Enrichment on its own finds entities already in the database within reference datasets. Risk propagation extends this by annotating newly discovered entities with risk topics so they become eligible for further enrichment.

In practice:

A designated person appears on a sanctions list (topic: sanction).
The enrichment pipeline finds them as a shareholder of Company A in a national registry and imports Company A.
An analysis step examines Company A's relationship to the sanctioned person and tags it sanction.linked.
On the next enrichment run, Company A — now carrying a risk topic — is itself matched against the registry. The system discovers that Company A owns 100% of Company B and imports it.
Company B is tagged sanction.linked, and the cycle continues.

Each iteration adds roughly a week of processing time. Registry enrichers are computationally expensive and run weekly. Plan for three to four weeks for a three-tier corporate hierarchy to fully materialize.

Risk propagation also powers the identification of relatives and close associates (RCAs). When a Wikidata enrichment run discovers that a politically exposed person (PEP) has a spouse, the analysis step tags the spouse as role.rca. That person then becomes eligible for enrichment, surfacing any companies they direct or other public roles they hold.

Topics that trigger enrichment

Not every entity in the database is enriched. Enrichers filter by risk topic to focus on entities that warrant deeper investigation:

Sanctions and export controls — sanction, sanction.linked, asset.frozen, export.control, export.control.linked, export.risk
Politically exposed persons — role.pep, role.rca
Regulatory and enforcement — debarment, reg.action, reg.warn
Other risk categories — poi (persons of interest), gov.soe (state-owned enterprises)

This filtering keeps the system tractable. Enriching every entity against every reference dataset would be computationally prohibitive and produce more noise than signal.

Beneficial ownership

Several major sanctions frameworks impose obligations that extend beyond the published lists. The most prominent is the OFAC 50 Percent Rule: any entity owned 50% or more, directly or indirectly, by one or more designated persons must be treated as blocked — even if it doesn't appear on the SDN list. The EU and UK apply similar beneficial ownership tests.

These rules can't be met by screening against watchlists alone. They require ownership graph data: knowing who owns what, through which intermediaries, and at what percentages.

Our enrichment pipeline contributes to this, but doesn't solve it. Where we have company registry coverage — currently about a dozen jurisdictions, mostly in Europe and the post-Soviet space — we can trace structural ownership links between designated persons and companies, and expand those through multiple tiers of corporate hierarchy. We also surface offshore connections through the ICIJ Offshore Leaks database (Panama Papers, Pandora Papers, and others), beneficial ownership disclosures via Open Ownership, and financial identifiers through GLEIF.

That said, the coverage is partial and the gaps are significant:

No ownership percentages. Most registries don't reliably report percentage stakes, so we identify structural links (X is a shareholder of Y) rather than calculating aggregate ownership fractions. We can't tell you whether someone holds 50% or 5%.
Limited geographic reach. Registry enrichment covers Russia, Ukraine, the UK, and a handful of other European jurisdictions. It doesn't cover China, the US (no federal company registry), India, the Gulf states, or most offshore centers (BVI, Cayman, Panama).
No nominee or trust transparency. Beneficial ownership structures that use nominees, trusts, or layered offshore vehicles are invisible to registry-based enrichment.

Enrichment gives you a meaningful head start on ownership-linked sanctions compliance, but it doesn't produce an exhaustive global database of entities affected by the 50 Percent and similar rules. No single data source does.

It's also worth naming the structural problem: gathering intelligence on the opaque networks associated with designated persons and companies is a task for which governments — equipped with investigative bodies, intelligence agencies, and access to beneficial ownership registers, as well as suspicious activity reports filed by financial institutions — are far better equipped than the private sector and civil society. The 50 Percent Rule, while well-intended, must also be considered a failure of government to properly perform its own role in the design and implementation of effective sanctions regimes.

Reference datasets

All reference datasets are listed in the enrichers collection. The categories below describe how each group contributes to the entity graph.

Company registries are the core of ownership enrichment. We cross-match against full national company registries to build out corporate hierarchies: directors, shareholders, subsidiaries, and beneficial owners.

Currently integrated: Russia (EGRUL), Ukraine (EDR), UK (Companies House PSC), Cyprus, Czech Republic, Estonia, Georgia, Kazakhstan, Latvia, Moldova, and Bosnia and Herzegovina.

Financial and corporate identifiers attach authoritative reference numbers to entities already in the database. These don't expand the graph — they make existing entities easier to match and integrate in downstream systems.

Sources: GLEIF (Legal Entity Identifiers), SWIFT BIC (bank identifiers), ESMA FIRDS and FCA FIRDS (financial instruments), LSEG PermID, OpenFIGI, OpenCorporates, and the US IRS FATCA FFI List.

Biographical and investigative databases surface personal networks, public roles, and investigative findings that add context to listed persons.

Sources: Wikidata (family members, associates, and public roles — persons only) and ICIJ Offshore Leaks (Panama Papers, Pandora Papers, and other leak investigations).

Shipping and transport records link vessels to owning and operating companies, useful for mapping networks around sanctions-linked shipping entities.

Sources: Abuja MoU (West and Central Africa) and Tokyo MoU (Asia-Pacific).

Enforcement documentation links official enforcement records to designated entities, providing a dated record of the underlying action.

Sources: OFAC press releases.

Data enrichment

What enrichment adds#

How enrichment works#

Risk-focused vs. reference datasets#

The enrichment cycle#

Step 1: Matching#

Step 2: Review#

Step 3: Expansion#

Step 4: Enrichment datasets#

Provenance and traceability#

Risk propagation#

Topics that trigger enrichment#

Beneficial ownership#

Reference datasets#

On this page