Frequently asked questions

#158: What are the known limitations of the matching API?

Category: API · Last updated: · Permalink

Matching entities from multiple databases is a complex problem. The matchers included in yente (and the OpenSanctions hosted API) provide solutions that have several known limitations. These limitations are most visible in scenarios where the query data provided by the API consumer is extremely limited (e.g., name-only matching).

Some known limitations:

  • Non-Western Writing Systems: Name matching is less precise when used in conjunction with writing systems that are not Western-style alphabets. In particular, the fuzzy comparison between different writing systems will produce increased error rates. This affects writing systems including Arabic/Farsi, Burmese, the systems used in China, Japan, Korea, and many Indian languages.
  • Phonetic Matching Limitations: Phonetic matching (Soundex, Metaphone) does not support any non-alphabet writing systems (e.g. abugida, logographic and syllabaric writing systems).
  • Company Name Variations: The company name matching mechanism is particularly vulnerable to misspellings in the legal type parts of company names (e.g., Lymited vs. Limited).
  • Name Aliases and Nicknames: Some name comparisons require dictionary alias approaches (e.g., matching Alexander and Sasha). Such dictionaries are not currently included in the OpenSanctions matching system.

Several vendors of advanced entity matching technology have integrated OpenSanctions data into their solutions. We're happy to put you in touch with those vendors.

Related questions

« Back to full FAQ index