Language Intelligence

Overview

Being a global Business Verification provider, handling language complexities is something that we feel is fundamental to our success and providing our clients with the easiest interaction with our various KYB Services.

We want to take the complexities of various language requirements out of the clients flow and make it a part of our process & intelligence layer. 

There are two main points during the verification process that we apply this language intelligence - before we make a request to each source & once we’ve gotten a response back.

Each step has a specific purpose that we’ll outline below. 

Pre Source Request

Each one of our sources behave differently per country depending on the structure of the underlying data. Some sources store data in a local native language while others standardize & store in English. Because of this complexity, the best way to send requests to our sources varies greatly. 

As an example - a single source may provide optimal results in country X when being queried in English but provide optimal results in country Y when being queried in Simplified Chinese. Because of this high level of variation, we’ve developed the ability to define the language of our requests to each source by country & language. This is done by creating a rule / set of rules that can be adjusted as needed. A sample rule could read like this:

Source X in Japan, for ‘BusinessName’ input - IF the detected input language is Japanese, Translate or Transliterate this to English prior to making the request’

This indicates that for Source X, they provide optimal results in Japan when querying in English.

This tool will provide us the highest likelihood of finding & selecting the right company within our Vendors result list. 

Currently Configured Countries:

Japan

Korea

Taiwan

Thailand

More to be added on a regular basis

Post Source Response

The second part to our language intelligence program deals with the data once we’ve gotten it back from the datasource. Each source may return a business name in different languages even within the same country. This causes an interesting problem when trying to match client input with a response.

To solve this problem - we’ve introduced a method to make multiple attempts at matching using a combination of languages. 

We initially try to match the client’s exact input with the vendor’s exact response. 

If step one results in a no-match, we attempt to detect the language for both the input & returned business. If they are two different languages, we will translate or transliterate either the input or returned value to ensure the two languages are the same and attempt to match again. 

We’ve seen this to be an effective approach that on average can change an original ‘no-match’ to a ‘match’ ~24% of the time.

To note - we don’t make changes to the original input and clearly define the different variants of the input & returned business names in a separate data field:

Input Business Name Translated

Input Business Name Transliterated

Return Business Name Translated

Return Business Name Transliterated 

Within the ‘Standardized Business Names’ object, we’ll also indicate the name pairing that led to the ‘match’ sign for a given verification. 

In the example below, you can see that the pairing to match was the 

TranslatedInputName & the ReturnedBusinessName

In NAPI - this may look like the following:

StandardizedBusinessNames [
{
Name: “ABC Widgets“,
Type: “InputBusinessName“,
usedForMatching: false
},
{
Name: “ABCWIDGETSINCHINESE“,
Type: “TranslatedInputName“,
usedForMatching: true
},
{
Name: “ABC Holdings“,
Type: “ReturnedAlternateName“,
usedForMatching: false
},
{
Name: “12351 Holdings“,
Type: “AlternateName“,
usedForMatching: false
},
{
Name: “ABCWIDGETSINCHINESE“,
Type: “ReturnedBusinessName“,
usedForMatching: true
}
]