How DueDil solves the 3 big challenges of processing and aggregating officer data

How DueDil solves the 3 big challenges of processing and aggregating officer data

With over half a million new businesses incorporated every year and about the same number ceasing to trade, turbulence in the SME market is a given.


This volatility poses challenges for banks and insurance firms looking to provide financial services or cover to the sector. How can you gauge the true risk profile of an organisation and know who or what to cover or deliver services to?

This challenge is made even harder by the somewhat opaque world of officer data - a critical component of understanding the profile of an organisation and its true nature.

In this article, we take a look at the challenges facing banks and insurers in uncovering the true UBO or officer behind an organisation.

For each challenge, we will show how our Business Information Graph (B.I.G.)™ has been designed to tackle them, provide tremendous value and give the information needed to build a thorough picture of the organisation and its officers before any decisions have to be made.


Untangling the complexity of officer identity

Understanding who an officer is (as in the true human entity) is vitally important to assess risk. The problem is, however, that they are extremely complex.

Unlike organisations, officers are human and as such are much more fluid and difficult to pin down.

Void of unique ID reference numbers (minus intensely personal and private documents such as passports), officers and UBOs don’t have a centralised publicly available system to analyse.

And even when available, these aren’t perfect due to multi-country nationals having different IDs despite being the same underlying entity.

Layer on top of this how interconnected these entities are and the scale of the problem comes sharply into focus.

For example, the same officer entity can be a director at multiple companies, without the digital representation of this necessarily representing the physical truth. 

How to assess risk effectively

Attributes to enrich information about an officer typically come from disparate datasets such as FCA or PEPS data. With companies all this is easy to do - the unique identifier makes it simple. All you need to do is match the identifier.

But when it comes to untangling the spaghetti-like officer connections, it takes some serious computational thinking.

There are three fundamental problems when processing and aggregating officer data in order to create a complete and accurate picture of risk:


  1. Determining whether two unkeyed officers are the same entity (conflation)

  2. Assigning a unique key to the resolve entity where the key exhibits desirable properties such as stability - a particularly significant challenge since source data comes from multiple sources and is highly fragmented

  3. Collecting data that best represents the officer entity from the various conflated profiles


In years gone by, we focused on a single source as the backbone of our officer data. This worked in principle, but it meant we didn’t have total control over the keys assigned to officers. 

This left us in danger if the keys were to change (which they did, frequently) that we would have to decode what was different and recode our data output.

It also raised challenges with Newly Incorporated Companies (NICs), since we were unable to surface officers properly until they appeared in our source’s data - many days later.

All of these factors precipitated the change we made recently to a more robust, extensible and generalisable approach. 

Building a ‘data lake’

Recognising that depending on one data source for officers was not enough, we began building a more comprehensive set of data sources to pool information together. 

As well as our previous suppliers, we enhanced our capabilities at scraping data from Companies House (CoHo). This enabled us to leverage CoHo as a first class data supplier for officer entities.

What we then did was clean and normalise all officer datasets and put them together in a ‘data lake’ where, although we track provenance for traceability, there is no difference between officers based on their source. 

Generating this structure and applying our intelligent machine learning and AI tools provides us with a significant competitive advantage and an unrivalled capacity to tackle the challenges of officer identification.

Fragmented data from Companies House

Unified data from DueDil

How we solve the challenge of conflation

With all this data swimming in the lake, what criteria should we use to determine whether two officers are in fact the same entity? 

Due to the sensitivity of officer data (asserting that two entities are the same is a strong statement and should be explainable), we decided to go with a comprehensive rules-based system.

Outlined here are the rules that we have adopted. 

Note, these are applied in a chain fashion and not individually. And the name component is the ‘normalised fully qualified name’ consisting of a normalised (removed whitespace, normalised capitalisation etc) fully qualified (forename + middle name + surname) name which has been shortened to fqname.

1. name/bday

Conflate officers if they have the same:

  • Fqname
  • Birth year
  • Birth month

We specifically exclude birth day since this information is often either not available for privacy restrictions or is incorrectly filled out as the first of the month which is usually untrue. 

2. Fqname/corporate match

Conflate officers if they have the same:

  • Fqname
  • Corporate match ID

A separate, upstream job attempts to generate matches between officers, which are corporate entities, and our companies dataset. This job performs the matching based on a special name parameter in corporate officers and performs further normalisation.


3. Name/appointment set intersection

Directors are first grouped by name. Within each name group, we look at the appointment of each officer, where these appointment sets are sets of company IDs at which officers are appointed. 

What does that mean? Well, here’s an example (all the officers have the same fqname).


officer_a: {company_a, company_b}

officer_b: {company_a, company_c, company_d}

officer_c: {company_c}

officer_d: {company_z}


In the example above, after performing set intersection conflation, the result is that officer_a, officer_b, and officer_c are the same person, and officer_d is a separate individual.


The reasons why this is market leading and a DueDil USP are varied:

  1. Volume of data

We can perform this kind of analysis because we have large volumes of data at rest. This would be extremely difficult to do on the fly by querying, say, the Companies House API.

We have over 10 million officer profiles (and counting as we backfill some historical data) stored in our databases from CoHo and other sources. 

CoHo doesn’t currently offer an up to date bulk product and has restrictive rate limits, meaning it would be extremely difficult for a competitor or customer to catch up on our competitive advantage.


  1. Cloud infrastructure

We are fully cloud native, meaning all the computationally intensive process of calculating set intersections is easy to run on high performance, scalable hardware. 

Furthermore, we have a proprietary, efficient algorithm developed in house to allow us to perform this calculation at a significantly lower cost.


  1. Nobody else does it

To the best of our knowledge, there is simply nobody else able to offer this level of officer conflation. Other providers don’t even do the basic conflations based on birthday data!

Why is this important?

Well, around 10% (~2M) of officers have fragmented profiles at source. Which means at least 1 in 10 officers in the CoHo universe is a duplicate profile. We reconcile them so they are correctly identified as a single officer.


Creating a keying system that works

Once we’ve collected grouped officer data, we need to assign the officer a key so clients can refer to the officer entity.

This key can be used for a variety of things, such as for keeping internal CRM data up to date or performing in life KYB checks on the officer entity to see if anything has changed (eg. disqualifications).

DueDil deploys a proprietary strategy for keying officers. This strategy derives key data from core officer properties that are unlikely to change over time, thus enabling us to generate stable keys much more unlikely to change.

There are of course still the occasional cases where a key can change, but DueDil is ahead of the competition here, too. We have implemented an officer ID redirection system which keeps track of all keys associated with an officer.

It calculates data such that we can intelligently redirect a user to the new key when they make a request through the API. 


Selecting the right record

And finally, once we merge officer records, we will have multiple digital versions of the same officer entity. How do we pick the correct representation? 

For example, an officer entity with several digital profiles will have his/her nationality specified multiple times, but we only want to surface a single nationality field in our reconciled entity.

We currently try and select the data based on which digital record is the freshest. There are some limitations to this approach but so far adopting this practice seems to perform more than acceptably. 

The value to our clients

Everything that we do is focused on delivering value to our clients, using their feedback to fuel our innovation and deliver a product that we know is more than fit for purpose. In this instance, the client value is potentially huge in avoiding fraud or high risk organisations.

Reconciled, non-fragmented officer profiles give our clients a better view of the people behind the businesses. Extremely rich profiles are gathered in one place, complete with data around disqualifications or gazette notices.

Our Business Information Graph (B.I.G.)™ benefits from this activity too by becoming more easily traversable, enabling users to hop between companies that an officer is associated with.

Clients are also then able to track the officer using our proprietary keying system and see if core details about the officer change over time. Like, for example, they become involved with companies that have a poor financial track record. 


To find out more about how our B.I.G.™ works and how we support our clients in their regulatory endeavours, check out our technology page, or to see it in action, book a demonstration with our product experts and data analysts.


Sign up to the latest DueDil news!
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.