Article

We Found €1.2M in Revenue Leakage Hiding in 3 Disconnected Databases

A data audit revealed €1.2M in annual revenue being misattributed across 3 disconnected databases with no shared key. Here's exactly what we found and how we fixed it.

Author

Pavel Siddique

Published

21 May 2026

Reading time

9 min read

Topics

data-engineering, data-platform, enterprise

The revenue was there. The customers were there. The transactions were recorded. But €1.2M per year was being optimised for the wrong acquisition channels because three databases — CRM, billing, and product — had no shared key linking them. Nobody had noticed because every individual system was working correctly. The problem only existed in the space between them.

€1.2MAnnual revenue mis-attributed

3Disconnected databases

47Data quality issues found

6 wksTime to resolve root cause

What "Disconnected Databases" Actually Means in Practice

When people hear "disconnected databases" they imagine broken systems or missing data. The reality is more subtle and more dangerous. Each of the three databases was functioning correctly. The CRM held customer acquisition data. The billing system held revenue. The product database held usage, cohort, and churn data. The problem was that there was no reliable shared identifier linking all three — no single customer_id that meant the same thing across systems.

The CRM used its own internal ID. Billing used a contract number. The product database used an email address as the primary key — which changed when customers updated their contact details, and sometimes duplicated when a single company had multiple seat-holders. When the analytics team tried to answer "which acquisition channel generates our highest-value customers," they were joining tables on best-guess fuzzy matches. The joins were technically valid. The conclusions were not.

The marketing team had been doubling down on a paid search channel for 14 months based on this data. The channel appeared to generate customers with 40% higher LTV. The true picture: that channel attracted customers in a specific industry segment who happened to be high-value. The acquisition channel itself wasn't the driver. The company had been misallocating marketing budget for over a year.

How the Audit Found It

A data audit doesn't start with databases. It starts with questions the business is trying to answer and works backwards to whether the data actually supports those answers. Our first session with this client's CTO and Head of Data produced a list of eight business questions they relied on for strategy. We then asked: "Show us the query that answers each of these." Four of the eight either didn't have a query or had a query that depended on a join logic nobody had documented.

The join between CRM and billing was done by company name — matched with a LOWER() and TRIM() function to normalize casing and whitespace. That works until a company rebrands, is acquired, or has a slightly different legal name in the two systems. We found 340 customer records where the join silently failed and defaulted to NULL, which the analytics layer treated as "unattributed" and excluded from channel calculations. Those 340 records contained the most valuable customers in the cohort.

The join between billing and the product database used email. Email changed for 18% of customers over a 24-month period — account migrations, domain changes, role changes. Each email change broke the historical join. We found five cohorts of customers who were treated as new customers in the product analytics but had 18+ months of billing history. The churn model was trained on this data. It consistently underestimated churn risk for a specific customer type because half of their historical signals were invisible to the model.

"Every system was accurate in isolation. The €1.2M problem didn't exist in any single database — it existed in the assumption that you could join them. That assumption was wrong." — Pavel Siddique, CEO, Indpro AB

The 47 Issues and How They Were Prioritised

The full audit produced 47 data quality issues across the three databases. Not all of them were revenue-critical. Triaging the 47 into actionable tiers took one working day. We scored each issue on two dimensions: revenue impact (how much money was at stake if left unfixed) and fix complexity (how long to resolve). The resulting 2×2 matrix was clear: three issues sat in the high-impact, low-complexity quadrant and were causing the €1.2M misattribution. Those three went first.

Issue Tier	Count	Revenue Impact	Fix Complexity	Priority
Critical	3	€1.2M+/year	Medium (2–3 weeks)	Immediate
High	9	€50K–200K/year	Low–Medium	Sprint 2
Medium	18	Operational impact	Medium	Backlog
Low	17	Reporting accuracy	Low	Hygiene

The three critical issues all traced back to the same root cause: no canonical customer identifier agreed upon at system design time. Each database had been built independently over four years. The CRM was bought off-the-shelf. Billing was custom-built by a contractor. The product database was built by the in-house team. Nobody owned the cross-system data model — because nobody thought cross-system joining would be a core analytics requirement until it was.

Running on data you haven't stress-tested? Our data audit process finds what you can't see from inside any single system.

Talk to Our Data Team Read: The 47-Issue Audit

The Fix: A Canonical Customer Identifier

The solution was not technically complex. The work was. We designed a global_cust_id — a company-level identifier that existed outside all three systems, maintained in a lightweight reference table, and propagated to each system via a nightly sync job. Every new customer record gets a global_cust_id at creation. Every historical record was reconciled using a four-pass matching algorithm: exact match on tax ID (found in both CRM and billing), then fuzzy match on company name with human review above 85% confidence, then manual resolution for the 40 records below that threshold.

The reconciliation took four weeks. Two engineers and one analyst. It was not glamorous work. It was also the most commercially valuable four weeks this team had spent in two years. Once the shared key existed, every join became exact, every cohort became accurate, and the churn model got retrained on clean data. The paid search channel that had looked like a high-LTV driver dropped to average. The channel they'd been under-investing in — direct and referral from a specific industry — was actually generating 60% of their highest-LTV customers.

The marketing team redirected 30% of paid search budget to referral programmes and direct outreach in the identified industry segment. That's the real value of fixing data: not the report that looks different. It's the decision that changes.

What to Check in Your Own Stack

You don't need a €1.2M problem to make this worth examining. If your business runs on more than two operational databases that were built at different times by different teams, the question is not whether you have cross-system join issues — it's how severe they are. The fastest diagnostic is this: pick your most important business metric. Find the query that produces it. Count the join conditions. If any join uses a field that can change over the customer lifecycle (email, company name, phone number), you have exposure.

A shared primary key — maintained in a reference table that predates all operational systems — is the correct fix. Building it retroactively takes 3–6 weeks depending on data volume and system access. It's not the kind of project that generates a ticket or gets prioritised in a roadmap sprint. Which is exactly why it rarely gets done until someone finds the €1.2M.

The diagnostic question: Pick your most important revenue metric. Find the query. If any JOIN uses a mutable field — email, name, phone — you have the same exposure. Fix the key, not the query.

Trade-offs and Honest Limitations

A canonical identifier solves the cross-system join problem but doesn't fix data quality within each individual system. Of the 47 issues we found, 44 required additional work beyond the shared key project. Budget for both: typically 4–6 weeks for the key reconciliation, then a second phase for the remaining quality issues in priority order. Don't expect the first phase to solve everything — but do expect it to make everything else measurable and fixable.

The other limitation: this kind of audit requires access. Access to query all three systems, access to the people who built them, and access to the business stakeholders who can confirm what "correct" looks like for each metric. If your data team is siloed from the commercial team, the audit produces findings but not resolution. The fix is a cross-functional project, not a data engineering project.

Frequently Asked Questions

How do you identify which data quality issues are revenue-critical vs. operational?

We start with business questions, not database tables. For each strategic metric the business relies on, we trace the query and identify every assumption in the join logic. Issues that affect a query used for resource allocation or investment decisions are automatically high-priority, regardless of their technical severity.

How long does a typical data audit take?

The audit itself — scoping, access, querying, issue identification, and triage — takes two to three weeks for a three-system environment. The reconciliation work that follows depends on data volume and system accessibility, but typically runs four to eight weeks. See our full breakdown in the 47-issue audit post .

Is €1.2M a typical finding, or was this an unusually large problem?

It's at the high end of what we typically find, but not unusual for a company with 3–5 years of independent system growth and no dedicated data engineering function. The €200K range is more common for smaller companies. The pattern — no shared key, joins on mutable fields, silent NULL exclusions — appears in roughly 70% of the companies we audit.

Pavel Siddique

CEO & Co-Founder

Pavel founded Indpro in 2010 with a vision to bridge Nordic engineering culture with India's deep tech talent pool. Based in Stockholm, he oversees strategy and client relationships.

Connect on LinkedIn →

Next articleView all

nordic-techarchitecture

14 API Endpoints, Fully Tested, in One Afternoon — Indpro AI Code Factory Build Log

AI Code Factory build log: 14 fully tested API endpoints delivered in one afternoon. What the agent wrote, what the developer wrote, and what the numbers mean.

arrow_forward

The Nordic CTO's Guide to Scaling Tech Teams with India

10 pages of practical insight on operating models, compensation benchmarks, and a hiring playbook. Free PDF.

Download the Free Guide

Enjoyed this article? Let's build something together.

Start a Conversation

Or reach us directly: sales@indpro.se · +46 73 932 21 38

arrow_back

Article

We Found €1.2M in Revenue Leakage Hiding in 3 Disconnected Databases

A data audit revealed €1.2M in annual revenue being misattributed across 3 disconnected databases with no shared key. Here's exactly what we found and how we fixed it.

Author

Pavel Siddique

Published

21 May 2026

Reading time

9 min read

Topics

data-engineering, data-platform, enterprise

€1.2MAnnual revenue mis-attributed

3Disconnected databases

47Data quality issues found

6 wksTime to resolve root cause

What "Disconnected Databases" Actually Means in Practice

How the Audit Found It

"Every system was accurate in isolation. The €1.2M problem didn't exist in any single database — it existed in the assumption that you could join them. That assumption was wrong." — Pavel Siddique, CEO, Indpro AB

The 47 Issues and How They Were Prioritised

Issue Tier	Count	Revenue Impact	Fix Complexity	Priority
Critical	3	€1.2M+/year	Medium (2–3 weeks)	Immediate
High	9	€50K–200K/year	Low–Medium	Sprint 2
Medium	18	Operational impact	Medium	Backlog
Low	17	Reporting accuracy	Low	Hygiene

Running on data you haven't stress-tested? Our data audit process finds what you can't see from inside any single system.

Talk to Our Data Team Read: The 47-Issue Audit

The Fix: A Canonical Customer Identifier

What to Check in Your Own Stack

Trade-offs and Honest Limitations

Frequently Asked Questions

How do you identify which data quality issues are revenue-critical vs. operational?

How long does a typical data audit take?

Is €1.2M a typical finding, or was this an unusually large problem?

Pavel Siddique

CEO & Co-Founder

Pavel founded Indpro in 2010 with a vision to bridge Nordic engineering culture with India's deep tech talent pool. Based in Stockholm, he oversees strategy and client relationships.

Connect on LinkedIn →

Next articleView all

nordic-techarchitecture

14 API Endpoints, Fully Tested, in One Afternoon — Indpro AI Code Factory Build Log

AI Code Factory build log: 14 fully tested API endpoints delivered in one afternoon. What the agent wrote, what the developer wrote, and what the numbers mean.

arrow_forward

The Nordic CTO's Guide to Scaling Tech Teams with India

10 pages of practical insight on operating models, compensation benchmarks, and a hiring playbook. Free PDF.

Download the Free Guide

Enjoyed this article? Let's build something together.

Start a Conversation

Or reach us directly: sales@indpro.se · +46 73 932 21 38