Loading
Loading
A practical framework for evaluating whether to build a custom data platform or adopt a modern data stack with off-the-shelf tools — from a team that has helped 20+ Nordic companies answer this question.
Author
Tom Bergström
Published
28 April 2026
Reading time
7 min read
Topics
data-platform, architecture, nordic-tech, build-vs-buy
The build-vs-buy decision in data platform architecture is one of the most consequential infrastructure choices a mid-market company can make. It is also one of the most poorly framed.
The framing that gets companies into trouble goes something like this: we can buy a managed platform and be operational faster, or we can build something custom and own it long-term. Which is cheaper?
That is not the right question. The right question is: which option will we still be able to operate, extend, and trust in three years — given the team we have, the data volumes we expect, and the strategic outcomes we are actually trying to achieve?
Managed data platforms — Snowflake, Databricks, Azure Synapse, and their equivalents — have become genuinely good. They abstract away infrastructure management, scale horizontally, and offer connectors to most of the systems mid-market companies run. The case for them is real.
But 'buy' in data infrastructure is not the same as buying software. It is more like renting a city. You have a lot of freedom inside the walls. The walls still exist.
What managed platforms do well
What they do less well
Egress costs compound quickly at mid-market data volumes. This is the number that surprises most buyers 18 months in.
Building a custom data platform does not mean starting from a blank terminal. It means assembling a stack from components — orchestration layer, storage layer, transformation layer, serving layer — and making deliberate choices at each level.
The modern open-source data stack (dbt, Apache Airflow or Prefect, Delta Lake or Iceberg, with AWS or Azure as the compute and storage backbone) is mature enough that a well-architected build is no longer the multi-year project it was five years ago. The component selection is still consequential, but the components themselves are battle-tested.
What a well-built platform does well
What it requires
Rather than a binary, the useful model is a set of questions that reveal which direction fits your situation.
| Condition | Direction |
|---|---|
| Your data volumes are under 500GB/month | Managed platform is likely justified — build overhead exceeds benefit |
| You have in-house data engineers | Build is viable. Without them, managed is safer short-term |
| You need results in 90 days | Managed wins on speed. No close contest |
| You have 5+ years of platform horizon | Build economics improve significantly at this timeline |
| Your data is your product | Build. You cannot outsource the core of your product to a vendor roadmap |
| Compliance is primary (healthcare, finance) | Evaluate managed platforms with strong compliance tooling first |
| Cost predictability matters more than speed | Build, with proper architecture. Managed pricing models are variable by design |
The single most important thing we have learned from building data platforms is that the architecture decision cannot be deferred. This is true for both build and buy — but the consequences of getting it wrong differ.
With a managed platform, a poor architecture means rework inside the vendor's constraints. You can fix it, but you pay in migration cost and accumulated technical debt. With a custom build, a poor architecture means rework at the infrastructure level. The cost is higher and the timeline longer.
In either case, the structure has to be decided before a single pipeline runs. Requirements need to be understood deeply — not just the immediate use cases, but the likely evolution of those use cases over the next three to five years. Data schemas change. Business logic changes. Query patterns change. An architecture that cannot absorb those changes gracefully will not survive them.
The architecture decision cannot be deferred. The structure has to be decided before a single pipeline runs.
Data platform architecture cannot be evaluated without considering how AI workloads will interact with it. This changes the analysis in a few important ways.
Managed platforms have invested heavily in AI-ready infrastructure — vector stores, embedding pipelines, model serving integrations. If AI workloads are likely to be significant within two years, the managed platform's head start matters more than it would have three years ago.
But the same caution applies: if your data is the substrate for your AI models, vendor lock-in on the data layer means vendor dependency on your model quality. That is a strategic exposure that some companies are comfortable with and some are not.
The principle that applies to AI tooling generally applies here: adopt what is stable and proven. The companies that integrated agentic AI workflows into their data platforms without architectural controls are the ones with the most expensive technical debt to unwind right now.
Buy if: you need speed, your volumes are moderate, your team does not have deep data engineering capability, and three to five years from now is not your primary planning horizon.
Build if: data is central to your product, you have the engineering capability to operate it (or can acquire it), you are planning for five-plus years, and cost predictability at scale matters.
Hybrid if: you are at mid-scale, building fast, and want to de-risk the lock-in question. Start with a managed platform for the use cases where it excels; build the components where you need ownership. This is increasingly the architecture that makes sense — but it requires a clear overall design, or you end up with neither the advantages of managed nor the freedom of custom.
Author
Indpro contributor and Nordic technology expert.
10 pages of practical insight on operating models, compensation benchmarks, and a hiring playbook. Free PDF.
Download the Free GuideOr reach us directly: sales@indpro.se · +46 73 932 21 38