Building ML Infrastructure Costs 3-5x More Than Most Marketing Teams Expect

ML infrastructure requires pipelines, serving, and monitoring — not just a model

Most teams think "building ML" means training a model. A 2015 NeurIPS paper by Sculley et al. found that ML code accounts for roughly 5% of a real-world ML system ¹. The other 95% is infrastructure: data pipelines, feature stores, serving layers, monitoring, and configuration.

For a marketing use case like propensity scoring, here is what that infrastructure actually looks like:

Data pipelines pull behavioral events from your site, clean them, join them with CRM or transaction data, and deliver them to a feature store. This runs continuously as visitors interact with your site.

Serving infrastructure takes a trained model and returns predictions in real time, while the visitor is still on your site. This requires low-latency endpoints, load balancing, and failover handling.

Monitoring tracks model accuracy over time. User behavior changes over time. A model trained on Q4 holiday traffic will degrade in Q1. Without automated drift detection and retraining triggers, prediction accuracy degrades silently.

Retraining pipelines retrain models on fresh data on a schedule or when monitoring flags degradation.

Without any one of these layers, the model never reaches production.

Building in-house means staffing for pipelines, serving, monitoring, and retraining

Each of those infrastructure layers requires dedicated staffing. These are full-time roles — not part-time assignments from existing engineering staff or short-term consulting engagements.

Start with data pipelines. Someone needs to build and maintain the ingestion layer that pulls clickstream events, cleans malformed records, joins behavioral data with your CRM, and writes it to a feature store. When a schema changes upstream, someone needs to fix the pipeline before stale features start degrading predictions. This is not a one-time build. It is ongoing work.

Serving infrastructure needs its own expertise. Low-latency prediction endpoints, load balancing, failover, and autoscaling are backend engineering problems, not data science problems. The engineer who trained your propensity model is probably not the engineer who should be managing Kubernetes clusters and optimizing inference latency.

Monitoring and retraining add another layer of staffing. Someone has to build drift detection, define retraining triggers, validate new model versions against holdout data, and manage staged rollouts. One case study found that model updates consumed 22% of total budget, while performance monitoring alone took 15% of technical resources ². Continuous retraining used 22% more resources than the initial deployment ².

In practice, industry reports suggest that maintaining ML systems requires roughly 3 dedicated engineers per 50 developers using the platform ². A small team of 2-3 ML and MLOps specialists represents over $500,000 annually in U.S. personnel costs alone ³. MLOps expertise — the people who actually know how to run models in production — commands a 15-30% salary premium over standard ML engineers ⁴. In major U.S. tech markets, a mid-level ML engineer runs $187,000-$220,000; a senior, $220,000-$275,000 ⁴. These are base figures before benefits, equity, or recruiting costs.

And staffing is not the only cost that surprises teams. A healthcare organization found that 63% of its ML expenses came from data pipeline optimization and GPU cluster management — not the model itself ². Industry analyses suggest that subscription and tooling costs represent less than 40% of actual expenses for most ML implementations; the rest is engineering, infrastructure, and operational overhead ².

Proof-of-concept models often ship in weeks. Production systems take 6-12 months to stabilize. Organizations that fail to account for these comprehensive costs risk budget overruns of 30-40% in their first year ³. One analysis found 87% of models are never deployed ⁵; another puts the figure at 90% ⁶.

Total cost comparison for a marketing ML stack

Here is a concrete cost comparison for a mid-size marketing team running four predictive models: propensity to purchase, churn risk, LTV estimation, and send-time optimization.

Build: annual cost for a 50K-customer base

Line item	Low estimate	High estimate
Cloud compute (training + inference)	$36,000	$72,000
Data storage and pipelines	$12,000	$24,000
Feature store / vector DB	$6,000	$18,000
Monitoring and observability tools	$4,800	$12,000
CI/CD and MLOps tooling	$3,600	$9,600
Infrastructure subtotal	$62,400	$135,600

This table covers infrastructure only. It excludes the engineering salaries detailed in the previous section. Factor in that staffing cost, and the fully loaded annual total reaches $562,000–$936,000 before any model produces a prediction in production.

Industry reports show that cloud and tooling subscription fees represent less than 40% of actual expenses when building in-house ². The remaining 60%+ comes from integration work, debugging data pipelines, retraining schedules, and incident response — tasks that do not appear in initial budgets.

Buy: annual cost for equivalent capability

SaaS platforms that deliver marketing-specific ML — propensity scoring, recommendations, LTV prediction, send-time optimization — typically fall in the $1,200–$15,000 per year range depending on event volume and model count. Almeta ML starts at $99/month. That range covers hosting, model updates, monitoring, and integrations with ad platforms and email tools.

Side-by-side summary

	Build (yr 1)	Buy (yr 1)
Infrastructure	$62K–$136K	$0 (included)
Engineering staff	$500K–$800K	$0
Platform / SaaS fee	$0	$1.2K–$15K
Integration effort (est.)	3–6 months	Days to weeks
Total	$562K–$936K	$1.2K–$15K

The cost gap narrows when a company already employs a dedicated ML team and needs capabilities beyond what any SaaS platform offers — custom architectures, proprietary training data pipelines, or regulatory requirements that prohibit third-party data processing.

How to audit an in-house ML project's true run-rate cost

Most teams stop tracking costs after launch. That makes it difficult to evaluate whether the project still justifies its resource allocation.

Start with compute. Pull your cloud invoices for the past six months and isolate ML-specific workloads: training runs, inference serving, data storage, and pipeline orchestration. These costs grow as models retrain on larger datasets and serve more requests.

Next, quantify staffing. Industry reports put maintenance staffing at 3 dedicated engineers per 50 developers on staff ². Calculate fully loaded costs for those roles, including benefits, tooling licenses, and management overhead. Factor in recruiting costs from turnover — backfilling a senior ML role takes an estimated three to six months.

Then estimate opportunity cost. Every engineer maintaining an existing model is an engineer not building new products. List the projects your team deferred or abandoned because of maintenance demands.

Finally, project these numbers forward. Using the Upsilon IT range of 25–75% maintenance allocation, a $600K initial build translates to $150K–$450K annually in sustaining work — before feature improvements ³.

Sum these four categories quarterly.

When custom infrastructure is worth it: proprietary data assets and unsolved problems

Most companies should default to buying ML infrastructure. But a small number of situations genuinely require custom builds.

Proprietary data assets that vendors cannot replicate. If your competitive advantage depends on data signals unique to your business, off-the-shelf platforms may not support them. A logistics company predicting delivery failures from proprietary sensor data across its fleet has inputs no vendor has modeled for. The data structure, update frequency, and feature engineering required may fall outside what any platform offers. In these cases, custom infrastructure exists to protect a data advantage, not to replicate what platforms already provide.

Problems no vendor has solved. Some prediction tasks sit outside the standard catalog of churn, conversion, and recommendation models. If you are scoring risk on a novel financial instrument or predicting equipment failure in a manufacturing process with no industry benchmark, you will not find a pre-built solution. Custom builds are justified when the problem itself is unsolved, not just when existing solutions lack specific features.

Regulatory or compliance constraints that prohibit third-party processing. Certain industries face data residency, auditability, or model explainability requirements that rule out external platforms entirely. If regulators require full control over model internals and data pipelines, building in-house may be the only viable path.

Scale that breaks vendor economics. At extremely high inference volumes, per-prediction pricing from vendors can exceed the cost of running dedicated infrastructure. Companies processing hundreds of millions of predictions daily sometimes reach a crossover point where owned infrastructure costs less.

Propensity scores, recommendations, and LTV prediction are solved problems — don't rebuild them

Propensity scoring, product recommendations, and lifetime value prediction share a common trait: the underlying algorithms are well-documented, widely implemented, and available as platform-level capabilities from major enterprise vendors. Adobe Experience Platform includes built-in ML-powered propensity scoring ⁷. Databricks offers propensity scoring as a turnkey solution accelerator ⁸. These are mature, production-grade implementations.

Building these models from scratch means re-deriving solutions that already exist in reliable, tested form. The core math — logistic regression for propensity, collaborative filtering for recommendations, probabilistic models for LTV — is standard. Most engineering effort goes into data pipelines, feature stores, model monitoring, and retraining infrastructure. That work is substantial, and it delivers no competitive advantage when the output is functionally identical to what a platform provides.

Vodafone Ukraine achieved a 30% churn reduction and 2% incremental revenue using ML-based propensity modeling ⁹. These results required clean data and correct implementation of standard techniques.

Vendor lock-in is a legitimate concern. If your scoring models, feature definitions, and prediction pipelines live entirely inside one vendor's ecosystem, switching costs rise. Look for platforms that export model artifacts in open formats like ONNX, which enables portability between frameworks ¹⁰. Evaluate whether the platform exposes your underlying data or traps it. Check whether you can extract trained models and run them independently.

The practical test for customization limits: identify the three most unusual modeling requirements your business has. If a platform handles all three, custom engineering adds cost without adding capability. If it handles none, building makes sense.

How to adopt a platform without losing data portability

Start with platforms that export raw data in open formats — CSV, Parquet, or direct database access. This is the single most important criterion. If you cannot extract your data, every other feature is irrelevant.

Before signing any contract, verify three things:

Data export: Can you pull all training data, feature sets, and model outputs on demand, without filing a support ticket?
Model export: Does the platform support standard formats like ONNX or serialized model files you can deploy independently?
API independence: Can your production systems call a fallback endpoint if the vendor goes down?

Run a portability test during evaluation. Export a sample dataset and a trained model. Deploy that model outside the platform. If this takes more than a day, the vendor's portability claims are marketing.

Layer custom work gradually. Use the platform for baseline predictions — churn, conversion likelihood, customer lifetime value — and build custom models only where the platform falls short. This avoids rebuilding what already works while preserving the option to internalize later.

Negotiate data export clauses in your contract. Specify formats, frequency limits, and deletion timelines. Treat data portability as a procurement requirement, not an afterthought.

The goal is optionality: use third-party infrastructure for speed now without surrendering control permanently.

Three questions that decide build vs. buy for your team

Before committing engineering resources in either direction, run your decision through three filters.

Do you have dedicated ML engineers who will own this long-term? Building infrastructure requires ongoing maintenance: retraining pipelines, monitoring for drift, updating feature stores, patching dependencies. If your ML team is two people who also handle analytics, they will spend more time on infrastructure maintenance than on models. Buying that infrastructure lets them spend time on model work specific to your business.

Is your prediction problem unique to your business? Some companies operate in domains where off-the-shelf models genuinely cannot capture the signal. If your data structure, latency requirements, or compliance constraints fall outside what existing platforms support, building may be the only viable path. For most customer behavior prediction use cases, the underlying patterns are well-understood and already implemented in production-grade systems.

What is your acceptable time to first production model? Building from scratch means six to twelve months before a model serves live traffic. Buying compresses that to weeks. If your business needs predictions now to validate a strategy or capture a market window, the speed difference outweighs any long-term flexibility you might gain from a custom stack.

If two or three answers favor buying, the build path is unlikely to justify the engineering investment.

FAQ

Q: How long does it take to get a custom ML model into production?

Most teams need 6 to 12 months to move from prototype to a production model that handles real traffic reliably. The model itself takes weeks. The remaining months go to building data pipelines, monitoring, retraining workflows, and failover systems. Off-the-shelf platforms that already have this infrastructure can deliver working predictions within weeks of integration.

Q: What hidden costs do teams miss when budgeting for in-house ML?

Infrastructure maintenance is the biggest cost gap. After the initial build, teams routinely spend the majority of engineering time on pipeline repairs, data quality fixes, and dependency updates rather than improving models ¹. Add cloud compute costs that scale with data volume, on-call rotations, and the opportunity cost of engineers not working on your core product.

Q: Do pre-built ML platforms sacrifice accuracy compared to custom models?

For well-understood marketing problems like propensity scoring or product recommendations, platform accuracy matches or exceeds most custom implementations. The reason: platforms encode years of feature engineering and are optimized across many deployments. Custom models only outperform when the problem is genuinely novel or requires proprietary data representations that no vendor supports.

Q: Can we start with a platform and migrate to in-house later if needed?

Yes, if you verify data portability before committing. Confirm you retain full ownership of raw data and can export model outputs in open formats. Platforms that lock data into proprietary formats create switching costs that grow monthly. The data portability checklist in the section above covers the specific criteria to evaluate.

Q: What is the minimum team size needed to maintain production ML infrastructure?

A functional ML engineering team needs, at minimum, three to four dedicated people regardless of overall company size: one ML engineer for model development, one data engineer for pipelines, one infrastructure or MLOps engineer for deployment and monitoring, and ideally a data analyst for validation. Below that threshold, individuals get stretched across too many responsibilities, and reliability suffers. This does not include the data scientists who define the business problems.

Q: When should a marketing team absolutely not build in-house?

When the ML application is a solved problem with commoditized solutions. Building custom propensity models, recommendation engines, or LTV predictors from scratch means replicating work that vendors have already optimized across thousands of deployments. Reserve engineering resources for problems where your specific business context creates a genuine competitive advantage that no existing tool addresses.

Q: How does Almeta ML handle the infrastructure side of marketing ML?

Almeta ML runs the full prediction infrastructure — data ingestion, feature computation, model training, and serving — behind a web tag or built-in integrations. It delivers propensity scores, product recommendations, LTV estimates, and send-time optimization without requiring your team to manage pipelines or retrain models. Predictions feed directly into Google Ads, Meta, TikTok, and email platforms through native integrations. You retain full ownership of your data and can export it at any time.

References

Hidden Technical Debt in Machine Learning Systems (papers.neurips.cc) ↩ ↩²
The Real Cost of AI: Calculating TCO for AI/ML Systems (mondaysys.com) ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷
AI Development Cost: A Comprehensive Overview (upsilonit.com) ↩ ↩² ↩³
Machine Learning Engineer Salary Benchmarks: US Market (signifytechnology.com) ↩ ↩²
Models Are Rarely Deployed: An Industry-wide Failure in Machine Learning Leadership (kdnuggets.com) ↩
Why 90% of Machine Learning Models Never Make It to Production (redapt.com) ↩
Propensity Score Using ML Predictive Model (experienceleague.adobe.com) ↩
Propensity Scoring Solution Accelerator (databricks.com) ↩
Propensity Model: How to Predict Customer Behavior (altexsoft.com) ↩
Open Neural Network Exchange (onnx.ai) ↩