← Back to work
Experience
sparkscalahadoopmachine-learningbayesian-inferencecausal-impactadtechbidding

Petabyte-Scale Ad Bidding Infrastructure

Marin’s bidding platform computed bids for over 6 billion objects — mostly keywords, plus creatives and placements — every four hours, pushing updates to publishers on the same cycle. The production Hadoop cluster had roughly 400TB of memory. It was a serious beast, and we had to optimise carefully around other consumers of the stack.

I inherited a Spark/Scala codebase that had been recent upgraded from a legacy Spring/Java application, but adoption was minial. No documentation, no tests, highly nuanced and esoteric. Understanding what it actually did was the single biggest technical challenge. The team was sixteen people: eight engineers, three QEs, two product managers, a data scientist, and two designers. I was resonsible for seeing it mature and completing the rollowou, sunsetting the legacy systems.

The work I’m most proud of was overhauling the clustering algorithm. The previous implementation had low adoption and no demonstrated lift. We took a thorough study of the problem and built a novel hybrid approach: a regression tree using categorical keyword features and prior performance data to identify similar objects, then pruning the leaves back a level to define group membership. Those groups then served as Bayesian priors for objects without enough data to bid on individually — the cold-start problem, again.

Proving it worked was its own challenge. In digital marketing, there’s constant change — running a clean control/exposed experiment over the 1–2 months required is nearly impossible. We automated measurement using Causal Impact and accepted some noise. The median uplift was 15%, though individual results ranged from -100% to +300%. That range tells you everything about the domain.

Separately, I designed and launched a cross-publisher budget optimisation system — a different product, integrated with the bidding platform — now managing $78M+ in annual ad spend across 32 customers.