Back to Insights
MMM2026.03.01

What Data Does MMM Need?

The three data layers MMM requires (4P, media, external variables), collection methods, and data security principles. Most companies already have sufficient data.


You understand that Marketing Mix Modeling (MMM) is effective for analyzing channel-level ROI and optimizing budgets. But one practical question remains.

"Does our company have the data to actually do this analysis?"

This is the first question that comes up when companies consider adopting MMM. And in most cases, the answer is "you already have enough." The data MMM requires isn't exotic. Sales data accumulated in your ERP, advertising spend reports from media agencies, and publicly available external indicators—most companies already possess or can easily obtain this data.

This article covers:

  1. The three data layers MMM uses and how to collect each
  2. How to integrate cross-platform data into a unified analysis framework
  3. Data security and sensitive information handling principles

MMM Data Three Layers: 4P, Media, External Variables
MMM Data Three Layers: 4P, Media, External Variables

The 3 Data Layers MMM Uses

MMM uses three layers of data to explain sales variation. Each layer captures different factors that influence revenue.

Layer 1: 4P Data — The Business Foundation

Data corresponding to the marketing 4Ps (Product, Price, Place, Promotion), reflecting internal business activities.

CategoryData ItemsPrimary Source
**Product**SKU-level sales, volume, new product launch datesERP, POS
**Price**List price, actual selling price, discount rate, promotional pricingERP, commerce platforms
**Place**Store-level sales, distribution channel mix, new store openingsERP, distributor reports
**Promotion**Promotion period, discount depth, scope, bundle compositionERP, marketing team records

The core source for this data is the ERP. Most companies already have years of transaction data accumulated in their ERP, which can be extracted and reprocessed through various formulas into MMM input format.

For example, analyzing "promotion effect" requires not just a simple discount rate but a combined metric of discount depth × duration × applicable SKU scope. Creating these combinations from raw ERP data is the essence of data preprocessing.

Layer 2: Media Data — The Complete Picture of Marketing Investment

Data recording costs and exposure by channel for marketing investments. This is the core input for estimating channel-level ROI in MMM.

Channel TypeData ItemsCollection Method
**Offline**TV GRP, radio spots, print placements, OOH exposureMedia rep reports, CSV upload
**Online**Impressions, clicks, cost (CPM/CPC), conversionsAPI auto-collection (Meta, Google, Naver, etc.)
**New Media**Influencer campaigns, content exposure, sponsorship costsAgency reports, manual entry
**BTL**Sampling volume, event participants, experience group sizeMarketing team records, CSV

Digital channels support automated collection via API. Major platforms like Meta Ads, Google Ads, and Naver Search Ads provide standardized APIs, enabling automatic retrieval of impression, click, and cost data on daily or weekly basis.

Offline channels are relatively manual. For TV, GRP reports from media representatives are used; for OOH, exposure estimates from media companies are uploaded as CSV files.

What matters here is cross-platform data integration. Each platform uses different metric names and units. TV uses GRP, digital uses Impressions, influencer uses View Count—connecting these differently-unitized data points into a single analysis framework requires a marketing taxonomy.

A taxonomy is a system that classifies channels, campaigns, and creatives into a consistent hierarchy. It creates linkages like "this TV ad and this digital campaign are part of the same brand campaign," transforming fragmented data into a format ready for integrated analysis.

Layer 3: External Variables — Influences Beyond Marketing

External factors that affect sales but are unrelated to marketing activities. Controlling for these variables in MMM is essential to accurately estimate marketing's pure effect.

CategoryData ItemsCollection Method
**Seasonality**Holidays, seasonal events, day-of-week effectsCalendar-based auto-generation
**Economy**Consumer price index, exchange rates, interest ratesPublic APIs (Bank of Korea, Statistics Korea)
**Events**Weather, sports events, social issuesWeather API, news crawling
**Competition**Competitor promotions, new launches, price changesCommerce crawling, search trends, buzz analysis

Most external variables can be automatically collected via public APIs. Seasonality is auto-generated from calendars, economic indicators come from central bank and statistics bureau APIs, and weather data comes from meteorological service APIs.

Competitor data is harder to obtain directly. In this case, proxy indicators are used. Crawling competitor product pricing and promotion history from commerce sites, or using search trend data and social buzz volume as proxy variables for competitor activity. Even without direct revenue data, these proxies are sufficient to reflect competitive dynamics in the model.


Data Security: Safe Collection, Management, and Disposal

Providing corporate data externally for MMM analysis naturally raises security concerns. MadMatics strictly adheres to the following principles.

No PII (Personally Identifiable Information)

MMM uses aggregated data. Weekly sales totals, channel-level ad spend aggregates, monthly exposure volumes—all aggregate-level numbers. PII such as individual customer names, contacts, or purchase histories is neither collected nor needed.

Scale Transformation for Sensitive Business Data

Sensitive business figures like absolute revenue and advertising budgets can have scaling transformations applied. For example, converting actual sales to an index format makes the original scale unidentifiable while preserving identical variation patterns, leaving analysis results unaffected.

Security MeasureDescription
**No PII collection**Personal identifiable information is never collected, stored, or processed
**Scale transformation**Sensitive figures indexed to de-identify original scale
**Purpose limitation**Collected data cannot be used beyond MMM analysis
**Secure disposal**Complete deletion following agreed procedures after project completion

Data security is as important as analysis quality. The concern that "handing over data feels risky" is entirely valid, and addressing that concern is an analytics partner's first responsibility.


Data Scarcity Is a Misconception

In summary, MMM requires data across three layers:

  • 4P Data: Business fundamentals extractable and reprocessable from ERP
  • Media Data: Digital via API auto-collection, offline via CSV upload, cross-platform integration through taxonomy
  • External Variables: Auto-collected via public APIs, competitor data leveraged through proxy indicators

Most companies already possess 70–80% of this data. What's lacking isn't data itself, but the collection, cleansing, and integration system that connects fragmented data into a unified analysis framework.

MadMatics has built end-to-end data infrastructure including data collection templates, API integration pipelines, and taxonomy design. Rather than worrying "we can't do MMM because we lack data," the first step is discussing together "which data should we organize first."

If you need the full process from data collection to analysis to budget optimization, MadMatics Action MMM is ready to help you get started.