Demand Planning and Forecasting

Demand planning translates market signals into an operational volume forecast used to drive supply, inventory, labor, and capacity decisions. A good demand plan is not the most accurate forecast — it’s the forecast that minimizes total system cost (stockouts + excess inventory + expediting + overtime) across all planning horizons.

Forecasting Methods

Method	Best For	Limitation
Simple moving average	Stable, no trend/seasonality	Lags trends; treats all periods equally
Weighted moving average	Stable with slight trend	Manual weight-setting
Exponential smoothing (SES)	Stable demand, self-correcting	No trend or seasonality
Holt’s method	Demand with trend	No seasonality
Holt-Winters (Triple ES)	Trend + seasonality	Requires 2+ years of history
Regression (causal)	Demand driven by external variable (price, weather, GDP)	Requires leading indicator data
ML / AI (LSTM, XGBoost)	Complex patterns, large SKU counts	Black box; requires data maturity

Most planning systems use Holt-Winters or a variant as the statistical baseline for replenishment SKUs, then layer in judgment adjustments.

Forecast Error Metrics

Metric	Formula	Notes
MAPE	Mean \|Actual − Forecast\| / Actual × 100	Intuitive but biased toward low-volume SKUs
WMAPE	Σ\|Actual − Forecast\| / Σ Actual × 100	Preferred — volume-weighted, not distorted by small SKUs
MAD	Mean \|Actual − Forecast\|	Same units as demand; useful for operations
Bias	Mean (Forecast − Actual)	Positive = over-forecast; negative = under-forecast
Tracking signal	Cumulative forecast error / MAD	>±4 signals systematic bias; trigger model review

WERC best-in-class benchmark: WMAPE < 15% at the SKU/week level. Most organizations operate at 20–35% WMAPE; top performers in stable categories reach 8–12%.

ABC/XYZ Segmentation

Segments the SKU portfolio to assign appropriate forecasting methods and safety stock policies:

	A (high volume)	B (medium)	C (low volume)
X (stable)	Statistical model; tight safety stock	Statistical model	Statistical model; moderate buffer
Y (variable)	Statistical + market input	Statistical + judgment	Statistical; wide buffer
Z (erratic/intermittent)	Consensus-heavy; min/max	Judgment-based	Make-to-order or zero stock

C/Z SKUs (low volume, erratic) are the biggest source of forecast error and excess inventory. Policies matter more than forecasting algorithms for this segment.

Consensus Demand Process

The statistical baseline is an input, not the output. The consensus process layers in:

Statistical baseline: System-generated, SKU/week level
Sales input: Promotional events, key account changes, new business pipeline
Marketing input: Trade spend, pricing changes, advertising calendars
Product management: New launches, end-of-life, portfolio changes
Consensus adjustment: Final one-number plan, with assumption log

Assumption logging is critical — if the forecast misses, you need to know which assumption failed.

New Product Forecasting

No history exists for new products. Methods:

Analogue: Map to the lifecycle curve of a similar past product; scale by relative pricing, distribution, and marketing support
Market research: Consumer/customer surveys; pilot/test market data
Management judgment: Expert opinion with structured uncertainty ranges (P10/P50/P90)

New product forecast error rates of 40–70% in the first 6 months are common. Plan for this with flexible supply capacity and cautious initial inventory commitments.

Demand Sensing

Short-horizon demand sensing (1–14 days) supplements the statistical forecast with high-frequency signals:

POS data from retailers (daily sell-through at store level)
Customer order patterns (order frequency, order size trends)
Warehouse shipment actuals vs. plan

Demand sensing tools (Blue Yonder, o9, ToolsGroup) improve short-horizon MAPE by 30–50% vs. unconstrained statistical models, directly reducing safety stock requirements at the DC level.

CPFR

Collaborative Planning, Forecasting and Replenishment — a retailer-supplier process where:

Retailer shares POS data and promotional calendars
Supplier shares production constraints and lead times
Joint forecast is agreed and exception-managed

CPFR is most effective with top-5 retail accounts. Standard protocol defined by GS1/VICS. Requires EDI or portal connectivity and sustained account management attention.

Common Biases

Bias	Manifestation	Mitigation
Optimism bias	Sales always over-forecasts new products	Separate demand forecast from sales quota
Sandbagging	Sales under-forecasts to manage expectations	Accountability to forecast accuracy metric
Anchoring	Forecast snaps to last year’s number	Statistical override with documented exceptions
Hockey stick	Back-loads volume to end of period	Period-level shape analysis

Intermittent Demand Methods (Course 4.5 Depth)

Standard exponential smoothing and Holt-Winters fail for intermittent demand — items that have frequent zero-demand periods interspersed with irregular positive demands (spare parts, industrial MRO, slow-moving B2B SKUs). Three dedicated methods:

Syntetos-Boylan Demand Classification Matrix

Before choosing a forecasting method, classify the demand pattern using two dimensions:

ADI (Average Demand Interval): average number of periods between non-zero demand occurrences. ADI > 1.32 = intermittent.
CV² (squared coefficient of variation of non-zero demand sizes): measures how variable the demand quantity is when it does occur. CV² > 0.49 = erratic.

	Low CV² (≤0.49)	High CV² (>0.49)
Low ADI (≤1.32)	Smooth — use SES/Holt-Winters	Erratic — use Holt with wide buffer
High ADI (>1.32)	Intermittent — use Croston	Lumpy — use TSB

Croston’s Method (1972)

Developed by J.D. Croston for spare parts forecasting. Croston decomposes intermittent demand into two separate exponential smoothing processes:

D̄ (average demand size when demand occurs): updated only in periods with non-zero demand
T̄ (average inter-demand interval): updated only in periods with non-zero demand

Forecast = D̄ / T̄ (average demand per period)

Croston is 15–30% more accurate than exponential smoothing for intermittent demand items. The weakness: Croston’s estimator has a positive bias — it systematically overestimates demand.

SBA: Syntetos-Boylan Approximation

SBA corrects Croston’s positive bias with a simple adjustment:

F_SBA = (1 - α/2) × D̄/T̄

Where α is the smoothing parameter. The (1 - α/2) term deflates the Croston estimate, removing the systematic upward bias. SBA is the preferred method for intermittent demand in most planning software implementations because it is unbiased and computationally trivial.

TSB: Teunter-Syntetos-Babai (2011)

TSB takes a different approach: instead of smoothing demand size and demand interval separately, TSB smooths the probability of demand occurring each period plus the expected demand size. This makes TSB adaptive to demand pattern changes — particularly useful for spare parts late in a product’s lifecycle when demand is declining toward obsolescence. When demand probability falls below a threshold, TSB naturally drives the forecast to zero, enabling proactive write-off decisions rather than waiting for negative inventory cycles.

Best use case: slow-moving spare parts with lifecycle dynamics; any item where you expect demand to eventually reach zero and need the forecast to adapt proactively.

Hierarchical Forecasting and MinT Reconciliation

Most planning systems forecast at multiple levels simultaneously: total company → business unit → product family → SKU → SKU/location. The forecasts generated at different levels are often inconsistent — SKU-level forecasts don’t sum to the family-level forecast, which doesn’t sum to the total.

Reconciliation methods:

Top-down: Distribute the aggregate forecast proportionally to lower levels using historical shares. Simple but propagates aggregate errors.
Bottom-up: Sum SKU-level forecasts upward. Captures SKU-level variation but can miss macro patterns.
Middle-out: Forecast at product-family level; distribute down and aggregate up. Balances accuracy levels.
MinT (Minimum Trace reconciliation): Simultaneously reconciles all levels using optimal linear combinations that minimize the trace of the forecast error covariance matrix. Mathematically derives the best combination of forecasts across all levels.

MinT results from academic literature: MAPE reduction of 5.5–30.2% across test cases, averaging 14.9% improvement over non-reconciled forecasts. The improvement is largest when lower-level forecasts are noisy and aggregate-level forecasts are more stable. Commercial implementations: o9, SAP IBP (hierarchical reconciliation module).

M5 Forecasting Competition Findings

The M5 Competition (2020, Makridakis et al.) remains the most rigorous public benchmark of forecasting methods, using 42,840 time series from Walmart’s retail data.

Key findings:

LightGBM and gradient boosting ensemble methods dominated — winning solutions used gradient boosted trees with rich feature engineering over statistical methods
ML beats statistical when: high-dimensional feature space (price elasticity, promotions, calendar events), large portfolio (thousands of SKUs where cross-SKU patterns exist), and sufficient data volume (2+ years of weekly history)
Statistical methods remain competitive for: single-item forecasting with limited history, interpretability requirements, environments without data science capability

Practical implication: for a company with 10,000+ SKUs, retail POS data, and a planning team that includes data scientists, ML forecasting is worth the investment. For a 500-SKU B2B distributor without data science capability, Holt-Winters with Croston for slow movers is the pragmatic choice.

Forecast Value Add (FVA) Analysis

FVA (developed by Michael Gilliland at SAS) measures whether each step in the consensus forecasting process improves or degrades accuracy relative to the previous step.

FVA calculation: Compare MAPE of the adjusted forecast vs. MAPE of the unadjusted baseline.

Positive FVA: the adjustment improved accuracy — it added value
Negative FVA: the adjustment made the forecast worse — it destroyed value
Zero FVA: the adjustment made no difference

The benchmark for any manual step: does it beat a naïve forecast (last period’s actual)? If your consensus process can’t beat naïve forecasting, you have a serious process problem.

What FVA typically reveals: analyst overrides frequently produce negative FVA — the human judgment made the forecast less accurate than the statistical baseline. This is not a criticism of analysts; it reflects anchoring bias, optimism bias, and the hockey-stick phenomenon. FVA analysis makes this visible for the first time in most organizations that run it.

FVA should be measured at every step: statistical baseline → sales input → marketing input → management override → final consensus. Steps with consistently negative FVA should be eliminated or restructured.

Demand Sensing: The Short-Horizon Layer

Demand sensing supplements the statistical forecast in the 0–8 week window using high-frequency signals unavailable to monthly forecasting processes:

POS (Point of Sale) data: daily sell-through by store and SKU — actual consumer demand, not retailer orders
EDI order patterns: customer order frequency and size trends in the current week
Social signals: search trend data, social media volume as leading indicators for specific categories
Weather data: for weather-sensitive categories (beverages, ice cream, seasonal apparel)

Demand sensing improvement: 5–20% accuracy improvement over conventional methods in the 0–8 week window. The improvement is larger when conventional forecasting relies on orders (which lag POS by 1–3 weeks) rather than POS data directly.

Commercial implementations: Blue Yonder Luminate Demand Sensing, o9 Demand Sensing, ToolsGroup SO99+.

New Product Forecasting: The Analogue Method in Practice

The analogue method maps a new product to the lifecycle curve of a comparable past product. The key is defining “comparable” rigorously:

Select analogues based on: category similarity, price tier, channel overlap, marketing support level
Scale the analogue curve by: relative distribution breadth, relative advertising spend, relative price premium/discount
Define scenario range: P10 (pessimistic), P50 (base), P90 (optimistic) from the spread of comparable analogues
Plan supply flexibility against P50 but communicate the P10/P90 range to procurement so capacity decisions account for the full range

New product forecast error of 40–70% in the first 6 months is normal. The goal is not accuracy — it’s setting supply flexibility correctly and not over-committing inventory to a point forecast that has a ±50% uncertainty band.

Standard content

Continue reading with Standard

This article is part of our Standard library — written from real projects, not generic explainers.

Full Standard tier vault — automation, intralogistics, supply chain, more
Practitioner-level guidance from real projects
Unlimited AI questions across the Standard corpus

Start 3-day free trial — $19/mo See all plans

$19/mo Standard · $25/mo Pro · cancel anytime

Already subscribed? Sign in