Advanced Usage: Grid Search Optimization ========================================= This guide walks through a realistic end-to-end workflow: optimizing an SMA crossover strategy by grid-searching over indicator periods and exit parameters, using ``mktlib.backtest``, ``mktlib.metrics``, and `polars-talib `_. Prerequisites: .. code-block:: bash pip install mktlib[data] polars-talib The full runnable script is at ``scripts/grid_search_sma.py`` in the repository. Generate Synthetic Data ----------------------- We generate 5 years of 1-minute OHLCV bars (~491k rows) by running ``mktlib.data.geometric_brownian_motion`` at second-level resolution, then aggregating to 1-minute bars with ``ticks_to_ohlcv``. The ``dt`` parameter controls the time step in annualised units. The default ``1/252`` gives one trading day per step. For 1-minute bars, set ``dt`` so each tick represents one second of trading time: .. code-block:: python from mktlib.data import geometric_brownian_motion, ticks_to_ohlcv # 1 tick = 1 second of US equity trading time # 252 trading days × 6.5 hours × 3600 seconds dt_1s = 1 / (252 * 6.5 * 3600) n_days = 5 * 252 # 5 years of trading days ticks_per_day = 390 * 60 # 390 minutes × 60 seconds gbm = geometric_brownian_motion( n=n_days * ticks_per_day, base_price=100.0, drift=0.08, volatility=1.0, dt=dt_1s, seed=42, ) ohlcv = ticks_to_ohlcv(gbm, bar_size=60, seed=43) # 60 ticks → 1-minute bar Pass ``drift`` and ``volatility`` as annualised values — the function scales them internally via ``dt``. .. note:: **Understanding GBM drift and volatility** The GBM log-return per step is ``(μ − ½σ²)·dt + σ·√dt·Z``. Two quantities matter for intuition: - **Expected price** after 1 year: ``S₀ · exp(μ)`` — drift alone determines the *mean* across many simulated paths. - **Median price** after 1 year: ``S₀ · exp(μ − ½σ²)`` — the Itô correction ``−½σ²`` penalises high volatility. Because ``exp()`` is convex, a few explosive upward paths pull the mean up while the *majority* of paths drift downward. With ``drift=0.08, volatility=1.0`` the median path declines ~34%/yr even though the average across paths grows at 8%. Quick reference for common parameter regimes (``drift=0.08``): .. list-table:: :header-rows: 1 * - Regime - volatility - Median annual return - Per-bar σ (1 min) * - Calm equity (SPY-like) - 0.20 - ~+6% - ~0.05% * - Volatile equity - 0.40 - ~0% - ~0.10% * - Stress-test / noisy - 1.00 - ~−34% - ~0.25% **Why σ = 1.0 here:** the grid search intentionally uses high volatility to create noisy, challenging price action that separates good indicator parameters from bad ones. For realistic equity simulation, use 0.15–0.30. ``ticks_to_ohlcv`` takes any DataFrame with a numeric column (the output of any generator) and aggregates every ``bar_size`` steps into one OHLCV bar. Pass ``column="value"`` for Ornstein-Uhlenbeck output; the default is ``"price"`` (GBM / fractional random walk). Open/close are the first/last price in each bar, high/low span all intermediate prices, and volume is synthetic lognormal (disable with ``volume=False``). Bars that would be incomplete at the tail are dropped. The result has columns ``bar, open, high, low, close, volume``. Timestamp assignment is left to the caller — ``scripts/grid_search_sma.py`` shows how to add business-day minute timestamps on top. See :doc:`api/data` for the full data generation API. Define a Parameterized Strategy ------------------------------- The strategy is a frozen dataclass with ``fast_period`` and ``slow_period`` parameters. The optional ``init`` hook lets the strategy add its own indicator columns to the DataFrame before signal evaluation — making it self-contained: .. code-block:: python import polars as pl import polars_talib as plta from dataclasses import dataclass from mktlib.backtest import Crossover, Crossunder @dataclass(frozen=True, slots=True) class SmaCross: fast_period: int = 20 slow_period: int = 50 def init(self, df: pl.DataFrame) -> pl.DataFrame: return df.with_columns( plta.sma(pl.col("close"), timeperiod=self.fast_period).alias("fast_sma"), plta.sma(pl.col("close"), timeperiod=self.slow_period).alias("slow_sma"), ) def entry(self) -> Crossover: return Crossover("fast_sma", "slow_sma") def exit(self) -> Crossunder: return Crossunder("fast_sma", "slow_sma") With ``init``, the caller just passes raw OHLCV data — no external indicator step: .. code-block:: python result = run(df, SmaCross(fast_period=10, slow_period=50)) Fetch the Risk-Free Rate ~~~~~~~~~~~~~~~~~~~~~~~~ To compute a meaningful Sharpe ratio we need the risk-free rate for the period. ``get_risk_free_rate`` returns the average 3-month T-bill yield (annualized decimal) over the given date range: .. code-block:: python from mktlib.rates import get_risk_free_rate rf = get_risk_free_rate(df["date"].min(), df["date"].max()) Grid Search over SMA Periods ----------------------------- Search over fast periods (5--50, step 5) and slow periods (20--200, step 10), skipping invalid combos where fast >= slow. For each combo, run the backtest and score by Sharpe ratio: .. code-block:: python import itertools from mktlib.backtest import run from mktlib.metrics import sharpe, cumulative_return MINUTES_PER_YEAR = 252 * 390 # ppy for minute-bar returns fast_range = range(5, 55, 5) slow_range = range(20, 210, 10) results = [] for fast, slow in itertools.product(fast_range, slow_range): if fast >= slow: continue strategy = SmaCross(fast_period=fast, slow_period=slow) result = run(df, strategy) ret = result.returns["return"] results.append({ "fast_period": fast, "slow_period": slow, "sharpe": round(sharpe(ret, ppy=MINUTES_PER_YEAR, rf=rf), 4), "cumulative_return": round(cumulative_return(ret), 4), "n_trades": len(result.trades), }) sma_results = pl.DataFrame(results).sort("sharpe", descending=True) print(sma_results.head(5)) Extract the best parameters for the next stage: .. code-block:: python best = sma_results.row(0, named=True) best_fast = int(best["fast_period"]) best_slow = int(best["slow_period"]) Add Take-Profit / Stop-Loss Optimization ----------------------------------------- Extend the strategy with percentage-based exits using ``Pct``, ``ValueGT``, and ``ValueLT``. The TP triggers when price rises a given percentage above the slow SMA; the SL triggers when price falls below: .. code-block:: python from mktlib.backtest import Condition, Pct, ValueGT, ValueLT @dataclass(frozen=True, slots=True) class SmaCrossWithExits: fast_period: int = 20 slow_period: int = 50 tp_pct: float = 5.0 sl_pct: float = 3.0 def init(self, df: pl.DataFrame) -> pl.DataFrame: return df.with_columns( plta.sma(pl.col("close"), timeperiod=self.fast_period).alias("fast_sma"), plta.sma(pl.col("close"), timeperiod=self.slow_period).alias("slow_sma"), ) def entry(self) -> Crossover: return Crossover("fast_sma", "slow_sma") def exit(self) -> Condition: tp = ValueGT("close", Pct("slow_sma", self.tp_pct)) sl = ValueLT("close", Pct("slow_sma", -self.sl_pct)) return Crossunder("fast_sma", "slow_sma") | tp | sl ``Pct("slow_sma", 5)`` resolves to ``slow_sma * 1.05`` — 5% above. ``Pct("slow_sma", -3)`` resolves to ``slow_sma * 0.97`` — 3% below. Conditions compose with ``|`` (any) and ``&`` (all). .. note:: **TP/SL relative to entry price vs. a moving indicator** The example above uses ``Pct("slow_sma", 5)`` — the threshold moves with the SMA on every bar. If you want TP/SL anchored to the **entry bar's price** (e.g., "take profit at 5% above the close when I entered"), use ``EntryRef``: .. code-block:: python from mktlib.backtest import EntryRef def exit(self) -> Condition: tp = ValueGT("close", Pct(EntryRef("close"), self.tp_pct)) sl = ValueLT("close", Pct(EntryRef("close"), -self.sl_pct)) return Crossunder("fast_sma", "slow_sma") | tp | sl ``EntryRef("close")`` captures the close at the entry signal bar and forward-fills it. The engine creates the snapshot column automatically — no manual ``init()`` work needed. See :doc:`api/backtest` for details. Now grid-search TP/SL percentages with the best SMA periods fixed: .. code-block:: python tp_range = [i / 10 for i in range(1, 11)] # 0.1% to 1.0%, step 0.1% sl_range = [i / 10 for i in range(1, 11)] results = [] for tp_pct, sl_pct in itertools.product(tp_range, sl_range): strategy = SmaCrossWithExits( fast_period=best_fast, slow_period=best_slow, tp_pct=tp_pct, sl_pct=sl_pct, ) result = run(df, strategy) ret = result.returns["return"] results.append({ "tp_pct": tp_pct, "sl_pct": sl_pct, "sharpe": round(sharpe(ret, ppy=MINUTES_PER_YEAR, rf=rf), 4), "cumulative_return": round(cumulative_return(ret), 4), "n_trades": len(result.trades), }) tp_sl_results = pl.DataFrame(results).sort("sharpe", descending=True) print(tp_sl_results.head(5)) Analyze Results --------------- The two-stage approach keeps the search space manageable: ~160 combos for SMA periods, then ~361 combos for TP/SL — instead of ~58,000 for a single combined grid. A few things to keep in mind: - **Overfitting risk**: optimizing on the same data you evaluate on will overestimate real performance. Split your data into in-sample (for optimization) and out-of-sample (for validation). - **Transaction costs**: the backtest engine uses fill-at-next-open semantics but does not model commissions or slippage. Strategies with many trades may look better than they are. - **Metric choice**: Sharpe rewards consistency. Consider ``sortino`` (downside risk only) or ``omega`` as alternatives. See :doc:`api/metrics` for the full list. For generating a complete tearsheet of the winning strategy, see ``mktlib.reports.html()`` in :doc:`api/reports`. Multi-Symbol Grid Search ------------------------ When optimizing a strategy across multiple symbols, use ``instrument_col`` to run all symbols in a single backtest call. Per-symbol returns let you evaluate each ticker independently or aggregate into a portfolio: .. code-block:: python import itertools import polars as pl from mktlib.backtest import run from mktlib.metrics import sharpe symbols_df = ... # DataFrame with columns: symbol, date, open, close fast_range = range(5, 55, 5) slow_range = range(20, 210, 10) results = [] for fast, slow in itertools.product(fast_range, slow_range): if fast >= slow: continue strategy = SmaCross(fast_period=fast, slow_period=slow) result = run(symbols_df, strategy, instrument_col="symbol") # Per-symbol Sharpe — O(1) access via result[symbol] for sym in result.symbols: sym_ret = result[sym].returns["return"] results.append({ "symbol": sym, "fast_period": fast, "slow_period": slow, "sharpe": round(sharpe(sym_ret), 4), }) # Or equal-weight portfolio Sharpe portfolio = result.returns.group_by("date").agg( pl.col("return").mean() )["return"] results.append({ "symbol": "PORTFOLIO", "fast_period": fast, "slow_period": slow, "sharpe": round(sharpe(portfolio), 4), }) grid = pl.DataFrame(results).sort("sharpe", descending=True) This avoids the outer loop over symbols that single-symbol backtesting would require, while keeping each symbol's indicator computation isolated. See Also -------- - :doc:`quickstart` — basic API usage - :doc:`api/backtest` — full backtest API reference - :doc:`api/metrics` — all available financial metrics - :doc:`api/data` — synthetic data generators