LSTM Neural Networks

Long Short-Term Memory recurrent neural networks that learn temporal dependencies in price sequences for multi-step return forecasting.

Overview

Long Short-Term Memory (LSTM) networks are a type of recurrent neural network (RNN) designed to learn long-range dependencies in sequential data. Developed by Hochreiter & Schmidhuber in 1997, LSTMs use gating mechanisms — input, forget, and output gates — to selectively remember and forget information across many time steps. In trading, LSTMs can learn complex temporal patterns across price, volume, and indicator sequences that static ML models cannot capture.

How it looks on a chart

Illustration only — synthetic data generated for visual reference.

Beginner

While most indicators look at a fixed window of the past, LSTM neural networks can learn from much longer sequences and remember which parts of history are relevant. They are inspired by how human memory works — we remember some things for a long time and quickly forget others. An LSTM trained on price sequences learns relationships like "whenever price has had three down weeks followed by a volume spike and then a small recovery week, it has often started a new uptrend." These complex, conditional, multi-step patterns are exactly what traditional indicators cannot capture. However, LSTMs are far more complex to train and require much more data than simpler models. They also need careful validation to ensure they are genuinely learning market patterns and not just memorizing historical noise.

Intermediate

An LSTM cell maintains two state vectors: the cell state (long-term memory) and the hidden state (short-term memory). Three gates control information flow: the forget gate (σ(Wf·[hₜ₋₁,xₜ]+bf)) determines what to discard from cell state; the input gate controls what new information to add; the output gate determines the next hidden state. For trading, typical architecture: 2–3 LSTM layers with 50–128 units each, followed by dense layers for output. Input features are sequences of 20–60 bars of [returns, RSI, MACD, volume ratio, ATR]. The output is a return forecast or probability of directional move over the next 1–5 bars. Critical implementation details: (1) Normalize inputs to [0,1] or z-score using only data available at each training point; (2) Use Dropout (0.2–0.4) between layers to regularize; (3) Train with early stopping on a time-split validation set; (4) Retrain monthly to adapt to regime changes. Walk-forward validation across 5+ years is essential.

Advanced

LSTMs face specific challenges in financial time series that differ from NLP (their primary domain): financial data is far shorter (years, not billions of tokens), more non-stationary, and has much lower signal-to-noise ratio. The network capacity needed for language modeling greatly exceeds what is justified by financial data, leading to overfitting despite regularization. Transformers (the architecture behind GPT) have begun replacing LSTMs even in time series forecasting. Temporal Fusion Transformer (Lim et al. 2021) achieves state-of-the-art on multi-variate time series benchmarks. However, for typical trading backtesting workflows, the added complexity of transformers rarely justifies the marginal improvement over well-regularized LSTMs. In academic literature, the honest assessment is that deep learning models for return prediction achieve modestly better out-of-sample R² than linear models — on the order of 0.1–0.5% higher for daily returns (Gu, Kelly, Xiu 2020). This is statistically significant given large samples but modest in absolute terms. The real value of LSTM in systematic trading is often in volatility and regime prediction rather than return point forecasting.

Formula

Forget gate: fₜ = σ(Wf·[hₜ₋₁, xₜ] + bf)
Input gate: iₜ = σ(Wi·[hₜ₋₁, xₜ] + bi)
Cell state: Cₜ = fₜ⊙Cₜ₋₁ + iₜ⊙tanh(Wc·[hₜ₋₁,xₜ]+bc)
Output: hₜ = oₜ⊙tanh(Cₜ)

1.Prepare input sequences of length T (e.g., 30 bars) with normalized feature vectors per bar.
2.Design LSTM architecture: layers, units, dropout rate, and output head (regression or classification).
3.Train using Adam optimizer with early stopping on a held-out temporal validation set.
4.Generate rolling predictions by feeding the last T bars through the trained network at each new bar.
5.Retrain the model monthly using an expanding or sliding window of training data to adapt to new regimes.

Parameters

Parameter	Default	Range	Description
Sequence Length	30	10–100	Number of bars in each input sequence to the LSTM.
LSTM Units	64	16–256	Number of LSTM units per layer.
Forecast Horizon	5	1–20	Number of bars ahead to forecast.

Trading signals

bullish: LSTM predicted return > 1 standard deviation above average

Network predicts significantly positive return — high-confidence long signal.

bearish: LSTM predicted return < 1 standard deviation below average

Network predicts significantly negative return — high-confidence short signal.

bullish: LSTM and Random Forest signals agree

Multi-model consensus — highest-confidence ensemble signal, increase position size.

Limitations

•Requires large datasets and significant computation — impractical for assets with limited history.
•Prone to overfitting on financial data's low signal-to-noise ratio; validation is critical but insufficient alone.
•Model behavior is opaque — difficult to explain why a specific prediction was made.
•Performance degrades rapidly during market regime changes not present in training data.

How Gilito AI uses LSTM

Gilito trains LSTM models on each asset class using rolling 2-year windows, with the network architecture automatically tuned via Bayesian hyperparameter optimization. LSTM outputs are one of three model inputs (alongside Random Forest and logistic regression) in Gilito's meta-ensemble layer that produces the final strategy selection signal.

Related indicators

Random Forest

Logistic Regression

ARIMA