Summary of Deep Hedging—A landmark paper on the application of reinforcement learning in options hedging

Taichi Kato


In Deep Hedging, H Bühler et al. explores a novel method for hedging large portfolios of derivatives with the presence of market frictions using reinforcement techniques. By using reinforcement learning as the optimization method for finding hedging policies, the authors succeed in outperforming pre-existing models based on the assumptions of complete markets.

Investment banks and their derivatives desks are responsible for creating financial derivatives which are hedged appropriately, priced with a premium, and sold over the counter (not on a public market). Since these derivatives do not have observable prices, the pricing will have to take into account the cost of hedging it. By under-hedging, the seller takes on greater risk than they are being paid for. Through over-hedging, the bank is vulnerable to losing its customers to competitors. Hence, an appropriate hedging strategy should have a minimum hedging cost (which includes transaction costs and losses from trades), while ensuring its risks are sufficiently hedged.

There has been significant research done in the field of hedging in complete markets, exemplified by the Black-Scholes, Heston models and their applications, but these hedging decisions are often augmented by trading desks through their intuitions, as the complete market assumptions don’t hold in practical applications. A more data-driven method is desirable, going beyond the traditional statistical analysis approach to quantitative trading. However, they often do not incorporate financial theory into these models enough, resulting in overfitting and poor risk management.

One big advantage of the reinforcement learning approach is in its speed and scalability to large portfolios. Computation of Greeks using classical models is not scalable, and are very limited by the amount of compute that is available. The use of reinforcement learning models for numerical optimization allows for more immediate rebalancing of the hedging instruments, and a higher accuracy in quoting the derivatives price.

Another understated advantage of this approach is its generalizability and model-independent nature. The method is data driven and independent of human input of models. Although, it is likely that the Greeks would still be used in the industry for calculating market risk limits.

The paper fails to highlight the sparsity of financial datasets. The authors use traditional models (namely the Heston model, with transaction costs) for market simulation. As finance data tends to be sparse, market simulation is used instead of real data. This is not a problem given that this approach performs better than traditional option pricing methods. It is also foreseeable that use of market data may become more feasible as we improve the data efficiency of FCNs through data augmentation, using pre-trained models, and other techniques. However, we should still be aware of this limitation, as the future improvements in these approaches will come primarily from improving the quality of input data; this is opposed to improvements in the neural net architectures. It is discussed by the author that reinforcement techniques are sensitive to the quality of data, and in practical applications of the model, data becomes more important.

Finally, another challenge that could be addressed, and would be of interest is in the problem of interpretability with deep hedging. Though the features that are being used in these models are selected manually, it could be difficult to figure out what drives some hedging strategy, without simulating multiple scenarios. The model will take a long time training, given that there will be a wide range of hedging asset classes.

In this landmark paper, the authors propose a novel method for option pricing and optimization of its hedging strategies within an incomplete market. This is a large shift in the paradigm of option pricing and hedging. Practical application of these models will shift from more vanilla to exotic options, as we improve the data and engineer the input features. The trend in finance is to perhaps move beyond greeks and other mathematical models, to directly using empirical data and working on option pricing and hedging as an optimization problem. Reframing finance problems as convex optimization problems that can be solved through reinforcement learning can be beneficial in a number of fields. Some interesting future research in this field includes optimizing bid-ask pricing in market making, investment portfolio construction given risk constraints and arbitraging in cryptocurrency markets.