Backward Propagation for Finance People

In finance, the process of retrospectively evaluating the efficacy of investment decisions is referred to as attribution.  It's used to ask, "How'd we do?" and offer a granular, objective answer.  Using the risk/reward framework, widely accepted risk factors are used to explain a position's performance.


As a cartoon example, let's say you own a share of IBM.  For equities, we'll perform attribution using Beta, roughly speaking a measure of correlation to the underlying market.  If IBM has a Beta of 0.75 and the SnP5000 moved by 2% then Beta predicts that IBM would move by 0.02 * 0.75 = 1.5%.  If instead we observed a measure of 1.7% then we may say that 1.7 - 1.5 = 20bps (basis point = 1/100 of 1%) is due to expert skill aka "alpha."  Alpha is often used as a remuneration metric for professional portfolio managers.

There's actually a whole lot more to attribution but hopefully this example communicates the general intuition.  Interestingly, this process is highly similar to the process used to train neural networks, Backward Propagation.

A neural network consists of multiple layers of multiple nodes, or neurons.  It collectively captures some complexity useful for modeling or prediction.  The process of supervised learning starts with some "guess" and a known set of inputs and output.  It uses the power of the computer to iteratively tweak the neural network until it approaches an aspired to range of loss or error.

Suppose we have two independent variables and one dependent, x, y, and z respectively.  We're given that x = 2, y = 3, and z = 11.  We might guess that the model is 4x + y = z -> 4(2) + 3 = 11.  But how can we get a computer to reach this same conclusion on its own?

At risk of oversimplifying the process, we can view each node in a neural network as encoding the independent variables' coefficients.  If we start the computer with a random guess  2x + y = z.  Our first iteration produces 2(2) + 1(3) = 5.  Our error is 5 - 11 = -6.  We'll need to iteratively tweak the coefficients of our independent variables until we approach our expected value of 11. 2x + 2y = z, 4x = y, 4x + 2y = z, etc.  In mathematics this is known as a numerical method to estimate an answer in contrast to an analytical method that directly calculates an answer.

We could do this by randomly perturbing our coefficients until we hit the mark, but this could run indefinitely and relies on chance.  A more efficient way to tweak is to use the derivative.  If you recall from Calculus, we can use the derivative to find the slope of a function.  Said another way, the derivative describes the independent variable's contribution to the change in the dependent variable.  In multivariable calculus the derivative is called the gradient.  We hold all other dependent variables constant to measure the contribution of any one variable to a change.  The properties of the neural network make this process not only tractable but efficient.

By moving the coefficients closer to their actual value one derivative at a time, we approach the expected value using a gradient descent algorithm.  "Descent" comes from the idea of comparing our process to a funnel.  We allow gravity to pull our estimate in the direction of the steepest slope to its targeted resting point at the bottom of the funnel.

Of course, a neural network and backward propagation are far more complex.  However, the intent is to demonstrate a the intuition for comparison to a familiar process.

Now with both covered, we can see how they're similar.  Both attribution and backward propagation are used to ask, "How'd we do?" They do so by using a metric that estimates the impact of an independent variable on a dependent variable.  Attribution ideally allows us to learn from our successes or mistakes to constantly deliver better investment results.  Backward propagation has the same goal, to evaluate a given iteration of an experiment and move it closer to our ideal result.

Comments

Popular posts from this blog

Engineering Truisms

The Telescoping Constructor (Anti-Pattern)

Software Capex: The Cost of Flexibility