Photo Credit: Michael Rose

heading_1 Deriva - runtime generation of algorythmic differentiation byte-code

My colleague Marc Henrard has been writing about Algorithmic Differentiation (AD) for some time now. Any formula, no matter how complex, can be split into a list of simple mathematical operations that a computer understands, and (importantly) have known, analytic derivatives. AD is nothing more than a system for bookkeeping while applying the chain rule. We’ve implemented AD in the OpenGamma Analytics Library manually and seen considerable performance gains as a result.

Soon after Marc introduced me to AD, I started thinking about the possibility of doing AAD automatically. The result of this thought process is Deriva, a Clojure implementation of AD (with a DSL for Java), designed to automate the tedious process of coding AD manually, which gets hard and time consuming with higher-order derivatives. I haven’t been able to find another, satisfactory Java-based solution for multi-dimensional AD.

heading_3 Quick background on AD

According to the Wikipedia definition, Algorithmic Differentiation is

… a set of techniques to numerically evaluate the derivative of a function specified by a computer program. Automatic differentiation is not:

• Symbolic differentiation, or
• Numerical differentiation (the method of finite differences).

These classical methods run into problems: symbolic differentiation leads to inefficient code (unless carefully done)1 and faces the difficulty of converting a computer program into a single expression, while numerical differentiation can introduce round-off errors in the discretization process and cancellation. Both classical methods have problems with calculating higher derivatives, where the complexity and errors increase. Finally, both classical methods are slow at computing the partial derivatives of a function with respect to many inputs, as is needed for gradient-based optimization algorithms. Automatic differentiation solves all of these problems.

heading_3 Meet Deriva

Deriva is a Clojure implementation of Automated Algorithmic Differentiation with a DSL for Java. The client of the library defines symbolic expressions using either regular Lisp s-forms or the Java builder pattern. Beside closed-form expressions, Deriva provides support for non-analytic functions (case functions) and for incorporating regular Java code into its symbolic expression trees.

heading_3 Installation

heading_4 Leiningen

Add the following to your `:dependencies`:

``````deriva "0.1.0-SNAPSHOT"
``````

heading_4 Maven

``````<dependency>
<groupId>deriva</groupId>
<artifactId>deriva</artifactId>
<version>0.1.0-SNAPSHOT</version>
</dependency>
``````

The jar is hosted in a Clojars.org repository, so if you haven’t already added it to your Maven repositories, you can do it by adding the following section to your pom.xml inside the tag:

``````<repository>
<id>clojars</id>
<url>https://clojars.org/repo/</url>
<snapshots>
<enabled>true</enabled>
</snapshots>
<releases>
<enabled>true</enabled>
</releases>
</repository>
``````

Fork Deriva!

heading_3 Some examples

heading_4 Simple example - sine function

To use Deriva java DSL simply add the following lines to your code:

``````import com.lambder.deriva.*;
import static com.lambder.deriva.Deriva.*;
``````

now we can define sine expression, simply as:

``````Expression expr = sin('x');
``````

Such expressions can be used as sub-expressions to build more complex math formulas. We’ll look at more complex example later on. Now let’s see how we can use an Expression. To execute the expression we need to create a function from it:

``````Function fun = expr.function('x');
``````

Here we see that we use the `'x'` symbol (represented here by a single character, but regular Strings will do as well) to define mapping of symbols to the arguments of a function. Placing a symbol in a given place indicates which argument it will map to. It will become more clear, when we see function invocation:

``````double result = fun.execute(Math.PI / 6);
``````

here `execute` takes `double` parameters (in our case one such parameter) and substitutes them into the underlying expression in accordance with the mapping defined when we called `function`. So in this case it replaces all occurrences of  x  with  pi/6 , making our original expression render sin(pi/6).

Now to derivatives. In order to calculate a derivative of a given expression in respect to a given symbol, in Java we do:

``````Expression expr_d_1 = d(expr, 'x');
``````

We can then use the expression, representing first order derivative on sine function  ∂/(∂x) sin(x)  , to obtain its value at point t by:

``````double slope = expr_d_1.function('x').execute(t);
``````

heading_4 More fun - gradients of multivariate functions

The real benefits of algorithmic differentiation come when we work with multivariate functions. The code that calculates a gradient - that is, a vector of partial derivatives in respect to all variables - requires less operations than calculating these partial derivatives separately.

As an example lets take  sin(x^2 y^2) RR^2 => RR  function.

To get its gradient we do:

``````Expression expr = sin(mul(sq('x'), sq('y')));
Function1 fun = d(expr, 'x', 'y').function('x', 'y');  // (1)
double result = fun.execute(1.0, 2.0);
System.out.println(Arrays.toString(result));
``````

which prints:

``````-5.2291489669088955, -2.6145744834544478
``````

The result is n-long array of doubles, which elements are values of corresponding partial derivatives in order defined in `function` (1) call.

heading_4 How does it work?

Deriva doesn’t follow directly any of the classic methods described in the Wikipedia article2, but rather it is a combination of symbolic transformation, operations generation and eventually bytecode generation. The symbolic transformation simply uses differentiation rules to transform one expression into another - for example to transform  ∂/(∂x) sin(x)  into  cos(x) . One can even add such custom rules on an application level at runtime. The second phase, generation of elementary operations, flattens the expression tree (or trees in case of gradient) into one list of variable substitutions. During that operation the index of all sub-expressions is used in order to eliminate using same expressions twice. The final phase uses another set of rules, which define a given mathematical operation will be reflected as a working bytecode. E.g. operation `sin('x')` is defined as `Math.sin(x)`.

The workings of Deriva can be observed with a help of `describe` method:

``````Expression expr = sin(mul(sq('x'), sq('y')));
System.out.println(expr.describe());
``````

which displays:

``````Expression:
(sin (* (sq x) (sq y)))

gets turned into:
(sin (* (sq x) (sq y)))

and into:
final double G__1316 = y;  //  y
final double G__1315 = sq( G__1316 );  //  (sq y)
final double G__1314 = x;  //  x
final double G__1313 = sq( G__1314 );  //  (sq x)
final double G__1312 = G__1313 * G__1315;  //  (mul (sq x) (sq y))
final double G__1311 = sin( G__1312 );  //  (sin (mul (sq x) (sq y)))
``````

Here we see that the code which calculates  sin(x^2 y^2)  requires, as one would expect, three multiplications and one sin function call.

The workings of the gradient generation can be shown as:

``````Expression expr = sin(mul(sq('x'), sq('y')));
VectorD g_expr = vector(expr, d(expr, 'x'), d(expr, 'x'));
System.out.println(g_expr.describe());
``````
``````Expression:
(sin (* (sq x) (sq y)))
(d (sin (* (sq x) (sq y))) x)
(d (sin (* (sq x) (sq y))) x)

gets turned into:
(vector
(sin (* (sq x) (sq y)))
(* (cos (* (sq x) (sq y))) (* (* 2 x) (sq y)))
(* (cos (* (sq x) (sq y))) (* (* 2 x) (sq y))))

and into:
final double G__1343 = y;  //  y
final double G__1342 = sq( G__1343 );  //  (sq y)
final double G__1341 = x;  //  x
final double G__1340 = 2;  //  2
final double G__1339 = G__1340 * G__1341;  //  (* 2 x)
final double G__1338 = G__1339 * G__1342;  //  (* (* 2 x) (sq y))
final double G__1334 = sq( G__1341 );  //  (sq x)
final double G__1333 = G__1334 * G__1342;  //  (mul (sq x) (sq y))
final double G__1332 = cos( G__1333 );  //  (cos (mul (sq x) (sq y)))
final double G__1331 = G__1332 * G__1338;  //  (* (cos (mul (sq x) (sq y))) (* (* 2 x) (sq y)))
final double G__1312 = sin( G__1333 );  //  (sin (mul (sq x) (sq y)))
final double G__1311 =  G__1312, G__1331, G__1331 ;  //  (vector (sin (mul (sq x) (sq y))) (* (cos (mul (sq x) (sq y))) (* (* 2 x) (sq y))) (* (cos (mul (sq x) (sq y))) (* (* 2 x) (sq y))))

``````

Here we see 2 trigonometric function calls and 6 multiplications, as opposed to 3 trigonometric function calls and 17 multiplications which one could expect from calculating:  { sin(x^2 y^2) , 2 x y^2 cos(x^2 y^2), 2 x^2 y cos(x^2 y^2) } .

This effect is more profound as the expressions get more complicated or the number of domain’s dimension grows.

heading_4 How about non-analytic functions?

Some functions although have no analytical definition are still differentiable at whole domain. As an example consider the  f(x)={(e^(-1/x),if x>0),(text{0},if x<=0):}  which looks like:

For that purpose Deriva offers special `when` expression, accompanied by logic operators `and`, `or`, `not`, `gt`, `lt` and `eq`. Let’s see it in action:

``````Function fun = when(
gt('x', 0),   // when x is greather than 0
exp(          // e^(-1/x)
neg(
div(1, 'x'))),
0)            // otherwise 0
.function('x');
``````

heading_4 How this stuff looks like in clojure

The namespace we are using is `com.lambder.deriva.core`

``````(use 'com.lambder.deriva.core)
``````
heading_5 Simple expression:
``````(def f (function (sin x)))
(f 1) ;=> 0.8414709848078965
(def g (function (∂ (sin x) x))
(g 1) ;=> 0.5403023058681398
``````

heading_4 More involved example - Black model3 with sensitivities

Black model as defined on wikipedia

call price :  c = e^(-rT)(FN(d_1)-KN(d_2))

put price:  p = e^(-rT)(FN(-d_2)-KN(-d_1))

where

 d_1 = (ln(F//K)+(sigma^2//2)T)/(sigma*sqrt(T))

 d_2 = (ln(F//K)-(sigma^2//2)T)/(sigma*sqrt(T))

 N(x) = int_-oo^x (1/(2sqrt(pi))e^-(t^2/2) ) dt ~~ 1/(e^(-0.07056 * x^3 -1.5976*x) + 1)

``````import static com.lambder.deriva.Deriva.*;

public class Formulas {

public static Expression black(final boolean isCall) {

// Logistic aproximation of Cumulated Standard Normal Distribution
// 1/( e^(-0.07056 * x^3 - 1.5976*x) + 1)
Expression N =  div(1.0,
exp(
sub(
mul(
-0.07056,
pow('x', 3)),
mul(-1.5976, 'x'))),
1.0));

// ( F/K+T*σ^2/2 ) / σ*sqrt(T)
Expression d1 = div(
div('F', 'K'),
mul(div(sq("sigma"), 2.0), 'T')),
mul("sigma", sqrt('T')));

// ( F/K-T*σ^2/2 ) / σ*sqrt(T)
Expression d2 = div(
sub(
div('F', 'K'),
mul(div(sq("sigma"), 2.0), 'T')),
mul("sigma", sqrt('T')));

// e^(-r*T) * ( F*N(d1)-K*N(d2) )
Expression call = mul(
exp(neg(mul('r', 'T'))),
sub(
mul('F', N.bind('x', d1)),
mul('K', N.bind('x', d2))));

// e^(-r*T) * ( F*N(-d2)-K*N(-d1) )
Expression put = mul(
exp(neg(mul('r', 'T'))),
sub(
mul('F', N.bind('x', neg(d2))),
mul('K', N.bind('x', neg(d1)))));

return isCall ? call : put;
}

// usage
public static void main(String args) {
// lets fix timeToExpiry to 0.523 and get only strike, forward and lognormalVol sensitivities
Expression blackModel = black(true).bind('T', 0.523);
Function1 fun = d(blackModel, 'F', 'K', 'r').function('F', 'K', 'r');
fun.execute(12.3, 14.3, 0.03);
fun.execute(12.3, 11.0, 0.03);
fun.execute(12.3, 11.0, 0.02);
}
}
``````

and the Clojure equivalent:

``````(use 'com.lambder.deriva.core)

(def N
'(/ 1
(+ 1
(exp (-
(* -0.07056 (pow x 3))
(* -1.5976 x)))))

(def d1 '(/ (+ (/ F K) (* T (/ (sq sigma) 2)))))

(def d2 '(/ (- (/ F K) (* T (/ (sq sigma) 2)))))

(def call
`(*
(exp (- (* r T))
(-
(* F ~(bind N x d1))
(* K ~(bind N x d2)))))

(def put
`(*
(exp (- (* r T))
(-
(* F ~(bind N x (- d1)))
(* K ~(bind N x (- d2))))))

(defn black-expression call? call put)

;; usage

(def black-model-with-sensitivities (∂ (bind (black-expression true) T 0.523) F K r))

(black-model-with-sensitivities 12.3 14.3 0.03)
(black-model-with-sensitivities 12.3 11.0 0.03)
(black-model-with-sensitivities 12.3 11.0 0.02)
``````

heading_3 Q&A

1. Isn’t it slow? No. The generated code is actually very fast. As fast as Java originated byte-code can be. There is no effort put in the performance of the code defining and transforming actual expressions, but the resulting code is highly optimised. The intention is to have the definition in kind of initialisation part of the application and rund the generated code many times.

2. What are the applications? Computation of sensitives of various financial models. Backpropagation in machine learning and I believe many more.

heading_3 The roadmap.

Derive is by no means finished product. It is under active development and I already have gathered requests for new functionalities:

1. Support for complex numbers
2. Support for Jacobians
3. Using arbitrary java code in Deriva expressions
1. Deriva indeed does it carefully ;)

2. http://en.wikipedia.org/wiki/Algorithmic_differentiation