Applications of Malliavin calculus to the pricing and hedging of Bermudan options
James Newbury Mathematical Institute ...

Author:
James Newbury

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Applications of Malliavin calculus to the pricing and hedging of Bermudan options

James Newbury Mathematical Institute University of Oxford

A dissertation submitted for the degree of: Master of Science in Mathematical and Computational Finance Trinity 2011

Abstract The pricing of Bermudan options, which give the holder the right to buy or sell an underlying asset at a predetermined price and at a discretely spaced number of times prior to maturity, can be based on a deterministic method or on a probabilistic one. Deterministic methods such as finite differences lose their efficiency as the dimension of the problem increases, and they are therefore known to suffer from the ”curse of dimensionality”. Probabilistic methods enable us to overcome this problem by using Monte Carlo simulations. One particular method is the Malliavin pricing and hedging algorithm, which uses representation formulas for conditional expectation and its derivative to approximate the price and delta of a Bermudan option. This paper specifically deals with how the powerful tools of Malliavin calculus are applied in the derivation of such representation formulas, and looks at how the latter are subsequently used in the pricing and hedging algorithm. Key words: Bermudan option, dynamic programming principle, Malliavin derivative operator, Skorohod integral, first and second variational processes, representation formula, localizing function.

1

Acknowledgements I would like to thank my supervisor, Dr Lajos Gergely Gyurk´o, for his guidance as well as for all the helpful remarks he has given me throughout the duration of the dissertation.

2

Contents

Introduction

3

1 Malliavin Calculus 1.1 Context and general framework . . . . . . . . . . . . . . . . . . . 1.2 The Malliavin derivative operator . . . . . . . . . . . . . . . . . . 1.3 The Skorohod integral and other important results . . . . . . . .

8 8 9 10

2 Representation formulas for derivative 2.1 General framework . . . . . 2.2 Application to SDEs . . . . 2.3 The notion of localization .

13 13 15 22

conditional expectation and its . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3 Applications to the pricing and hedging of Bermudan 3.1 Overview of the problem and alternative probabilistic methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 The Malliavin pricing and hedging algorithm . . . . . . 3.2.1 Applications of the representation formulas . . . 3.2.2 Sketch of the algorithm . . . . . . . . . . . . . . 3.3 Numerical applications . . . . . . . . . . . . . . . . . . .

options pricing . . . . . . . . . . . . . . . . . . . . . . . . .

28 28 31 31 34 37

Concluding considerations

41

Bibliography

44

3

Introduction Bermudan options, which can be considered as intermediate contracts between American and European options, give the holder the right to buy or sell an underlying asset at a predetermined price and at a discretely spaced number of times prior to a given maturity. Intuitively speaking, the price of an American option is always greater or equal than the price of its Bermudan counterpart, since it gives its holder the right to exercise his right at any given time prior to maturity. In the same way, the price of a Bermudan option is always greater or equal than the price of a European option with the same characteristics, given that the latter can only be exercised at maturity. A direct consequence of this feature is that the price of a European option can explicitly be determined by computing the discounted expectation of the terminal payoff. However, this specific procedure cannot be extended to the case of Bermudan options, since the idea here is to determine the optimal exercise date. In the case of a nondividend paying Bermudan call, it can be shown that the optimal exercise date is in effect at maturity. Apart from this very isolated case, the determination of the optimal exercise date is far from being a trivial affair. We now look at how the problem can be formalised. Let T be a positive constant representing the maturity of the option and let W = (Wt )0≤t≤T be a 1-dimensional Wiener process on a filtered probability space (Ω, F, (Ft )0≤t≤T , Q), where (Ft )0≤t≤T represents the augmented filtration generated by W , and where Q is the risk-neutral probability. We make the assumption of a complete market, i.e all contingent claims are Q-attainable, which guarantees the uniqueness of Q, and enables us to determine the unique price of European contingent claims as the discounted expectation of their payoff at maturity. We denote by E(.) the risk-neutral expectation. We then proceed to the discretization of the interval [0, T ] by introducing the T and by taking into consideration the discrete times tk = kδt time step δt = M for all k ∈ {0, ..., M }. A Bermudan option with maturity T on an underlying asset with price X is defined as a contract which gives the holder the right to buy or sell the underlying at a strike price K and at any exercise date within the set {t1 , ..., tM }. We denote by Φ : R+ → R the payoff of the option, i.e Φ(Xkδt ) represents the amount received by the holder if he decides to exercise his right at time tk , and we assume that the underlying asset’s price process 4

X = (Xt )0≤t≤T satisfies the following SDE on [0, T ]: dXt = b(Xt )dt + σ(Xt )dWt , X0 = x

(1)

where b and σ are two sufficiently smooth functions with bounded derivatives. We also assume that the following integrability condition is satisfied by the discounted payoff at time 0: ! E

sup |e−rt Φ(Xt )|

< +∞

t∈[0,T ]

It can be shown that the price at time t ∈ [0, T ] of such an option is given by: V (t, x) = sup Et,x e−r(τ −t) Φ(Xτ ) τ ∈Tt,T

where Et,x represents the risk-neutral expectation given that the price process starts at x at time t, where Tt,T denotes the set of all admissible stopping times in [t, T ], and where r denotes the constant risk-free interest rate. We are now interested in determining two fundamental quantities which characterize the option. On the one hand, the premium of the option, defined as its price V (0, x) at time 0, is the quantity paid upfront by the holder of the option to its seller. On the other hand, the delta of the option at time t, denoted by ∆(t, x), corresponds to the derivative of the price of the option at time t with respect to the initial value of the underlying asset: ∆(t, x) = ∂x V (t, x) = ∂x sup Et,x e−r(τ −t) Φ(Xτ ) τ ∈Tt,T

The above expressions for the price and delta of a Bermudan option precisely show that we are in fact confronted with an optimal stopping problem. We therefore have to consider the Bellman dynamic programming principle, starting with the following backward recursive system, valid for all k ∈ {M − 1, ..., 0}: ( Vˆ (tM , XM δt ) = Φ(X M δt ) (2) Vˆ (tk , Xkδt ) = max Φ(Xkδt ), e−rδt E Vˆ (tk+1 , X(k+1)δt )|Xkδt where Vˆ (tk , Xkδt ) corresponds to the approximation of the option’s price at time tk . This system shows that the value of the option at maturity is simply equal to the payoff at maturity, given the fact that the holder cannot exercise the option at a later date. It also states that the value of the option at time tk is equal to the maximum of the intrinsic value at time tk and the continuation value, i.e the discounted conditional expectation of the value of the option at time tk+1 . Intuitively speaking, the continuation value can be considered as the value of keeping the option alive rather than exercising it at a given discrete time. The price of the option V (0, x) is then approximated by Vˆ (0, x).

5

As for the delta of the option, we heuristically take into account the derivative of the price at time t1 with respect to the price of the underlying. More ˆ precisely, the approximation ∆(0, x) of the delta at time 0 is given by: ˆ 1 , Xδt ) = ∂α Φ(α)1C c + e−rδt ∂α E Vˆ (t2 , X2δt )|Xδt = α 1C ∆(t (3) ˆ ˆ 1 , Xδt ) ∆(0, x) = E ∆(t where the continuation region C is defined by C = {α ∈ R+ : Vˆ (t1 , α) > Φ(α)}. Generally speaking, the existing methods used for pricing Bermudan options can either be deterministic or probabilistic. As far as the deterministic methods are concerned, a simple application of Itˆo’s lemma to the price V A (t, x) of the associated American option leads to the following PDE: ∂V A A A + LV − rV 1{V A (t,x)>Φ(x)} = 0 ∂t with the terminal condition V A (T, x) = Φ(x) and where L denotes the infinitesimal generator of the process X defined in the following way: L.(x) = b(x)

1 ∂2. ∂. (x) + σ 2 (x) 2 (x) ∂x 2 ∂x

A weak formulation of the previous problem with variational inequalities (in Bensoussan and Lions [2] for example) leads to a linear complementarity problem which can be solved by standard finite difference methods (for example, see Duffy [5]). This method subsequently gives us the price of the associated Bermudan option, and an approximation of its American counterpart. Despite its relatively high efficiency for low-dimensional problems, the finite difference approach suffers from the so-called ”curse of dimensionality”. There seems to be a consensus based on the idea that in three or more dimensions, the computing speed of the finite difference approach is uncompetitive in comparison with probabilistic techniques. The problem of pricing Bermudan options using probabilistic methods is based on the use of Monte Carlo simulations. Such probabilistic approaches can roughly be divided into three distinct categories. The first one is based on the approximation of the conditional expectations which appear in the dynamic programming principle formulation (2) using a regression on a truncated basis of L2 . This specific method is known as the Longstaff-Schwartz algorithm, and is introduced in [11]. The second category involves the presence of a tree which enables us to discretize the underlying price process on a given grid (for example, see Broadie and Glasserman [4]). Finally, the last category exploits the existence of representation formulas for conditional expectation and its derivative (which appear in systems (1) and (2)) via a Malliavin calculus approach, which is exactly what is dealt with throughout this paper. This method was first 6

presented by Lions and Reigner in [10], and then by Bally et al. in [1], where the algorithm is presented for a geometric Brownian motion underlying process. One of the aims of this paper is to provide a slightly more generalised version of the approach presented in [1], most notably by presenting the problem and the algorithm with an underlying price process following the SDE which appears in (1). Malliavin calculus, also known as stochastic calculus of variations, provides us with the mechanics to compute derivatives of random variables. Powerful features such as the Malliavin derivative operator and its adjoint, the Skorohod integral, enable us to express conditional expectations of the form E(f (Xt )|Xs = α) and its derivative ∂α E(f (Xt )|Xs = α), where f is a function satisfying a certain number of regularity conditions, as a ratio of two nonconditioned expectations which involve quantities such as the Skorohod integral. In other words, the information ”contained” in the conditioning is subsequently transmitted to the Skorohod integral. These formulas are obtained by Fourni´e et al. in [7]. This paper is organised as follows. In chapter 1, we introduce the necessary tools of Malliavin calculus which are used later on in the derivation of the representation formulas. Given that the formalism behind Malliavin calculus can rapidly become very complex, we try to follow a simple and logical approach and we therefore do not provide any proofs in this introductory chapter, many of which can be found in Nualart [12]. Once these tools have been introduced, we proceed to the derivation of representation formulas for conditional expectation and its derivative in chapter 2. Here, we give details of all the proofs as this section undeniably represents the core of the problem we are dealing with. We finally apply these representation formulas within the context of the pricing and hedging of Bermudan options in chapter 3, most prominently by introducing the resulting algorithm, and by extending our discussion to the analysis of various numerical results. Let us finally mention that in this paper, the representation formulas as well as the Malliavin pricing and hedging algorithm are presented in a onedimensional setting. We have deliberately chosen not to introduce these ideas in the multi-dimensional case for the sake of clarity. However, all the results presented here are easily extendable to higher dimensions (this is done in the geometric Brownian motion case by Bally et al. in [1]), and one must not forget that the main benefits of Monte Carlo based approaches over finite difference methods arise in a multi-dimensional setting.

7

1

Malliavin Calculus

This chapter aims to give an introduction to Malliavin calculus and presents the main theoretical results which will be needed later on in the derivation of the representation formulas for conditional expectation and its derivative. We follow the approach presented in [6], and a more complete description of the following results along with their proofs can be found in [8] and [12].

1.1

Context and general framework

We start by introducing the general framework used in this introductory chapter. We start by fixing a positive constant T ∈]0, +∞]. We then consider an n-dimensional Wiener process W = (Wt )0≤t≤T on a filtered probability space (Ω, F, (Ft )0≤t≤T , P), where (Ft )0≤t≤T represents the augmented filtration generated by W . Let us mention that this multi-dimensional setting is only used in this chapter, and we later move on to a one-dimensional framework. Let us now recall the basic spaces and norms which are commonly used in stochastic analysis. Definition 1.1.1 The Hilbert space L2 (Ω) is the set of real-valued square integrable random variables. It is equipped with the following inner product: ∀X, Y ∈ L2 (Ω), < X, Y >L2 (Ω) := E(XY ) The norm induced by this inner product is: q ∀X ∈ L2 (Ω), kXkL2 (Ω) = < X, X >L2 (Ω) = E(X 2 )1/2 Definition 1.1.2 The Hilbert space L2 (Ω×[0, T ]) is the set of random processes φ such that: ! Z T 2 E φ (ω, t)dt < ∞ 0 2

L (Ω × [0, T ]) is equipped with the inner product: Z

2

∀φ, ψ ∈ L (Ω × [0, T ]), < φ, ψ >L2 (Ω×[0,T ] := E

8

T

! φ(ω, t)ψ(ω, t)dt

0

Here, the corresponding norm is naturally: 2

∀φ ∈ L (Ω×[0, T ]), kφkL2 (Ω×[0,T ]) =

q

Z < φ, φ >L2 (Ω×[0,T ]) = E

!1/2

T 2

φ (ω, t)dt 0

Definition 1.1.3 We denote by M2 (Ω × [0, T ]) ⊂ L2 (Ω × [0, T ]) the Hilbert space of progressively measurable random processes φ such that: ! Z T 2 φ (ω, t)dt < ∞ E 0

The corresponding inner product and norm are defined in a similar way as above. We also need to specify the general form of the random variables we shall be working with. More precisely: Definition 1.1.4 We define C ⊂ L2 (Ω) as the set of random variables F of the form: ! Z T Z T F =f g1 (t)dW (t), ..., gn (t)dW (t) 0

0

Cp∞ (Rn ),

the set of infinitely continuously differentiable funcwhere f belongs to tions on Rn such that all partial derivatives have polynomial growth, and where the functions g1 , ..., gn are in L2 (Ω × [0, T ]).

1.2

The Malliavin derivative operator

We start this section by giving a precise mathematical meaning to the derivative of an element F ∈ C: Definition 1.2.1 If F ∈ C, the Malliavin derivative DF of F is defined as the stochastic process DF = (Dt F )0≤t≤T belonging to L2 (Ω × [0, T ]) such that: ! Z T Z T n X ∂f Dt F = g1 (t)dW (t), ..., gn (t)dW (t) gi (t) ∂xi 0 0 i=1 One interesting feature of C is that its completion with respect to a certain norm is a Banach space: Definition 1.2.2 If F ∈ C, we define the norm k.k1,2 by: 2 1/2

kF k1,2 := E(F )

Z +E

!1/2

T 2

(Dt F ) dt 0

Furthermore, we define the Banach space D1,2 as the completion of C with respect to k.k1,2 . 9

We are now able to give a first important result related to the Malliavin derivative operator D: Proposition 1.2.1 The Malliavin derivative operator D, defined on D1,2 and with values in L2 (Ω × [0, T ]), is a closed unbounded operator. As a matter of fact, the Malliavin derivative operator shares a certain number of properties which are satisfied by the derivative operator of ordinary differential calculus: Proposition 1.2.2 The Malliavin derivative operator D satisfies the following properties: 1. Linearity: ∀(α, β) ∈ R2 , ∀F, G ∈ D1,2 , Dt (αF + βG) = αDt (F ) + βDt (G) 2. Product rule: ∀F, G ∈ D1,2 , Dt (F G) = F Dt (G) + GDt (F ) 3. Chain rule: If u : Rn → R is a continuously differentiable function with bounded partial derivatives, if F = (F1 , ..., Fn ) is a random vector with components in D1,2 , then u(F ) ∈ D1,2 and: Dt u(F ) =

1.3

n X ∂u (F )Dt Fi ∂x i i=1

The Skorohod integral and other important results

According to the Riesz representation theorem, the unboundedness of the Malliavin derivative operator D enables us to define its adjoint, namely the Skorohod integral, denoted by δ: Definition 1.3.1 Let u be a random process in L2 (Ω × [0, T ]). Then u ∈ Dom(δ) if for any F ∈ D1,2 , we have: ! Z T

< DF, u >L2 (Ω×[0,T ]) = E

≤ C(u)kF k1,2

(Dt F )ut dt 0

where C(u) is a constant which only depends on u. Subsequently, if u ∈ Dom(δ), the Skorohod integral δ(u) is defined for any F ∈ D1,2 by: ! Z T

E(F δ(u)) =< DF, u >L2 (Ω×[0,T ]) = E

10

(Dt F )ut dt 0

When u ∈ Dom(δ), we say that u is Skorohod-integrable, and δ(u) defined above is an element of L2 (Ω). Furthermore, the following property enables us to establish a connection between the Skorohod integral and the Itˆo integral: Proposition 1.3.1 Dom(δ) contains all the progressively measurable random processes in L2 (Ω × [0, T ]), i.e Dom(δ) ⊃ M2 (Ω × [0, T ]), and the Skorohod integral restricted to M2 (Ω × [0, T ]) coincides with the Itˆ o integral. We can now state a fundamental result related to the Skorohod integral, known as the integration by parts formula: Theorem 1.3.1 Let F be an FT -measurable random variable in D1,2 . Then for any u ∈ Dom(δ): Z δ(F u) = F δ(u) −

T

(Dt F )ut dt 0

We now look at how Malliavin calculus can be applied within the context of stochastic differential equations. This fundamental application will prove to be vital throughout the main steps of the derivation of the representation formulas for conditional expectation and its derivative. Proposition 1.3.2 Let X = (Xt )t∈[0,T ] be an n-dimensional Itˆ o process characterized by the following dynamics: dXt = b(Xt )dt + σ(Xt )dWt where b and σ are two continuously differentiable functions with bounded derivatives. Define Y = (Yt )t∈[0,T ] as the first variational process of X: 0

dYt = b (Xt )Yt dt +

n X

σi0 (Xt )Yt dWti ,

Y0 = In

i=1

where In denotes the identity matrix of Rn . Then X ∈ D1,2 and satisfies: Ds Xt = Yt Ys−1 σ(Xs )1{s≤t} ,

s ≥ 0 a.s

Moreover, if ψ is a continuously differentiable function with bounded partial derivatives, we have: Ds ψ(XT ) = ∇ψ(XT )YT Ys−1 σ(Xs )1{s≤T } ,

s ≥ 0 a.s

We now have a look at some examples of applications of the previous proposition: Example 1.3.1 We suppose that the process X = (Xt )t∈[0,T ] satisfies the following SDE: dXt = rXt dt + σXt dWt , X0 = x 11

In this case, the first variational process Y = (Yt )t∈[0,T ] satisfies: dYt = rYt dt + σYt dWt , Y0 = 1 and therefore Yt =

Xt x

for all t ≥ 0. Applying proposition 1.3.2, we obtain: Ds Xt = σXt 1{s≤t}

Example 1.3.2 We suppose that the process X = (Xt )t∈[0,T ] satisfies the following SDE: dXt = a(θ − Xt )dt + σdWt , X0 = x Here, the dynamics of the first variational process Y = (Yt )t∈[0,T ] are given by: dYt = −aYt dt, Y0 = 1 We therefore have Yt = e−at for all t ≥ 0, and the Malliavin derivative of Xt is now given by: Ds Xt = σe−a(t−s) 1{s≤t} We end this chapter by introducing commutation formulas between the Malliavin derivative operator D and the Riemann-Stieltjes and Itˆo operators of integration. These results will prove to be very helpful within the context of the derivation of representation formulas for the derivative of conditional expectation. Proposition 1.3.3 We consider a process u = (ut )t∈[0,T ] in M2 (Ω × [0, T ]) satisfying ut ∈ D1,2 for all t ∈ [0, T ] and (Ds ut )t≥s ∈ M2 (Ω × [s, T )) for all RT RT s ∈ [0, T ]. Then 0 ut dWt and 0 ut dt belong to D1,2 , and for all s ∈ [0, T ], we have the two following equalities: R R Ds T ut dWt = us + T (Ds ut )dWt 0 s R R Ds T ut dt = T (Ds ut )dt 0 s

12

2

Representation formulas for conditional expectation and its derivative

In this chapter, we apply the previous results in order to derive representation formulas for conditional expectation and its derivative. These formulas will enable us to approximate the price and delta of a Bermudan option using Monte Carlo simulations later on. In order to derive such formulas, we follow the approach developed in [7] and [1].

2.1

General framework

We place ourselves in the probabilistic framework set in chapter 1, taking n = 1. Let us recall that our aim is to take into consideration conditional expectations of the form: E(f (F )|G = 0) where F and G are two FT -measurable random variables in D1,2 , and where f is a Borel-measurable function with polynomial growth. We also assume that F and G are smooth enough in a Malliavin sense, meaning that Dt F and Dt G belong to L2 (Ω × [0, T ]). We finally assume that Dt G is non-degenerate, i.e there exists a sufficiently smooth process u = (ut )0≤t≤T satisfying: ! Z T E (Dt G)ut dt|σ(F, G) = 1 (2.1) 0

where σ(F, G) denotes the σ-algebra generated by F and G. As a result, we will choose ut = T D1t G whenever it is possible. We can now state the first representation formula: Theorem 2.1.1 Assuming F, G, u and f satisfy the previous assumptions, and if f is continuously differentiable, we have: RT E f (F )H(G)δ(u) − f 0 (F )H(G) 0 (Dt F )ut dt E(f (F )|G = 0) = E(H(G)δ(u)) 13

where H denotes the standard Heaviside funtion. Proof: Using the definition of conditional expectation, we can write: E(f (F )δ0 (G)) E(δ0 (G))

E(f (F )|G = 0) =

where δ0 denotes the Dirac delta function at 0. On the one hand, taking derivatives in a distributional sense, we have: E(δ0 (G))

= E(H 0 (G)) !!

T

Z

0

(Dt G)ut dt|σ(F, G)

= E H (G)E 0

=

E E H 0 (G)

Z

!!

T

(Dt G)ut dt|σ(F, G) 0

0

Z

!

T

= E H (G)

(Dt G)ut dt 0

!

T

Z = E

(Dt H(G))ut dt 0

=

E(H(G)δ(u))

On the other hand, using the integration by parts formula, we have: ! Z T E(H(G)δ(f (F )u)) = E f (F )H(G)δ(u) − H(G) (Dt f (F ))ut dt 0

Z

0

= E f (F )H(G)δ(u) − f (F )H(G)

!

T

(Dt F )ut dt 0

But by definition of the Skorohod integral, we see that: ! Z T E(H(G)δ(f (F )u)) = E (Dt H(G))f (F )ut dt 0

Z =

E f (F )δ0 (G)

!

T

(Dt G)ut dt 0

=

E (f (F )δ0 (G))

Remark 2.1.1 Rigorously speaking, the previous computation can be justified by the fact that: E(f (F )|G = 0) = lim

→0+

14

E(f (F )1[−,] (G)) E(1[−,] (G))

We then use the integration by parts formula to write: E(f (F )1[−,] (G)) = E f (F )H (G)δ(u) − f 0 (F )H (G)

!

T

Z

(Dt F )ut dt 0

where H denotes the Heaviside function at , i.e defined by H (x) = 1{x>} +C, C being an arbitrary constant. The conclusion then comes by letting go to 0+ . This theorem provides us with a representation formula of conditional expectation in the case of a continuously differentiable function f . However, in many financial applications, we often have to consider cases of non-smooth functions. Therefore, the previous result happens to be quite restrictive as far as assumptions are concerned. In order to deal with the potential non-smoothness of f , we have to introduce an additional assumption on the random process u: Corollary 2.1.1 We assume that F, G, u and f satisfy the previous assumptions, and we also assume that u satisifes: ! Z T

(Dt F )ut dt|σ(F, G)

E

=0

(2.2)

0

Then we have the following representation: E(f (F )|G = 0) =

E(f (F )H(G)δ(u)) E(H(G)δ(u))

Proof: Note that using the previous theorem, it suffices to prove that ! Z T

E f 0 (F )H(G)

(Dt F )ut dt

=0

0

But this is not hard to see, as we can write: ! Z T

0

E f (F )H(G)

(Dt F )ut dt

0

!!

T

Z

= E E f (F )H(G)

(Dt F )ut dt|σ(F, G)

0

0

= E f 0 (F )H(G)E

Z

T

!! (Dt F )ut dt|σ(F, G)

0

=

0.

2.2

Application to SDEs

Having introduced the representation formulas of conditional expectations of the form E(f (F )|G = 0), we now wish to specialize these results to the 15

case of stochastic differential equations. More precisely, we consider a process X = (Xt )0≤t≤T and we suppose that it satisfies the following SDE on [0, T ]: dXt = b(Xt )dt + σ(Xt )dWt , X0 = x where b and σ are two sufficiently smooth functions with bounded derivatives. We now choose F = Xt and G = Xs − α, where α ∈ R and where s and t are such that 0 ≤ s ≤ t ≤ T . Our aim is therefore to provide representation formulas for the conditional expectation E(f (Xt )|Xs = α). The first step is to introduce the first variational process Y = (Yt )0≤t≤T associated with X. Recall that Y follows the following SDE on [0, T ]: dYt = b0 (Xt )Yt dt + σ 0 (Xt )Yt dWt , Y0 = 1 The next step is to ensure the non-degeneracy of the Malliavin derivative of Xs − α. In other words, we need to determine a process u = (ut )0≤t≤T satisfying: ! Z T (Dr (Xs − α))ur dr|σ(Xt , Xs ) = 1 E 0

Given that Dr (Xs − α) = Dr (Xs ) = YYrs σ(Xr )1{r≤s} , we naturally choose Yr ur = sYs σ(X 1{r≤s} . As a result, the representation formula becomes: r) RT E f (Xt )H(Xs − α)δ(u) − f 0 (Xt )H(Xs − α) 0 (Dr Xt )ur dr E(f (Xt )|Xs = α) = E(H(Xs − α)δ(u)) E f (Xt )H(Xs − α)δ(u) − f 0 (Xt )H(Xs − α) YYst = E(H(Xs − α)δ(u)) Before any Monte-Carlo simulation can be performed with this formula, we see that one difficulty arises, namely the presence of the Skorohod integral Yr δ(u). Indeed, the random variable ur = sYs σ(X 1{r≤s} is not Fr -measurable r) and therefore the proces u is not F-adapted. As a result, there is no explicit expression for u involving an Itˆo integral. However, if we notice that the ranYr dom variable Ys ur = sσ(X 1{r≤s} is Fr -measurable, we can now consider the r) Yr F-adapted process v = (vr )0≤r≤T defined by vr = sσ(X 1{r≤s} and use the r) integration by parts formula to determine δ(u). Assuming Ys takes positive values a.s, we can write: Z T δ(v) 1 + (Dr Ys )ur dr δ(u) = Ys Ys 0 Z s Z s 1 Yr 1 (Dr Ys )Yr = dWr + dr sYs 0 σ(Xr ) sYs2 0 σ(Xr )

We finally need to determine an explicit expression for Dr Ys . This can be done using the following lemma: 16

Lemma 2.2.1 Let Z = (Zt )0≤t≤T be the second variational process associated with X, i.e the process which satisfies the following SDE on [0, T ]: dZt = (b0 (Xt )Zt + b00 (Xt )Yt2 )dt + (σ 0 (Xt )Zt + σ 00 (Xt )Yt2 )dWt , Z0 = 0 Then Dr Ys can be expressed as: σ(Xr )Zs σ(Xr )Zr Ys 1{r≤s} Dr Ys = + σ 0 (Xr )Ys − Yr Yr2 Proof: Recall that we have: Z s Z 0 Ys = b (Xr )Yr dr + 0

s

σ 0 (Xr )Yr dWr

0

Applying the two commutation formulas introduced in Proposition 1.3.3 enables us to write, for s ≥ r: Z s Z s 0 0 Dr Ys = (Dv b (Xv )Yv )dv + σ (Xr )Yr + (Dv σ 0 (Xv )Yv )dWv r

r

Therefore, for s ≥ r, we can write: d(Dr Ys )

= Ys b00 (Xs )(Dr Xs )ds + b0 (Xs )(Dr Ys )ds + Ys σ 00 (Xs )(Dr Xs )dWs + σ 0 (Xs )(Dr Ys )dWs

Now using the fact that Dr Xs = d(Dr Ys )

Ys Yr σ(Xr )1{r≤s} ,

we obtain, for s ≥ r:

σ(Xr ) ds + b0 (Xs )(Dr Ys )ds Yr σ(Xr ) + Ys2 σ 00 (Xs ) dWs + σ 0 (Xs )(Dr Ys )dWs Yr = Ys2 b00 (Xs )

Consequently, given the dynamics of the process Z, the process R defined by Rs = Dr Ys − σ(XYrr)Zs satisfies the following SDE, for all s ∈ [r, t]: dRs = b0 (Xs )Rs ds + σ 0 (Xs )Rs dWs The solution of this SDE can be readily expressed as, for s ≥ r: Rs = Rr

Ys σ(Xr )Zr Ys = σ 0 (Xr )Ys − Yr Yr2

We then deduce the required expression for Dr Ys .

The representation formula is finally obtained in the following theorem: Theorem 2.2.1 Assuming f is a continuously differentiable function, we have for all α ∈ R: E f (Xt )H(Xs − α)δ(u) − f 0 (Xt )H(Xs − α) YYst E(f (Xt )|Xs = α) = E(H(Xs − α)δ(u)) 17

where the Skorohod integral δ(u) is given by: Z s Z s 0 1 Yr sZs σ (Xr )Yr Zr δ(u) = dWr + + − dr sYs Ys σ(Xr ) Yr 0 σ(Xr ) 0 Once again, we notice that the previous theorem requires the smoothness of f . In order to take into account the case of non-smooth functions, recall that the process u must now also satisfy the following condition: ! Z T

(Dr Xt )ur dr|σ(Xt , Xs )

E

=0

0

Choosing ur =

Yr σ(Xr )Ys

1 s 1{r≤s}

−

1 t−s 1{s≤r≤t}

, u satisfies the two re-

quired conditions (2.1) and (2.2), and the representation formula becomes: Theorem 2.2.2 For all α ∈ R, we have: E(f (Xt )|Xs = α) =

E(f (Xt )H(Xs − α)δ(u)) E(H(Xs − α)δ(u))

where the Skorohod integral δ(u) is now given by: Z s Z t Z s 0 1 s Yr sZs σ (Xr )Yr Zr Yr δ(u) = dWr − dWr + + − dr sYs t − s s σ(Xr ) Ys σ(Xr ) Yr 0 0 σ(Xr ) We now apply the previous representation formulas to specific stochastic differential equations. We start with the case of geometric Brownian motion: Example 2.2.1 Here, we suppose that the process X = (Xt )0≤t≤T satisfies the following SDE on [0, T ]: dXt = rXt dt + σXt dWt , X0 = x In this particular case, the first and second variational processes are respectively given by Yt = Xxt and Zt = 0 for all t ∈ [0, T ]. We can therefore explicitly determine the Skorohod integral involved in the representation formula of E(f (Xt )|Xs = α). Using the expression of δ(u) given in theorem 2.2.2, we have: x tWs − sWt +1 δ(u) = Xs σs(t − s) Consequently, for all α ∈ R, we have: s −sWt E f (Xt )H(Xs − α) X1s tW σs(t−s) + 1 E(f (Xt )|Xs = α) = s −sWt E H(Xs − α) X1s tW σs(t−s) + 1 We then apply the results to the case of an Ornstein-Uhlenbeck process: 18

Example 2.2.2 We suppose that the process X = (Xt )0≤t≤T satisfies the following SDE on [0, T ]: dXt = a(θ − Xt )dt + σdWt , X0 = x Here, the first and second variational processes are respectively given by Yt = e−at and Zt = 0 for all t ∈ [0, T ]. The expression of δ(u) in theorem 2.2.2 gives us: Z s Z t s 1 −ar −ar e dW − e dW δ(u) = r r σse−as t−s s 0 We therefore have, for all α ∈ R: R R t −ar s s E f (Xt )H(Xs − α) 0 e−ar dWr − t−s e dW r s R E(f (Xt )|Xs = α) = Rt s −ar s −ar E H(Xs − α) 0 e dWr − t−s s e dWr In order to be able to perform a Monte Carlo simulation using the above expression, we see that we need to determine the distribution of δ(u). More precisely, for all s ∈ [0, T ], we have to determine the distribution of the random variable As defined by As = σse−as δ(u). We have: Z t s e−ar dWr e−ar dWr − t−s s 0 Z t t−s s e−ar 1{r≤s} − 1{r≥s} dWr t−s 0 s Z t 1 e−ar (t1{r≤s} − s)dWr t−s 0

Z As

= = =

s

A straightforward computation gives us As ∼ N (0, as ) where as is given by, for all s ∈ [0, T ]: as =

1 2a(t − s)2

1 − e−2as t(t − 2s) + s2 1 − e−2at

Finally this gives us, for all α ∈ R: E(f (Xt )|Xs = α) =

√ E(f (Xt )H(Xs − α)N as ) √ E(H(Xs − α)N as )

where N is a standard Gaussian random variable. We now wish to derive representation formulas for the derivative of conditional expectation, i.e for expressions of the form ∂α E(f (Xt )|Xs = α). These formulas will subsequently enable us to compute the delta of Bermudan options. We only consider the more general case where the f is not necessarily smooth enough, and therefore our starting point is the second representation formula given in theorem 2.2.2: 19

Theorem 2.2.3 For all α ∈ R, we have: ∂α E(f (Xt )|Xs = α) =

E(H(Xs − α)δ(u))Ls,t [f ](α) − E(f (Xt )H(Xs − α)δ(u))Ls,t [1](α) E(H(Xs − α)δ(u))2

where the operator Ls,t [.](α) is given by, for all α ∈ R: Z t 2 (Dr δ(u))ur dr Ls,t [.](α) = E .(Xt )H(Xs − α)δ (u) − .(Xt )H(Xs − α) 0

and where the Skorohod integral δ(u) is given by: Z s Z t Z s 0 1 Yr Yr s sZs σ (Xr )Yr Zr δ(u) = dWr − dWr + + − dr sYs t − s s σ(Xr ) Ys σ(Xr ) Yr 0 σ(Xr ) 0 Proof: We need to show that ∂α E(f (Xt )H(Xs − α)δ(u)) = Ls,t [f ](α) and that ∂α E(H(Xs − α)δ(u)) = Ls,t [1](α). Assuming we can interchange the derivation and expectation operators, we have: ∂α E(f (Xt )H(Xs − α)δ(u))

= E(f (Xt )∂α (H(Xs − α))δ(u)) =

E(f (Xt )δα (Xs )δ(u))

where δα denotes the Dirac delta function at α. Furthermore, using the R specific t expression of u and the fact that E 0 (Dr f (Xt ))ur dr|σ(f (Xt ), Xs ) = 0, we can write: ∂α E(f (Xt )H(Xs − α)δ(u))

= E(f (Xt )δα (Xs )δ(u)) Z t = E f (Xt )δα (Xs )δ(u) (Dr Xs )ur dr 0 Z t = E (Dr H(Xs − α)f (Xt ))δ(u)ur dr 0

=

E(H(Xs − α)f (Xt )δ(δ(u)u)) Z t 2 E f (Xt )H(Xs − α) δ (u) − (Dr δ(u))ur dr

=

Ls,t [f ](α)

=

0

where we have used the integration by parts formula between the last two lines. In order to show that ∂α E(H(Xs − α)δ(u)) = Ls,t [1](α), we repeat the same steps taking f ≡ 1. We now return to our two examples in order to derive expressions for the derivative of conditional expectations. Using theorem 2.2.3, we see that the only difficulty is to determine an expression for Dr δ(u). Example 2.2.3 In the case of geometric Brownian motion, recall that the Skorohod integral δ(u) is given by: x tWs − sWt +1 δ(u) = Xs σs(t − s) 20

We then have: Dr δ(u)

= =

x tWs − sWt x + 1 (Dr Xs ) + (Dr (tWs − sWt )) Xs2 σs(t − s) σs(t − s)Xs tx1{r≤s} − sx1{r≤t} tWs − sWt x − 2 + 1 σXs 1{r≤s} + Xs σs(t − s) σs(t − s)Xs

−

A straightforward computation then gives us, after rearranging all the terms: Z t x2 t (Dr δ(u))ur dr = 2 − ∆Ws,t Xs σs(t − s) σ 0 with ∆Ws,t = (tWs − sWt + σs(t − s)). We then have: ! Z t 2 ∆Ws,t x2 t 2 (Dr δ(u))ur dr = 2 δ (u) − − + ∆Ws,t Xs σs(t − s) σs(t − s) σ 0 This finally yields, for all α ∈ R: x2 Ls,t [.](α) = E .(Xt )H(Xs − α) 2 Xs σs(t − s)

2 ∆Ws,t t − + ∆Ws,t σs(t − s) σ

!!

The quantity ∂α E(f (Xt )|Xs = α) can then be readily computed by plugging the above expression of Ls,t [.](α) into the formula given by theorem 2.2.3. Example 2.2.4 In the case of the Ornstein-Uhlenbeck process, the Skorohod integral δ(u) is given by: Z t 1 δ(u) = e−ar (t1{r≤s} − s)dWr σs(t − s)e−as 0 The first step is to compute Dr δ(u): Z t 1 −av Dr Dr δ(u) = e (t1{v≤s} − s)dWv σs(t − s)e−as 0 Z s Z t t s −av −av D D = e dW e dW − r r v v σs(t − s)e−as σs(t − s)e−as 0 0 se−ar te−ar − = σs(t − s)e−as σs(t − s)e−as =

e−a(r−s) σs

Given that ur = Z

e−a(r−s) σ

t

1 s 1{r≤s}

(Dr δ(u))ur dr

= = =

1 t−s 1{s≤r≤t}

, we now have:

e−2a(r−s) 1 1 1{r≤s} − 1{s≤r≤t} dr sσ 2 s t−s 0 Z Z t e2as s −2ar e2as e dr − e−2ar dr s2 σ 2 0 s(t − s)σ 2 s 1 1 −2a(t−s) 2as e − 1 + e − 1 2as2 σ 2 2as(t − s)σ 2

Z

0

t

−

21

This finally gives us, for all α ∈ R: 2 ! Z t 1 −ar e (t1{r≤s} − s)dWr Ls,t [.](α) = E .(Xt )H(Xs − α) 2 2 σ s (t − s)2 e−2as 0 1 1 −2a(t−s) 2as − E .(Xt )H(Xs − α) e − 1 e − 1 + 2as2 σ 2 2as(t − s)σ 2

2.3

The notion of localization

In the proof of theorem 2.1.1, the use of Bayes’ formula within the context of conditional expectation introduces a Dirac delta function, which is very irregular. As a result, the technique involving the integration by parts formula enbables us to ”eliminate” the Dirac delta function and we subsequently obtain an expression with a Heaviside function H. However, it has to be said that this procedure yields an increase in the variance of the resulting estimator, something which is specifically related to the presence of H. In order to overcome this potential drawback, the notion of localization has been developed in recent years (for example in [7] and [1]). Such a procedure R involves the use of a certain localizing function ψ : R → [0, ∞) satisfying R ψ(t)dt = 1. Let us mention that the use of the localized version of the representation formulas is absolutely necessary if one wishes to obtain satisfactory convergence results in the context of Monte Carlo simulation. We now return to the case where the function f is not necessarily smooth, and therefore the random process u satisfies the two required conditions (2.1) and (2.2). The localized version of the representation theorem becomes, both for conditional expectation and its derivative: R Theorem 2.3.1 For any function ψ : R → [0, ∞) satisfying R ψ(t)dt = 1, and for all α ∈ R, we have: E(f (Xt )|Xs = α) =

Jψ s,t [f ](α) Jψ s,t [1](α)

where the operator Jψ s,t [.](α) is given by, for all α ∈ R: Jψ s,t [.](α) = E (.(Xt ) (ψ(Xs − α) + δ(u)(H(Xs − α) − Ψ(Xs − α)))) and where ΨRis the cumulative distribution function associated with ψ, i.e such x that Ψ(x) = −∞ ψ(t)dt. Furthermore, for all α ∈ R, we have: ∂α E(f (Xt )|Xs = α) =

ψ ψ ψ Jψ s,t [1](α)Ls,t [f ](α) − Js,t [f ](α)Ls,t [1](α) 2 Jψ s,t [1](α)

22

where the operator Lψ s,t [.](α) is given by, for all α ∈ R: Lψ [.](α) = E .(X )ψ(X − α)δ(u) t s s,t Z t 2 + E .(Xt )(H(Xs − α) − Ψ(Xs − α)) δ (u) − (Dr δ(u))ur dr 0

Note that the Skorohod integral δ(u) is still given by: Z s Z t Z s 0 1 Yr Yr σ (Xr )Yr s sZs Zr δ(u) = dWr − dWr + + − dr sYs t − s s σ(Xr ) Ys σ(Xr ) Yr 0 σ(Xr ) 0 Proof: According to theorem 2.2.2, we have for all α ∈ R: E(f (Xt )|Xs = α) =

E(f (Xt )H(Xs − α)δ(u)) E(H(Xs − α)δ(u))

We then write: E(f (Xt )H(Xs − α)δ(u))

= E(f (Xt )ψ(Xs − α)) + E(f (Xt )H(Xs − α)δ(u)) − E(f (Xt )ψ(Xs − α)) = E(f (Xt )ψ(Xs − α)) + E(f (Xt )H(Xs − α)δ(u)) − E(f (Xt )Ψ0 (Xs − α))

But then we can apply exactly the same argument as in the proof of theorem 2.1.1 (i.e using the definition of the Skorohod integral and the integration by parts formula) to notice that E(f (Xt )Ψ0 (Xs − α)) = E(f (Xt )Ψ(Xs − α)δ(u)). This gives us the required result for the numerator. As far as the denominator is concerned, we apply the previous argument taking f ≡ 1. Moreover, let us notice that in order to obtain the representation formula of the derivative of conditional expectation, it now suffices to prove that Lψ s,t [.](α) ≡ Ls,t [.](α) for all α ∈ R. We applyR the same reasoning as above, where we introduce the t quantity π = δ 2 (u) − 0 (Dr δ(u))ur dr: Ls,t [f ](α)

= E(f (Xt )H(Xs − α)π) =

E(f (Xt )ψ(Xs − α)δ(u)) + E(f (Xt )H(Xs − α)π)

−

E(f (Xt )ψ(Xs − α)δ(u))

=

E(f (Xt )ψ(Xs − α)δ(u)) + E(f (Xt )H(Xs − α)π)

−

E(f (Xt )Ψ0 (Xs − α)δ(u))

=

E(f (Xt )ψ(Xs − α)δ(u)) + E(f (Xt )H(Xs − α)π)

−

E(f (Xt )Ψ(Xs − α)π)

=

Lψ s,t [f ](α)

where we have used the definition of the Skorohod integral and the integration by parts formula between lines 3 and 4. The same reasoning can be applied to 23

the specific case f ≡ 1 and we get the required result.

Now that we have introduced the localized versions of the representation formulas for conditional expectation and its derivative, one natural question arises, namely the optimal choice of the localizing function ψ. Recall that in order to evaluate the quantity E(f (Xt )|Xs = α), we need to compute Jψ s,t [.](α), which is given by, for all α ∈ R: Jψ s,t [.](α) = E (.(Xt ) (ψ(Xs − α) + δ(u)(H(Xs − α) − Ψ(Xs − α)))) Practically speaking, this expectation is computed via Monte Carlo simulation, and the estimator which is used is simply the associated empirical mean. More precisely, if we denote by N the number of simulated paths, we have the following approximation, valid for all α ∈ R: Jψ s,t [.](α) ≈

N 1 X (q) .(Xt ) ψ(Xs(q) − α) + δ(u)(H(Xs(q) − α) − Ψ(Xs(q) − α)) N q=1

In order to reduce the variance as much as possible, the idea is to minimize the mean integrated variance with respect to the localizing function ψ. In other words, we have to solve the following program: inf I(ψ)

ψ∈L1

R where L1 = {ψ : R → [0, ∞) : ψ ∈ C 1 (R), ψ(+∞) = 0, R ψ(t)dt = 1} and where the mean integrated variance I(ψ) is given by: Z 2 I(ψ) = E .2 (Xt ) (ψ(Xs − α) + δ(u)(H(Xs − α) − Ψ(Xs − α))) dα R

The choice of the optimal localizing function ψ is given in the following proposition: Proposition 2.3.1 We have inf ψ∈L1 I(ψ) = I(ψ ∗ ), where ψ ∗ is the probability density function of the Laplace distribution with parameter λ∗. , i.e for all t ∈ R, ∗ λ∗ ψ ∗ (t) = 2. e−λ. |t| , where λ∗. is given by: λ∗.

=

E .2 (Xt )δ 2 (u) E(.2 (Xt ))

!1/2

Remark 2.3.1 The cumulative distribution function Ψ associated with the Laplace probability density function with parameter λ∗. is given by, for all t ∈ R: 1 ∗ 1 −λ∗. t Ψ(t) = 1 − e 1{t≥0} + eλ. t 1{t s, we have the following distribution equality: r 1 − e−2a(t−s) −a(t−s) −a(t−s) Xt = Xs e + θ(1 − e )+σ Z 2a 35

where Z is a standard Gaussian random variable. In order to obtain the OrnsteinUhlenbeck bridge, for a given sample q ∈ {1, ..., N }, we start by generating ˆ (q) , ..., X ˆ (q) defined by: X δt M δt r 1 − e−2aδt (q) (q) (q) −aδt −aδt ˆ ˆ Zk X + θ(1 − e )+σ kδt = X(k−1)δt e 2a (q)

where the random variables Zk following way:

(q)

are independent. We then set XM δt in the r

(q) XM δt

= xe

−aM δt

+ θ(1 − e

−aM δt

)+σ

1 − e−2aM δt (q) ZM 2a

We finally go back in time using the following distribution equality, valid for all k ∈ {M − 1, ..., 1}: (q)

(q)

(q)

(q)

ˆ + (X ˆ Xkδt = X kδt M δt − XM δt )

eakδt − e−akδt eaM δt − e−aM δt

Remark 3.2.1 Apart from the use of localizing functions to reduce the variance of the two previous estimators, Bally et al. [1] explain how the introduction of a control variate can be used in order to gain more precision. Let us first recall the main idea behind this specific variance reduction technique. Heuristically speaking, if we want to approximate the expectation E(f ) of a given payoff f , and if there exists another payoff g such that its expectation explicitly known, Pis N it is possible to reduce the variance of the estimator N1 q=1 f (q) by replac P PN N ing it by N1 q=1 f (q) − λ N1 q=1 g (q) − E(g) , where λ is a real constant to be determined. It can be shown that the variance of the new estimator is equal to N1 (V ar(f ) − 2λCov(f, g) + λ2 V ar(g)), and this quantity is minimal 1 2 for λ∗ = Cov(f,g) V ar(g) . The resulting variance, equal to N V ar(f )(1 − Corr(f, g) ), precisely shows that the idea is to choose a payoff g which is well correlated with f in absolute value. In the particular case of Bermudan options, Bally et al. [1] suggest the use of the associated European option as a control variate. If we denote by V B (t, x) and V E (t, x) the respective prices of a Bermudan and European call with the same maturity and strike, we define the quantity V (t, x) = V B (t, x) − V E (t, x). We then introduce the new time-dependent obˆ x) = Φ(x) − V E (t, x), which naturally satisfies Φ(T, ˆ stacle Φ(t, x) = 0, and by construction we have: ˆ Xτ ) V (t, x) = sup Et,x e−r(τ −t) Φ(τ, τ ∈Tt,T

In order to obtain the approximations of the price and delta of the Bermudan ˆ x), option, we first carry out all the previous steps with the new obstacle Φ(t, and then we simply add the associated price and delta of the European call at time 0 to the estimators v0 (x) and w0 (x).

36

3.3

Numerical applications

In this final section, we discuss numerical results obtained after having implemented the Malliavin pricing and hedging algorithm in C++. As we have previously mentioned, the algorithm does not numerically work if the localization method is not taken into account. The following results are therefore obtained with the use of localizing functions. We shall base ourselves on the two main examples we have used throughout the dissertation, namely the case of geometric Brownian motion and the Ornstein-Uhlenbeck process. We start with the case of geometric Brownian motion. To this purpose, we consider a non-dividend paying Bermudan put option with strike K = 100 and maturity T = 1 year on an underlying stock with the following characteristics: • Annual risk-free interest rate r = 9.53%. • Annual volatility of returns σ = 20%. • Initial stock value x = 100. According to online pricers (for example, the one which can be found at http://www.intrepid.com/robertl/option-pricer1.html) the ”reference” price and delta of an American option with the same characteristics are 4.891 and -0.403. The following results are given with the number of paths N ranging from 100 to 20000, and the number of time steps M ranging from 10 to 30. Given that an American option can be exercised at any time before maturity, its price should in principle be greater than the price of the associated Bermudan option, which can only be exercised at a finite number of dates. We should therefore expect a convergence of the approximated prices towards the ”upper bound” of 4.891: N 100 500 1000 2000 3000 10000 20000

M = 10 7.058 5.436 5.318 5.255 5.109 4.765 4.777

M = 20 15.289 5.367 5.298 5.187 4.769 4.792 4.814

M = 30 10.808 5.665 5.154 4.739 4.803 4.821 4.829

Table 3.1: Approximation of the option’s price with localization The results given in table 3.1 illustrate several interesting points. To start with, the algorithm is very unstable for a very low number of paths, and only starts to provide reasonable prices from N = 1000 onwards. Furthermore, the condition that the Bermudan option’s price should always be less than the ”reference” value of the associated American option is not always satisfied. It only happens to be verified when the number of paths is sufficiently large. Finally, 37

taking a minimal number of paths equal to 10000, we do observe the reasonable increasing order of prices as the number of time steps (i.e the exercise dates) increases. We now have a look at the standard deviation of the Malliavin price estimator as a function of the number of paths, with different time steps taken into account:

Figure 3.1: Standard Deviation of the Malliavin price estimator as a function of the number of paths

Figure 3.1 above underscores two fundamental aspects. On the one hand, regardless of the number of chosen time steps, we observe a decrease in the estimator’s standard deviation as the number of paths increases. This is a standard property related to Monte Carlo estimators. On the other hand, the more interesting result is that the three curves plot above one another in a very particular order: for a given sample size, as the number of exercise dates increases, the standard deviation of the estimator also increases. This precisely shows us that if we want to increase the number of exercise dates, we also need to increase the number of paths in order to obtain the same precision. This particular trade-off between the number of exercise dates and standard deviation is underlined within the context of Bermudan options pricing by Hepperger [9]. Intuitively speaking, one possible explanation is that a higher number of exercise dates enables the estimator to capture the effects of more extreme points on the numerical grid, thus entailing an increase in its variance and standard deviation. We now analyse the results of the approximation of the option’s delta. Table 3.2 below shows that the approximations of the delta happen to be quite unstable for a number of paths less than 2000. The convergence towards the ”reference ” value of -0.403 is observable for higher values of N and M , although we do sense that additional variance reduction techniques such as the introduction of a control variate (Remark 3.2.1) would enable us to achieve a greater level of convergence. Unfortunately, our own implementation involving 38

the control variate method did not yield any satisfactory results, and they have therefore not been included here.

N 100 500 1000 2000 3000 10000 20000

M = 10 -0.901 -0.159 -0.303 -0.567 -0.511 -0.502 -0.491

M = 20 -0.256 -0.133 -0.187 -0.544 -0.515 -0.497 -0.475

M = 30 -0.144 -0.278 -0.532 -0.512 -0.485 -0.469 -0.442

Table 3.2: Approximation of the option’s delta with localization For illustrative purposes only, we also show the approximations of the price of the same Bermudan option when the localization technique is not taken into consideration: N 100 500 1000 2000 3000

Price 7011 23701 19970 343588 2424299

Standard Deviation 1081 3375 2748 28426 202969

Table 3.3: Approximation of the option’s price without localization, taking M=10 time steps Table 3.3 precisely shows to what extent the non-localized version of the algorithm does not numerically work. We observe a ”blow-up” of the prices and standard deviations as the number of paths increases. We now consider a non-dividend paying Bermudan put option with strike K = 3% and maturity T = 1 year on an arbitrary underlying asset (which could be an interest rate in this case) represented by an Ornstein-Uhlenbeck price process. Let us mention that as ”true” prices of an option on an underlying with such dynamics cannot be found online or in existing literature, the following analysis is purely indicative. We are therefore more interested in the behaviour of the estimators as N and M vary than in the actual value of the option. We choose an annual risk-free interest rate of r = 9.53% and the underlying asset has the following characteristics: • Rate of mean-reversion a = 10%. • Long-term average θ = 4%. 39

• Annual volatility of returns σ = 20%. • Initial asset value x = 3%. N 100 500 1000 2000 3000 10000 20000

M = 10 0.164 1.256 1.071 0.618 0.740 0.789 0.799

M = 20 0.291 1.187 1.501 0.656 0.783 0.813 0.826

M = 30 0.679 0.821 2.218 0.663 0.794 0.819 0.831

Table 3.4: Approximation of the option’s price (in %) with localization The results are given in table 3.3 below. Once again, the approximation procedure seems to be rather unstable for a number of paths less than 2000. Even though we do not have any ”reference” price in this particular case, we do observe a convergence towards a certain value greater than 0.831%. We also notice the increasing sequence of prices as the number of exercise dates increases for a given number of paths, which is consistent with our previous comments regarding this matter. We end this numerical section by analysing the required CPU time of the Malliavin pricing algorithm, which is represented in figure 3.2 below:

Figure 3.2: CPU time as a function of the number of paths N for the computation of the option’s price

Having observed the required CPU time in figure 3.2, three essential remarks have to be made. To start with, there seems to be a quadratic relationship 40

between the CPU time and the number of simulated paths. As a matter of fact, this particular feature is simply explained by what actually has to be computed in step 3.(c) of the algorithm. More precisely, for all q ∈ {1, ..., N } the following quantity is determined: (q)

(q)

Ek [vk+1 ](Xkδt ) ≈

ψ Jk,k+1 [vk+1 ](Xkδt ) (q)

ψ Jk,k+1 [1](Xkδt )

where the numerator and the denominator involve the computation of empirical means with another N sample paths. This naturally results in the CPU time being ”squared” with respect to the number of paths. The second remark which can be made is related to the effect of the number of time steps. The required CPU time seems to be directly proportional to the number of time steps being used, which is something to be expected given the general body of the algorithm. Indeed, as the algorithm is of backward type, if we place ourselves at a given position in time, the different steps are carried out for each path. As a result, taking 20 time steps roughly doubles the CPU time in comparison with 10 time steps. Finally, the most important remark from a practical viewpoint is undeniably related to the overall computational cost in absolute terms. As we observed earlier on in the numerical examples, a very high number of paths is usually necessary to obtain satisfactory results. However, this is done at a considerably high computational cost. For example, choosing 10000 paths along with only 10 timsesteps yields a required CPU time of just over one hour. As a result, it has to be said that the this algorithm is rather uncompetitive with other algorithms such as Longstaff-Schwartz in terms of CPU time.

41

Concluding considerations Let us first summarize the approach which has been followed here in order to be able to present the Malliavin pricing and hedging algorithm. In chapter 1, we started by introducing the main Malliavin calculus tools needed in the derivation of the representation formulas for conditional expectation and its derivative. We then introduced these formulas in chapter 2, starting with nonlocalized versions, before moving on to their localized counterparts. In chapter 3, starting from the dynamic programming principle formulation of the problem, we specifically showed how these formulas had to be applied within the context of the pricing and hedging of Bermudan options. We then provided a sketch of the algorithm itself, before finally discussing several numerical results. As far as numerical aspects are concerned, our own implementation underlined the fact that the algorithm cannot numerically work without the use of localizing functions, as prices and standard deviations of estimators tend to ”blow up”. Even though our implementation of the control variate method presented in remark 3.2.1 did not prove to yield any satisfactory results, we believe that it most certainly would have enabled us to obtain more stable and more accurate prices and deltas with fewer sample paths. Indeed, the results presented here seem to be reasonably in line with ”reference” values when the number of paths is relatively high, but this mechanically increases (and ”squares”) the required CPU time for the computations. This naturally leads us to question of the efficiency of the algorithm in terms of computational cost, especially in comparison with other methods such as the Longstaff-Schwartz algorithm. Bally et al. [1] state that the latter is undenaiably more competitive with regard to the computational speed, although it does not necessarily yield more accurate results. Furthermore, one fundamental advantage of the Malliavin approach is that it remains after all a pricing and hedging algorithm, unlike other probabilistic based methods, which do not provide their own specific procedure to compute the delta of the option. Taking into consideration the sequence followed in this paper, the first natural extension which comes to mind is based on the introduction of the representation formulas for conditional expectation and its derivative in a multidimensional setting, as the main benefits of probabilistic methods over deterministic ones arise in higher dimensions. This can be done in a fairly straightforward 42

way when the underlying price process is characterized by geometric Brownian motion (see Bally et al. [1]), but whether or not such an extension can readily be developed for other processes remains to be seen. Finally, given the high required CPU time for the Malliavin pricing and hedging algorithm, another possible extension would be to look at how this particular method could be combined with other probabilistic based approaches in order to achieve more competitive time costs. The core problem we have to deal with here is related to the presence of step 3.(c) of the algorithm, which involves a very costly computation with the use of equation (3.1), and subsequently leads to the required CPU time being ”squared”. Other ways of evaluating (3.1) could potentially yield a more efficient computational cost, although it currently remains unclear as to how one could go about determining such alternative techniques.

43

Bibliography [1]

V. Bally, L. Caramellino, and A. Zanette, Pricing and hedging American options by Monte Carlo methods using a Malliavin calculus approach, Institut National de Recherche en Informatique et en Automatique, Research paper no 4804, 2003.

[2]

A. Bensoussan and J-L. Lions, Impulse control and quasivariational inequalities, Gauthiers-Villars, 1984.

[3]

M. Bladt and M. Sorensen, Simple simulation of diffusion bridges with application to likelihood inference for diffusions, Centre for Analytical Finance, University of Aarhus, Working paper no 225, 2007.

[4]

M. Broadie and P. Glasserman, Pricing American-style securities using simulation, Journal of Economic Dynamics and Control, pp. 1323-1352, 1997.

[5]

D. Duffy, Finite difference methods in financial engineering: A partial differential equation approach, Wiley Series, 2006.

[6]

E. Fourni´e, J-M. Lasry, J. Lebuchoux, P-L. Lions, N. Touzi, Applications of Malliavin calculus to Monte Carlo methods in finance, Finance and Stochastics, Springer-Verlag, pp. 391-412, 1999.

[7]

E. Fourni´e, J-M. Lasry, J. Lebuchoux, P-L. Lions, Applications of Malliavin calculus to Monte Carlo methods in finance II, Finance and Stochastics, Springer-Verlag, pp. 201-236, 2001.

[8]

P. Friz, An introduction to Malliavin calculus, Courant Institute of Mathematical Sciences, New York University, 2002.

[9]

P. Hepperger, Pricing high-dimensional Bermudan options using variancereduced Monte Carlo methods, Zentrum Mathematik, Technische Universit¨ at M¨ unchen.

[10] P-L. Lions and H. Regnier, Calcul du prix et des sensibilit´es d’une option am´ericaine par une m´ethode de Monte Carlo, Preprint, 2001.

44

[11] F. Longstaff and E. Schwartz, Valuing American options by simulations: a simple least squares approach, The Review of Financial Studies, no 14, pp. 113-148, 2001. [12] D. Nualart, The Malliavin calulus and related topics, Springer-Verlag, 1995.

45

James Newbury Mathematical Institute University of Oxford

A dissertation submitted for the degree of: Master of Science in Mathematical and Computational Finance Trinity 2011

Abstract The pricing of Bermudan options, which give the holder the right to buy or sell an underlying asset at a predetermined price and at a discretely spaced number of times prior to maturity, can be based on a deterministic method or on a probabilistic one. Deterministic methods such as finite differences lose their efficiency as the dimension of the problem increases, and they are therefore known to suffer from the ”curse of dimensionality”. Probabilistic methods enable us to overcome this problem by using Monte Carlo simulations. One particular method is the Malliavin pricing and hedging algorithm, which uses representation formulas for conditional expectation and its derivative to approximate the price and delta of a Bermudan option. This paper specifically deals with how the powerful tools of Malliavin calculus are applied in the derivation of such representation formulas, and looks at how the latter are subsequently used in the pricing and hedging algorithm. Key words: Bermudan option, dynamic programming principle, Malliavin derivative operator, Skorohod integral, first and second variational processes, representation formula, localizing function.

1

Acknowledgements I would like to thank my supervisor, Dr Lajos Gergely Gyurk´o, for his guidance as well as for all the helpful remarks he has given me throughout the duration of the dissertation.

2

Contents

Introduction

3

1 Malliavin Calculus 1.1 Context and general framework . . . . . . . . . . . . . . . . . . . 1.2 The Malliavin derivative operator . . . . . . . . . . . . . . . . . . 1.3 The Skorohod integral and other important results . . . . . . . .

8 8 9 10

2 Representation formulas for derivative 2.1 General framework . . . . . 2.2 Application to SDEs . . . . 2.3 The notion of localization .

13 13 15 22

conditional expectation and its . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3 Applications to the pricing and hedging of Bermudan 3.1 Overview of the problem and alternative probabilistic methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 The Malliavin pricing and hedging algorithm . . . . . . 3.2.1 Applications of the representation formulas . . . 3.2.2 Sketch of the algorithm . . . . . . . . . . . . . . 3.3 Numerical applications . . . . . . . . . . . . . . . . . . .

options pricing . . . . . . . . . . . . . . . . . . . . . . . . .

28 28 31 31 34 37

Concluding considerations

41

Bibliography

44

3

Introduction Bermudan options, which can be considered as intermediate contracts between American and European options, give the holder the right to buy or sell an underlying asset at a predetermined price and at a discretely spaced number of times prior to a given maturity. Intuitively speaking, the price of an American option is always greater or equal than the price of its Bermudan counterpart, since it gives its holder the right to exercise his right at any given time prior to maturity. In the same way, the price of a Bermudan option is always greater or equal than the price of a European option with the same characteristics, given that the latter can only be exercised at maturity. A direct consequence of this feature is that the price of a European option can explicitly be determined by computing the discounted expectation of the terminal payoff. However, this specific procedure cannot be extended to the case of Bermudan options, since the idea here is to determine the optimal exercise date. In the case of a nondividend paying Bermudan call, it can be shown that the optimal exercise date is in effect at maturity. Apart from this very isolated case, the determination of the optimal exercise date is far from being a trivial affair. We now look at how the problem can be formalised. Let T be a positive constant representing the maturity of the option and let W = (Wt )0≤t≤T be a 1-dimensional Wiener process on a filtered probability space (Ω, F, (Ft )0≤t≤T , Q), where (Ft )0≤t≤T represents the augmented filtration generated by W , and where Q is the risk-neutral probability. We make the assumption of a complete market, i.e all contingent claims are Q-attainable, which guarantees the uniqueness of Q, and enables us to determine the unique price of European contingent claims as the discounted expectation of their payoff at maturity. We denote by E(.) the risk-neutral expectation. We then proceed to the discretization of the interval [0, T ] by introducing the T and by taking into consideration the discrete times tk = kδt time step δt = M for all k ∈ {0, ..., M }. A Bermudan option with maturity T on an underlying asset with price X is defined as a contract which gives the holder the right to buy or sell the underlying at a strike price K and at any exercise date within the set {t1 , ..., tM }. We denote by Φ : R+ → R the payoff of the option, i.e Φ(Xkδt ) represents the amount received by the holder if he decides to exercise his right at time tk , and we assume that the underlying asset’s price process 4

X = (Xt )0≤t≤T satisfies the following SDE on [0, T ]: dXt = b(Xt )dt + σ(Xt )dWt , X0 = x

(1)

where b and σ are two sufficiently smooth functions with bounded derivatives. We also assume that the following integrability condition is satisfied by the discounted payoff at time 0: ! E

sup |e−rt Φ(Xt )|

< +∞

t∈[0,T ]

It can be shown that the price at time t ∈ [0, T ] of such an option is given by: V (t, x) = sup Et,x e−r(τ −t) Φ(Xτ ) τ ∈Tt,T

where Et,x represents the risk-neutral expectation given that the price process starts at x at time t, where Tt,T denotes the set of all admissible stopping times in [t, T ], and where r denotes the constant risk-free interest rate. We are now interested in determining two fundamental quantities which characterize the option. On the one hand, the premium of the option, defined as its price V (0, x) at time 0, is the quantity paid upfront by the holder of the option to its seller. On the other hand, the delta of the option at time t, denoted by ∆(t, x), corresponds to the derivative of the price of the option at time t with respect to the initial value of the underlying asset: ∆(t, x) = ∂x V (t, x) = ∂x sup Et,x e−r(τ −t) Φ(Xτ ) τ ∈Tt,T

The above expressions for the price and delta of a Bermudan option precisely show that we are in fact confronted with an optimal stopping problem. We therefore have to consider the Bellman dynamic programming principle, starting with the following backward recursive system, valid for all k ∈ {M − 1, ..., 0}: ( Vˆ (tM , XM δt ) = Φ(X M δt ) (2) Vˆ (tk , Xkδt ) = max Φ(Xkδt ), e−rδt E Vˆ (tk+1 , X(k+1)δt )|Xkδt where Vˆ (tk , Xkδt ) corresponds to the approximation of the option’s price at time tk . This system shows that the value of the option at maturity is simply equal to the payoff at maturity, given the fact that the holder cannot exercise the option at a later date. It also states that the value of the option at time tk is equal to the maximum of the intrinsic value at time tk and the continuation value, i.e the discounted conditional expectation of the value of the option at time tk+1 . Intuitively speaking, the continuation value can be considered as the value of keeping the option alive rather than exercising it at a given discrete time. The price of the option V (0, x) is then approximated by Vˆ (0, x).

5

As for the delta of the option, we heuristically take into account the derivative of the price at time t1 with respect to the price of the underlying. More ˆ precisely, the approximation ∆(0, x) of the delta at time 0 is given by: ˆ 1 , Xδt ) = ∂α Φ(α)1C c + e−rδt ∂α E Vˆ (t2 , X2δt )|Xδt = α 1C ∆(t (3) ˆ ˆ 1 , Xδt ) ∆(0, x) = E ∆(t where the continuation region C is defined by C = {α ∈ R+ : Vˆ (t1 , α) > Φ(α)}. Generally speaking, the existing methods used for pricing Bermudan options can either be deterministic or probabilistic. As far as the deterministic methods are concerned, a simple application of Itˆo’s lemma to the price V A (t, x) of the associated American option leads to the following PDE: ∂V A A A + LV − rV 1{V A (t,x)>Φ(x)} = 0 ∂t with the terminal condition V A (T, x) = Φ(x) and where L denotes the infinitesimal generator of the process X defined in the following way: L.(x) = b(x)

1 ∂2. ∂. (x) + σ 2 (x) 2 (x) ∂x 2 ∂x

A weak formulation of the previous problem with variational inequalities (in Bensoussan and Lions [2] for example) leads to a linear complementarity problem which can be solved by standard finite difference methods (for example, see Duffy [5]). This method subsequently gives us the price of the associated Bermudan option, and an approximation of its American counterpart. Despite its relatively high efficiency for low-dimensional problems, the finite difference approach suffers from the so-called ”curse of dimensionality”. There seems to be a consensus based on the idea that in three or more dimensions, the computing speed of the finite difference approach is uncompetitive in comparison with probabilistic techniques. The problem of pricing Bermudan options using probabilistic methods is based on the use of Monte Carlo simulations. Such probabilistic approaches can roughly be divided into three distinct categories. The first one is based on the approximation of the conditional expectations which appear in the dynamic programming principle formulation (2) using a regression on a truncated basis of L2 . This specific method is known as the Longstaff-Schwartz algorithm, and is introduced in [11]. The second category involves the presence of a tree which enables us to discretize the underlying price process on a given grid (for example, see Broadie and Glasserman [4]). Finally, the last category exploits the existence of representation formulas for conditional expectation and its derivative (which appear in systems (1) and (2)) via a Malliavin calculus approach, which is exactly what is dealt with throughout this paper. This method was first 6

presented by Lions and Reigner in [10], and then by Bally et al. in [1], where the algorithm is presented for a geometric Brownian motion underlying process. One of the aims of this paper is to provide a slightly more generalised version of the approach presented in [1], most notably by presenting the problem and the algorithm with an underlying price process following the SDE which appears in (1). Malliavin calculus, also known as stochastic calculus of variations, provides us with the mechanics to compute derivatives of random variables. Powerful features such as the Malliavin derivative operator and its adjoint, the Skorohod integral, enable us to express conditional expectations of the form E(f (Xt )|Xs = α) and its derivative ∂α E(f (Xt )|Xs = α), where f is a function satisfying a certain number of regularity conditions, as a ratio of two nonconditioned expectations which involve quantities such as the Skorohod integral. In other words, the information ”contained” in the conditioning is subsequently transmitted to the Skorohod integral. These formulas are obtained by Fourni´e et al. in [7]. This paper is organised as follows. In chapter 1, we introduce the necessary tools of Malliavin calculus which are used later on in the derivation of the representation formulas. Given that the formalism behind Malliavin calculus can rapidly become very complex, we try to follow a simple and logical approach and we therefore do not provide any proofs in this introductory chapter, many of which can be found in Nualart [12]. Once these tools have been introduced, we proceed to the derivation of representation formulas for conditional expectation and its derivative in chapter 2. Here, we give details of all the proofs as this section undeniably represents the core of the problem we are dealing with. We finally apply these representation formulas within the context of the pricing and hedging of Bermudan options in chapter 3, most prominently by introducing the resulting algorithm, and by extending our discussion to the analysis of various numerical results. Let us finally mention that in this paper, the representation formulas as well as the Malliavin pricing and hedging algorithm are presented in a onedimensional setting. We have deliberately chosen not to introduce these ideas in the multi-dimensional case for the sake of clarity. However, all the results presented here are easily extendable to higher dimensions (this is done in the geometric Brownian motion case by Bally et al. in [1]), and one must not forget that the main benefits of Monte Carlo based approaches over finite difference methods arise in a multi-dimensional setting.

7

1

Malliavin Calculus

This chapter aims to give an introduction to Malliavin calculus and presents the main theoretical results which will be needed later on in the derivation of the representation formulas for conditional expectation and its derivative. We follow the approach presented in [6], and a more complete description of the following results along with their proofs can be found in [8] and [12].

1.1

Context and general framework

We start by introducing the general framework used in this introductory chapter. We start by fixing a positive constant T ∈]0, +∞]. We then consider an n-dimensional Wiener process W = (Wt )0≤t≤T on a filtered probability space (Ω, F, (Ft )0≤t≤T , P), where (Ft )0≤t≤T represents the augmented filtration generated by W . Let us mention that this multi-dimensional setting is only used in this chapter, and we later move on to a one-dimensional framework. Let us now recall the basic spaces and norms which are commonly used in stochastic analysis. Definition 1.1.1 The Hilbert space L2 (Ω) is the set of real-valued square integrable random variables. It is equipped with the following inner product: ∀X, Y ∈ L2 (Ω), < X, Y >L2 (Ω) := E(XY ) The norm induced by this inner product is: q ∀X ∈ L2 (Ω), kXkL2 (Ω) = < X, X >L2 (Ω) = E(X 2 )1/2 Definition 1.1.2 The Hilbert space L2 (Ω×[0, T ]) is the set of random processes φ such that: ! Z T 2 E φ (ω, t)dt < ∞ 0 2

L (Ω × [0, T ]) is equipped with the inner product: Z

2

∀φ, ψ ∈ L (Ω × [0, T ]), < φ, ψ >L2 (Ω×[0,T ] := E

8

T

! φ(ω, t)ψ(ω, t)dt

0

Here, the corresponding norm is naturally: 2

∀φ ∈ L (Ω×[0, T ]), kφkL2 (Ω×[0,T ]) =

q

Z < φ, φ >L2 (Ω×[0,T ]) = E

!1/2

T 2

φ (ω, t)dt 0

Definition 1.1.3 We denote by M2 (Ω × [0, T ]) ⊂ L2 (Ω × [0, T ]) the Hilbert space of progressively measurable random processes φ such that: ! Z T 2 φ (ω, t)dt < ∞ E 0

The corresponding inner product and norm are defined in a similar way as above. We also need to specify the general form of the random variables we shall be working with. More precisely: Definition 1.1.4 We define C ⊂ L2 (Ω) as the set of random variables F of the form: ! Z T Z T F =f g1 (t)dW (t), ..., gn (t)dW (t) 0

0

Cp∞ (Rn ),

the set of infinitely continuously differentiable funcwhere f belongs to tions on Rn such that all partial derivatives have polynomial growth, and where the functions g1 , ..., gn are in L2 (Ω × [0, T ]).

1.2

The Malliavin derivative operator

We start this section by giving a precise mathematical meaning to the derivative of an element F ∈ C: Definition 1.2.1 If F ∈ C, the Malliavin derivative DF of F is defined as the stochastic process DF = (Dt F )0≤t≤T belonging to L2 (Ω × [0, T ]) such that: ! Z T Z T n X ∂f Dt F = g1 (t)dW (t), ..., gn (t)dW (t) gi (t) ∂xi 0 0 i=1 One interesting feature of C is that its completion with respect to a certain norm is a Banach space: Definition 1.2.2 If F ∈ C, we define the norm k.k1,2 by: 2 1/2

kF k1,2 := E(F )

Z +E

!1/2

T 2

(Dt F ) dt 0

Furthermore, we define the Banach space D1,2 as the completion of C with respect to k.k1,2 . 9

We are now able to give a first important result related to the Malliavin derivative operator D: Proposition 1.2.1 The Malliavin derivative operator D, defined on D1,2 and with values in L2 (Ω × [0, T ]), is a closed unbounded operator. As a matter of fact, the Malliavin derivative operator shares a certain number of properties which are satisfied by the derivative operator of ordinary differential calculus: Proposition 1.2.2 The Malliavin derivative operator D satisfies the following properties: 1. Linearity: ∀(α, β) ∈ R2 , ∀F, G ∈ D1,2 , Dt (αF + βG) = αDt (F ) + βDt (G) 2. Product rule: ∀F, G ∈ D1,2 , Dt (F G) = F Dt (G) + GDt (F ) 3. Chain rule: If u : Rn → R is a continuously differentiable function with bounded partial derivatives, if F = (F1 , ..., Fn ) is a random vector with components in D1,2 , then u(F ) ∈ D1,2 and: Dt u(F ) =

1.3

n X ∂u (F )Dt Fi ∂x i i=1

The Skorohod integral and other important results

According to the Riesz representation theorem, the unboundedness of the Malliavin derivative operator D enables us to define its adjoint, namely the Skorohod integral, denoted by δ: Definition 1.3.1 Let u be a random process in L2 (Ω × [0, T ]). Then u ∈ Dom(δ) if for any F ∈ D1,2 , we have: ! Z T

< DF, u >L2 (Ω×[0,T ]) = E

≤ C(u)kF k1,2

(Dt F )ut dt 0

where C(u) is a constant which only depends on u. Subsequently, if u ∈ Dom(δ), the Skorohod integral δ(u) is defined for any F ∈ D1,2 by: ! Z T

E(F δ(u)) =< DF, u >L2 (Ω×[0,T ]) = E

10

(Dt F )ut dt 0

When u ∈ Dom(δ), we say that u is Skorohod-integrable, and δ(u) defined above is an element of L2 (Ω). Furthermore, the following property enables us to establish a connection between the Skorohod integral and the Itˆo integral: Proposition 1.3.1 Dom(δ) contains all the progressively measurable random processes in L2 (Ω × [0, T ]), i.e Dom(δ) ⊃ M2 (Ω × [0, T ]), and the Skorohod integral restricted to M2 (Ω × [0, T ]) coincides with the Itˆ o integral. We can now state a fundamental result related to the Skorohod integral, known as the integration by parts formula: Theorem 1.3.1 Let F be an FT -measurable random variable in D1,2 . Then for any u ∈ Dom(δ): Z δ(F u) = F δ(u) −

T

(Dt F )ut dt 0

We now look at how Malliavin calculus can be applied within the context of stochastic differential equations. This fundamental application will prove to be vital throughout the main steps of the derivation of the representation formulas for conditional expectation and its derivative. Proposition 1.3.2 Let X = (Xt )t∈[0,T ] be an n-dimensional Itˆ o process characterized by the following dynamics: dXt = b(Xt )dt + σ(Xt )dWt where b and σ are two continuously differentiable functions with bounded derivatives. Define Y = (Yt )t∈[0,T ] as the first variational process of X: 0

dYt = b (Xt )Yt dt +

n X

σi0 (Xt )Yt dWti ,

Y0 = In

i=1

where In denotes the identity matrix of Rn . Then X ∈ D1,2 and satisfies: Ds Xt = Yt Ys−1 σ(Xs )1{s≤t} ,

s ≥ 0 a.s

Moreover, if ψ is a continuously differentiable function with bounded partial derivatives, we have: Ds ψ(XT ) = ∇ψ(XT )YT Ys−1 σ(Xs )1{s≤T } ,

s ≥ 0 a.s

We now have a look at some examples of applications of the previous proposition: Example 1.3.1 We suppose that the process X = (Xt )t∈[0,T ] satisfies the following SDE: dXt = rXt dt + σXt dWt , X0 = x 11

In this case, the first variational process Y = (Yt )t∈[0,T ] satisfies: dYt = rYt dt + σYt dWt , Y0 = 1 and therefore Yt =

Xt x

for all t ≥ 0. Applying proposition 1.3.2, we obtain: Ds Xt = σXt 1{s≤t}

Example 1.3.2 We suppose that the process X = (Xt )t∈[0,T ] satisfies the following SDE: dXt = a(θ − Xt )dt + σdWt , X0 = x Here, the dynamics of the first variational process Y = (Yt )t∈[0,T ] are given by: dYt = −aYt dt, Y0 = 1 We therefore have Yt = e−at for all t ≥ 0, and the Malliavin derivative of Xt is now given by: Ds Xt = σe−a(t−s) 1{s≤t} We end this chapter by introducing commutation formulas between the Malliavin derivative operator D and the Riemann-Stieltjes and Itˆo operators of integration. These results will prove to be very helpful within the context of the derivation of representation formulas for the derivative of conditional expectation. Proposition 1.3.3 We consider a process u = (ut )t∈[0,T ] in M2 (Ω × [0, T ]) satisfying ut ∈ D1,2 for all t ∈ [0, T ] and (Ds ut )t≥s ∈ M2 (Ω × [s, T )) for all RT RT s ∈ [0, T ]. Then 0 ut dWt and 0 ut dt belong to D1,2 , and for all s ∈ [0, T ], we have the two following equalities: R R Ds T ut dWt = us + T (Ds ut )dWt 0 s R R Ds T ut dt = T (Ds ut )dt 0 s

12

2

Representation formulas for conditional expectation and its derivative

In this chapter, we apply the previous results in order to derive representation formulas for conditional expectation and its derivative. These formulas will enable us to approximate the price and delta of a Bermudan option using Monte Carlo simulations later on. In order to derive such formulas, we follow the approach developed in [7] and [1].

2.1

General framework

We place ourselves in the probabilistic framework set in chapter 1, taking n = 1. Let us recall that our aim is to take into consideration conditional expectations of the form: E(f (F )|G = 0) where F and G are two FT -measurable random variables in D1,2 , and where f is a Borel-measurable function with polynomial growth. We also assume that F and G are smooth enough in a Malliavin sense, meaning that Dt F and Dt G belong to L2 (Ω × [0, T ]). We finally assume that Dt G is non-degenerate, i.e there exists a sufficiently smooth process u = (ut )0≤t≤T satisfying: ! Z T E (Dt G)ut dt|σ(F, G) = 1 (2.1) 0

where σ(F, G) denotes the σ-algebra generated by F and G. As a result, we will choose ut = T D1t G whenever it is possible. We can now state the first representation formula: Theorem 2.1.1 Assuming F, G, u and f satisfy the previous assumptions, and if f is continuously differentiable, we have: RT E f (F )H(G)δ(u) − f 0 (F )H(G) 0 (Dt F )ut dt E(f (F )|G = 0) = E(H(G)δ(u)) 13

where H denotes the standard Heaviside funtion. Proof: Using the definition of conditional expectation, we can write: E(f (F )δ0 (G)) E(δ0 (G))

E(f (F )|G = 0) =

where δ0 denotes the Dirac delta function at 0. On the one hand, taking derivatives in a distributional sense, we have: E(δ0 (G))

= E(H 0 (G)) !!

T

Z

0

(Dt G)ut dt|σ(F, G)

= E H (G)E 0

=

E E H 0 (G)

Z

!!

T

(Dt G)ut dt|σ(F, G) 0

0

Z

!

T

= E H (G)

(Dt G)ut dt 0

!

T

Z = E

(Dt H(G))ut dt 0

=

E(H(G)δ(u))

On the other hand, using the integration by parts formula, we have: ! Z T E(H(G)δ(f (F )u)) = E f (F )H(G)δ(u) − H(G) (Dt f (F ))ut dt 0

Z

0

= E f (F )H(G)δ(u) − f (F )H(G)

!

T

(Dt F )ut dt 0

But by definition of the Skorohod integral, we see that: ! Z T E(H(G)δ(f (F )u)) = E (Dt H(G))f (F )ut dt 0

Z =

E f (F )δ0 (G)

!

T

(Dt G)ut dt 0

=

E (f (F )δ0 (G))

Remark 2.1.1 Rigorously speaking, the previous computation can be justified by the fact that: E(f (F )|G = 0) = lim

→0+

14

E(f (F )1[−,] (G)) E(1[−,] (G))

We then use the integration by parts formula to write: E(f (F )1[−,] (G)) = E f (F )H (G)δ(u) − f 0 (F )H (G)

!

T

Z

(Dt F )ut dt 0

where H denotes the Heaviside function at , i.e defined by H (x) = 1{x>} +C, C being an arbitrary constant. The conclusion then comes by letting go to 0+ . This theorem provides us with a representation formula of conditional expectation in the case of a continuously differentiable function f . However, in many financial applications, we often have to consider cases of non-smooth functions. Therefore, the previous result happens to be quite restrictive as far as assumptions are concerned. In order to deal with the potential non-smoothness of f , we have to introduce an additional assumption on the random process u: Corollary 2.1.1 We assume that F, G, u and f satisfy the previous assumptions, and we also assume that u satisifes: ! Z T

(Dt F )ut dt|σ(F, G)

E

=0

(2.2)

0

Then we have the following representation: E(f (F )|G = 0) =

E(f (F )H(G)δ(u)) E(H(G)δ(u))

Proof: Note that using the previous theorem, it suffices to prove that ! Z T

E f 0 (F )H(G)

(Dt F )ut dt

=0

0

But this is not hard to see, as we can write: ! Z T

0

E f (F )H(G)

(Dt F )ut dt

0

!!

T

Z

= E E f (F )H(G)

(Dt F )ut dt|σ(F, G)

0

0

= E f 0 (F )H(G)E

Z

T

!! (Dt F )ut dt|σ(F, G)

0

=

0.

2.2

Application to SDEs

Having introduced the representation formulas of conditional expectations of the form E(f (F )|G = 0), we now wish to specialize these results to the 15

case of stochastic differential equations. More precisely, we consider a process X = (Xt )0≤t≤T and we suppose that it satisfies the following SDE on [0, T ]: dXt = b(Xt )dt + σ(Xt )dWt , X0 = x where b and σ are two sufficiently smooth functions with bounded derivatives. We now choose F = Xt and G = Xs − α, where α ∈ R and where s and t are such that 0 ≤ s ≤ t ≤ T . Our aim is therefore to provide representation formulas for the conditional expectation E(f (Xt )|Xs = α). The first step is to introduce the first variational process Y = (Yt )0≤t≤T associated with X. Recall that Y follows the following SDE on [0, T ]: dYt = b0 (Xt )Yt dt + σ 0 (Xt )Yt dWt , Y0 = 1 The next step is to ensure the non-degeneracy of the Malliavin derivative of Xs − α. In other words, we need to determine a process u = (ut )0≤t≤T satisfying: ! Z T (Dr (Xs − α))ur dr|σ(Xt , Xs ) = 1 E 0

Given that Dr (Xs − α) = Dr (Xs ) = YYrs σ(Xr )1{r≤s} , we naturally choose Yr ur = sYs σ(X 1{r≤s} . As a result, the representation formula becomes: r) RT E f (Xt )H(Xs − α)δ(u) − f 0 (Xt )H(Xs − α) 0 (Dr Xt )ur dr E(f (Xt )|Xs = α) = E(H(Xs − α)δ(u)) E f (Xt )H(Xs − α)δ(u) − f 0 (Xt )H(Xs − α) YYst = E(H(Xs − α)δ(u)) Before any Monte-Carlo simulation can be performed with this formula, we see that one difficulty arises, namely the presence of the Skorohod integral Yr δ(u). Indeed, the random variable ur = sYs σ(X 1{r≤s} is not Fr -measurable r) and therefore the proces u is not F-adapted. As a result, there is no explicit expression for u involving an Itˆo integral. However, if we notice that the ranYr dom variable Ys ur = sσ(X 1{r≤s} is Fr -measurable, we can now consider the r) Yr F-adapted process v = (vr )0≤r≤T defined by vr = sσ(X 1{r≤s} and use the r) integration by parts formula to determine δ(u). Assuming Ys takes positive values a.s, we can write: Z T δ(v) 1 + (Dr Ys )ur dr δ(u) = Ys Ys 0 Z s Z s 1 Yr 1 (Dr Ys )Yr = dWr + dr sYs 0 σ(Xr ) sYs2 0 σ(Xr )

We finally need to determine an explicit expression for Dr Ys . This can be done using the following lemma: 16

Lemma 2.2.1 Let Z = (Zt )0≤t≤T be the second variational process associated with X, i.e the process which satisfies the following SDE on [0, T ]: dZt = (b0 (Xt )Zt + b00 (Xt )Yt2 )dt + (σ 0 (Xt )Zt + σ 00 (Xt )Yt2 )dWt , Z0 = 0 Then Dr Ys can be expressed as: σ(Xr )Zs σ(Xr )Zr Ys 1{r≤s} Dr Ys = + σ 0 (Xr )Ys − Yr Yr2 Proof: Recall that we have: Z s Z 0 Ys = b (Xr )Yr dr + 0

s

σ 0 (Xr )Yr dWr

0

Applying the two commutation formulas introduced in Proposition 1.3.3 enables us to write, for s ≥ r: Z s Z s 0 0 Dr Ys = (Dv b (Xv )Yv )dv + σ (Xr )Yr + (Dv σ 0 (Xv )Yv )dWv r

r

Therefore, for s ≥ r, we can write: d(Dr Ys )

= Ys b00 (Xs )(Dr Xs )ds + b0 (Xs )(Dr Ys )ds + Ys σ 00 (Xs )(Dr Xs )dWs + σ 0 (Xs )(Dr Ys )dWs

Now using the fact that Dr Xs = d(Dr Ys )

Ys Yr σ(Xr )1{r≤s} ,

we obtain, for s ≥ r:

σ(Xr ) ds + b0 (Xs )(Dr Ys )ds Yr σ(Xr ) + Ys2 σ 00 (Xs ) dWs + σ 0 (Xs )(Dr Ys )dWs Yr = Ys2 b00 (Xs )

Consequently, given the dynamics of the process Z, the process R defined by Rs = Dr Ys − σ(XYrr)Zs satisfies the following SDE, for all s ∈ [r, t]: dRs = b0 (Xs )Rs ds + σ 0 (Xs )Rs dWs The solution of this SDE can be readily expressed as, for s ≥ r: Rs = Rr

Ys σ(Xr )Zr Ys = σ 0 (Xr )Ys − Yr Yr2

We then deduce the required expression for Dr Ys .

The representation formula is finally obtained in the following theorem: Theorem 2.2.1 Assuming f is a continuously differentiable function, we have for all α ∈ R: E f (Xt )H(Xs − α)δ(u) − f 0 (Xt )H(Xs − α) YYst E(f (Xt )|Xs = α) = E(H(Xs − α)δ(u)) 17

where the Skorohod integral δ(u) is given by: Z s Z s 0 1 Yr sZs σ (Xr )Yr Zr δ(u) = dWr + + − dr sYs Ys σ(Xr ) Yr 0 σ(Xr ) 0 Once again, we notice that the previous theorem requires the smoothness of f . In order to take into account the case of non-smooth functions, recall that the process u must now also satisfy the following condition: ! Z T

(Dr Xt )ur dr|σ(Xt , Xs )

E

=0

0

Choosing ur =

Yr σ(Xr )Ys

1 s 1{r≤s}

−

1 t−s 1{s≤r≤t}

, u satisfies the two re-

quired conditions (2.1) and (2.2), and the representation formula becomes: Theorem 2.2.2 For all α ∈ R, we have: E(f (Xt )|Xs = α) =

E(f (Xt )H(Xs − α)δ(u)) E(H(Xs − α)δ(u))

where the Skorohod integral δ(u) is now given by: Z s Z t Z s 0 1 s Yr sZs σ (Xr )Yr Zr Yr δ(u) = dWr − dWr + + − dr sYs t − s s σ(Xr ) Ys σ(Xr ) Yr 0 0 σ(Xr ) We now apply the previous representation formulas to specific stochastic differential equations. We start with the case of geometric Brownian motion: Example 2.2.1 Here, we suppose that the process X = (Xt )0≤t≤T satisfies the following SDE on [0, T ]: dXt = rXt dt + σXt dWt , X0 = x In this particular case, the first and second variational processes are respectively given by Yt = Xxt and Zt = 0 for all t ∈ [0, T ]. We can therefore explicitly determine the Skorohod integral involved in the representation formula of E(f (Xt )|Xs = α). Using the expression of δ(u) given in theorem 2.2.2, we have: x tWs − sWt +1 δ(u) = Xs σs(t − s) Consequently, for all α ∈ R, we have: s −sWt E f (Xt )H(Xs − α) X1s tW σs(t−s) + 1 E(f (Xt )|Xs = α) = s −sWt E H(Xs − α) X1s tW σs(t−s) + 1 We then apply the results to the case of an Ornstein-Uhlenbeck process: 18

Example 2.2.2 We suppose that the process X = (Xt )0≤t≤T satisfies the following SDE on [0, T ]: dXt = a(θ − Xt )dt + σdWt , X0 = x Here, the first and second variational processes are respectively given by Yt = e−at and Zt = 0 for all t ∈ [0, T ]. The expression of δ(u) in theorem 2.2.2 gives us: Z s Z t s 1 −ar −ar e dW − e dW δ(u) = r r σse−as t−s s 0 We therefore have, for all α ∈ R: R R t −ar s s E f (Xt )H(Xs − α) 0 e−ar dWr − t−s e dW r s R E(f (Xt )|Xs = α) = Rt s −ar s −ar E H(Xs − α) 0 e dWr − t−s s e dWr In order to be able to perform a Monte Carlo simulation using the above expression, we see that we need to determine the distribution of δ(u). More precisely, for all s ∈ [0, T ], we have to determine the distribution of the random variable As defined by As = σse−as δ(u). We have: Z t s e−ar dWr e−ar dWr − t−s s 0 Z t t−s s e−ar 1{r≤s} − 1{r≥s} dWr t−s 0 s Z t 1 e−ar (t1{r≤s} − s)dWr t−s 0

Z As

= = =

s

A straightforward computation gives us As ∼ N (0, as ) where as is given by, for all s ∈ [0, T ]: as =

1 2a(t − s)2

1 − e−2as t(t − 2s) + s2 1 − e−2at

Finally this gives us, for all α ∈ R: E(f (Xt )|Xs = α) =

√ E(f (Xt )H(Xs − α)N as ) √ E(H(Xs − α)N as )

where N is a standard Gaussian random variable. We now wish to derive representation formulas for the derivative of conditional expectation, i.e for expressions of the form ∂α E(f (Xt )|Xs = α). These formulas will subsequently enable us to compute the delta of Bermudan options. We only consider the more general case where the f is not necessarily smooth enough, and therefore our starting point is the second representation formula given in theorem 2.2.2: 19

Theorem 2.2.3 For all α ∈ R, we have: ∂α E(f (Xt )|Xs = α) =

E(H(Xs − α)δ(u))Ls,t [f ](α) − E(f (Xt )H(Xs − α)δ(u))Ls,t [1](α) E(H(Xs − α)δ(u))2

where the operator Ls,t [.](α) is given by, for all α ∈ R: Z t 2 (Dr δ(u))ur dr Ls,t [.](α) = E .(Xt )H(Xs − α)δ (u) − .(Xt )H(Xs − α) 0

and where the Skorohod integral δ(u) is given by: Z s Z t Z s 0 1 Yr Yr s sZs σ (Xr )Yr Zr δ(u) = dWr − dWr + + − dr sYs t − s s σ(Xr ) Ys σ(Xr ) Yr 0 σ(Xr ) 0 Proof: We need to show that ∂α E(f (Xt )H(Xs − α)δ(u)) = Ls,t [f ](α) and that ∂α E(H(Xs − α)δ(u)) = Ls,t [1](α). Assuming we can interchange the derivation and expectation operators, we have: ∂α E(f (Xt )H(Xs − α)δ(u))

= E(f (Xt )∂α (H(Xs − α))δ(u)) =

E(f (Xt )δα (Xs )δ(u))

where δα denotes the Dirac delta function at α. Furthermore, using the R specific t expression of u and the fact that E 0 (Dr f (Xt ))ur dr|σ(f (Xt ), Xs ) = 0, we can write: ∂α E(f (Xt )H(Xs − α)δ(u))

= E(f (Xt )δα (Xs )δ(u)) Z t = E f (Xt )δα (Xs )δ(u) (Dr Xs )ur dr 0 Z t = E (Dr H(Xs − α)f (Xt ))δ(u)ur dr 0

=

E(H(Xs − α)f (Xt )δ(δ(u)u)) Z t 2 E f (Xt )H(Xs − α) δ (u) − (Dr δ(u))ur dr

=

Ls,t [f ](α)

=

0

where we have used the integration by parts formula between the last two lines. In order to show that ∂α E(H(Xs − α)δ(u)) = Ls,t [1](α), we repeat the same steps taking f ≡ 1. We now return to our two examples in order to derive expressions for the derivative of conditional expectations. Using theorem 2.2.3, we see that the only difficulty is to determine an expression for Dr δ(u). Example 2.2.3 In the case of geometric Brownian motion, recall that the Skorohod integral δ(u) is given by: x tWs − sWt +1 δ(u) = Xs σs(t − s) 20

We then have: Dr δ(u)

= =

x tWs − sWt x + 1 (Dr Xs ) + (Dr (tWs − sWt )) Xs2 σs(t − s) σs(t − s)Xs tx1{r≤s} − sx1{r≤t} tWs − sWt x − 2 + 1 σXs 1{r≤s} + Xs σs(t − s) σs(t − s)Xs

−

A straightforward computation then gives us, after rearranging all the terms: Z t x2 t (Dr δ(u))ur dr = 2 − ∆Ws,t Xs σs(t − s) σ 0 with ∆Ws,t = (tWs − sWt + σs(t − s)). We then have: ! Z t 2 ∆Ws,t x2 t 2 (Dr δ(u))ur dr = 2 δ (u) − − + ∆Ws,t Xs σs(t − s) σs(t − s) σ 0 This finally yields, for all α ∈ R: x2 Ls,t [.](α) = E .(Xt )H(Xs − α) 2 Xs σs(t − s)

2 ∆Ws,t t − + ∆Ws,t σs(t − s) σ

!!

The quantity ∂α E(f (Xt )|Xs = α) can then be readily computed by plugging the above expression of Ls,t [.](α) into the formula given by theorem 2.2.3. Example 2.2.4 In the case of the Ornstein-Uhlenbeck process, the Skorohod integral δ(u) is given by: Z t 1 δ(u) = e−ar (t1{r≤s} − s)dWr σs(t − s)e−as 0 The first step is to compute Dr δ(u): Z t 1 −av Dr Dr δ(u) = e (t1{v≤s} − s)dWv σs(t − s)e−as 0 Z s Z t t s −av −av D D = e dW e dW − r r v v σs(t − s)e−as σs(t − s)e−as 0 0 se−ar te−ar − = σs(t − s)e−as σs(t − s)e−as =

e−a(r−s) σs

Given that ur = Z

e−a(r−s) σ

t

1 s 1{r≤s}

(Dr δ(u))ur dr

= = =

1 t−s 1{s≤r≤t}

, we now have:

e−2a(r−s) 1 1 1{r≤s} − 1{s≤r≤t} dr sσ 2 s t−s 0 Z Z t e2as s −2ar e2as e dr − e−2ar dr s2 σ 2 0 s(t − s)σ 2 s 1 1 −2a(t−s) 2as e − 1 + e − 1 2as2 σ 2 2as(t − s)σ 2

Z

0

t

−

21

This finally gives us, for all α ∈ R: 2 ! Z t 1 −ar e (t1{r≤s} − s)dWr Ls,t [.](α) = E .(Xt )H(Xs − α) 2 2 σ s (t − s)2 e−2as 0 1 1 −2a(t−s) 2as − E .(Xt )H(Xs − α) e − 1 e − 1 + 2as2 σ 2 2as(t − s)σ 2

2.3

The notion of localization

In the proof of theorem 2.1.1, the use of Bayes’ formula within the context of conditional expectation introduces a Dirac delta function, which is very irregular. As a result, the technique involving the integration by parts formula enbables us to ”eliminate” the Dirac delta function and we subsequently obtain an expression with a Heaviside function H. However, it has to be said that this procedure yields an increase in the variance of the resulting estimator, something which is specifically related to the presence of H. In order to overcome this potential drawback, the notion of localization has been developed in recent years (for example in [7] and [1]). Such a procedure R involves the use of a certain localizing function ψ : R → [0, ∞) satisfying R ψ(t)dt = 1. Let us mention that the use of the localized version of the representation formulas is absolutely necessary if one wishes to obtain satisfactory convergence results in the context of Monte Carlo simulation. We now return to the case where the function f is not necessarily smooth, and therefore the random process u satisfies the two required conditions (2.1) and (2.2). The localized version of the representation theorem becomes, both for conditional expectation and its derivative: R Theorem 2.3.1 For any function ψ : R → [0, ∞) satisfying R ψ(t)dt = 1, and for all α ∈ R, we have: E(f (Xt )|Xs = α) =

Jψ s,t [f ](α) Jψ s,t [1](α)

where the operator Jψ s,t [.](α) is given by, for all α ∈ R: Jψ s,t [.](α) = E (.(Xt ) (ψ(Xs − α) + δ(u)(H(Xs − α) − Ψ(Xs − α)))) and where ΨRis the cumulative distribution function associated with ψ, i.e such x that Ψ(x) = −∞ ψ(t)dt. Furthermore, for all α ∈ R, we have: ∂α E(f (Xt )|Xs = α) =

ψ ψ ψ Jψ s,t [1](α)Ls,t [f ](α) − Js,t [f ](α)Ls,t [1](α) 2 Jψ s,t [1](α)

22

where the operator Lψ s,t [.](α) is given by, for all α ∈ R: Lψ [.](α) = E .(X )ψ(X − α)δ(u) t s s,t Z t 2 + E .(Xt )(H(Xs − α) − Ψ(Xs − α)) δ (u) − (Dr δ(u))ur dr 0

Note that the Skorohod integral δ(u) is still given by: Z s Z t Z s 0 1 Yr Yr σ (Xr )Yr s sZs Zr δ(u) = dWr − dWr + + − dr sYs t − s s σ(Xr ) Ys σ(Xr ) Yr 0 σ(Xr ) 0 Proof: According to theorem 2.2.2, we have for all α ∈ R: E(f (Xt )|Xs = α) =

E(f (Xt )H(Xs − α)δ(u)) E(H(Xs − α)δ(u))

We then write: E(f (Xt )H(Xs − α)δ(u))

= E(f (Xt )ψ(Xs − α)) + E(f (Xt )H(Xs − α)δ(u)) − E(f (Xt )ψ(Xs − α)) = E(f (Xt )ψ(Xs − α)) + E(f (Xt )H(Xs − α)δ(u)) − E(f (Xt )Ψ0 (Xs − α))

But then we can apply exactly the same argument as in the proof of theorem 2.1.1 (i.e using the definition of the Skorohod integral and the integration by parts formula) to notice that E(f (Xt )Ψ0 (Xs − α)) = E(f (Xt )Ψ(Xs − α)δ(u)). This gives us the required result for the numerator. As far as the denominator is concerned, we apply the previous argument taking f ≡ 1. Moreover, let us notice that in order to obtain the representation formula of the derivative of conditional expectation, it now suffices to prove that Lψ s,t [.](α) ≡ Ls,t [.](α) for all α ∈ R. We applyR the same reasoning as above, where we introduce the t quantity π = δ 2 (u) − 0 (Dr δ(u))ur dr: Ls,t [f ](α)

= E(f (Xt )H(Xs − α)π) =

E(f (Xt )ψ(Xs − α)δ(u)) + E(f (Xt )H(Xs − α)π)

−

E(f (Xt )ψ(Xs − α)δ(u))

=

E(f (Xt )ψ(Xs − α)δ(u)) + E(f (Xt )H(Xs − α)π)

−

E(f (Xt )Ψ0 (Xs − α)δ(u))

=

E(f (Xt )ψ(Xs − α)δ(u)) + E(f (Xt )H(Xs − α)π)

−

E(f (Xt )Ψ(Xs − α)π)

=

Lψ s,t [f ](α)

where we have used the definition of the Skorohod integral and the integration by parts formula between lines 3 and 4. The same reasoning can be applied to 23

the specific case f ≡ 1 and we get the required result.

Now that we have introduced the localized versions of the representation formulas for conditional expectation and its derivative, one natural question arises, namely the optimal choice of the localizing function ψ. Recall that in order to evaluate the quantity E(f (Xt )|Xs = α), we need to compute Jψ s,t [.](α), which is given by, for all α ∈ R: Jψ s,t [.](α) = E (.(Xt ) (ψ(Xs − α) + δ(u)(H(Xs − α) − Ψ(Xs − α)))) Practically speaking, this expectation is computed via Monte Carlo simulation, and the estimator which is used is simply the associated empirical mean. More precisely, if we denote by N the number of simulated paths, we have the following approximation, valid for all α ∈ R: Jψ s,t [.](α) ≈

N 1 X (q) .(Xt ) ψ(Xs(q) − α) + δ(u)(H(Xs(q) − α) − Ψ(Xs(q) − α)) N q=1

In order to reduce the variance as much as possible, the idea is to minimize the mean integrated variance with respect to the localizing function ψ. In other words, we have to solve the following program: inf I(ψ)

ψ∈L1

R where L1 = {ψ : R → [0, ∞) : ψ ∈ C 1 (R), ψ(+∞) = 0, R ψ(t)dt = 1} and where the mean integrated variance I(ψ) is given by: Z 2 I(ψ) = E .2 (Xt ) (ψ(Xs − α) + δ(u)(H(Xs − α) − Ψ(Xs − α))) dα R

The choice of the optimal localizing function ψ is given in the following proposition: Proposition 2.3.1 We have inf ψ∈L1 I(ψ) = I(ψ ∗ ), where ψ ∗ is the probability density function of the Laplace distribution with parameter λ∗. , i.e for all t ∈ R, ∗ λ∗ ψ ∗ (t) = 2. e−λ. |t| , where λ∗. is given by: λ∗.

=

E .2 (Xt )δ 2 (u) E(.2 (Xt ))

!1/2

Remark 2.3.1 The cumulative distribution function Ψ associated with the Laplace probability density function with parameter λ∗. is given by, for all t ∈ R: 1 ∗ 1 −λ∗. t Ψ(t) = 1 − e 1{t≥0} + eλ. t 1{t s, we have the following distribution equality: r 1 − e−2a(t−s) −a(t−s) −a(t−s) Xt = Xs e + θ(1 − e )+σ Z 2a 35

where Z is a standard Gaussian random variable. In order to obtain the OrnsteinUhlenbeck bridge, for a given sample q ∈ {1, ..., N }, we start by generating ˆ (q) , ..., X ˆ (q) defined by: X δt M δt r 1 − e−2aδt (q) (q) (q) −aδt −aδt ˆ ˆ Zk X + θ(1 − e )+σ kδt = X(k−1)δt e 2a (q)

where the random variables Zk following way:

(q)

are independent. We then set XM δt in the r

(q) XM δt

= xe

−aM δt

+ θ(1 − e

−aM δt

)+σ

1 − e−2aM δt (q) ZM 2a

We finally go back in time using the following distribution equality, valid for all k ∈ {M − 1, ..., 1}: (q)

(q)

(q)

(q)

ˆ + (X ˆ Xkδt = X kδt M δt − XM δt )

eakδt − e−akδt eaM δt − e−aM δt

Remark 3.2.1 Apart from the use of localizing functions to reduce the variance of the two previous estimators, Bally et al. [1] explain how the introduction of a control variate can be used in order to gain more precision. Let us first recall the main idea behind this specific variance reduction technique. Heuristically speaking, if we want to approximate the expectation E(f ) of a given payoff f , and if there exists another payoff g such that its expectation explicitly known, Pis N it is possible to reduce the variance of the estimator N1 q=1 f (q) by replac P PN N ing it by N1 q=1 f (q) − λ N1 q=1 g (q) − E(g) , where λ is a real constant to be determined. It can be shown that the variance of the new estimator is equal to N1 (V ar(f ) − 2λCov(f, g) + λ2 V ar(g)), and this quantity is minimal 1 2 for λ∗ = Cov(f,g) V ar(g) . The resulting variance, equal to N V ar(f )(1 − Corr(f, g) ), precisely shows that the idea is to choose a payoff g which is well correlated with f in absolute value. In the particular case of Bermudan options, Bally et al. [1] suggest the use of the associated European option as a control variate. If we denote by V B (t, x) and V E (t, x) the respective prices of a Bermudan and European call with the same maturity and strike, we define the quantity V (t, x) = V B (t, x) − V E (t, x). We then introduce the new time-dependent obˆ x) = Φ(x) − V E (t, x), which naturally satisfies Φ(T, ˆ stacle Φ(t, x) = 0, and by construction we have: ˆ Xτ ) V (t, x) = sup Et,x e−r(τ −t) Φ(τ, τ ∈Tt,T

In order to obtain the approximations of the price and delta of the Bermudan ˆ x), option, we first carry out all the previous steps with the new obstacle Φ(t, and then we simply add the associated price and delta of the European call at time 0 to the estimators v0 (x) and w0 (x).

36

3.3

Numerical applications

In this final section, we discuss numerical results obtained after having implemented the Malliavin pricing and hedging algorithm in C++. As we have previously mentioned, the algorithm does not numerically work if the localization method is not taken into account. The following results are therefore obtained with the use of localizing functions. We shall base ourselves on the two main examples we have used throughout the dissertation, namely the case of geometric Brownian motion and the Ornstein-Uhlenbeck process. We start with the case of geometric Brownian motion. To this purpose, we consider a non-dividend paying Bermudan put option with strike K = 100 and maturity T = 1 year on an underlying stock with the following characteristics: • Annual risk-free interest rate r = 9.53%. • Annual volatility of returns σ = 20%. • Initial stock value x = 100. According to online pricers (for example, the one which can be found at http://www.intrepid.com/robertl/option-pricer1.html) the ”reference” price and delta of an American option with the same characteristics are 4.891 and -0.403. The following results are given with the number of paths N ranging from 100 to 20000, and the number of time steps M ranging from 10 to 30. Given that an American option can be exercised at any time before maturity, its price should in principle be greater than the price of the associated Bermudan option, which can only be exercised at a finite number of dates. We should therefore expect a convergence of the approximated prices towards the ”upper bound” of 4.891: N 100 500 1000 2000 3000 10000 20000

M = 10 7.058 5.436 5.318 5.255 5.109 4.765 4.777

M = 20 15.289 5.367 5.298 5.187 4.769 4.792 4.814

M = 30 10.808 5.665 5.154 4.739 4.803 4.821 4.829

Table 3.1: Approximation of the option’s price with localization The results given in table 3.1 illustrate several interesting points. To start with, the algorithm is very unstable for a very low number of paths, and only starts to provide reasonable prices from N = 1000 onwards. Furthermore, the condition that the Bermudan option’s price should always be less than the ”reference” value of the associated American option is not always satisfied. It only happens to be verified when the number of paths is sufficiently large. Finally, 37

taking a minimal number of paths equal to 10000, we do observe the reasonable increasing order of prices as the number of time steps (i.e the exercise dates) increases. We now have a look at the standard deviation of the Malliavin price estimator as a function of the number of paths, with different time steps taken into account:

Figure 3.1: Standard Deviation of the Malliavin price estimator as a function of the number of paths

Figure 3.1 above underscores two fundamental aspects. On the one hand, regardless of the number of chosen time steps, we observe a decrease in the estimator’s standard deviation as the number of paths increases. This is a standard property related to Monte Carlo estimators. On the other hand, the more interesting result is that the three curves plot above one another in a very particular order: for a given sample size, as the number of exercise dates increases, the standard deviation of the estimator also increases. This precisely shows us that if we want to increase the number of exercise dates, we also need to increase the number of paths in order to obtain the same precision. This particular trade-off between the number of exercise dates and standard deviation is underlined within the context of Bermudan options pricing by Hepperger [9]. Intuitively speaking, one possible explanation is that a higher number of exercise dates enables the estimator to capture the effects of more extreme points on the numerical grid, thus entailing an increase in its variance and standard deviation. We now analyse the results of the approximation of the option’s delta. Table 3.2 below shows that the approximations of the delta happen to be quite unstable for a number of paths less than 2000. The convergence towards the ”reference ” value of -0.403 is observable for higher values of N and M , although we do sense that additional variance reduction techniques such as the introduction of a control variate (Remark 3.2.1) would enable us to achieve a greater level of convergence. Unfortunately, our own implementation involving 38

the control variate method did not yield any satisfactory results, and they have therefore not been included here.

N 100 500 1000 2000 3000 10000 20000

M = 10 -0.901 -0.159 -0.303 -0.567 -0.511 -0.502 -0.491

M = 20 -0.256 -0.133 -0.187 -0.544 -0.515 -0.497 -0.475

M = 30 -0.144 -0.278 -0.532 -0.512 -0.485 -0.469 -0.442

Table 3.2: Approximation of the option’s delta with localization For illustrative purposes only, we also show the approximations of the price of the same Bermudan option when the localization technique is not taken into consideration: N 100 500 1000 2000 3000

Price 7011 23701 19970 343588 2424299

Standard Deviation 1081 3375 2748 28426 202969

Table 3.3: Approximation of the option’s price without localization, taking M=10 time steps Table 3.3 precisely shows to what extent the non-localized version of the algorithm does not numerically work. We observe a ”blow-up” of the prices and standard deviations as the number of paths increases. We now consider a non-dividend paying Bermudan put option with strike K = 3% and maturity T = 1 year on an arbitrary underlying asset (which could be an interest rate in this case) represented by an Ornstein-Uhlenbeck price process. Let us mention that as ”true” prices of an option on an underlying with such dynamics cannot be found online or in existing literature, the following analysis is purely indicative. We are therefore more interested in the behaviour of the estimators as N and M vary than in the actual value of the option. We choose an annual risk-free interest rate of r = 9.53% and the underlying asset has the following characteristics: • Rate of mean-reversion a = 10%. • Long-term average θ = 4%. 39

• Annual volatility of returns σ = 20%. • Initial asset value x = 3%. N 100 500 1000 2000 3000 10000 20000

M = 10 0.164 1.256 1.071 0.618 0.740 0.789 0.799

M = 20 0.291 1.187 1.501 0.656 0.783 0.813 0.826

M = 30 0.679 0.821 2.218 0.663 0.794 0.819 0.831

Table 3.4: Approximation of the option’s price (in %) with localization The results are given in table 3.3 below. Once again, the approximation procedure seems to be rather unstable for a number of paths less than 2000. Even though we do not have any ”reference” price in this particular case, we do observe a convergence towards a certain value greater than 0.831%. We also notice the increasing sequence of prices as the number of exercise dates increases for a given number of paths, which is consistent with our previous comments regarding this matter. We end this numerical section by analysing the required CPU time of the Malliavin pricing algorithm, which is represented in figure 3.2 below:

Figure 3.2: CPU time as a function of the number of paths N for the computation of the option’s price

Having observed the required CPU time in figure 3.2, three essential remarks have to be made. To start with, there seems to be a quadratic relationship 40

between the CPU time and the number of simulated paths. As a matter of fact, this particular feature is simply explained by what actually has to be computed in step 3.(c) of the algorithm. More precisely, for all q ∈ {1, ..., N } the following quantity is determined: (q)

(q)

Ek [vk+1 ](Xkδt ) ≈

ψ Jk,k+1 [vk+1 ](Xkδt ) (q)

ψ Jk,k+1 [1](Xkδt )

where the numerator and the denominator involve the computation of empirical means with another N sample paths. This naturally results in the CPU time being ”squared” with respect to the number of paths. The second remark which can be made is related to the effect of the number of time steps. The required CPU time seems to be directly proportional to the number of time steps being used, which is something to be expected given the general body of the algorithm. Indeed, as the algorithm is of backward type, if we place ourselves at a given position in time, the different steps are carried out for each path. As a result, taking 20 time steps roughly doubles the CPU time in comparison with 10 time steps. Finally, the most important remark from a practical viewpoint is undeniably related to the overall computational cost in absolute terms. As we observed earlier on in the numerical examples, a very high number of paths is usually necessary to obtain satisfactory results. However, this is done at a considerably high computational cost. For example, choosing 10000 paths along with only 10 timsesteps yields a required CPU time of just over one hour. As a result, it has to be said that the this algorithm is rather uncompetitive with other algorithms such as Longstaff-Schwartz in terms of CPU time.

41

Concluding considerations Let us first summarize the approach which has been followed here in order to be able to present the Malliavin pricing and hedging algorithm. In chapter 1, we started by introducing the main Malliavin calculus tools needed in the derivation of the representation formulas for conditional expectation and its derivative. We then introduced these formulas in chapter 2, starting with nonlocalized versions, before moving on to their localized counterparts. In chapter 3, starting from the dynamic programming principle formulation of the problem, we specifically showed how these formulas had to be applied within the context of the pricing and hedging of Bermudan options. We then provided a sketch of the algorithm itself, before finally discussing several numerical results. As far as numerical aspects are concerned, our own implementation underlined the fact that the algorithm cannot numerically work without the use of localizing functions, as prices and standard deviations of estimators tend to ”blow up”. Even though our implementation of the control variate method presented in remark 3.2.1 did not prove to yield any satisfactory results, we believe that it most certainly would have enabled us to obtain more stable and more accurate prices and deltas with fewer sample paths. Indeed, the results presented here seem to be reasonably in line with ”reference” values when the number of paths is relatively high, but this mechanically increases (and ”squares”) the required CPU time for the computations. This naturally leads us to question of the efficiency of the algorithm in terms of computational cost, especially in comparison with other methods such as the Longstaff-Schwartz algorithm. Bally et al. [1] state that the latter is undenaiably more competitive with regard to the computational speed, although it does not necessarily yield more accurate results. Furthermore, one fundamental advantage of the Malliavin approach is that it remains after all a pricing and hedging algorithm, unlike other probabilistic based methods, which do not provide their own specific procedure to compute the delta of the option. Taking into consideration the sequence followed in this paper, the first natural extension which comes to mind is based on the introduction of the representation formulas for conditional expectation and its derivative in a multidimensional setting, as the main benefits of probabilistic methods over deterministic ones arise in higher dimensions. This can be done in a fairly straightforward 42

way when the underlying price process is characterized by geometric Brownian motion (see Bally et al. [1]), but whether or not such an extension can readily be developed for other processes remains to be seen. Finally, given the high required CPU time for the Malliavin pricing and hedging algorithm, another possible extension would be to look at how this particular method could be combined with other probabilistic based approaches in order to achieve more competitive time costs. The core problem we have to deal with here is related to the presence of step 3.(c) of the algorithm, which involves a very costly computation with the use of equation (3.1), and subsequently leads to the required CPU time being ”squared”. Other ways of evaluating (3.1) could potentially yield a more efficient computational cost, although it currently remains unclear as to how one could go about determining such alternative techniques.

43

Bibliography [1]

V. Bally, L. Caramellino, and A. Zanette, Pricing and hedging American options by Monte Carlo methods using a Malliavin calculus approach, Institut National de Recherche en Informatique et en Automatique, Research paper no 4804, 2003.

[2]

A. Bensoussan and J-L. Lions, Impulse control and quasivariational inequalities, Gauthiers-Villars, 1984.

[3]

M. Bladt and M. Sorensen, Simple simulation of diffusion bridges with application to likelihood inference for diffusions, Centre for Analytical Finance, University of Aarhus, Working paper no 225, 2007.

[4]

M. Broadie and P. Glasserman, Pricing American-style securities using simulation, Journal of Economic Dynamics and Control, pp. 1323-1352, 1997.

[5]

D. Duffy, Finite difference methods in financial engineering: A partial differential equation approach, Wiley Series, 2006.

[6]

E. Fourni´e, J-M. Lasry, J. Lebuchoux, P-L. Lions, N. Touzi, Applications of Malliavin calculus to Monte Carlo methods in finance, Finance and Stochastics, Springer-Verlag, pp. 391-412, 1999.

[7]

E. Fourni´e, J-M. Lasry, J. Lebuchoux, P-L. Lions, Applications of Malliavin calculus to Monte Carlo methods in finance II, Finance and Stochastics, Springer-Verlag, pp. 201-236, 2001.

[8]

P. Friz, An introduction to Malliavin calculus, Courant Institute of Mathematical Sciences, New York University, 2002.

[9]

P. Hepperger, Pricing high-dimensional Bermudan options using variancereduced Monte Carlo methods, Zentrum Mathematik, Technische Universit¨ at M¨ unchen.

[10] P-L. Lions and H. Regnier, Calcul du prix et des sensibilit´es d’une option am´ericaine par une m´ethode de Monte Carlo, Preprint, 2001.

44

[11] F. Longstaff and E. Schwartz, Valuing American options by simulations: a simple least squares approach, The Review of Financial Studies, no 14, pp. 113-148, 2001. [12] D. Nualart, The Malliavin calulus and related topics, Springer-Verlag, 1995.

45

Our partners will collect data and use cookies for ad personalization and measurement. Learn how we and our ad partner Google, collect and use data. Agree & close