$24
In online learning we observe a stream of data and we make predictions for new incoming data. After each prediction we observe some performance measure and we adjust our prediction scheme accordingly. Well known applications are spam classi cation and investment.
Exercise 1 serves as an introductory exercise into the online learning with expert advice setting (mix loss). Exercise 1d may take some time. Exercise 2 again considers the expert setting, but with the dot loss. Exercise 3 deals with online convex optimization (OCO). Exercise 3b is tricky, here we ask you to do an adapted proof for online gradient descent (OGD). In exercise 4 you apply the online learning theory to bitcoin / altcoin trading. You will implement the aggregating algorithm (AA) and OGD to trade bitcoins / altcoins automatically.
In the exercises we adopt the notation of Bubeck [2011], which we also used in the lecture.
We would prefer you to typeset your answers using latex and that you hand in your assignment by PDF on brightspace. Please include your (matlab) code in the appendix.
Exercise 1 [25pt] Assume we are in the online learning setting with d = 3 experts e1; e2
and e3 and we evaluate using the mix loss lm(pt; zt) = log(
d
i
e
zti
).
Pi=1
p
Consider the following adversary moves zt for t = 1; : : : ; 4:
t
z1
z2
z3
z4
e1
0
0
1
0
e2
0:1
0
0
0:9
e3
0:2
0:1
0
0
Consider the following strategies:
A in round t choose pt
= eb(t), where the expert b(t) is the expert with the smallest
i
=
t 1
z
i
i
.
cumulative loss L
s=1
t
up to time t. In other words b(t) = arg min L
t 1
i
t 1
P
For t = 1 choose p1 = (1=3; 1=3; 1=3).
the strategy of the Aggregating Algorithm (AA).
[5pt] Compute the selected actions pt for t = 1; : : : ; 4 for strategy A and B.
[5pt] Compute the total mix loss obtained by all strategies and compute the corresponding expert regrets of all strategies after n = 4 rounds.
[5pt] For this example, we have shown in the lecture that for AA we have the theoretical guarantee1 RnE log(d). Use this guarantee to derive an upperbound on
the cumulative mix loss of AA for n = 4:
4
t=1 lm(pt; zt) C. Compute the numerical
P
value for C for this example. How does C compare with the total mix loss obtained by both strategies at n = 4?
[10pt] In this exercise we will show that this theoretical guarantee RnE log(d) cannot be improved (in other words, this bound is ‘tight’). To show this, show that for
See the slides and lecture for a proof. Cesa-Bianchi and Lugosi [2006, Chapter 3.5] also give a proof.
1
any strategy pt and for any d we can nd an adversary zt 2 ( ; 1]d such that RnE
log d. Show your steps. Hint: rst consider n = 1, d = 2. To get an understanding of the problem you may try to plug in some numbers for pit, and nd adversary moves zt that make the regret larger or equal to log(d).
Exercise 2 [20pt] Now we consider the online learning with expert advice setting with d experts and the dot loss ld(pt; zt) = pTt zt. Consider the strategies:
A
the Exp strategy with learning rate A =
4 log(d).
A
p
B
B
the Exp strategy with learning rate B =
p
2 log(d).
De ne Rn as the expert regret of strategy A and Rn as the expert regret of strategy B. We do not consider any speci c adversary this time.
[5pt] For both strategies, compute the theoretical guarantee2 on the expert regret for n = 2. With other words, compute CnA for which RnA CnA for A and compute the value CnB for which RnB CnB for B. Derive which bound is tighter (so, show which C is smaller?) for n = 2. Show your steps.
[5pt] Now derive which bound is tighter for n = 4. Show your steps.
[5pt] Lets assume that strategy B has a tighter bound for some n, meaning CnB < CnA. Does that necessarily mean that for any adversary moves zt for t = 1; : : : ; n, that RnB RnA? You may give an informal argument to explain your answer (an example or proof is not required).
[5pt] Generally, how does the strategy pt change when we increase the learning rate? What do you expect to happen if we set the learning rate extremely high ( = 1)?
Exercise 3 [15pt] Assume we are in the online convex optimization (OCO) setting. That means we are given a closed convex set A Rd and a convex loss function l : A Z ! R and in each round t we take an action at 2 A. Our goal is to minimize the OCO regret
Pn
Rn = t=1 l(at; zt) min l(a; zt), where zt is a parameter chosen by an adversary in
a2A
round t. Assume that for all (a; z) 2 (A Z) the loss l(a; z) is di erentiable with respect to a and that the gradient with respect to a is bounded: jjral(a; z)jj G. Assume further that maxjjajj R. In this setting we can use the online gradient
a2A
descent method with a xed learning rate 2 (0; 1). In the lecture we showed that this procedure achieves a OCO regret of Rn R2 + G2n for any adversary.
2 2
(a) [5pt] Derive that the optimal learning rate is given by n =
R
. This means that
Gp
n
n minimizes the right hand side of the OCO regret bound for Rn from above. Show p
for this optimal choice that Rn RG n.
To set the learning rate according to 3(a) we need to know the number of rounds n in advance. Sometimes we do not know n, and we want to perform well for any n. To get a guarantee on the OCO regret for all n, we can make the learning rate dependent on the time t.
(b) [10pt] Show that if we set t =
R
and
1
= 0 we can bound for all n 1 the
Gp
0
t
OCO regret by Rn 32 GRpn. Explain every step you do as detailed as possible. Hints:
See the lecture (slides), or see [Bubeck, 2011, Chapter 2.2].
2
The exercise might be a challenge, but you will manage. You can follow for most parts the online gradient descent proof3. The step of the telescoping sum, however, will be more di cult. Write out the sum for the rst few terms of t and try to re-order it in a way that allows you to use the bound jjajj R for all a 2 A, then telescope. At some point in the proof you might want to use the inequality Pn 1 2pn.
t=1 pt
Exercise 4 [40pt] We now turn our focus to an application in investment called universal portfolio theory, where we model investment as a repeated decision making scenario. You will rst show that in this setting we should use the mix-loss. Then you will download bitcoin data and implement two algorithms that automatically trade bitcoins / altcoins.
The setting of the portfolio theory is the following. Assume that on a given day t we can distribute our wealth on d di erent assets (for example stocks or several types of bitcoin). Each asset i has a given price xit 2 (0; 1) on day t (for simplicity we assume that the price of the asset cannot drop to 0).
[3pt] Assume that at time t we have a wealth of Wt 2 R. Assume we distribute our wealth Wt at time t according to pt = (p1t; :::; pdt) 2 d on the d di erent assets4. Given the asset prices xit and xit+1, derive the value of our wealth Wt+1 on the next day.
Hint: Introduce rti = xit+1=xit.
Now you show why we should optimize the mix loss in investment.
(b) [5pt] Using 4a, show that with the adversary moves zti = log(rti) that
n
d
! = log(Wn=W1):
log
ptie zti
X
Xi
t=1
=1
Explain why the mix loss is thus appropriate for this setting and explain why we want the total mix loss to be as small as possible.
In this exercise we use some bitcoin data collected by Jesse Vent 5. We have prepared the data and provide you with some example code. Download and inspect the code. We consider the 5 most popular bitcoins / altcoins: BCH (Bitcoin Cash), BTC (Bitcoin), LTC (Litecoin), XRP (Ripple), ETH (Ethereum). Each coin is worth a certain amount of US dollars each day. Our algorithm will in the rst round t = 1 buy an equal amount of coins (so our wealth will be distributed as 20% BCH, 20% BTC, 20% ETH, etc...). Then afterward, each day t = 2; : : : ; 213 our algorithm will sell some coins and buy new ones (redistribute our wealth), depending on past observed behavior of the coins. We will use the AA algorithm in 4c and the OGD algorithm in 4d. In both cases we will use the mix loss. Please note that this homework in no way constitutes investment advice, and that investing in bitcoin and altcoin carries a high risk.
See the lecture, or see [Bubeck, 2011, Chapter 4.1]
Example: if we have Wt = 100 euro, and p1t = 0:2, this means on timestep t we buy / sell an amount of asset 1 such that afterward the amount we own of asset 1 is worth 20 euro.
https://www.kaggle.com/jessevent/all-crypto-currencies
3
[15pt] We will now consider coin trading in the expert setting as follows. We have ve experts, e1; : : : ; e5. Expert e1 in each round will choose move e1 = (1; 0; 0; 0; 0), or with other words, expert i will always invest all its money in asset i. We will re-invest our wealth according to pt 2 5 on each day t = 1; : : : ; 213, where pt is computed by AA.
[0pt] For this exercise we have provided some code. Inspect the given les.
2. [1pt] Compute the adversary moves zti = log(rti) and the total loss of each expert.
[8pt] Implement AA (AA.m) and compute the expert regret of AA (use n = 213).
[3pt] How does the total loss of AA compare with the total loss of the experts? How does the expert regret of AA compare with the guaranteed expert regret? Is the adversary generating ‘di cult’ data?
[2pt] Visualize the strategy of AA using a plot of pt vs t. Also visualize the values of the coins, xt vs t. What do you observe? Can you explain the strategy?
[1pt] If we would have invested according to the AA strategy, how much would our wealth have increased?
[17pt] We will now consider coin trading in the online convex optimization setting. Now the action space A = 5 (recall this means that for all at 2 A we have that
P
d
t t
m t t
i=1 ati = 1 and ati 2 [0; 1]). The action determines how we invest our money in bitcoin
on each day t = 1; : : : ; 213. The loss function is again the mix loss l(a ; z ) = l
(a ; z ).
Recall that lm(at; zt) = log(atT rt), with zti =
log(rti).
[2pt] For this example, how is the OCO regret di erent from the expert regret considered above?
[1pt] Show that the gradient of the mix loss is equal to
rt
ralm(a; zt) = aTt rt
[1pt] Implement mix_loss.m.
[3pt] Implement online gradient descent (OGD) in OGD.m. Use the learning rate
=
R
. For the OCO regret guarantee of this value of see also exercise 3a.
Gpn
5. [2pt] We have provided values R and G for you to use for this data (in OGD.m).
Explain why these values are OK to for these particular adversary moves zt.
6. [1pt] Run OGD for n = 213. Visualize the strategy of OGD as in 4c. How much would our wealth have increased if we would have used OGD to invest?
7. [2pt] Compute the optimal ‘ xed’ action a 2 A (you may use an optimization toolbox, and you may use loss_fixed_action.m). Compute the OCO regret of OGD.
8. [2pt] How does the loss of OGD compare with the loss of the best xed action, and how does the OCO regret of OGD compare with the guaranteed OCO regret (see 3a)? Is the adversary generating ‘di cult’ data?
4
[3pt] In the provided data none of the coins crashed. That is rti 0 for all i and all t. If there is a coin i and a time t such that rti = 0 we run into trouble. Which assumption of Theorem 2 from the lecture notes is violated when this happens? What happens to the bound from the OCO regret from Theorem 2?
5
6
Bibliography
Sebastien Bubeck. Introduction to online optimization. Lecture Notes, pages 1{86, 2011.
Nicolo Cesa-Bianchi and Gabor Lugosi. Prediction, learning, and games. Cambridge univer-sity press, 2006.
7