Sitemap
Expedia Group Technology

Stories from the Expedia Group Technology teams

EXPEDIA GROUP TECHNOLOGY — DATA

Recursive Least Squares for Linear Contextual Bandits

8 min readDec 15, 2020

--

Abstract image showing purple and blue three dimensional waveforms on a yellow gradient background
decorative separator

Contextual bandit algorithms

\[A = \{a_1,…,a_K\}\] \[K \geq 2\]
\[ C = \mathds{R}^{J} \]
\[a^{*}_c = \argmax\limits_a E(R|A=a,C=c)\] \[R \in \mathds{R}\]
\[\Delta = \sum_t E(R|a^{*}_{c_t}, c_t) — E(R|\tilde{a_t}, c_t)\] \[\tilde{a_t} \text{: Selected Arm at time t}\]

ConBandits as a linear model

\[ m: A, C \rightarrow \mathfrak{X} \] \[ \mathfrak{X} = \mathds{R}^{L} \]
\[r_t = w^{\intercal}x_t + \epsilon_t\] \[\epsilon_t \sim N(0, \sigma²)\]
\[\hat{w}_{ols} = \Gamma X^{\intercal}r\] \[\Gamma \equiv (X^{\intercal}X)^{-1}\]
\[ \hat{w}_{ols} \sim N(w, \Sigma)\] \[\Sigma \equiv \sigma² \Gamma\]

Contextual bandit step-by-step

A bandit has 4 steps. 1.Getting a context. 2 Choosing an action. 3. Observing a reward. 4 updating the algorithm.

Selecting an arm with Thompson sampling

Description of the Thompson Sampling. Sample random parameters and find the arm that maximises the score given the context.

Updating the parameters with recurrent least squares

\[\Gamma_{t} = \Gamma_{t-1} — \frac{\Gamma_{t-1} x_t x_t^\intercal \Gamma_{t-1}}{1 + x_t^\intercal \Gamma_{t-1} x_t}\] \[ w_
\[ \hat{\sigma}_{t}² = \alpha\hat{\sigma}_{t-1}² + (1-\alpha) \hat{\epsilon}_t² \] \[\hat{\epsilon}_t \equiv (x^\interca
Description of the Recursive least square algorithm. Updating the Sum of Square matrix, and means.

Conclusion

decorative separator

References

Image source

Expedia Group Technology
Expedia Group Technology

Responses (1)