https://wiljam144.github.io/axolum/blogs/scientific/thompson%20sampling.html

Eingereichte URL:
https://wiljam144.github.io/axolum/blogs/scientific/thompson%20sampling.html
Bericht beendet:

Die von der Seite ausgehenden identifizierten Links

JavaScript-Variablen · 5 gefunden

Globale JavaScript-Variablen, die in das Window Object einer Seite geladen werden, sind Variablen, die außerhalb von Funktionen deklariert werden und von jeder Stelle des Codes innerhalb des aktuellen Bereichs zugänglich sind

NameTyp
onbeforetoggleobject
documentPictureInPictureobject
onscrollendobject
hljsobject
katexobject

Konsolenprotokoll-Meldungen · 1 gefunden

In der Web-Konsole protokollierte Meldungen

TypKategorieProtokoll
warningother
Text
Error with Permissions-Policy header: Origin trial controlled feature not enabled: 'interest-cohort'.

HTML

Der HTML-Rohtext der Seite

<!DOCTYPE html><html lang="en"><head>
        <meta charset="UTF-8">
        <meta name="viewport" content="width=device-width, initial-scale=1.0">
        <title>_Axolum_</title>
        <link rel="stylesheet" href="../../index.css">

        <link rel="icon" type="image/x-icon" href="../../assets/favicon.ico">

        <link rel="stylesheet" href="../../lib/hljs/black-metal-dark-funeral.min.css">
        

        <link rel="stylesheet" href="../../lib/katex/katex.min.css">
        <script defer="" src="../../lib/katex/katex.min.js"></script>
    </head>
    <body>
        <div id="content">
            <div id="logo">
                <div id="title">
                    <span class="purple">_</span>
                    <a href="../../index.html">
                        Axolum
                    </a>
                    <span class="purple">_</span>
                </div>
                <div id="links">
                    <a href="../../personal.html" style="margin-right: 10px">Personal</a>
                    <a href="../../digital-garden.html" style="margin-left: 10px; margin-right: 10px;">Digital Garden</a>
                    <a href="../../scientific.html" style="margin-left: 10px">Scientific</a>
                </div>
            </div>
            <hr>

            <div id="text">
                <h1><span class="purple">#</span> Intro to Thompson Sampling</h1>
<p><em>Thompson Sampling</em> is a method of reinforcement learning.</p>
<h2><span class="purple">#</span> One-Armed Bandit</h2>
<p>This method can be used to solve the following problem. Imagine you are at a
casino with a row of slot machines, you know that the machines have different
chances of player winning. You want to find the one with the highest win
probability in the smallest amount of rounds.</p>
<p>Let's assume that one round costs €1 and you have €1000 to spend, in other words
you will play 1000 rounds.</p>
<h2><span class="purple">#</span> Beta Function</h2>
<p><em>Thompson Sampling</em> uses something called a beta-function. It is defined as
follows:</p>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mtable rowspacing="0.25em" columnalign="right left" columnspacing="0em"><mtr><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><mi>F</mi><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo><mo>=</mo><mfrac><mrow><mi>f</mi><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo></mrow><mi mathvariant="normal">B</mi></mfrac></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><mtext>where</mtext></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><mi>f</mi><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo><mo>=</mo><msup><mi>x</mi><mrow><mi>α</mi><mo>−</mo><mn>1</mn></mrow></msup><mo stretchy="false">(</mo><mn>1</mn><mo>−</mo><mi>x</mi><msup><mo stretchy="false">)</mo><mrow><mi>β</mi><mo>−</mo><mn>1</mn></mrow></msup></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><mi mathvariant="normal">B</mi><mo>=</mo><msubsup><mo>∫</mo><mn>0</mn><mn>1</mn></msubsup><mi>f</mi><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo><mi>d</mi><mi>x</mi></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><mi>α</mi><mo separator="true">,</mo><mi>β</mi><mo>−</mo><mtext>some&nbsp;constants</mtext></mrow></mstyle></mtd></mtr></mtable><annotation encoding="application/x-tex">\begin{align*}
&amp;F(x) = \frac{f(x)}{\Beta} \\
&amp;\text{where} \\
&amp;f(x) = x^{\alpha - 1}(1-x)^{\beta - 1} \\
&amp;\Beta = \int_0^1f(x)dx \\
&amp;\alpha, \beta - \text{some constants}
\end{align*}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:9.7811em;vertical-align:-4.6405em;"></span><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:5.1405em;"><span style="top:-7.2445em;"><span class="pstrut" style="height:3.564em;"></span><span class="mord"></span></span><span style="top:-5.4185em;"><span class="pstrut" style="height:3.564em;"></span><span class="mord"></span></span><span style="top:-3.8594em;"><span class="pstrut" style="height:3.564em;"></span><span class="mord"></span></span><span style="top:-1.6354em;"><span class="pstrut" style="height:3.564em;"></span><span class="mord"></span></span><span style="top:0.4165em;"><span class="pstrut" style="height:3.564em;"></span><span class="mord"></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:4.6405em;"><span></span></span></span></span></span><span class="col-align-l"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:5.1405em;"><span style="top:-7.2445em;"><span class="pstrut" style="height:3.564em;"></span><span class="mord"><span class="mord"></span><span class="mord mathnormal" style="margin-right:0.13889em;">F</span><span class="mopen">(</span><span class="mord mathnormal">x</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.46em;"><span style="top:-2.314em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord mathrm">B</span></span></span><span style="top:-3.22em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.06em;"></span></span><span style="top:-3.71em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.10764em;">f</span><span class="mopen">(</span><span class="mord mathnormal">x</span><span class="mclose">)</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.686em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span><span style="top:-5.4185em;"><span class="pstrut" style="height:3.564em;"></span><span class="mord"><span class="mord"></span><span class="mord text"><span class="mord">where</span></span></span></span><span style="top:-3.8594em;"><span class="pstrut" style="height:3.564em;"></span><span class="mord"><span class="mord"></span><span class="mord mathnormal" style="margin-right:0.10764em;">f</span><span class="mopen">(</span><span class="mord mathnormal">x</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mord"><span class="mord mathnormal">x</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8641em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight" style="margin-right:0.0037em;">α</span><span class="mbin mtight">−</span><span class="mord mtight">1</span></span></span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord">1</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mord mathnormal">x</span><span class="mclose"><span class="mclose">)</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8991em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight" style="margin-right:0.05278em;">β</span><span class="mbin mtight">−</span><span class="mord mtight">1</span></span></span></span></span></span></span></span></span></span></span><span style="top:-1.6354em;"><span class="pstrut" style="height:3.564em;"></span><span class="mord"><span class="mord"></span><span class="mord mathrm">B</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mop"><span class="mop op-symbol large-op" style="margin-right:0.44445em;position:relative;top:-0.0011em;">∫</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.564em;"><span style="top:-1.7881em;margin-left:-0.4445em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">0</span></span></span><span style="top:-3.8129em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.9119em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord mathnormal" style="margin-right:0.10764em;">f</span><span class="mopen">(</span><span class="mord mathnormal">x</span><span class="mclose">)</span><span class="mord mathnormal">d</span><span class="mord mathnormal">x</span></span></span><span style="top:0.4165em;"><span class="pstrut" style="height:3.564em;"></span><span class="mord"><span class="mord"></span><span class="mord mathnormal" style="margin-right:0.0037em;">α</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord mathnormal" style="margin-right:0.05278em;">β</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mord text"><span class="mord">some&nbsp;constants</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:4.6405em;"><span></span></span></span></span></span></span></span></span></span></span></span>
<p>You can notice that when <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>α</mi></mrow><annotation encoding="application/x-tex">\alpha</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4306em;"></span><span class="mord mathnormal" style="margin-right:0.0037em;">α</span></span></span></span> increases the graph of <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>F</mi><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">F(x)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathnormal" style="margin-right:0.13889em;">F</span><span class="mopen">(</span><span class="mord mathnormal">x</span><span class="mclose">)</span></span></span></span> will move to the
right and when <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>β</mi></mrow><annotation encoding="application/x-tex">\beta</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8889em;vertical-align:-0.1944em;"></span><span class="mord mathnormal" style="margin-right:0.05278em;">β</span></span></span></span> increases it will move to the left.</p>
<h2><span class="purple">#</span> Sampling</h2>
<p>In the <em>One-Armed Bandit</em> problem we can solve it by using the amount of times
we won and the amount of times we lost on each slot machine as our <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>α</mi></mrow><annotation encoding="application/x-tex">\alpha</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4306em;"></span><span class="mord mathnormal" style="margin-right:0.0037em;">α</span></span></span></span>
and <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>β</mi></mrow><annotation encoding="application/x-tex">\beta</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8889em;vertical-align:-0.1944em;"></span><span class="mord mathnormal" style="margin-right:0.05278em;">β</span></span></span></span>. Then using those constants to create a beta-function and sample it,
then we choose the machine whose function sample was highest, which will yield
the optimal machine quickly.</p>
<h2><span class="purple">#</span> Code</h2>
<p>First we create the environment and array <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>X</mi></mrow><annotation encoding="application/x-tex">X</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em;"></span><span class="mord mathnormal" style="margin-right:0.07847em;">X</span></span></span></span> which will store results of each
round.</p>
<pre><code class="language-python hljs" data-highlighted="yes"><span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np

conversionRates = [<span class="hljs-number">0.15</span>, <span class="hljs-number">0.04</span>, <span class="hljs-number">0.13</span>, <span class="hljs-number">0.11</span>, <span class="hljs-number">0.05</span>]
N = <span class="hljs-number">10000</span>
d = <span class="hljs-built_in">len</span>(conversionRates)

X = np.zeros((N, d))
<span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> <span class="hljs-built_in">range</span>(N):
  <span class="hljs-keyword">for</span> j <span class="hljs-keyword">in</span> <span class="hljs-built_in">range</span>(d):
    <span class="hljs-keyword">if</span> np.random.rand() &lt; conversionRates[j]:
      X[i][j] = <span class="hljs-number">1</span>
</code></pre>
<p>Then we can use <em>Thompson Sampling</em> and beta-function (distribution) to train our
model. The graphs of beta-function of each machine will move to the left and right, 
making the one with best conversion rate to move to the right causing it to sample
better and better which will make our model choose it more often.</p>
<pre><code class="language-python hljs" data-highlighted="yes">nPosReward = np.zeros(d)
nNegReward = np.zeros(d)

<span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> <span class="hljs-built_in">range</span>(N):
  selected = <span class="hljs-number">0</span>
  maxRandom = <span class="hljs-number">0</span>
  <span class="hljs-keyword">for</span> j <span class="hljs-keyword">in</span> <span class="hljs-built_in">range</span>(d):
    randomBeta = np.random.beta(nPosReward[j] + <span class="hljs-number">1</span>, nNegReward[j] + <span class="hljs-number">1</span>)
    <span class="hljs-keyword">if</span> randomBeta &gt; maxRandom:
      maxRandom = randomBeta
      selected = j

  <span class="hljs-keyword">if</span> X[i][selected] == <span class="hljs-number">1</span>:
    nPosReward[selected] += <span class="hljs-number">1</span>
  <span class="hljs-keyword">else</span>:
    nNegReward[selected] += <span class="hljs-number">1</span>
</code></pre>

            </div>
        </div>
        <script src="../../lib/hljs/highlight.min.js"></script>
        <script>hljs.highlightAll();</script>
    

</body></html>