Beta Priors for Self-Reinforcing Binary Decisions

Why the Beta prior naturally matches sequential binary feedback.

Beta priors are natural for sequential binary decisions because they match the structure of the feedback: each observation is either a success or a failure, and the posterior update only needs to add one count to one side.

That is why Bayesian multi-armed bandit examples with binary feedback often use one Beta prior for each action's unknown success probability. The same count-based update also appears in A/B testing, online experiments, diagnostics, quality control, and other repeated yes/no feedback loops.

The previous post introduced the Beta distribution as a distribution over probabilities and showed how binary data updates its two parameters. This post uses that result online: one observation arrives, two counts update, the predictive probability changes, and the same posterior predictive loop can be represented by a Pólya urn.

In [1]:

# Imports
from __future__ import annotations

import bokeh.events
import bokeh.io
import bokeh.layouts
import bokeh.models
import bokeh.palettes
import bokeh.plotting
import matplotlib.pyplot as plt
import numpy as np
import scipy.stats
import seaborn as sns
from bokeh.models import CustomJS
from bokeh.resources import Resources

bokeh.io.output_notebook(
    resources=Resources(mode="cdn", components=["bokeh", "bokeh-widgets"]),
    hide_banner=True,
)

sns.set_theme(style="whitegrid", context="notebook")
rng = np.random.default_rng(17)
#

The Beta-Bernoulli Update Loop

Start with one action and one unknown success probability $\theta$. Each time we choose the action, we observe one binary outcome:

$$ Y_t \in \{0, 1\}. $$

The outcome $Y_t$ is the feedback we observe. The parameter $\theta$ is not an outcome; it is the unknown probability that the outcome is $1$ when this action is chosen.

If the current belief is

$$ \theta \sim \mathrm{Beta}(\alpha, \beta), $$

then one new observation updates the parameters by counting:

$$ \alpha' = \alpha + y_t, \qquad \beta' = \beta + (1-y_t). $$

A success increments $\alpha$; a failure increments $\beta$. This is the same conjugacy property from the previous post, now used one observation at a time.

For a bandit with multiple actions, this loop runs separately for each action. Action $a$ has its own unknown success probability $\theta_a$ and its own counts $\alpha_a$ and $\beta_a$. Only the chosen action gets updated after each observed success or failure.

The success and failure counts are sufficient statistics for this model: once we know how many successes and failures we have seen, the order of those observations no longer matters for the posterior. That is useful online: each new observation changes one count and leaves the rest of the summary untouched.

The plot below shows the same update after a short sequence of binary observations. The posterior stays in the Beta family; only the two counts move.

In [2]:

# Illustrate the Beta density evolution during sequential learning.

def update_beta_posterior(*, alpha: float, beta: float, outcome: int) -> tuple[float, float]:
    """Update a Beta posterior after one binary observation."""
    if outcome not in (0, 1):
        raise ValueError("outcome must be 0 or 1")
    return alpha + outcome, beta + (1 - outcome)


observed_outcomes = np.array([1, 0, 1, 1, 0, 1, 1, 1, 0, 1], dtype=int)
alpha = 2.0
beta = 2.0
posterior_states: list[tuple[int, float, float]] = [(0, alpha, beta)]

for step, outcome in enumerate(observed_outcomes, start=1):
    alpha, beta = update_beta_posterior(alpha=alpha, beta=beta, outcome=int(outcome))
    posterior_states.append((step, alpha, beta))

def beta_density_grid(*, alpha: float, beta: float, nb_points: int = 500) -> tuple[np.ndarray, np.ndarray]:
    """Return a grid and Beta density values for plotting."""
    theta = np.linspace(1e-4, 1.0 - 1e-4, nb_points)
    density = scipy.stats.beta.pdf(theta, a=alpha, b=beta)
    return theta, density


selected_steps = {0, 1, 3, 6, len(observed_outcomes)}

fig, ax = plt.subplots(figsize=(8, 4.5))
for step, state_alpha, state_beta in posterior_states:
    if step not in selected_steps:
        continue
    theta, density = beta_density_grid(alpha=state_alpha, beta=state_beta)
    ax.plot(theta, density, linewidth=2, label=f"after {step} observations")

ax.set_title("Sequential binary feedback keeps the posterior in the Beta family")
ax.set_xlabel(r"success probability $\theta$")
ax.set_ylabel("density")
ax.legend()
sns.despine()
plt.show()
#

No description has been provided for this image

The Predictive View

The current posterior also gives the predictive probability for the next observation. We do not know the true success probability $\theta$, so we average over the current posterior belief about it.

If the current belief is

$$ \theta \mid y_{1:t} \sim \mathrm{Beta}(\alpha_t, \beta_t), $$

then the posterior predictive probability of success is the posterior mean of $\theta$:

$$ P(Y_{t+1}=1 \mid y_{1:t}) = \mathbb{E}[\theta \mid y_{1:t}] = \frac{\alpha_t}{\alpha_t + \beta_t}. $$

In other words, the next feedback is predicted as a Bernoulli outcome whose success probability is the posterior mean:

$$ Y_{t+1} \mid y_{1:t} \sim \mathrm{Bernoulli}\!\left(\frac{\alpha_t}{\alpha_t + \beta_t}\right). $$

If we started from $\mathrm{Beta}(\alpha_0,\beta_0)$ and have seen $s_t$ successes and $f_t$ failures, then $\alpha_t = \alpha_0 + s_t$ and $\beta_t = \beta_0 + f_t$. The predictive probability can therefore also be written as

$$ P(Y_{t+1}=1 \mid y_{1:t}) = \frac{\alpha_0 + s_t}{\alpha_0 + \beta_0 + s_t + f_t}. $$

After the next observation arrives, the same loop continues:

$$ \alpha_{t+1} = \alpha_t + y_{t+1}, \qquad \beta_{t+1} = \beta_t + (1-y_{t+1}). $$

So the posterior does two jobs at once: it summarizes what we have learned so far, and through its posterior predictive distribution it gives the probability for the next outcome. When we sample an outcome from that predictive distribution and feed it back as evidence, we get exactly the Pólya urn update.

This is the predictive reason the parameters are often described as pseudo-counts: in the fraction above, $\alpha_0$ and $\beta_0$ sit in the same denominator as the observed counts. They are not fake data in a literal sense, but they behave like prior evidence in both the update and the next-outcome probability.

The Pólya Urn Connection

A Pólya urn gives a concrete version of the same posterior predictive loop.

This is a generative picture, not a literal claim that the world samples from our belief. In Beta-Bernoulli inference, a fixed but unknown $\theta$ generates the data. In the urn, the current composition generates the next draw. Averaged over the Beta prior, the two views produce the same sequence distribution, which is why the urn is a faithful picture of the posterior predictive loop.

Imagine an urn with red and blue balls. In the urn itself, these are just colors. To keep the physical picture literal, take α and β to be whole-number initial counts here:

it starts with α red balls,
and β blue balls.

Then the Pólya urn process follows this four-step loop:

draw one ball,
observe its color,
put it back,
add one extra ball of the same color.

To connect this notation with the Beta-Bernoulli model, let red represent Bernoulli outcome $\textcolor{#c44e52}{y=1}$ and blue represent $\textcolor{#4c72b0}{y=0}$. After $t$ urn draws, suppose we have observed r_t red draws and b_t blue draws. The urn then contains α + r_t red balls and β + b_t blue balls, so the probability of drawing red next is

$$ P(\text{red next}) = \frac{\textcolor{#c44e52}{\alpha} + \textcolor{#c44e52}{r_t}}{(\textcolor{#c44e52}{\alpha} + \textcolor{#c44e52}{r_t}) + (\textcolor{#4c72b0}{\beta} + \textcolor{#4c72b0}{b_t})} = \frac{\textcolor{#c44e52}{\alpha} + \textcolor{#c44e52}{r_t}}{\textcolor{#c44e52}{\alpha} + \textcolor{#4c72b0}{\beta} + t}. $$

Here $t = \textcolor{#c44e52}{r_t} + \textcolor{#4c72b0}{b_t}$.

This is reinforcement in the predictive process. A red draw is recorded and reinforced by adding another red ball to the urn. That changes the urn composition, so red is slightly more likely on the next draw. A blue draw does the same in the other direction. The process has memory, but the memory is compressed into two counts.

With this mapping, r_t and b_t are the two counts that update a $\mathrm{Beta}(\textcolor{#c44e52}{\alpha}, \textcolor{#4c72b0}{\beta})$ prior. The urn's next-red probability then matches the Beta-Bernoulli posterior predictive probability for $\textcolor{#c44e52}{y=1}$: current evidence changes the next predictive probability, and the next sampled outcome becomes evidence for the following prediction.

The interactive plot below runs this draw-replace-add loop many times and aggregates the path segments. Within each draw count $t$, brighter segments have been walked by more simulated trajectories than the other segments at that same step. The color scale is log-compressed so rare branches stay visible. The bar chart underneath counts where completed trajectories end.

In [3]:

# Interactive Pólya urn app

# Keep the path-segment mass, final-ratio counts, and setup parameters separate.
datasource_polya_urn_segment = bokeh.models.ColumnDataSource(
    dict(
        ratio_start=[],
        step_start=[],
        ratio_end=[],
        step_end=[],
        segment_count=[],
        segment_color=[],
        segment_width=[],
    )
)
datasource_polya_urn_sample_ratio = bokeh.models.ColumnDataSource(
    dict(ratio_center=[], bar_width=[], path_count=[], bar_color=[])
)
DEFAULT_ALPHA = 2
DEFAULT_BETA = 5
datasource_polya_urn_params = bokeh.models.ColumnDataSource(dict(alpha=[DEFAULT_ALPHA], beta=[DEFAULT_BETA]))
datasource_anim_state = bokeh.models.ColumnDataSource(dict(_placeholder=[]))

APP_WIDTH = 760
SLIDER_WIDTH = 340
DEFAULT_PATH_LENGTH = 40
POLYA_TRAJECTORY_PALETTE = list(bokeh.palettes.Viridis256[20:245])

fig_polya_urn_anim_paths = bokeh.plotting.figure(
    x_range=bokeh.models.Range1d(start=0.0, end=1.0),
    y_range=bokeh.models.Range1d(start=DEFAULT_PATH_LENGTH, end=0),
    width=APP_WIDTH,
    height=390,
    title="Pólya urn path mass: red ratio over draws",
    x_axis_label="red ratio (red balls / total balls)",
    y_axis_label="draw count t (top to bottom)",
    tools="",
)
fig_polya_urn_anim_paths.segment(
    x0="ratio_start",
    y0="step_start",
    x1="ratio_end",
    y1="step_end",
    source=datasource_polya_urn_segment,
    line_color="segment_color",
    line_width="segment_width",
    line_alpha=0.85,
)

fig_polya_urn_anim_final_counts = bokeh.plotting.figure(
    x_range=fig_polya_urn_anim_paths.x_range,
    y_range=bokeh.models.DataRange1d(start=0),
    width=APP_WIDTH,
    height=210,
    title="Where the trajectories end",
    x_axis_label="final red ratio (red balls / total balls)",
    y_axis_label="number of trajectories",
    tools="",
)
fig_polya_urn_anim_final_counts.vbar(
    x="ratio_center",
    width="bar_width",
    bottom=0,
    top="path_count",
    source=datasource_polya_urn_sample_ratio,
    fill_color="bar_color",
    line_color="black",
    fill_alpha=0.7,
)

button_polya_urn_start = bokeh.models.Button(
    label="Run",
    button_type="primary",
    width=80,
    html_attributes={"title": "Run the Pólya urn simulation with the current alpha and beta values."},
)
button_polya_urn_reset = bokeh.models.Button(
    label="Reset",
    button_type="default",
    width=80,
    html_attributes={"title": "Stop the simulation and clear all data. New alpha/beta values apply on next Run."},
)

slider_polya_urn_alpha = bokeh.models.Slider(
    start=1,
    end=25,
    value=DEFAULT_ALPHA,
    step=1,
    title="alpha: initial red balls (before Run)",
    width=SLIDER_WIDTH,
    html_attributes={"title": "Initial number of red balls in the urn."},
)
slider_polya_urn_beta = bokeh.models.Slider(
    start=1,
    end=25,
    value=DEFAULT_BETA,
    step=1,
    title="beta: initial blue balls (before Run)",
    width=SLIDER_WIDTH,
    html_attributes={"title": "Initial number of blue balls in the urn."},
)
slider_polya_urn_path_length = bokeh.models.Slider(
    start=5,
    end=120,
    value=DEFAULT_PATH_LENGTH,
    step=5,
    title="draws per trajectory (before Run)",
    width=SLIDER_WIDTH,
    html_attributes={"title": "Number of urn draws in one trajectory. Reset to change it."},
)
slider_polya_urn_speed = bokeh.models.Slider(
    start=1,
    end=100,
    value=10,
    step=1,
    title="speed: urn draws per frame",
    width=SLIDER_WIDTH,
    html_attributes={"title": "How many urn updates to process on each animation frame."},
)
div_polya_urn_lock_status = bokeh.models.Div(
    text=(
        "<div style='padding: 8px 10px; border-left: 4px solid #2f6f4e; "
        "background: #eef7f1; color: #244434;'>"
        "Choose the initial balls and draws per trajectory, then press <b>Run</b>. "
        "These setup settings lock once the simulation starts."
        "</div>"
    ),
    width=APP_WIDTH,
)

# CustomJS keeps the simulation fully embedded in the static blog post.
callback_start_code = (
    f"const TRAJECTORY_PALETTE = {POLYA_TRAJECTORY_PALETTE!r};\n"
    """
const TICK_INTERVAL_MS = 20;
const MIN_SEGMENT_WIDTH = 1.0;
const MAX_EXTRA_SEGMENT_WIDTH = 3.0;

function getRatio(urn) {
    return urn[0] / (urn[0] + urn[1]);
}

function paletteColorFromMass(value, maxValue) {
    if (maxValue <= 0) {
        return TRAJECTORY_PALETTE[0];
    }
    const normalizedMass = Math.log1p(value) / Math.log1p(maxValue);
    const colorIndex = Math.round(normalizedMass * (TRAJECTORY_PALETTE.length - 1));
    return TRAJECTORY_PALETTE[colorIndex];
}

function refreshSegmentStyles(segmentSource) {
    // Normalize path mass within each draw count so later, lower-count rows stay readable.
    const counts = segmentSource.data.segment_count;
    const steps = segmentSource.data.step_start;
    const maxCountByStep = new Map();

    for (let i = 0; i < counts.length; i++) {
        const step = steps[i];
        const currentMax = maxCountByStep.get(step) || 0;
        maxCountByStep.set(step, Math.max(currentMax, counts[i]));
    }

    segmentSource.data.segment_color = counts.map((count, i) => (
        paletteColorFromMass(count, maxCountByStep.get(steps[i]) || 0)
    ));
    segmentSource.data.segment_width = counts.map((count, i) => {
        const stepMax = maxCountByStep.get(steps[i]) || 0;
        const widthDenom = Math.max(1.0, Math.log1p(stepMax));
        return MIN_SEGMENT_WIDTH + MAX_EXTRA_SEGMENT_WIDTH * Math.log1p(count) / widthDenom;
    });
}

function refreshFinalRatioStyles(histSource) {
    const counts = histSource.data.path_count;
    const maxCount = counts.reduce((maxValue, count) => Math.max(maxValue, count), 0);
    histSource.data.bar_color = counts.map((count) => paletteColorFromMass(count, maxCount));
}

function initFinalRatioCounts(alpha, beta, pathLength, histSource) {
    // A completed path can only have k = 0, ..., pathLength red draws.
    // Therefore each possible final ratio gets its own exact bar.
    const numPossibleRatios = pathLength + 1;
    const denom = alpha + beta + pathLength;
    const ratioBarWidth = 0.9 / denom;
    const ratioCenters = Array.from({length: numPossibleRatios}, (_, k) => (alpha + k) / denom);

    histSource.data.ratio_center = ratioCenters;
    histSource.data.bar_width = Array(numPossibleRatios).fill(ratioBarWidth);
    histSource.data.path_count = Array(numPossibleRatios).fill(0);
    histSource.data.bar_color = Array(numPossibleRatios).fill(paletteColorFromMass(0, 0));
    histSource.change.emit();
}

function updateFinalRatioCount(k, histSource) {
    histSource.data.path_count[k]++;
}

function setTitle(model, title) {
    const attrs = {...model.html_attributes};
    attrs.title = title;
    model.html_attributes = attrs;
}

function setStatusLocked(statusDiv) {
    statusDiv.text = "<div style='padding: 8px 10px; border-left: 4px solid #9a6518; "
        + "background: #fff4df; color: #5a3b0c;'>"
        + "Initial balls and draws per trajectory are <b>locked</b> for this run. "
        + "Press <b>Reset</b> to change them."
        + "</div>";
}

function recordSegment(anim, segmentSource) {
    const redBefore = anim.currentUrn[0];
    const blueBefore = anim.currentUrn[1];
    const ratioBefore = redBefore / (redBefore + blueBefore);
    const drawRed = Math.random() < ratioBefore;
    const redAfter = redBefore + (drawRed ? 1 : 0);
    const blueAfter = blueBefore + (drawRed ? 0 : 1);
    const ratioAfter = redAfter / (redAfter + blueAfter);
    const stepAfter = anim.stepIdx + 1;
    const segmentKey = `${anim.stepIdx}:${redBefore}:${redAfter}`;
    let segmentIndex = anim.segmentIndexByKey.get(segmentKey);

    if (segmentIndex === undefined) {
        segmentIndex = segmentSource.data.segment_count.length;
        anim.segmentIndexByKey.set(segmentKey, segmentIndex);
        segmentSource.data.ratio_start.push(ratioBefore);
        segmentSource.data.step_start.push(anim.stepIdx);
        segmentSource.data.ratio_end.push(ratioAfter);
        segmentSource.data.step_end.push(stepAfter);
        segmentSource.data.segment_count.push(0);
        segmentSource.data.segment_color.push(paletteColorFromMass(0, 0));
        segmentSource.data.segment_width.push(MIN_SEGMENT_WIDTH);
    }

    segmentSource.data.segment_count[segmentIndex]++;
    anim.currentUrn = [redAfter, blueAfter];
}

function updatePathMassPlot(anim, segmentSource, histSource) {
    if (anim.stepIdx === 0) {
        // Start one new trajectory, but aggregate it into shared path segments.
        anim.currentUrn = [anim.alpha, anim.beta];
    }

    recordSegment(anim, segmentSource);
    anim.stepIdx++;

    if (anim.stepIdx === anim.pathLength) {
        // Once a trajectory is complete, record where it ended and start the next one.
        const k = anim.currentUrn[0] - anim.alpha;
        updateFinalRatioCount(k, histSource);
        anim.stepIdx = 0;
    }
}

function updateManyPathSteps(anim, segmentSource, histSource, speedSlider) {
    const stepsPerFrame = Math.max(1, Math.floor(speedSlider.value));
    for (let i = 0; i < stepsPerFrame; i++) {
        updatePathMassPlot(anim, segmentSource, histSource);
    }
    refreshSegmentStyles(segmentSource);
    segmentSource.change.emit();
    refreshFinalRatioStyles(histSource);
    histSource.change.emit();
}

    let anim = stateSource._anim;

    if (anim && anim.isRunning) {
        // Run acts as a pause/resume toggle after the simulation has started.
        clearInterval(anim.updateTimer);
        anim.isRunning = false;
        startButton.label = "Run";
        setTitle(startButton, "Resume the simulation. Reset to change alpha/beta.");
        sliderAlpha.disabled = true;
        sliderBeta.disabled = true;
        sliderPathLength.disabled = true;
        setTitle(sliderAlpha, "Reset to change alpha.");
        setTitle(sliderBeta, "Reset to change beta.");
        setTitle(sliderPathLength, "Reset to change draws per trajectory.");
        setStatusLocked(statusDiv);
    } else {
        if (!anim) {
            // The first Run freezes setup choices so all displayed paths share one experiment.
            const alpha = paramsSource.data["alpha"][0];
            const beta = paramsSource.data["beta"][0];
            const pathLength = Math.round(sliderPathLength.value);
            anim = {
                pathLength: pathLength,
                stepIdx: 0,
                alpha: alpha,
                beta: beta,
                currentUrn: null,
                segmentIndexByKey: new Map(),
                updateTimer: null,
                isRunning: false,
            };
            stateSource._anim = anim;
            figure.y_range.start = pathLength;
            figure.y_range.end = 0;
            initFinalRatioCounts(alpha, beta, pathLength, histSource);
        }

        sliderAlpha.disabled = true;
        sliderBeta.disabled = true;
        sliderPathLength.disabled = true;
        setTitle(sliderAlpha, "Reset to change alpha.");
        setTitle(sliderBeta, "Reset to change beta.");
        setTitle(sliderPathLength, "Reset to change draws per trajectory.");
        setStatusLocked(statusDiv);

        anim.updateTimer = setInterval(
            () => updateManyPathSteps(anim, segmentSource, histSource, speedSlider),
            TICK_INTERVAL_MS,
        );
        anim.isRunning = true;
        figure.title.text = "Pólya urn path mass: red ratio over draws "
            + "(alpha=" + anim.alpha + ", beta=" + anim.beta + ")";
        startButton.label = "Pause";
        setTitle(startButton, "Pause the simulation.");
    }
"""
)

button_polya_urn_start.js_on_event(
    bokeh.events.ButtonClick,
    CustomJS(
        args=dict(
            segmentSource=datasource_polya_urn_segment,
            histSource=datasource_polya_urn_sample_ratio,
            paramsSource=datasource_polya_urn_params,
            stateSource=datasource_anim_state,
            figure=fig_polya_urn_anim_paths,
            startButton=button_polya_urn_start,
            sliderAlpha=slider_polya_urn_alpha,
            sliderBeta=slider_polya_urn_beta,
            sliderPathLength=slider_polya_urn_path_length,
            speedSlider=slider_polya_urn_speed,
            statusDiv=div_polya_urn_lock_status,
        ),
        code=callback_start_code,
        module=False,
    ),
)

callback_reset_code = """
function setTitle(model, title) {
    const attrs = {...model.html_attributes};
    attrs.title = title;
    model.html_attributes = attrs;
}

function setStatusReady(statusDiv) {
    statusDiv.text = "<div style='padding: 8px 10px; border-left: 4px solid #2f6f4e; "
        + "background: #eef7f1; color: #244434;'>"
        + "Choose the initial balls and draws per trajectory, then press <b>Run</b>. "
        + "These setup settings lock once the simulation starts."
        + "</div>";
}

    const anim = stateSource._anim;

    if (anim) {
        // Reset stops browser-side animation state before clearing the visible data.
        if (anim.updateTimer) {
            clearInterval(anim.updateTimer);
        }
        delete stateSource._anim;
    }

    segmentSource.data.ratio_start = [];
    segmentSource.data.step_start = [];
    segmentSource.data.ratio_end = [];
    segmentSource.data.step_end = [];
    segmentSource.data.segment_count = [];
    segmentSource.data.segment_color = [];
    segmentSource.data.segment_width = [];
    segmentSource.change.emit();

    histSource.data.ratio_center = [];
    histSource.data.bar_width = [];
    histSource.data.path_count = [];
    histSource.data.bar_color = [];
    histSource.change.emit();

    figure.title.text = "Pólya urn path mass: red ratio over draws";
    startButton.label = "Run";
    setTitle(startButton, "Run the Pólya urn simulation with the current alpha and beta values.");

    sliderAlpha.disabled = false;
    sliderBeta.disabled = false;
    sliderPathLength.disabled = false;
    setTitle(sliderAlpha, "Initial number of red balls in the urn.");
    setTitle(sliderBeta, "Initial number of blue balls in the urn.");
    setTitle(sliderPathLength, "Number of urn draws in one trajectory.");
    setStatusReady(statusDiv);
"""

button_polya_urn_reset.js_on_event(
    bokeh.events.ButtonClick,
    CustomJS(
        args=dict(
            segmentSource=datasource_polya_urn_segment,
            histSource=datasource_polya_urn_sample_ratio,
            stateSource=datasource_anim_state,
            figure=fig_polya_urn_anim_paths,
            startButton=button_polya_urn_start,
            sliderAlpha=slider_polya_urn_alpha,
            sliderBeta=slider_polya_urn_beta,
            sliderPathLength=slider_polya_urn_path_length,
            statusDiv=div_polya_urn_lock_status,
        ),
        code=callback_reset_code,
        module=False,
    ),
)

callback_alpha_code = """
    // Store slider choices in a shared source until Run freezes them for the simulation.
    paramsSource.data["alpha"][0] = cb_obj.value;
    paramsSource.change.emit();
"""

callback_beta_code = """
    // Store slider choices in a shared source until Run freezes them for the simulation.
    paramsSource.data["beta"][0] = cb_obj.value;
    paramsSource.change.emit();
"""

slider_polya_urn_alpha.js_on_change(
    "value",
    CustomJS(
        args=dict(paramsSource=datasource_polya_urn_params),
        code=callback_alpha_code,
        module=False,
    ),
)
slider_polya_urn_beta.js_on_change(
    "value",
    CustomJS(
        args=dict(paramsSource=datasource_polya_urn_params),
        code=callback_beta_code,
        module=False,
    ),
)

# Compose the controls below the plots so readers can experiment without a Python kernel.
layout_polya_urn_buttons = bokeh.layouts.row(
    bokeh.models.Spacer(width=285),
    button_polya_urn_start,
    button_polya_urn_reset,
    bokeh.models.Spacer(width=285),
    width=APP_WIDTH,
)
layout_polya_urn_slider_row_1 = bokeh.layouts.row(
    slider_polya_urn_alpha,
    bokeh.models.Spacer(width=40),
    slider_polya_urn_beta,
    width=APP_WIDTH,
)
layout_polya_urn_slider_row_2 = bokeh.layouts.row(
    slider_polya_urn_path_length,
    bokeh.models.Spacer(width=40),
    slider_polya_urn_speed,
    width=APP_WIDTH,
)
layout_polya_urn_sliders = bokeh.layouts.column(
    layout_polya_urn_slider_row_1,
    layout_polya_urn_slider_row_2,
    width=APP_WIDTH,
)

layout_polya_urn_anim = bokeh.layouts.column(
    fig_polya_urn_anim_paths,
    fig_polya_urn_anim_final_counts,
    div_polya_urn_lock_status,
    layout_polya_urn_buttons,
    layout_polya_urn_sliders,
)

bokeh.plotting.show(layout_polya_urn_anim)
#

Before the Limit: The Beta-Binomial Distribution

The urn gives a one-step predictive rule, but the endpoint plots ask a longer-run question: after $T$ reinforced draws, where can an urn trajectory end?

For a fixed number of urn draws $T$, the endpoint distribution is still discrete. There are only $T+1$ possible numbers of red draws:

$$ R_T \in \{0, 1, \ldots, T\}. $$

Those finite endpoint counts follow a Beta-binomial distribution:

$$ R_T \sim \mathrm{BetaBinomial}(T, \alpha, \beta). $$

This is related to the Beta-Bernoulli update, but it is not the same object. Beta-Bernoulli describes one binary observation at a time and how the Beta belief updates after observing $y \in \{0,1\}$. Beta-binomial describes a finite-count prediction: how many red, or $y=1$, outcomes we might see after $T$ reinforced draws.

The final urn ratio in the plots is

$$ Z_T = \frac{\alpha + R_T}{\alpha + \beta + T}. $$

For small $T$, this ratio lives on a coarse grid of possible values. As $T$ grows, the grid becomes finer. In the limit, the distribution of $Z_T$ approaches the $\mathrm{Beta}(\alpha,\beta)$ distribution, which is continuous over the interval $[0,1]$.

The finite-time Beta-binomial view explains what the plots below are showing. In each panel, we compute the exact distribution of final red ratios after a fixed number of draws and compare that finite endpoint distribution with the limiting Beta density.

The blue step function shows the exact finite distribution on its natural grid, scaled as a density so each step's area equals its probability mass. The black curve is the $\mathrm{Beta}(\alpha,\beta)$ density. As the trajectories get longer, the possible final ratios become more finely spaced, and the step function lines up with the Beta density. Only in the limit of increasingly long sequences of draws do the final red ratios approach the Beta distribution.

A single urn trajectory ends at one final ratio. The Beta distribution describes the distribution of those endpoints across possible trajectories, not a deterministic destination for one trajectory.

In [4]:

# Illustrate how the Beta distribution emerges as the limit of Pólya urn endpoints.

def polya_final_red_ratio_distribution(
    *,
    alpha: int,
    beta: int,
    path_length: int,
) -> tuple[np.ndarray, np.ndarray]:
    """Return the exact finite endpoint distribution on its natural grid."""
    denominator = alpha + beta + path_length
    grid_width = 1.0 / denominator
    red_draws = np.arange(path_length + 1)
    final_ratios = (alpha + red_draws) / denominator
    probabilities = scipy.stats.betabinom.pmf(red_draws, n=path_length, a=alpha, b=beta)

    # Scale probability masses to density units so step area equals probability.
    density_values = probabilities / grid_width
    step_edges = np.concatenate(
        ([final_ratios[0] - 0.5 * grid_width], final_ratios + 0.5 * grid_width)
    )
    return final_ratios, step_edges, density_values


alpha_for_urn = 2
beta_for_urn = 5
path_lengths = [5, 10, 20, 100]

theta, beta_density = beta_density_grid(alpha=alpha_for_urn, beta=beta_for_urn)

fig, axes = plt.subplots(2, 2, figsize=(10, 6), sharex=True)
for ax, path_length in zip(axes.flat, path_lengths, strict=True):
    final_ratios, step_edges, density_values = polya_final_red_ratio_distribution(
        alpha=alpha_for_urn,
        beta=beta_for_urn,
        path_length=path_length,
    )
    ax.stairs(
        density_values,
        step_edges,
        fill=True,
        color="tab:blue",
        alpha=0.28,
        label="exact finite endpoint distribution",
    )
    ax.stairs(density_values, step_edges, color="tab:blue", linewidth=1.3)

    # Mark the exact support points so the finite distribution reads as discrete.
    marker_stride = max(1, len(final_ratios) // 80)
    marker_indices = np.arange(0, len(final_ratios), marker_stride)
    ax.vlines(
        final_ratios[marker_indices],
        ymin=0,
        ymax=density_values[marker_indices],
        color="tab:blue",
        alpha=0.22,
        linewidth=0.8,
    )
    ax.scatter(
        final_ratios[marker_indices],
        density_values[marker_indices],
        color="tab:blue",
        s=12,
        zorder=3,
        label="exact support points" if path_length == path_lengths[0] else None,
    )
    ax.plot(
        theta,
        beta_density,
        color="black",
        linewidth=2,
        label=rf"limiting $Beta({alpha_for_urn}, {beta_for_urn})$ density",
    )
    ax.set_title(f"{path_length} draws: {path_length + 1} possible endpoints")
    ax.set_xlabel("final red ratio (red balls / total balls)")
    ax.set_ylabel("density")

handles, labels = axes[0, 0].get_legend_handles_labels()
fig.legend(handles, labels, loc="lower center", ncol=3, frameon=False, bbox_to_anchor=(0.5, -0.02))
fig.suptitle(
    "From finite urn endpoints to a Beta density\n"
    r"Increasing draws per trajectory, starting from $\alpha=2$ red and $\beta=5$ blue balls",
    y=1.04,
)
fig.tight_layout(rect=(0, 0.06, 1, 1))
sns.despine()
plt.show()
#

What the Beta-Bernoulli Loop Assumes

The Beta-Bernoulli update is online and interpretable, but it is still a model. Binary feedback is the setup; the stronger modeling assumptions are about stability, feedback, and what the counts summarize.

It assumes:

each action has a stable probability of outcome $y=1$,
feedback is observed for the action we actually took,
counts of $y=1$ and $y=0$ outcomes are enough to summarize what we have learned.

These assumptions fit simple A/B tests, binary bandits, and many repeated yes/no decision problems. They are too narrow when outcomes have different values, rewards are continuous, actions affect future states, or the environment changes over time.

The Pólya urn is the posterior predictive loop made physical: each draw is sampled from the current predictive probability, then added back as evidence that reinforces the next prediction. It shows what happens inside the Beta-Bernoulli model when we repeatedly sample from the posterior predictive distribution and feed each sampled outcome back in as evidence.

Summary

The Beta prior is natural for sequential binary decisions because it matches the structure of the feedback.

Each action has an unknown probability of outcome $y=1$. Each time we choose that action, we observe one binary outcome. If $y_t=1$, we increment $\alpha_a$; if $y_t=0$, we increment $\beta_a$. The posterior remains Beta, so the learner only needs to carry two counts per action.

Those same counts also define the next prediction. The posterior mean gives the posterior predictive probability for the next $y=1$ outcome:

$$ P(Y_{t+1}=1 \mid y_{1:t}) = \frac{\alpha_t}{\alpha_t + \beta_t}. $$

The Pólya urn gives a concrete picture of the same posterior predictive update. A red draw is sampled from the current urn composition, then added back as evidence for the next draw; a blue draw does the same in the other direction. That reinforcement makes future draws depend on previous draws, while the final counts still carry the information used by the Beta-Bernoulli posterior.

The urn also gives the finite-to-limit bridge. After $T$ draws, the number of red draws follows a Beta-binomial distribution. After converting that count into a final red ratio and letting $T$ grow, the distribution approaches $\mathrm{Beta}(\alpha,\beta)$.

The update itself stays small:

$$ \alpha_a \leftarrow \alpha_a + y_t, \qquad \beta_a \leftarrow \beta_a + (1-y_t) $$

for the action $a$ we actually chose. That is the compact Beta-Bernoulli update loop behind Bayesian sequential binary decisions.

References

Pólya's Urn Process - RandomServices: detailed technical reference for the urn process and its connection to Beta-Bernoulli and Beta-binomial behavior.
Back to basics - Pólya urns - Djalil Chafaï: mathematical walkthrough of the urn ratio as a martingale and the limiting Beta distribution.
Pólya urn model - Wikipedia: compact reference for the draw-replace-add urn process, self-reinforcement, exchangeability, the martingale property, and convergence to a Beta distribution.
Beta-binomial distribution - Wikipedia: compact reference for the finite-time bridge in this post. Before urn ratios approach a continuous Beta distribution, the number of red draws after $T$ draws follows a discrete Beta-binomial distribution.