Thursday, December 24, 2015

A Probability Problem

Rephrased from this post by reddit user /u/phenomist, using the setup from the original 538 puzzler.

Let $X_1, \ldots, X_n$ be distributions over the set of positive integers such that $\mathbb{E}[X_i]$ is finite for each $i$.

Initially, we set $n$ variables, $t_1, \ldots, t_n$, where each $t_i$ is sampled independently from $X_i$. Then we repeat the following algorithm: if all of the $t_i$ are equal to a common value $t$, then we return the value of $t$. Otherwise, we take the index $k$ such that $t_k$ is minimal and replace $t_k$ with $t_k + x_k$, where $x_k$ is independently sampled from $X_k$. (Note that if there is a tie, it does not matter what order we resolve the tie in.)

For example, if $n = 2$ and $X_1, X_2 \sim \operatorname{Unif}\{1, 2, 3, 4, 5\}$, then this process might go $(2, 4) \rightarrow (3, 4) \rightarrow (7, 4) \rightarrow (7, 8) \rightarrow (8, 8) \implies 8$.

Show that the expected value of this process is \[ \operatorname{lcm}(d_1,\ldots,d_n)\cdot \prod_{i = 1}^n \frac{E[X_i]}{d_i} \] where $d_i = \gcd\{n\in \mathbb{Z}^+\mid \mathbb{P}[X_i = n] > 0\}$


Tuesday, December 22, 2015

An abstract ranking system

Ranking systems seem to be all the rage these days, especially in sports. Here are just some jumbled thoughts on what we might want from a ranking system:

Abstractly, we want a function RANK(G), where

Input: G is a (possibly multi-)directed graph on N vertices
Output: A "ranking" of the N vertices. That is, a total ordering of the N vertices with ties.

We probably want the following properties:

(ISO) If $G_1$ and $G_2$ are isomorphic graphs, then this isomorphism should also take RANK($G_1$) to RANK($G_2$)
(ADDEDGE) Let $H = G\cup\{u\rightarrow v\}$ for some $u,v\in G$. Then the ranking of $u$ in RANK($H$) is no worse than the ranking of $u$ in RANK($G$) (that is, no teams that are ranked behind $u$ get moved in front of $u$ as a result of $u$ beating $v$), and the ranking of $v$ in RANK($H$) is no better than the ranking of $v$ in RANK($G$). In addition, if $u$ and $v$ were ranked the same in RANK($G$), then $u$ should be ranked strictly higher than $v$ in RANK($H$). Conversely, if $u$ and $v$ are equal in RANK($H$), then $v$ must have been ranked strictly higher in RANK($G$).

Perhaps the following would also be nice?

(MIRROR) Suppose $G'$ is the result of mirroring all of the edges of $G$. Then RANK($G'$) is just RANK($G$) reversed.

One ranking system that satisfies (ISO), (ADDEDGE), and (MIRROR) is the ranking system where we just rank teams based on winning percentage, with 0-0 teams coming in at 50%. And when $n = 3$, this is the only ranking system satisfying (ISO), (ADDEDGE), and (MIRROR). However, this is definitely not a satisfactory ranking systems in many situations, such as the FBS.

One thing that college football analysts like to harp over is "Strength of Schedule". There is some merit in this: suppose we had the following situation: There are two groups (call them SEC and ACC for no reason in particular) such that every team plays each other team in the same group, and every team in the ACC beats every team in the SEC that they play (and suppose the density of games between the two groups is reasonably high.) Then it makes sense that an ACC team that doesn't beat any other ACC teams should still be ranked higher than an SEC team that beats every other SEC team, which wouldn't necessarily be reflected if we just ranked by winning percentage.

So maybe these are two other properties that we care about:

(WEAKDOM) Suppose $S_1$ and $S_2$ are two different strongly connected components in $G$ such that 1) no team in $S_2$ has beaten $S_1$ and 2) At least 1 team in $S_1$ has beaten a team in $S_2$. Then every team in $S_1$ should be ranked higher than every team in $S_2$ in RANK($G$).

(STRONGDOM) Suppose $S_1$ and $S_2$ are two different strongly connected components in $G$ such that every $S_1$ has beaten every team in $S_2$. Then every team in $S_1$ should be ranked higher than every team in $S_2$ in RANK($G$).

(STRONGDOM) is probably too strong to be practical. (WEAKDOM) can be scary but it's probably reasonable to have in all practical situations.