Tuesday, April 11, 2017

An Inequality between Conditional Entropy and Error Probability

Suppose that $X$ is a random variable that can assume $N$ different values, and suppose that $Y$ is another random variable that is linked to $X$ in an arbitrary way. We want to estimate $X$ from observing $Y$, i.e., we define a (possibly random) function $\hat{X}(Y)$. (We may assume that $X$ is a signal we want to estimate and that $Y$ is a noisy or distorted observation of $X$.) We define the detection probability as $P_d:=\mathbb{P}(X=\hat{X}(Y))$, where the probability is taken w.r.t. the joint distribution $p_{X,Y}$ of $X$ and $Y$. A famous inequality relating $P_d$ to the conditional entropy of $X$ given $Y$ is Fano's inequality:
$$
 H(X|Y) \le h_2(P_d) + (1-P_d)\log (N-1)
$$
where $h_2(p):=-p\log p-(1-p)\log(1-p)$. Fano's inequality depends on the alphabet size $N$ of $X$. In what follows we derive a different inequality that is independent of $N$.

To this end, suppose that we use the maximum a posteriori estimate (MAP) of $X$ given $Y$, i.e.,
$$
  \hat{x}(y) = \arg\max_x p_{X,Y}(x,y) = \arg\max_x p_{X|Y}(x|y).
$$
Given that $Y=y$, we can thus define
$$
P_d(y) = \sum_{x} p_{X|Y}(x|y) \mathbb{I}(x=\hat{x}(y)) = \max_x p_{X|Y}(x|y)
$$
from which $P_d=\sum_y p_Y(y) P_d(y)$ follows. Comparing the right-hand side of this with Renyi's entropy of order infinity, we observe that
$$
 P_d(y) = 2^{-H_\infty(X|Y=y)}.
$$
Renyi's entropy is non-increasing in the order, i.e., we have $H_\infty(X|Y=y)\le H(X|Y=y)$. Hence, $P_d(y)\ge 2^{-H(X|Y=y)}$. The function $2^{-x}$ is convex in $x$. Hence, Jensen's inequality yields
$$
 P_d = \sum_y p_Y(y) P_d(y) \ge \sum_y p_Y(y) 2^{-H(X|Y=y)} \ge 2^{- \sum_y p_Y(y) H(X|Y=y)} = 2^{-H(X|Y)}.
$$
Thus, the MAP detection probability is bounded from below by a decreasing function of the conditional entropy. The bound does not depend (explicitly) on the alphabet size of $X$.