<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://wiki.sarg.dev/index.php?action=history&amp;feed=atom&amp;title=Conditional_entropy</id>
	<title>Conditional entropy - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://wiki.sarg.dev/index.php?action=history&amp;feed=atom&amp;title=Conditional_entropy"/>
	<link rel="alternate" type="text/html" href="https://wiki.sarg.dev/index.php?title=Conditional_entropy&amp;action=history"/>
	<updated>2026-04-24T20:42:07Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.44.2</generator>
	<entry>
		<id>https://wiki.sarg.dev/index.php?title=Conditional_entropy&amp;diff=541959&amp;oldid=prev</id>
		<title>imported&gt;I love yourwiki: /* growthexperiments-addlink-summary-summary:3|0|0 */</title>
		<link rel="alternate" type="text/html" href="https://wiki.sarg.dev/index.php?title=Conditional_entropy&amp;diff=541959&amp;oldid=prev"/>
		<updated>2025-07-06T05:02:57Z</updated>

		<summary type="html">&lt;p&gt;&lt;span class=&quot;autocomment&quot;&gt;growthexperiments-addlink-summary-summary:3|0|0&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;{{Short description|Measure of relative information in probability theory}}&lt;br /&gt;
{{Information theory}}&lt;br /&gt;
&lt;br /&gt;
[[Image:Entropy-mutual-information-relative-entropy-relation-diagram.svg|thumb|256px|right|[[Venn diagram]] showing additive and subtractive relationships various [[Quantities of information|information measures]] associated with correlated variables &amp;lt;math&amp;gt;X&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;Y&amp;lt;/math&amp;gt;. The area contained by both circles is the [[joint entropy]] &amp;lt;math&amp;gt;\Eta(X,Y)&amp;lt;/math&amp;gt;. The circle on the left (red and violet) is the [[Entropy (information theory)|individual entropy]] &amp;lt;math&amp;gt;\Eta(X)&amp;lt;/math&amp;gt;, with the red being the conditional entropy &amp;lt;math&amp;gt;\Eta(X|Y)&amp;lt;/math&amp;gt;. The circle on the right (blue and violet) is &amp;lt;math&amp;gt;\Eta(Y)&amp;lt;/math&amp;gt;, with the blue being &amp;lt;math&amp;gt;\Eta(Y|X)&amp;lt;/math&amp;gt;. The violet is the [[mutual information]] &amp;lt;math&amp;gt;\operatorname{I}(X;Y)&amp;lt;/math&amp;gt;.]]&lt;br /&gt;
&lt;br /&gt;
In [[information theory]], the &amp;#039;&amp;#039;&amp;#039;conditional entropy&amp;#039;&amp;#039;&amp;#039; quantifies the amount of information needed to describe the outcome of a [[random variable]] &amp;lt;math&amp;gt;Y&amp;lt;/math&amp;gt; given that the value of another random variable &amp;lt;math&amp;gt;X&amp;lt;/math&amp;gt; is known. Here, information is measured in [[Shannon (unit)|shannon]]s, [[Nat (unit)|nat]]s, or [[Hartley (unit)|hartley]]s. The &amp;#039;&amp;#039;entropy of &amp;lt;math&amp;gt;Y&amp;lt;/math&amp;gt; conditioned on &amp;lt;math&amp;gt;X&amp;lt;/math&amp;gt;&amp;#039;&amp;#039; is written as &amp;lt;math&amp;gt;\Eta(Y|X)&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== Definition ==&lt;br /&gt;
The conditional entropy of &amp;lt;math&amp;gt;Y&amp;lt;/math&amp;gt; given &amp;lt;math&amp;gt;X&amp;lt;/math&amp;gt; is defined as&lt;br /&gt;
&lt;br /&gt;
:&amp;lt;math&amp;gt;\Eta(Y|X)\ = -\sum_{x\in\mathcal X, y\in\mathcal Y}p(x,y)\log \frac {p(x,y)} {p(x)}&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where &amp;lt;math&amp;gt;\mathcal X&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;\mathcal Y&amp;lt;/math&amp;gt; denote the [[Support (mathematics)|support sets]] of &amp;lt;math&amp;gt;X&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;Y&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;Note:&amp;#039;&amp;#039; Here, the convention is that the expression &amp;lt;math&amp;gt;0 \log 0&amp;lt;/math&amp;gt; should be treated as being equal to zero. This is because &amp;lt;math&amp;gt;\lim_{\theta\to0^+} \theta\, \log \theta = 0&amp;lt;/math&amp;gt;.&amp;lt;ref&amp;gt;{{Cite web|url=http://www.inference.org.uk/mackay/itprnn/book.html|title=David MacKay: Information Theory, Pattern Recognition and Neural Networks: The Book|website=www.inference.org.uk|access-date=2019-10-25}}&amp;lt;/ref&amp;gt; &amp;lt;!-- because p(x,y) could still equal 0 even if p(x) != 0 and p(y) != 0. What about p(x,y)=p(x)=0? --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Intuitively, notice that by definition of [[expected value]] and of [[Conditional Probability|conditional probability]], &amp;lt;math&amp;gt;\displaystyle H(Y|X) &amp;lt;/math&amp;gt; can be written as &amp;lt;math&amp;gt; H(Y|X) = \mathbb{E}[f(X,Y)]&amp;lt;/math&amp;gt;, where &amp;lt;math&amp;gt; f &amp;lt;/math&amp;gt; is defined as &amp;lt;math&amp;gt;\displaystyle f(x,y) := -\log\left(\frac{p(x, y)}{p(x)}\right) = -\log(p(y|x))&amp;lt;/math&amp;gt;. One can think of &amp;lt;math&amp;gt;\displaystyle f&amp;lt;/math&amp;gt; as associating each pair &amp;lt;math&amp;gt;\displaystyle (x, y)&amp;lt;/math&amp;gt; with a quantity measuring the information content of &amp;lt;math&amp;gt;\displaystyle (Y=y)&amp;lt;/math&amp;gt; given &amp;lt;math&amp;gt;\displaystyle (X=x)&amp;lt;/math&amp;gt;. This quantity is directly related to the amount of information needed to describe the event &amp;lt;math&amp;gt;\displaystyle (Y=y)&amp;lt;/math&amp;gt; given &amp;lt;math&amp;gt;(X=x)&amp;lt;/math&amp;gt;. Hence by computing the expected value of &amp;lt;math&amp;gt;\displaystyle f &amp;lt;/math&amp;gt; over all pairs of values &amp;lt;math&amp;gt;(x, y) \in \mathcal{X} \times \mathcal{Y}&amp;lt;/math&amp;gt;, the conditional entropy &amp;lt;math&amp;gt;\displaystyle H(Y|X)&amp;lt;/math&amp;gt; measures how much information, on average, the variable &amp;lt;math&amp;gt; X &amp;lt;/math&amp;gt; encodes about &amp;lt;math&amp;gt; Y &amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== Motivation ==&lt;br /&gt;
Let &amp;lt;math&amp;gt;\Eta(Y|X=x)&amp;lt;/math&amp;gt; be the [[Shannon Entropy|entropy]] of the discrete random variable &amp;lt;math&amp;gt;Y&amp;lt;/math&amp;gt; conditioned on the discrete random variable &amp;lt;math&amp;gt;X&amp;lt;/math&amp;gt; taking a certain value &amp;lt;math&amp;gt;x&amp;lt;/math&amp;gt;. Denote the support sets of &amp;lt;math&amp;gt;X&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;Y&amp;lt;/math&amp;gt; by &amp;lt;math&amp;gt;\mathcal X&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;\mathcal Y&amp;lt;/math&amp;gt;. Let &amp;lt;math&amp;gt;Y&amp;lt;/math&amp;gt; have [[probability mass function]] &amp;lt;math&amp;gt;p_Y{(y)}&amp;lt;/math&amp;gt;. The unconditional entropy of &amp;lt;math&amp;gt;Y&amp;lt;/math&amp;gt; is calculated as &amp;lt;math&amp;gt;\Eta(Y) := \mathbb{E}[\operatorname{I}(Y)]&amp;lt;/math&amp;gt;, i.e.&lt;br /&gt;
&lt;br /&gt;
:&amp;lt;math&amp;gt;\Eta(Y) = \sum_{y\in\mathcal Y} {\mathrm{Pr}(Y=y)\,\mathrm{I}(y)} &lt;br /&gt;
= -\sum_{y\in\mathcal Y} {p_Y(y) \log_2{p_Y(y)}},&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where &amp;lt;math&amp;gt;\operatorname{I}(y_i)&amp;lt;/math&amp;gt; is the [[information content]] of the [[Outcome (probability)|outcome]] of &amp;lt;math&amp;gt;Y&amp;lt;/math&amp;gt; taking the value &amp;lt;math&amp;gt;y_i&amp;lt;/math&amp;gt;. The entropy of &amp;lt;math&amp;gt;Y&amp;lt;/math&amp;gt; conditioned on &amp;lt;math&amp;gt;X&amp;lt;/math&amp;gt; taking the value &amp;lt;math&amp;gt;x&amp;lt;/math&amp;gt; is defined by: &lt;br /&gt;
&lt;br /&gt;
:&amp;lt;math&amp;gt;\Eta(Y|X=x)&lt;br /&gt;
= -\sum_{y\in\mathcal Y} {\Pr(Y = y|X=x) \log_2{\Pr(Y = y|X=x)}}.&amp;lt;/math&amp;gt;&lt;br /&gt;
Note that &amp;lt;math&amp;gt;\Eta(Y|X)&amp;lt;/math&amp;gt; is the result of averaging &amp;lt;math&amp;gt;\Eta(Y|X=x)&amp;lt;/math&amp;gt; over all possible values &amp;lt;math&amp;gt;x&amp;lt;/math&amp;gt; that &amp;lt;math&amp;gt;X&amp;lt;/math&amp;gt; may take. Also, if the above sum is taken over a sample &amp;lt;math&amp;gt;y_1, \dots, y_n&amp;lt;/math&amp;gt;, the expected value &amp;lt;math&amp;gt;E_X[ \Eta(y_1, \dots, y_n \mid X = x)]&amp;lt;/math&amp;gt; is known in some domains as &amp;#039;&amp;#039;&amp;#039;{{Visible anchor|equivocation}}&amp;#039;&amp;#039;&amp;#039;.&amp;lt;ref&amp;gt;{{cite journal|author1=Hellman, M.|author2=Raviv, J.|year=1970|title=Probability of error, equivocation, and the Chernoff bound|journal=IEEE Transactions on Information Theory|volume=16|issue=4|pages=368–372|doi=10.1109/TIT.1970.1054466|citeseerx=10.1.1.131.2865}}&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Given [[Discrete random variable|discrete random variables]] &amp;lt;math&amp;gt;X&amp;lt;/math&amp;gt; with image &amp;lt;math&amp;gt;\mathcal X&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;Y&amp;lt;/math&amp;gt; with image &amp;lt;math&amp;gt;\mathcal Y&amp;lt;/math&amp;gt;, the conditional entropy of &amp;lt;math&amp;gt;Y&amp;lt;/math&amp;gt; given &amp;lt;math&amp;gt;X&amp;lt;/math&amp;gt; is defined as the [[Weight function|weighted sum]] of &amp;lt;math&amp;gt;\Eta(Y|X=x)&amp;lt;/math&amp;gt; for each possible value of &amp;lt;math&amp;gt;x&amp;lt;/math&amp;gt;, using  &amp;lt;math&amp;gt;p(x)&amp;lt;/math&amp;gt; as the weights:&amp;lt;ref name=cover1991&amp;gt;{{cite book|isbn=0-471-06259-6|year=1991|authorlink1=Thomas M. Cover|author1=T. Cover|author2=J. Thomas|title=Elements of Information Theory|publisher=Wiley |url=https://archive.org/details/elementsofinform0000cove|url-access=registration}}&amp;lt;/ref&amp;gt;{{rp|15}}&lt;br /&gt;
:&amp;lt;math&amp;gt;&lt;br /&gt;
\begin{align}&lt;br /&gt;
\Eta(Y|X)\ &amp;amp;\equiv \sum_{x\in\mathcal X}\,p(x)\,\Eta(Y|X=x)\\&lt;br /&gt;
&amp;amp; =-\sum_{x\in\mathcal X} p(x)\sum_{y\in\mathcal Y}\,p(y|x)\,\log_2\, p(y|x)\\&lt;br /&gt;
&amp;amp; =-\sum_{x\in\mathcal X, y\in\mathcal Y}\,p(x)p(y|x)\,\log_2\,p(y|x)\\&lt;br /&gt;
&amp;amp; =-\sum_{x\in\mathcal X, y\in\mathcal Y}p(x,y)\log_2 \frac {p(x,y)} {p(x)}. &lt;br /&gt;
\end{align}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Properties==&lt;br /&gt;
===Conditional entropy equals zero===&lt;br /&gt;
:&amp;lt;math&amp;gt;\Eta(Y|X)=0&amp;lt;/math&amp;gt; [[if and only if]] the value of &amp;lt;math&amp;gt;Y&amp;lt;/math&amp;gt; is completely determined by the value of &amp;lt;math&amp;gt;X&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
===Conditional entropy of independent random variables===&lt;br /&gt;
Conversely, &amp;lt;math&amp;gt;\Eta(Y|X) = \Eta(Y)&amp;lt;/math&amp;gt; if and only if &amp;lt;math&amp;gt;Y&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;X&amp;lt;/math&amp;gt; are [[independent random variables]].&lt;br /&gt;
&lt;br /&gt;
===Chain rule===&lt;br /&gt;
Assume that the combined system determined by two random variables &amp;lt;math&amp;gt;X&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;Y&amp;lt;/math&amp;gt; has [[joint entropy]] &amp;lt;math&amp;gt;\Eta(X,Y)&amp;lt;/math&amp;gt;, that is, we need &amp;lt;math&amp;gt;\Eta(X,Y)&amp;lt;/math&amp;gt; bits of information on average to describe its exact state. Now if we first learn the value of &amp;lt;math&amp;gt;X&amp;lt;/math&amp;gt;, we have gained &amp;lt;math&amp;gt;\Eta(X)&amp;lt;/math&amp;gt; bits of information. Once &amp;lt;math&amp;gt;X&amp;lt;/math&amp;gt; is known, we only need &amp;lt;math&amp;gt;\Eta(X,Y)-\Eta(X)&amp;lt;/math&amp;gt; bits to describe the state of the whole system. This quantity is exactly &amp;lt;math&amp;gt;\Eta(Y|X)&amp;lt;/math&amp;gt;, which gives the &amp;#039;&amp;#039;chain rule&amp;#039;&amp;#039; of conditional entropy:&lt;br /&gt;
&lt;br /&gt;
:&amp;lt;math&amp;gt;\Eta(Y|X)\, = \, \Eta(X,Y)- \Eta(X).&amp;lt;/math&amp;gt;&amp;lt;ref name=cover1991 /&amp;gt;{{rp|17}}&lt;br /&gt;
&lt;br /&gt;
The chain rule follows from the above definition of conditional entropy:&lt;br /&gt;
&lt;br /&gt;
:&amp;lt;math&amp;gt;\begin{align} &lt;br /&gt;
\Eta(Y|X) &amp;amp;= \sum_{x\in\mathcal X, y\in\mathcal Y}p(x,y)\log \left(\frac{p(x)}{p(x,y)} \right) \\[4pt]&lt;br /&gt;
 &amp;amp;= \sum_{x\in\mathcal X, y\in\mathcal Y}p(x,y)(\log (p(x)) - \log (p(x,y))) \\[4pt]&lt;br /&gt;
 &amp;amp;= -\sum_{x\in\mathcal X, y\in\mathcal Y}p(x,y)\log (p(x,y)) + \sum_{x\in\mathcal X, y\in\mathcal Y}{p(x,y)\log(p(x))} \\[4pt]&lt;br /&gt;
 &amp;amp; = \Eta(X,Y) + \sum_{x \in \mathcal X} p(x)\log (p(x) ) \\[4pt]&lt;br /&gt;
 &amp;amp; = \Eta(X,Y) - \Eta(X). &lt;br /&gt;
\end{align}&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In general, a chain rule for multiple random variables holds:&lt;br /&gt;
&lt;br /&gt;
:&amp;lt;math&amp;gt; \Eta(X_1,X_2,\ldots,X_n) =&lt;br /&gt;
 \sum_{i=1}^n \Eta(X_i | X_1, \ldots, X_{i-1}) &amp;lt;/math&amp;gt;&amp;lt;ref name=cover1991 /&amp;gt;{{rp|22}}&lt;br /&gt;
&lt;br /&gt;
It has a similar form to [[Chain rule (probability)|chain rule]] in [[probability theory]], except that addition instead of multiplication is used.&lt;br /&gt;
&lt;br /&gt;
===Bayes&amp;#039; rule===&lt;br /&gt;
[[Bayes&amp;#039; rule]] for conditional entropy states&lt;br /&gt;
:&amp;lt;math&amp;gt;\Eta(Y|X) \,=\, \Eta(X|Y) - \Eta(X) + \Eta(Y).&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;Proof.&amp;#039;&amp;#039; &amp;lt;math&amp;gt;\Eta(Y|X) = \Eta(X,Y) - \Eta(X)&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;\Eta(X|Y) = \Eta(Y,X) - \Eta(Y)&amp;lt;/math&amp;gt;. Symmetry entails &amp;lt;math&amp;gt;\Eta(X,Y) = \Eta(Y,X)&amp;lt;/math&amp;gt;. Subtracting the two equations implies Bayes&amp;#039; rule.&lt;br /&gt;
&lt;br /&gt;
If &amp;lt;math&amp;gt;Y&amp;lt;/math&amp;gt; is [[Conditional independence|conditionally independent]] of &amp;lt;math&amp;gt;Z&amp;lt;/math&amp;gt; given &amp;lt;math&amp;gt;X&amp;lt;/math&amp;gt; we have:&lt;br /&gt;
&lt;br /&gt;
:&amp;lt;math&amp;gt;\Eta(Y|X,Z) \,=\, \Eta(Y|X).&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Other properties===&lt;br /&gt;
For any &amp;lt;math&amp;gt;X&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;Y&amp;lt;/math&amp;gt;:&lt;br /&gt;
:&amp;lt;math&amp;gt;\begin{align}&lt;br /&gt;
  \Eta(Y|X) &amp;amp;\le \Eta(Y) \, \\&lt;br /&gt;
  \Eta(X,Y) &amp;amp;= \Eta(X|Y) + \Eta(Y|X) + \operatorname{I}(X;Y),\qquad \\&lt;br /&gt;
  \Eta(X,Y) &amp;amp;= \Eta(X) + \Eta(Y) - \operatorname{I}(X;Y),\, \\&lt;br /&gt;
  \operatorname{I}(X;Y) &amp;amp;\le \Eta(X),\,&lt;br /&gt;
\end{align}&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where &amp;lt;math&amp;gt;\operatorname{I}(X;Y)&amp;lt;/math&amp;gt; is the [[mutual information]] between &amp;lt;math&amp;gt;X&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;Y&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
For independent &amp;lt;math&amp;gt;X&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;Y&amp;lt;/math&amp;gt;:&lt;br /&gt;
&lt;br /&gt;
:&amp;lt;math&amp;gt;\Eta(Y|X) = \Eta(Y) &amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;\Eta(X|Y) = \Eta(X) \, &amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Although the specific-conditional entropy &amp;lt;math&amp;gt;\Eta(X|Y=y)&amp;lt;/math&amp;gt; can be either less or greater than &amp;lt;math&amp;gt;\Eta(X)&amp;lt;/math&amp;gt; for a given [[random variate]] &amp;lt;math&amp;gt;y&amp;lt;/math&amp;gt; of &amp;lt;math&amp;gt;Y&amp;lt;/math&amp;gt;, &amp;lt;math&amp;gt;\Eta(X|Y)&amp;lt;/math&amp;gt; can never exceed &amp;lt;math&amp;gt;\Eta(X)&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== Conditional differential entropy ==&lt;br /&gt;
=== Definition ===&lt;br /&gt;
The above definition is for discrete random variables. The continuous version of discrete conditional entropy is called &amp;#039;&amp;#039;conditional differential (or continuous) entropy&amp;#039;&amp;#039;. Let &amp;lt;math&amp;gt;X&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;Y&amp;lt;/math&amp;gt; be a continuous random variables with a [[joint probability density function]] &amp;lt;math&amp;gt;f(x,y)&amp;lt;/math&amp;gt;. The differential conditional entropy &amp;lt;math&amp;gt;h(X|Y)&amp;lt;/math&amp;gt; is defined as&amp;lt;ref name=cover1991 /&amp;gt;{{rp|249}}&lt;br /&gt;
&lt;br /&gt;
:&amp;lt;math&amp;gt;h(X|Y) = -\int_{\mathcal X, \mathcal Y} f(x,y)\log f(x|y)\,dx dy&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
=== Properties ===&lt;br /&gt;
In contrast to the conditional entropy for discrete random variables, the conditional differential entropy may be negative.&lt;br /&gt;
&lt;br /&gt;
As in the discrete case there is a chain rule for differential entropy:&lt;br /&gt;
:&amp;lt;math&amp;gt;h(Y|X)\,=\,h(X,Y)-h(X)&amp;lt;/math&amp;gt;&amp;lt;ref name=cover1991 /&amp;gt;{{rp|253}}&lt;br /&gt;
Notice however that this rule may not be true if the involved differential entropies do not exist or are infinite.&lt;br /&gt;
&lt;br /&gt;
Joint differential entropy is also used in the definition of the [[mutual information]] between continuous random variables:&lt;br /&gt;
:&amp;lt;math&amp;gt;\operatorname{I}(X,Y)=h(X)-h(X|Y)=h(Y)-h(Y|X)&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
:&amp;lt;math&amp;gt;h(X|Y) \le h(X)&amp;lt;/math&amp;gt; with equality if and only if &amp;lt;math&amp;gt;X&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;Y&amp;lt;/math&amp;gt; are independent.&amp;lt;ref name=cover1991 /&amp;gt;{{rp|253}}&lt;br /&gt;
&lt;br /&gt;
===Relation to estimator error===&lt;br /&gt;
The conditional differential entropy yields a lower bound on the expected squared error of an [[estimator]]. For any Gaussian random variable &amp;lt;math&amp;gt;X&amp;lt;/math&amp;gt;, observation &amp;lt;math&amp;gt;Y&amp;lt;/math&amp;gt; and estimator &amp;lt;math&amp;gt;\widehat{X}&amp;lt;/math&amp;gt; the following holds:&amp;lt;ref name=cover1991 /&amp;gt;{{rp|255}}&lt;br /&gt;
:&amp;lt;math display=&amp;quot;block&amp;quot;&amp;gt;\mathbb{E}\left[\bigl(X - \widehat{X}{(Y)}\bigr)^2\right] &lt;br /&gt;
 \ge \frac{1}{2\pi e}e^{2h(X|Y)}&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This is related to the [[uncertainty principle]] from [[quantum mechanics]].&lt;br /&gt;
&lt;br /&gt;
==Generalization to quantum theory==&lt;br /&gt;
In [[quantum information theory]], the conditional entropy is generalized to the [[conditional quantum entropy]]. The latter can take negative values, unlike its classical counterpart.&lt;br /&gt;
&lt;br /&gt;
== See also ==&lt;br /&gt;
* [[Entropy (information theory)]]&lt;br /&gt;
* [[Mutual information]]&lt;br /&gt;
* [[Conditional quantum entropy]]&lt;br /&gt;
* [[Variation of information]]&lt;br /&gt;
* [[Entropy power inequality]]&lt;br /&gt;
* [[Likelihood function]]&lt;br /&gt;
&lt;br /&gt;
==References==&lt;br /&gt;
{{Reflist}}&lt;br /&gt;
&lt;br /&gt;
[[Category:Entropy and information]]&lt;br /&gt;
[[Category:Information theory]]&lt;/div&gt;</summary>
		<author><name>imported&gt;I love yourwiki</name></author>
	</entry>
</feed>