Probability axioms
Template:Short description Template:Probability fundamentals The standard probability axioms are the foundations of probability theory introduced by Russian mathematician Andrey Kolmogorov in 1933.<ref name=":0">Template:Cite book</ref> Like all axiomatic systems, they outline the basic assumptions underlying the application of probability to fields such as pure mathematics and the physical sciences, while avoiding logical paradoxes.<ref>Template:Cite web</ref>
The probability axioms do not specify or assume any particular interpretation of probability, but may be motivated by starting from a philosophical definition of probability and arguing that the axioms are satisfied by this definition. For example,
- Cox's theorem derives the laws of probability based on a "logical" definition of probability as the likelihood or credibility of arbitrary logical propositions.<ref>Template:Cite journal</ref><ref>Template:Cite book</ref>
- The Dutch book arguments show that rational agents must make bets which are in proportion with a subjective measure of the probability of events.
The third axiom, σ-additivity, is relatively modern, and originates with Lebesgue's measure theory. Some authors replace this with the strictly weaker axiom of finite additivity, which is sufficient to deal with some applications.<ref>Template:Cite journal</ref>
Kolmogorov axioms
In order to state the Kolmogorov axioms, the following pieces of data must be specified:
- The sample space, <math display="inline">\Omega</math>, which is the set of all possible outcomes or elementary events.
- The space of all events, which are each taken to be sets of outcomes (i.e. subsets of <math display="inline">\Omega</math>). The event space, <math display="inline">F</math>, must be a [[Σ-algebra|Template:Mvar-algebra]] on <math display="inline">\Omega</math>.
- The probability measure <math display="inline">P</math> which assigns to each event <math>E \in F</math> its probability, <math>P(E)</math>.
Taken together, these assumptions mean that <math>(\Omega, F, P)</math> is a measure space. It is additionally assumed that <math>P(\Omega)=1</math>, making this triple a probability space.<ref name=":0" />
Template:AnchorFirst axiom
The probability of an event is a non-negative real number. This assumption is implied by the fact that <math>P</math> is a measure on <math>F</math>.
- <math>P(E)\geq 0 \qquad \forall E \in F</math>
Theories which assign negative probability relax the first axiom.
Template:AnchorSecond axiom
This is the assumption of unit measure: that the probability that at least one of the elementary events in the entire sample space will occur is 1.<math display="block">P(\Omega) = 1</math>From this axiom it follows that <math>P(E)</math> is always finite, in contrast with more general measure theory.
Template:AnchorThird axiom
This is the assumption of σ-additivity: Any countable sequence of disjoint sets (synonymous with mutually exclusive events) <math>E_1, E_2, \ldots</math> satisfies
- <math>P\left(\bigcup_{i = 1}^\infty E_i\right) = \sum_{i=1}^\infty P(E_i).</math>
This property again is implied by the fact that <math>P</math> is a measure. Note that, by taking <math>E_1 = \Omega</math> and <math>E_i = \emptyset</math> for all <math>i>1</math>, one deduces that <math>P(\emptyset) = 0</math>. This in turn shows that σ-additivity implies finite additivity.
Some authors consider merely finitely additive probability spaces, in which case one just needs an algebra of sets, rather than a σ-algebra.<ref>Template:Cite web</ref> Quasiprobability distributions in general relax the third axiom.
Elementary consequences
In order to demonstrate that the theory generated by the Kolmogorov axioms corresponds with classical probability, some elementary consequences are typically derived.<ref>Template:Cite web</ref>
- Since <math>P</math> is finitely additive, we have <math>P(A) + P(A^c) = P(A\cup A^c)= P(\Omega) = 1</math>, so <math>P(A^c) = 1-P(A)</math>.
- In particular, it follows that <math>P(\emptyset) = 0</math>. The empty set is interpreted as the event that "no outcome occurs", which is impossible.
- Similarly, if <math>A \subseteq B</math>, then <math>P(B) = P(A \cup (B\setminus A)) = P(A) + P(B\setminus A) \ge P(A)</math>. In other words, <math>P</math> is monotone.<ref name=":1">Template:Cite book</ref>
- Since <math>\emptyset \subseteq E \subseteq \Omega</math> for any event <math>E</math>, it follows that <math>0 \le P(E) \le 1</math>.
By dividing <math>A \cup B </math> into the disjoint sets <math>A \setminus (A \cap B) </math>, <math>B \setminus (A \cap B)</math> and <math>A \cap B</math>, one arrives at a probabilistic version of the inclusion-exclusion principle<ref>Template:Cite web</ref><math display="block">P(A \cup B) = P(A) + P(B) - P(A \cap B).</math>In the case where <math>\Omega</math> is finite, the two identities are equivalent.
In order to actually do calculations when <math>\Omega</math> is an infinite set, it is sometimes useful to generalize from a finite sample space. For example, if <math>\Omega</math> consists of all infinite sequences of tosses of a fair coin, it is not obvious how to compute the probability of any particular set of sequences (i.e. an event). If the event is "every flip is heads", then it is intuitive that the probability can be computed as:<math display="block">P(\text{infinite sequence of heads}) = \lim_{n \to \infty} P(\text{sequence of n heads}) = \lim_{n \to \infty} 2^{-n} = 0.</math>In order to make this rigorous, one has to prove that <math>P</math> is continuous, in the following sense. If <math>A_j,\,\, j = 1, 2, \ldots</math> is a sequence of events increasing (or decreasing) to another event <math>A</math>, then<ref>Template:Cite book</ref><math display="block">\lim_{n \to \infty} P(A_n) = P(A).</math>
Simple example: coin toss
Consider a single coin-toss, and assume that the coin will either land heads (H) or tails (T) (but not both). No assumption is made as to whether the coin is fair.<ref>Template:Cite journal</ref>
We may define:
- <math>\Omega = \{H,T\}</math>
- <math>F = \{\varnothing, \{H\}, \{T\}, \{H,T\}\}</math>
Kolmogorov's axioms imply that:
- <math>P(\varnothing) = 0</math>
The probability of neither heads nor tails, is 0.
- <math>P(\{H,T\}^c) = 0</math>
The probability of either heads or tails, is 1.
- <math>P(\{H\}) + P(\{T\}) = 1</math>
The sum of the probability of heads and the probability of tails, is 1.
See also
- Template:Annotated link
- Template:Annotated link
- Template:Annotated link
- Template:Annotated link
- Template:Annotated link
- Template:Annotated link
- Template:Annotated link
References
Further reading
- Template:Cite book
- Template:Cite book
- Formal definition of probability in the Mizar system, and the list of theorems formally proved about it.