imported>Ktkvtsh: /* References */ add notelist

2025-10-26T23:45:20Z

References: add notelist

New page

{{Short description|Data whose unit can take on only two possible states}}
{{more citations needed|date=April 2019}}

'''Binary data''' is [[data]] whose unit can take on only two possible states. These are often labelled as 0 and 1 in accordance with the [[binary numeral system]] and [[Boolean algebra]].

Binary data occurs in many different technical and scientific fields, where it can be called by different names including ''[[bit]]'' (binary digit) in [[computer science]], ''[[truth value]]'' in [[mathematical logic]] and related domains and ''[[#In statistics|binary variable]]'' in statistics.

==Mathematical and combinatoric foundations==
A [[finite set|discrete]] variable that can take only [[one]] state contains zero [[informational entropy|information]], and {{num|2}} is the next [[natural number]] after 1. That is why the [[bit]], a variable with only two possible values, is a standard primary [[units of information|unit of information]].

A collection of {{mvar|n}} bits may have {{math|[[power of two|2<sup>''n''</sup>]]}} states: see [[binary number]] for details. Number of states of a collection of discrete variables depends [[exponential function|exponentially]] on the number of variables, and only as a [[power law]] on number of states of each variable. Ten bits have more ({{num|1024}}) states than three [[decimal digit]]s ({{num|1000}}). {{math|10''k''}} bits are more than sufficient to represent an information (a [[number]] or anything else) that requires {{math|3''k''}} decimal digits, so information contained in discrete variables with [[ternary numeral system|3]], 4, 5, 6, 7, 8, 9, [[Neper|10]]... states can be ever superseded by allocating two, three, or four times more bits. So, the use of any other small number than 2 does not provide an advantage.

[[Image:Hypercubeorder binary.svg|thumb|right|A [[Hasse diagram]]: representation of a Boolean algebra as a [[directed graph]]]]
Moreover, Boolean algebra provides a convenient mathematical structure for collection of bits, with a semantic of a collection of [[propositional variable]]s. Boolean algebra operations are known as "[[bitwise operation]]s" in computer science. [[Boolean function]]s are also well-studied theoretically and easily implementable, either with [[computer program]]s or by so-named [[logic gate]]s in [[digital electronics]]. This contributes to the use of bits to represent different data, even those originally not binary.

=={{anchor|Statistics}}In statistics==
In [[statistics]], '''binary data''' is a [[statistical data type]] consisting of [[categorical data]], that can take exactly two possible values, such as "A" and "B", or "heads" and "tails". It is also called '''dichotomous data''', and an older term is '''quantal data'''.{{sfn|Collett|2002|p=1}} The two values are often referred to generically as "success" and "failure".{{sfn|Collett|2002|p=1}} As a form of categorical data, binary data is [[nominal data]], meaning the values are [[qualitative property|qualitatively different]] and cannot be compared numerically. However, the values are frequently represented as 1 or 0, which corresponds to counting the number of successes in a single trial: 1 (success…) or 0 (failure); see {{slink||Counting}}. More intuitively, binary data can be represented as [[count data]].

Often, binary data is used to represent one of two conceptually opposed values, e.g.:
*the outcome of an experiment ("success" or "failure")
*the response to a yes–no question ("yes" or "no")
*presence or absence of some feature ("is present" or "is not present")
*the truth or falsehood of a proposition ("true" or "false", "correct" or "incorrect")

However, it can also be used for data that is assumed to have only two possible values, even if they are not conceptually opposed or conceptually represent all possible values in the space. For example, binary data is often used to represent the party choices of voters in elections in the United States, i.e. [[Republican Party (United States)|Republican]] or [[Democratic Party (United States)|Democratic]]. In this case, there is no inherent reason why only two [[political party|political parties]] should exist, and indeed, other parties do exist in the U.S., but they are so minor that they are generally simply ignored. Modeling continuous data (or categorical data of more than 2 categories) as a binary variable for analysis purposes is called [[discretization|dichotomization]] (creating a [[dichotomy]]). Like all discretization, it involves [[discretization error]], but the goal is to learn something valuable despite the error: treating it as [[wikt:negligible|negligible]] for the purpose at hand, but remembering that it cannot be assumed to be negligible in general.

==={{anchor|Binary variable}}Binary variables===
A '''binary variable''' is a [[random variable]] of binary type, meaning with two possible values. [[Independent and identically distributed random variables|Independent and identically distributed]] (i.i.d.) binary variables follow a [[Bernoulli distribution]], but in general binary data need not come from i.i.d. variables. Total counts of i.i.d. binary variables (equivalently, sums of i.i.d. binary variables coded as 1 or 0) follow a [[binomial distribution]], but when binary variables are not i.i.d., the distribution need not be binomial.

===Counting===
Like categorical data, binary data can be converted to a [[Array data structure|vector]] of [[count data]] by writing one coordinate for each possible value, and counting 1 for the value that occurs, and 0 for the value that does not occur.<ref>{{cite book |last=Agresti |first=Alan |url=https://books.google.com/books?id=UOrr47-2oisC&pg=PA6 |title=Categorical Data Analysis |publisher=Wiley |year=2012 |isbn=978-0470463635 |edition=3rd |page=6 |section=1.2.2 Multinomial Distribution}}</ref> For example, if the values are A and B, then the data set A, A, B can be represented in counts as (1, 0), (1, 0), (0, 1). Once converted to counts, binary data can be [[grouped data|grouped]] and the counts added. For instance, if the set A, A, B is grouped, the total counts are (2, 1): 2 A's and 1 B (out of 3 trials).

Since there are only two possible values, this can be simplified to a single count (a scalar value) by considering one value as "success" and the other as "failure", coding a value of the success as 1 and of the failure as 0 (using only the coordinate for the "success" value, not the coordinate for the "failure" value). For example, if the value A is considered "success" (and thus B is considered "failure"), the data set A, A, B would be represented as 1, 1, 0. When this is grouped, the values are added, while the number of trial is generally tracked implicitly. For example, A, A, B would be grouped as 1 + 1 + 0 = 2 successes (out of <math>n = 3</math> trials). Going the other way, count data with <math>n = 1</math> is binary data, with the two classes being 0 (failure) or 1 (success).

Counts of i.i.d. binary variables follow a binomial distribution, with {{tmath|n}} the total number of trials (points in the grouped data).

===Regression===
{{main|Binary regression}}
[[Regression analysis]] on predicted outcomes that are binary variables is known as [[binary regression]]; when binary data is converted to count data and modeled as i.i.d. variables (so they have a binomial distribution), [[binomial regression]] can be used. The most common regression methods for binary data are [[logistic regression]], [[probit regression]], or related types of [[binary choice]] models.

Similarly, counts of i.i.d. categorical variables with more than two categories can be modeled with a [[multinomial regression]]. Counts of non-i.i.d. binary data can be modeled by more complicated distributions, such as the [[beta-binomial distribution]] (a [[compound distribution]]). Alternatively, the ''relationship'' can be modeled without needing to explicitly model the distribution of the output variable using techniques from [[generalized linear model]]s, such as [[quasi-likelihood]] and a [[quasibinomial]] model; see {{slink|Overdispersion|Binomial}}.

==In computing==
[[File:Commons QR code.png|thumb|right|A [[binary image]] of a [[QR code]], representing 1 bit per pixel, as opposed to a typical 24-bit [[Color depth#True color (24-bit)|true color]] image.]]
{{See also |Binary file}}
As modern [[computer]]s are designed for binary operations and storage, computer data is binary data. Each bit is stored in [[computer hardware |hardware]] that stores one of two states.{{efn|In a [[bistability |bistable]] device such as a [[flip-flop (electronics)|flip-flop]]}}<ref>{{Cite web |last=Gul |first=Najam |date=2022-08-18 |title=How do different types of Data get stored in form of 0 and 1? |url=https://www.deepcurious.com/how-do-different-types-of-data-get-stored-in-form-of-0-and-1 |access-date=2023-01-05 |website=Curiosity Tea |language=en}}</ref>

A computer generally accesses memory as a sequence memory locations that consist of a fixed number of bits; often an 8-bit [[byte]] but this varies by memory hardware. Higher-level groupings are often defined as well. For example, [[word (computer architecture)|word]] typically refers to a group of bytes and a group of words might be called ''long word'' or ''quadword''.

Although binary data can be interpreted as purely numeric, some data is more abstract; representing other concepts based on a mapping scheme. For example, memory can contain [[Instruction (computer science)|computer instruction]]s that can control the computer (i.e. via a [[computer program]]).

Memory can also contain data that represents text per a [[character encoding]] that encodes [[human-readable]] information. Although all computer data is binary data, in practice, ''binary data'' generally excludes this text data; [[plain text]]. Although technically text data is binary data (as all computer data is binary), a distinction is made between data that encoded as text vs. data that is not. Content that represents text can be binary such as an [[digital image |image]] of text but only data stored as encoded characters is considered text data. All other data is classified as (non-text) binary.

==See also==
* [[Bit array]]
* [[Binary protocol]]
* [[Bernoulli distribution]]
* [[Boolean data type]]
* [[Computer memory]]
* [[Categorical data]]
* [[Qualitative data]]

==Notes==
{{notelist}}

==References==
{{reflist}}
{{refbegin}}
{{cite book |title=Modelling Binary Data |first=David |last=Collett |year=2002 |publisher=CRC Press |edition=Second |isbn=9781420057386}}
{{refend}}

[[Category:Statistical data types]]

Binary data - Revision history

imported>Ktkvtsh: /* References */ add notelist