Augmented Backus–Naur form

From Vero - Wikipedia
Revision as of 22:24, 12 November 2025 by imported>EncyclopedianWP (Overview)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Template:Short description In computer science, augmented Backus–Naur form (ABNF) is a metalanguage based on Backus–Naur form (BNF) but consisting of its own syntax and derivation rules. The motive principle for ABNF is to describe a formal system of a language to be used as a bidirectional communications protocol. It is defined by Internet Standard 68 ("STD 68", type case sic), which Template:As of was Template:IETF RFC, and it often serves as the definition language for IETF communication protocols.<ref name="Internet Standards">Template:Cite web </ref><ref name="STD 68"> Template:Cite web </ref>

Template:IETF RFC supersedes Template:IETF RFC.<ref name="RFC Index"> Template:Cite web </ref> Template:IETF RFC updates it, adding a syntax for specifying case-sensitive string literals.

Overview

Template:Tall image An ABNF specification is a set of derivation rules, written as Template:Sxhl where rule is a case-insensitive nonterminal, the definition consists of sequences of symbols that define the rule, a comment for documentation, and ending with a carriage return and line feed.

Rule names are case-insensitive: <rulename>, <Rulename>, <RULENAME>, and <rUlENamE> all refer to the same rule. Rule names consist of a letter followed by letters, numbers, and hyphens.

Angle brackets (<, >) are not required around rule names (as they are in BNF). However, they may be used to delimit a rule name when used in prose to discern a rule name.

Example

The (U.S.) postal address example given in the augmented Backus–Naur form (ABNF) page may be specified as follows: <syntaxhighlight lang=abnf> postal-address = name-part street zip-part

name-part = *(personal-part SP) last-name [SP suffix] CRLF name-part =/ personal-part CRLF

personal-part = first-name / (initial ".") first-name = *ALPHA initial = ALPHA last-name = *ALPHA suffix = ("Jr." / "Sr." / 1*("I" / "V" / "X"))

street = [apt SP] house-num SP street-name CRLF apt = 1*4DIGIT house-num = 1*8(DIGIT / ALPHA) street-name = 1*VCHAR

zip-part = town-name "," SP state 1*2SP zip-code CRLF town-name = 1*(ALPHA / SP) state = 2ALPHA zip-code = 5DIGIT ["-" 4DIGIT] </syntaxhighlight>

Terminal values

Terminals are specified by one or more numeric characters.

Numeric characters may be specified as the percent sign %, followed by the base (b = binary, d = decimal, and x = hexadecimal), followed by the value, or concatenation of values (indicated by .). For example, a carriage return is specified by %d13 in decimal or %x0D in hexadecimal. A carriage return followed by a line feed may be specified with concatenation as %d13.10.

Literal text is specified through the use of a string enclosed in quotation marks ("). These strings are case-insensitive, and the character set used is (US-)ASCII. Therefore, the string "abc" will match “abc”, “Abc”, “aBc”, “abC”, “ABc”, “AbC”, “aBC”, and “ABC”. RFC 7405 added a syntax for case-sensitive strings: %s"aBc" will only match "aBc". Prior to that, a case-sensitive string could only be specified by listing the individual characters: to match “aBc”, the definition would be %d97.66.99. A string can also be explicitly specified as case-insensitive with a %i prefix.

Operators

White space

White space is used to separate elements of a definition; for space to be recognized as a delimiter, it must be explicitly included. The explicit reference for a single whitespace character is WSP (linear white space), and LWSP is for zero or more whitespace characters with newlines permitted. The LWSP definition in RFC5234 is controversial<ref name="RFC5234 Errata">RFC Errata 3096.</ref> because at least one whitespace character is needed to form a delimiter between two fields.

Definitions are left-aligned. When multiple lines are required (for readability), continuation lines are indented by whitespace.

Comment

; comment

A semicolon (;) starts a comment that continues to the end of the line.

Concatenation

Rule1 Rule2

A rule may be defined by listing a sequence of rule names.

To match the string “aba”, the following rules could be used:

Alternative

Rule1 / Rule2

A rule may be defined by a list of alternative rules separated by a solidus (/).

To accept the rule fu or the rule bar, the following rule could be constructed:

Incremental alternatives

Rule1 =/ Rule2

Additional alternatives may be added to a rule through the use of =/ between the rule name and the definition.

The rule

is therefore equivalent to

Value range

%c##-##

A range of numeric values may be specified through the use of a hyphen (-).

The rule

is equivalent to

Sequence group

(Rule1 Rule2)

Elements may be placed in parentheses to group rules in a definition.

To match "a b d" or "a c d", the following rule could be constructed:

To match “a b” or “c d”, the following rules could be constructed:

Variable repetition

n*nRule

To indicate repetition of an element, the form <a>*<b>element is used. The optional <a> gives the minimal number of elements to be included (with the default of 0). The optional <b> gives the maximal number of elements to be included (with the default of infinity).

Use *element for zero or more elements, *1element for zero or one element, 1*element for one or more elements, and 2*3element for two or three elements, cf. regular expressions e*, e?, e+ and e{2,3}.

Specific repetition

nRule

To indicate an explicit number of elements, the form <a>element is used and is equivalent to <a>*<a>element.

Use 2DIGIT to get two numeric digits, and 3DIGIT to get three numeric digits. (DIGIT is defined below under "Core rules". Also see zip-code in the example below.)

Optional sequence

[Rule]

To indicate an optional element, the following constructions are equivalent:

Operator precedence

The following operators have the given precedence from tightest binding to loosest binding:

  1. Strings, names formation
  2. Comment
  3. Value range
  4. Repetition
  5. Grouping, optional
  6. Concatenation
  7. Alternative

Use of the alternative operator with concatenation may be confusing, and it is recommended that grouping be used to make explicit concatenation groups.

Core rules

Template:Tall image

The core rules are defined in the ABNF standard.

Rule Formal definition Meaning
ALPHA Template:Codett Upper- and lower-case ASCII letters (A–Z, a–z)
DIGIT Template:Codett Decimal digits (0–9)
HEXDIG Template:Codett Hexadecimal digits (0–9, A–F, a–f)
DQUOTE Template:Codett Double quote
SP Template:Codett Space
HTAB Template:Codett Horizontal tab
WSP Template:Codett Space and horizontal tab
LWSP Template:Codett Linear white space (past newline)
VCHAR Template:Codett Visible (printing) characters
CHAR Template:Codett Any ASCII character, excluding NUL
OCTET Template:Codett 8 bits of data
CTL Template:Codett Controls
CR Template:Codett Carriage return
LF Template:Codett Linefeed
CRLF Template:Codett Internet-standard newline
BIT Template:Codett Binary digit

Note that in the core rules diagram the CHAR2 charset is inlined in char-val and CHAR3 is inlined in prose-val in the RFC spec. They are named here for clarity in the main syntax diagram.

ABNF representation of itself

ABNF's syntax itself may be represented with a ABNF like the following: <syntaxhighlight lang="abnf"> rulelist = 1*( rule / (*WSP c-nl) )

rule = rulename defined-as elements c-nl

                      ; continues if next line starts
                      ;  with white space

rulename = ALPHA *(ALPHA / DIGIT / "-")

defined-as = *c-wsp ("=" / "=/") *c-wsp

                      ; basic rules definition and
                      ;  incremental alternatives

elements = alternation *WSP

c-wsp = WSP / (c-nl WSP)

c-nl = comment / CRLF

                      ; comment or newline

comment = ";" *(WSP / VCHAR) CRLF

alternation = concatenation

                 *(*c-wsp "/" *c-wsp concatenation)

concatenation = repetition *(1*c-wsp repetition)

repetition = [repeat] element

repeat = 1*DIGIT / (*DIGIT "*" *DIGIT)

element = rulename / group / option /

                 char-val / num-val / prose-val

group = "(" *c-wsp alternation *c-wsp ")"

option = "[" *c-wsp alternation *c-wsp "]"

char-val = DQUOTE *(%x20-21 / %x23-7E) DQUOTE

                      ; quoted string of SP and VCHAR
                      ;  without DQUOTE

num-val = "%" (bin-val / dec-val / hex-val)

bin-val = "b" 1*BIT

                 [ 1*("." 1*BIT) / ("-" 1*BIT) ]
                      ; series of concatenated bit values
                      ;  or single ONEOF range

dec-val = "d" 1*DIGIT

                 [ 1*("." 1*DIGIT) / ("-" 1*DIGIT) ]

hex-val = "x" 1*HEXDIG

                 [ 1*("." 1*HEXDIG) / ("-" 1*HEXDIG) ]

prose-val = "<" *(%x20-3D / %x3F-7E) ">"

                      ; bracketed string of SP and VCHAR
                      ;  without angles
                      ; prose description, to be used as
                      ;  last resort

</syntaxhighlight>

The core rules have to be adapted to their environment's encoding. Here are the core rules of ABNF in 7-bit ASCII encoding: <syntaxhighlight lang="abnf"> ALPHA = %x41-5A / %x61-7A  ; A-Z / a-z

BIT = "0" / "1"

CHAR = %x01-7F

                      ; any 7-bit US-ASCII character,
                      ;  excluding NUL

CR = %x0D

                      ; carriage return

CRLF = CR LF

                      ; Internet standard newline

CTL = %x00-1F / %x7F

                      ; controls

DIGIT = %x30-39

                      ; 0-9

DQUOTE = %x22

                      ; " (Double Quote)

HEXDIG = DIGIT / "A" / "B" / "C" / "D" / "E" / "F"

HTAB = %x09

                      ; horizontal tab

LF = %x0A

                      ; linefeed

LWSP = *(WSP / CRLF WSP)

                      ; Use of this linear-white-space rule
                      ;  permits lines containing only white
                      ;  space that are no longer legal in
                      ;  mail headers and have caused
                      ;  interoperability problems in other
                      ;  contexts.
                      ; Do not use when defining mail
                      ;  headers and use with caution in
                      ;  other contexts.

OCTET = %x00-FF

                      ; 8 bits of data

SP = %x20

VCHAR = %x21-7E

                      ; visible (printing) characters

WSP = SP / HTAB

                      ; white space

</syntaxhighlight>

Pitfalls

RFC 5234 adds a warning in conjunction to the definition of LWSP as follows: Template:Blockquote

References

Template:Reflist

Template:Metasyntax