OpenBSD as a domain name server - The Domain Name System

2. The Domain Name System

DNS configuration is much easier if you have a good understanding of its fundamentals. Hence, before hurrying to edit our zone data files, let's take a brief look at the overall architecture of the Domain Name System and its inner mechanisms.

2.1 A few definitions

The Domain Name System is a distributed database of resource records (see [RFC1034]), associating many types of information (e.g. IP address, mail exchanger, etc.) with domain names. Similarly to the Unix file system, the structure of this database is a hierarchical inverted tree, with the root at the top. The whole tree is called the Domain Name Space.

Each node in the Domain Name Space has a text label (the root node has a special zero-length label, "") and is uniquely identified by its domain name, i.e. the list of the labels on the path from the node to the root, separated by dots (Unix paths, on the contrary, start from the root and are separated by slashes).

For instance, the domain name highlighted in the following picture is made up of the sequence "www", "kernel-panic", "it" and the root's null label, and is therefore written as "www.kernel-panic.it.".

Since the root node is usually written as a single dot, domain names ending with a trailing dot are considered absolute (similarly to Unix absolute pathnames, starting with a leading slash). An absolute domain name is also referred to as a fully qualified domain name (FQDN). Domain names with no trailing dot are considered relative to another domain, usually to the root itself. A relative domain name is also referred to as a partially qualified domain name (PQDN).

A domain is a subtree of the domain name space and takes the domain name of its top node. Each domain may have its own subtrees, called subdomains. Domains may also be referred to by level: a top-level (or first-level) domain is a child of the root; a second-level domain is a child of a first-level domain; and so on.

The hierarchical structure of the domain name system allows for the decentralization of its administration; in fact, an organization in charge of a domain can delegate, i.e. assign responsibility for, a subdomain to a different organization and only maintain information about the non-delegated part of the domain (called a zone).

Programs that store information about a zone are called domain name servers and are said to have authority for that zone. There are two types of name servers:

primary master name servers, which read the data for the zone from a local file (called zone data file);
secondary master name servers (or slaves), which get data from another name server that is authoritative for the zone (called master server) through a zone transfer; usually, but not necessarily, the master server is the zone's primary master.

Having two types of name servers makes administration easier, by providing a single point of configuration, while allowing for redundancy, load sharing and responsiveness by having multiple authoritative name servers for a zone.

2.2 The name resolution process

Clients that access name servers are called resolvers. In Bind, the resolver is just a library that must be linked by applications requiring name service. When an application needs information from the domain name space, it uses the resolver to perform a query against a DNS server (usually the corporate or the ISP's server). If authoritative for the queried zone, the DNS server will reply immediately; otherwise, it will search through the domain name space to find the requested data. This process is called name resolution.

There are two types of DNS queries:

iterative (or non-recursive), which simply ask a DNS server the best answer it already knows;
recursive, which ask the DNS server to fully answer the query, or give an error.

Usually resolvers perform recursive queries, placing the burden of resolution on the queried name server; DNS servers, instead, perform a series of iterative queries, following any referrals, until they receive the answer they are looking for.

Let's see how it all works by going through an example. Suppose you want to visit the "www.kernel-panic.it" web site; you type the URL in your browser, press "Enter" and this is what happens next:

the resolver performs a recursive query against your corporate DNS server, expecting the IP address of the "www.kernel-panic.it" web server (or an error) in return;
since the corporate DNS server isn't authoritative for the queried zone, it will send an iterative query for the address of the "www.kernel-panic.it" domain name to a root name server, i.e. one of the 13 worldwide DNS servers which know the name servers authoritative for each of the top-level zones;
the queried root name server won't probably know the full answer, but it will certainly know which name servers are authoritative for the "it" zone. Therefore, it will refer your corporate DNS server to those name servers;
your DNS server will choose one of the referred name servers and send it the same iterative query for the "www.kernel-panic.it" domain name;
the queried "it" name server won't probably know the full answer and therefore will refer your corporate DNS server to the list of name servers authoritative for the "kernel-panic.it" zone;
your DNS server has finally discovered the authoritative name servers for the "kernel-panic.it" zone and will send the same query to one of them;
the queried name server will return the address of the "www.kernel-panic.it" domain name;
your corporate name server is finally able to return the information to the resolver.

As you can see, the resolution process may involve quite a few steps; but after each step, the name server learns a new piece of information about the domain name space. For instance, in the previous example, the corporate DNS server has learned which servers are authoritative for the "it" and "kernel-panic.it" zones. So what happens now if you want to connect to the "ftp.kernel-panic.it" machine? Your corporate name server already knows the authoritative servers for the "kernel-panic.it" zone; so it will send the query directly to one of them and get the answer in a single step, thus speeding up the resolution process. Storing learned data for future reference is called caching. Since version 4.9, Bind also keeps track of non-existing domains (negative caching), thus preventing the repeating of failed queries.

2.3 Reverse name resolution

Reverse name resolution is the process of mapping an IP address back to a FQDN. Though this may seem to require an exhaustive search of the whole domain name space, it is, in matter of fact, as simple as name resolution because the developers of DNS have created a special "in-addr.arpa" domain that uses the dotted-octet representation of IP addresses as labels.

In other words, the in-addr.arpa domain has (or could have, to be more precise) up to 256 third-level subdomains (numbered from 0 to 255), corresponding to the possible values of the first octet of an IP address; each of those 256 subdomains could have, in turn, up to 256 fourth-level subdomains, also numbered from 0 to 255, corresponding to the values of the second octet; and so on.

Therefore, to look up the FQDN associated with an IP address, the resolver simply has to query the name server for the PTR record (see below) of the corresponding node in the in-addr.arpa domain. For example, to get the domain name for the 62.149.140.23 IP address, the resolver will query the DNS server for the PTR record of the "23.140.149.62.in-addr.arpa" domain name.

As you can see, IP addresses appear reversed in the in-addr.arpa domain name. This is due to a basic difference between IP addresses and domain names: IP addresses get more specific from left to right, while domain names get more specific from right to left. Hence, naming nodes in the in-addr.arpa domain in this (seemingly odd) way actually allows IP addresses to correctly reflect the hierarchical structure of the domain name system.

2.4 Resource records

Each node in the domain name space has a set of resource information (which may be empty) associated to it, composed of separate resource records (RRs). This information is contained in text form within the zone data files, while queries and zone transfers represent it in binary form. A resource record is made up of five fields:

Name: The domain name the resource record refers to
Type: The type of the resource record (see below)
TTL: The time to live of the RR, i.e. how long resolvers should keep it in cache before considering it outdated
Class: The type of network or software the record applies to; currently valid classes are Internet (IN), CHAOSnet (CH) and Hesiod (HS). We will discuss only the Internet class, which applies to all TCP/IP-based internets and is by far the most widespread
RDATA: The actual resource data associated with the domain name

The main DNS record types are the following (see [RFC1035]):

A (Address): A 32-bit host IP address
AAAA (IPv6 Address): A host address in IPv6 format
CNAME (Canonical Name): Specifies an alias for a domain name, i.e. a different FQDN that can be used to refer to the same host
KEY: The server's public key for TSIG and DNSSEC
MX (Mail eXchanger): Specifies a list of mail servers to which to send mail for that domain name
NS (Name Server): the authoritative name server for the domain
PTR (Pointer): A pointer to another location in the domain name space; it is mostly used to associate a domain name with an IP address in the "in-addr.arpa" domain for reverse name resolution
SOA (Start Of Authority): Identifies the start of a zone of authority
TXT (Text): a text string containing arbitrary data (up to 255 bytes) associated with a name