Files
in-tree-hints/draft-kolkman-dns-in-tree-hints.md
2026-03-17 02:17:46 +00:00

458 lines
19 KiB
Markdown

---
title: In-Tree Hints for DNS Resiliency
abbrev: in-tree-hints
docname: draft-kolkman-in-tree-hints
category: info
ipr: trust200902
area: ops
workgroup: dnsop
keyword: Internet-Draft
stand_alone: yes
pi:
RFCedstyle: yes
toc: yes
tocindent: yes
sortrefs: yes
symrefs: yes
strict: yes
comments: yes
inline: yes
text-list-symbols: -o*+
author:
ins: O. Kolkman
name: Olaf Kolkman
email: kolkman@isoc.org
normative:
RFC2119:
RFC3339:
RFC5011:
RFC7344:
informative:
RFC9499:
RFC8767:
E-Gov-Resilience:
title: "Assessing e-Government DNS Resilience"
date: 2022
author:
- ins: Sommese et al.
seriesinfo:
"IEEE": Proceedings of the 2022 International Conference on Network and Service Management (CNSM 2022)
--- abstract
We present a methodology by which networks that rely very strongly on
specific domain names can become more resilience to failures in the parent domain.
The approach presented uses a hints-file-like mechanism in recursive
nameservers in addition to having the authoritative servers follow a
few operational practices.
The suggested method can be seen as a means for increasing digital
sovereignty. We describe the approach, the necessary operational
practices, and the dilemmas this approach introduces.
--- middle
Introduction
============
-------
The Domain Name System (DNS) is a remarkably stable and resilient
system. However, in many environments people are looking on how they
can remain in control over the continuity of digital services in their
own environments and reduce external dependencies. One those
dependencies is the DNS, on which we focus in this document.
Consider the following failure case:
* A community of interest is highly dependent on services that are
discoverable with names within the example.net domain;
* A failure in DNS resolution occurs in the delegation between .net
and example.net;
* IP connectivity remains intact: The DNS servers that serve
example.net authoritatively are still reachable by the community of
interest. So are the recursive nameservers and the service of
interest.
This failure case may sound relatively limited. But here are a few
less abstract examples of such failure.
Consider an enterprise campus operating under the domain example.net
that provides essential services, such as logistics, to users on its
campus. If the transit connection to the broader Internet were to
fail, the consequences could be significant. Even when all
infrastructure (DNS recursive and authoritative, and the servers for
the services themselves, etc) is on premise a failure to resolve the
delegation between top level domain .net and example.net would
eventually lead to inability to contact services.
Another example is a small island nation state that has a number of
its government services running on the island under its own TLD. Now
considers a cable cut scenario where all upstream connectivity is
lost. After a while, when authority information starts to time out
from caches (for some implementations after 24 hours), connections to
services on the island will start to fail.
A less benign example is an intervention in the DNS root. Where
delegation data for a country's level top level domain(ccTLD) gets
altered or removed. Such intervention would eventually debilitate
users which rely on services within that ccTLDs domain, usually
government services and local media outlets within that country.
While unthinkable even a few years ago these sort of scenario are now
being considered in the context of international stability in
cyberspace.
In this document we document an operational approach that, with minor
support of recursive nameserver can offer one of the elements towards
greater autonomy and resilience of infrastructure dependent on a
specific domain. While certainly not the only approach to increase
resiliency (e.g. the small island nation state example would be
solved by having a local anycast instance of the root) we introduce
this to offer confidence building mechanism that does not
fundamentally change the DNS design. This approach is consistent with
the architecture, design, and operation of the DNS. By following
practices herein we avoid namespace fragmentation. We also avoid
fundamental protocol changes, in particular we avoid alternative
roots.
The approach called 'in-tree hints', offers protection against various
attack vectors that could compromise the delegation process. For
instance, on-path attackers may attempt to alter delegation records,
which could lead to denial of service, particularly in systems
utilizing Domain Name System Security Extensions
(DNSSEC). Additionally, threats such as DNS supply chain attacks or
inadvertent errors can result in unauthorized changes to the
delegation, including DS (Delegation Signer) records. More general, we
solve for the case that a DNS resolver receives parental data that is
inconsistent with the intent from the domain owner, i.e. receiving
data that is inconsistent with what is published on authoritative
servers. That includes not receiving data at all.
In-tree hints can be seen as a building block for resiliency of
critical infrastructure or digital autonomy. The approach is
complementary to serving stale data from the cache {{RFC8767}}, more
on this in section {{stale}}.
In this memo we describe what the parties that are critically
dependent on a specific domain and those that serve zones within that
domain will need to do in order to guarantee continuous operation.
In section {{concept}} we describe the idea and the requirements for a
recursive DNS server and the requirements of the zone associated with.
In section {{resilience}} we shortly point to other measures that must
be taken in combination with this mechanism. In section {{policy}} we
discuss some policy considerations and the dilemmas that exist with
respect to intentions of the DNS parent and child.
This document uses uppercase SHOULD, RECOMMENDED and MUST in the
meaning defined by {{RFC2119}}. Their lowercase equivalents do not
have normative meaning.
The in-tree hints concept {#concept}
==========================
{{RFC9499}} describes the root hints file "Operators who manage a DNS
recursive resolver typically need to configure a 'root hints
file'. This file contains the names and IP addresses of the
authoritative name servers for the root zone, so the software can
bootstrap the DNS resolution process. For many pieces of software,
this list comes built into the software."
The in-tree hints borrows this from this idea: by configuring a 'hints
file' for a specific domain one allows oneself to bootstrap from that
domain down, even if its parents are not available. Implementing it
requires a modification in recursive nameservers and adherence to some
operational practices.
Recursive nameserver {#rec}
----------------------------
Recursive nameserver software will need to be modified to deal to work
with in-tree hints.
An in-tree hints is configuration for a recursive resolver that
provides the names and IP addresses of authoritative name servers for
a specific domain. A recursive name server may be configured for
in-tree hints for multiple domains.
When there are no in-domain (in bailiwick) nameservers ({{RFC9499}})
in the NS set for the domain then this mechanism MUST [OMK: SHOULD?] not be
used. Without this requirement the resiliency properties can
potentially not be achieved as there are dependencies outside of
control of the domain. This requirement can be enforced by the
recursive nameserver software at the moment of configuration
parsing. In addition the in bailiwick server should fate share IP
connectivity with its dependendants. For instance, in our island
example one in-domain name server should be on the isle. In our
enterprise example one in-domain server should be on campus.
In-tree hints are only useful if the domain owner follows certain
practices. A recursive nameserver MAY only implement the in-tree hints
mechanism for a specific domain if the domain owner indicates it does
so. Section {{signal}} describes the RECOMMENDED way for domain name
owners to signal their intent. [OMK: REVIEW 2019 Keywords]
In-tree hints MUST only be used in combination with a DNSSEC
trust-anchor. i.e. a trusted public DNSSEC key that is associated with
the name. The trust-anchor MUST be maintained. It SHOULD be maintained
by the mechanism described in {{RFC5011}}. Alternatively an
appropriate and trustworthy off-band mechanism MAY be used. The
operator of a recursive nameserver must validate that the domain
associated with the in-tree hints follows the operational practices
described in this memo. This can be achieved by out-of band
mechanisms, or by querying the TXT record as described in {#auth}
When a recursive nameserver is configured with an in-tree hint then
the NS Resource Record set contained in the in-tree hint MUST be used
during the resolution process. Which means that they always overwrite
the NS and DS resource records received from the parent.
When the NS RRset on the domain's authoritative server changes and has
been validated using DNSSEC against configured key then the in-hints
tree configuration SHOULD be updated with the changed authoritative NS
set. This requirement guarantees that the intent of the domain holder
will be followed.
The recursive nameserver should honor the TTLs to regular check a
change of the authoritative NS RRset. Operators that implement in-tree
hints SHOULD use tooling, possibly implemented in the recursive
nameserver, to log and signal inconsistencies between information in
the parents and the in-tree configuration to the operators of the
recursive nameserver, these inconsistencies need to be well
understood. They could be the result of a bona-fide re-delegation (in
which case the parental records are likely a subset of the
authoritative NS RR set), the withdrawal of the delegation by the
parent, or an error or attack.
The trust anchor MUST be used for the validation of record within the
tree-hint's domain even when a parental DS record exists. Nota bene,
section 5 of {{RFC5011}} allows for deletion if a superior trust point
exists - when a trust anchor is part of an in-tree hint that deletion
with the motivation that a superior trust point exists MUST not
happen. When a tree-hint exists for a subordinate domain, that trust
anchor MUST take precedence.
Recursive nameservers that implement this mechanism SHOULD have a
fallback mechanism implemented that will eventually allow them to
reach the in-domain nameserver when other servers in the NS resource
record set fail. [OMK: I think this is an existing requirement
somewhere else in the mountain of RFCs]
Domain Owner {#auth}
--------------------
This section describes the operational practices that the domain owner
has to follow in order to achieve the resiliency within the domain.
The domain owner MUST maintain its DNSSEC configuration using the
mechanism described in {{RFC5011}}.
The domain owner MUST have at least one in-domain authoritative
nameserver in its NS set. If that nameserver's name is within a
delegated child domain, then the nameservers for that delegated domain
MUST also have at least one in-domain authoritative nameserver. This
requirement is recursive for further delegation.
In order to benefit from the resiliency properties provided by this
mechanism, the domain owner should require that delegated domains
(zones) within the domain all have one nameserver that are
in-domain. Note that delegated domains do not have to maintain a trust
anchor and can rely on there being a chain of trust established using
DS records from the trust-anchor down. [OMK: is this actually clear?
Domain, sub-domain, in-domain, may become confusing]
Furthermore, the in-domain nameserver SHOULD be positioned in a
network that shares connectivity fate with the clients. For instance,
in our enterprise example it should be in the enterprise campus
network. More generally the location is subject to a risk based
assessment about the likelihood of not being able to obtain an IP
connection the in-domain nameserver.
[OMK: should there be language here about out-of-domain nameservers?]
The domain owner should communicate to its community that it is
deploying practices that support in-tree hints. That communication MAY
be out of band. A RECOMMENDED in-band signaling mechanism in-band
described in section {{signal}}.
Operational Considerations {#operational}
======================
bla
Signaling {#signal}
--------------------
It is RECOMMENDED that a domain owner (the owner of `<domain>`)
signals to its user community that they are using the mechanism
described in this section. Signaling is done by putting a TXT
resource record with owner name `_in-tree.<domain>` containing an
expiry timestamp in {{RFC3339}} format. The expiry timestamp indicates
the date to which the owner is committed to follow the instructions in
section {{auth}}.
The recursive nameserver operator should at first opportunity, but not
longer than 30 days after the expiration, validate if a new expiry
record has been published by the domain owner. If not, they SHOULD
disable the in-tree hints configuration for the domain.
```
_in-tree.<domain> TXT <expiry timestamp>
```
[OMK: Alternatively we create a trivial RR type for this. EXP RR
containing a timestamp as defined in RFC4034 section-3.1.5 ]
Out of band signaling is not in scope for this memo.
Achieving true resiliency of services within the domain. {#resilience}
--------------
This memo describes a method to achieve resiliency of name resolution
for a community of interest of a particular domain. This is, by far,
not sufficient to achieve actual resiliency for services that are
provided within the domain. While a detailed discussion is out of
scope for this memo we like to remind the reader of the following:
* The in-domain nameservers should run on IP addresses that can
reasonably be expected to be reachable by the community of use. For
example, if a service is critical for on-campus enterprise use then
the in-domain nameserver should run on the campus network.
* Any service provider that offers a service under a certain name
within the domain should make sure that those services itself can be
reasonably expected to be reachable by the community of use. Any
service dependencies should also be local.
* In an effort to create local resiliency one should not forget that
resiliency is also achieved by having no single source of
failure. Having in-domain nameservers, and having services in reach
of the community of interest does not mean that one deploys
infrastructure elsewhere.
Serving stale data {#stale}
----------------
In-tree hints are complementary to serving stale data
{{RFC8767}}. Serving stale data will allow continuity for all zones
when their authoritative servers are not reachable and the data
happens to be in the resolvers cache. In-tree hints works for specific
domains when data does not happen to be available in recursive
nameserver caches or when the parent's server(s) deliver faulty
delegation data.
In-tree hints is not scalable in the sense that there is significant
operational overhead for both the domain owner, they have to run
in-domain nameservers and follow {{RFC5011}}, and the recursive
nameserver operator as they will have to troubleshoot
inconsistencies. Serving stale data is highly scalable as it only
needs one configuration within the recursive nameserver and then it
applies for all domains.
Conclusions
=============
[TODO]
Security Considerations
=======================
In-tree hints can be used in recursive nameservers in combination with
protective block-lists and does therefore not debilitate the available
mechanism to protect the community of users of a recursive nameserver.
Malwares that use their own recursive nameservers configured with
in-trees for their command and control domains to circumvent
de-delegation by the parents. However, those recursive nameservers are
likely under the control of the malware administrators and the risk
of disproportional damage for blocking these recursive nameservers DNS
after it has been established that they are used in command and
control seems proportionate.
This mechanism intends to provide resilience for network
failures. However, it adds complexity in software and operational
procedures, thereby increasing the fragility.
When DNS validation takes place by clients that are 'behind' a
recursive nameserver that is configured with in-tree hints for a
particular domain then behavior in case of inconsistencies between the
domain and its parent will lead to undefined behavior. These
validating clients SHOULD also implement in-tree hints.
Policy Considerations {#policy}
=====================
Inherently the approach described in this memo provides a mechanism
for a community of users of a domain to overwrite the policies from
the parent domain. For instance, it allows the community of users to
continue to use the domain even when e.g. the delegation for that
domain expires. As such, this mechanism allows a community to
continue to use a domain when the parent has de-delegated the domain
for instance in the context of a court order. At the same time this
in-tree approach can be a building block to create resilience for a
critical infrastructure. It can potentially be applied to a country
code top-level domain (CCTLD) and its user community. While the
failure mode at CCTLD level is extremely low, this approach may add to
confidence in the domain name system as a whole in times of
international tensions.
When an inconsistency exists between what is published in the parent
and what is used as in-tree-hints there is a fragmentation of the DNS
namespace. The operators of the recursive nameservers should
pro-actively restore the situation to consistency. Note that there is
no technical enforcement mechanism to aid that restoration but it is
expected that if a recursive nameserver operator configures an in-tree
domain he is part of the community of interest and therefore has out
of band means to contact the domain administrator. Also note that the
operators of the domain (e.g. example.net) do not have communication
mechanism that can enforce the use or non-use of in-tree hints by
recursive nameserver operators.
The authority for using or not using in-tree hints is with the
operator of the recursive nameserver - as a user agent for its
community. Users have in general been able to overwrite their DNS
configuration since the first deployment of the DNS system. Users can
use a recursive nameserver that does not use in-tree hints for a
particular domain and therefore can opt-out of the mechanism.
IANA Considerations
===================
No IANA considerations herein.
Acknowledgments
=================
This document is inspired by various hallway conversations about digital autonomy.
The author is an employee of the Internet Society, this document does
not necessarily reflect the position of the Internet Society.
{olaf: source="olaf"}