458 lines
19 KiB
Markdown
458 lines
19 KiB
Markdown
---
|
|
title: In-Tree Hints for DNS Resiliency
|
|
abbrev: in-tree-hints
|
|
docname: draft-kolkman-in-tree-hints
|
|
category: info
|
|
|
|
ipr: trust200902
|
|
area: ops
|
|
workgroup: dnsop
|
|
keyword: Internet-Draft
|
|
stand_alone: yes
|
|
pi:
|
|
RFCedstyle: yes
|
|
toc: yes
|
|
tocindent: yes
|
|
sortrefs: yes
|
|
symrefs: yes
|
|
strict: yes
|
|
comments: yes
|
|
inline: yes
|
|
text-list-symbols: -o*+
|
|
|
|
author:
|
|
ins: O. Kolkman
|
|
name: Olaf Kolkman
|
|
email: kolkman@isoc.org
|
|
|
|
normative:
|
|
RFC2119:
|
|
RFC3339:
|
|
RFC5011:
|
|
RFC7344:
|
|
|
|
|
|
informative:
|
|
RFC9499:
|
|
RFC8767:
|
|
E-Gov-Resilience:
|
|
title: "Assessing e-Government DNS Resilience"
|
|
date: 2022
|
|
author:
|
|
- ins: Sommese et al.
|
|
seriesinfo:
|
|
"IEEE": Proceedings of the 2022 International Conference on Network and Service Management (CNSM 2022)
|
|
|
|
|
|
--- abstract
|
|
|
|
We present a methodology by which networks that rely very strongly on
|
|
specific domain names can become more resilience to failures in the parent domain.
|
|
|
|
The approach presented uses a hints-file-like mechanism in recursive
|
|
nameservers in addition to having the authoritative servers follow a
|
|
few operational practices.
|
|
|
|
The suggested method can be seen as a means for increasing digital
|
|
sovereignty. We describe the approach, the necessary operational
|
|
practices, and the dilemmas this approach introduces.
|
|
|
|
--- middle
|
|
|
|
Introduction
|
|
============
|
|
-------
|
|
|
|
The Domain Name System (DNS) is a remarkably stable and resilient
|
|
system. However, in many environments people are looking on how they
|
|
can remain in control over the continuity of digital services in their
|
|
own environments and reduce external dependencies. One those
|
|
dependencies is the DNS, on which we focus in this document.
|
|
|
|
Consider the following failure case:
|
|
|
|
* A community of interest is highly dependent on services that are
|
|
discoverable with names within the example.net domain;
|
|
|
|
* A failure in DNS resolution occurs in the delegation between .net
|
|
and example.net;
|
|
|
|
* IP connectivity remains intact: The DNS servers that serve
|
|
example.net authoritatively are still reachable by the community of
|
|
interest. So are the recursive nameservers and the service of
|
|
interest.
|
|
|
|
This failure case may sound relatively limited. But here are a few
|
|
less abstract examples of such failure.
|
|
|
|
Consider an enterprise campus operating under the domain example.net
|
|
that provides essential services, such as logistics, to users on its
|
|
campus. If the transit connection to the broader Internet were to
|
|
fail, the consequences could be significant. Even when all
|
|
infrastructure (DNS recursive and authoritative, and the servers for
|
|
the services themselves, etc) is on premise a failure to resolve the
|
|
delegation between top level domain .net and example.net would
|
|
eventually lead to inability to contact services.
|
|
|
|
|
|
Another example is a small island nation state that has a number of
|
|
its government services running on the island under its own TLD. Now
|
|
considers a cable cut scenario where all upstream connectivity is
|
|
lost. After a while, when authority information starts to time out
|
|
from caches (for some implementations after 24 hours), connections to
|
|
services on the island will start to fail.
|
|
|
|
A less benign example is an intervention in the DNS root. Where
|
|
delegation data for a country's level top level domain(ccTLD) gets
|
|
altered or removed. Such intervention would eventually debilitate
|
|
users which rely on services within that ccTLDs domain, usually
|
|
government services and local media outlets within that country.
|
|
|
|
While unthinkable even a few years ago these sort of scenario are now
|
|
being considered in the context of international stability in
|
|
cyberspace.
|
|
|
|
|
|
In this document we document an operational approach that, with minor
|
|
support of recursive nameserver can offer one of the elements towards
|
|
greater autonomy and resilience of infrastructure dependent on a
|
|
specific domain. While certainly not the only approach to increase
|
|
resiliency (e.g. the small island nation state example would be
|
|
solved by having a local anycast instance of the root) we introduce
|
|
this to offer confidence building mechanism that does not
|
|
fundamentally change the DNS design. This approach is consistent with
|
|
the architecture, design, and operation of the DNS. By following
|
|
practices herein we avoid namespace fragmentation. We also avoid
|
|
fundamental protocol changes, in particular we avoid alternative
|
|
roots.
|
|
|
|
|
|
The approach called 'in-tree hints', offers protection against various
|
|
attack vectors that could compromise the delegation process. For
|
|
instance, on-path attackers may attempt to alter delegation records,
|
|
which could lead to denial of service, particularly in systems
|
|
utilizing Domain Name System Security Extensions
|
|
(DNSSEC). Additionally, threats such as DNS supply chain attacks or
|
|
inadvertent errors can result in unauthorized changes to the
|
|
delegation, including DS (Delegation Signer) records. More general, we
|
|
solve for the case that a DNS resolver receives parental data that is
|
|
inconsistent with the intent from the domain owner, i.e. receiving
|
|
data that is inconsistent with what is published on authoritative
|
|
servers. That includes not receiving data at all.
|
|
|
|
|
|
In-tree hints can be seen as a building block for resiliency of
|
|
critical infrastructure or digital autonomy. The approach is
|
|
complementary to serving stale data from the cache {{RFC8767}}, more
|
|
on this in section {{stale}}.
|
|
|
|
In this memo we describe what the parties that are critically
|
|
dependent on a specific domain and those that serve zones within that
|
|
domain will need to do in order to guarantee continuous operation.
|
|
|
|
In section {{concept}} we describe the idea and the requirements for a
|
|
recursive DNS server and the requirements of the zone associated with.
|
|
In section {{resilience}} we shortly point to other measures that must
|
|
be taken in combination with this mechanism. In section {{policy}} we
|
|
discuss some policy considerations and the dilemmas that exist with
|
|
respect to intentions of the DNS parent and child.
|
|
|
|
This document uses uppercase SHOULD, RECOMMENDED and MUST in the
|
|
meaning defined by {{RFC2119}}. Their lowercase equivalents do not
|
|
have normative meaning.
|
|
|
|
The in-tree hints concept {#concept}
|
|
==========================
|
|
|
|
{{RFC9499}} describes the root hints file "Operators who manage a DNS
|
|
recursive resolver typically need to configure a 'root hints
|
|
file'. This file contains the names and IP addresses of the
|
|
authoritative name servers for the root zone, so the software can
|
|
bootstrap the DNS resolution process. For many pieces of software,
|
|
this list comes built into the software."
|
|
|
|
The in-tree hints borrows this from this idea: by configuring a 'hints
|
|
file' for a specific domain one allows oneself to bootstrap from that
|
|
domain down, even if its parents are not available. Implementing it
|
|
requires a modification in recursive nameservers and adherence to some
|
|
operational practices.
|
|
|
|
|
|
Recursive nameserver {#rec}
|
|
----------------------------
|
|
|
|
Recursive nameserver software will need to be modified to deal to work
|
|
with in-tree hints.
|
|
|
|
An in-tree hints is configuration for a recursive resolver that
|
|
provides the names and IP addresses of authoritative name servers for
|
|
a specific domain. A recursive name server may be configured for
|
|
in-tree hints for multiple domains.
|
|
|
|
When there are no in-domain (in bailiwick) nameservers ({{RFC9499}})
|
|
in the NS set for the domain then this mechanism MUST [OMK: SHOULD?] not be
|
|
used. Without this requirement the resiliency properties can
|
|
potentially not be achieved as there are dependencies outside of
|
|
control of the domain. This requirement can be enforced by the
|
|
recursive nameserver software at the moment of configuration
|
|
parsing. In addition the in bailiwick server should fate share IP
|
|
connectivity with its dependendants. For instance, in our island
|
|
example one in-domain name server should be on the isle. In our
|
|
enterprise example one in-domain server should be on campus.
|
|
|
|
In-tree hints are only useful if the domain owner follows certain
|
|
practices. A recursive nameserver MAY only implement the in-tree hints
|
|
mechanism for a specific domain if the domain owner indicates it does
|
|
so. Section {{signal}} describes the RECOMMENDED way for domain name
|
|
owners to signal their intent. [OMK: REVIEW 2019 Keywords]
|
|
|
|
In-tree hints MUST only be used in combination with a DNSSEC
|
|
trust-anchor. i.e. a trusted public DNSSEC key that is associated with
|
|
the name. The trust-anchor MUST be maintained. It SHOULD be maintained
|
|
by the mechanism described in {{RFC5011}}. Alternatively an
|
|
appropriate and trustworthy off-band mechanism MAY be used. The
|
|
operator of a recursive nameserver must validate that the domain
|
|
associated with the in-tree hints follows the operational practices
|
|
described in this memo. This can be achieved by out-of band
|
|
mechanisms, or by querying the TXT record as described in {#auth}
|
|
|
|
When a recursive nameserver is configured with an in-tree hint then
|
|
the NS Resource Record set contained in the in-tree hint MUST be used
|
|
during the resolution process. Which means that they always overwrite
|
|
the NS and DS resource records received from the parent.
|
|
|
|
|
|
When the NS RRset on the domain's authoritative server changes and has
|
|
been validated using DNSSEC against configured key then the in-hints
|
|
tree configuration SHOULD be updated with the changed authoritative NS
|
|
set. This requirement guarantees that the intent of the domain holder
|
|
will be followed.
|
|
|
|
The recursive nameserver should honor the TTLs to regular check a
|
|
change of the authoritative NS RRset. Operators that implement in-tree
|
|
hints SHOULD use tooling, possibly implemented in the recursive
|
|
nameserver, to log and signal inconsistencies between information in
|
|
the parents and the in-tree configuration to the operators of the
|
|
recursive nameserver, these inconsistencies need to be well
|
|
understood. They could be the result of a bona-fide re-delegation (in
|
|
which case the parental records are likely a subset of the
|
|
authoritative NS RR set), the withdrawal of the delegation by the
|
|
parent, or an error or attack.
|
|
|
|
The trust anchor MUST be used for the validation of record within the
|
|
tree-hint's domain even when a parental DS record exists. Nota bene,
|
|
section 5 of {{RFC5011}} allows for deletion if a superior trust point
|
|
exists - when a trust anchor is part of an in-tree hint that deletion
|
|
with the motivation that a superior trust point exists MUST not
|
|
happen. When a tree-hint exists for a subordinate domain, that trust
|
|
anchor MUST take precedence.
|
|
|
|
Recursive nameservers that implement this mechanism SHOULD have a
|
|
fallback mechanism implemented that will eventually allow them to
|
|
reach the in-domain nameserver when other servers in the NS resource
|
|
record set fail. [OMK: I think this is an existing requirement
|
|
somewhere else in the mountain of RFCs]
|
|
|
|
Domain Owner {#auth}
|
|
--------------------
|
|
|
|
This section describes the operational practices that the domain owner
|
|
has to follow in order to achieve the resiliency within the domain.
|
|
|
|
The domain owner MUST maintain its DNSSEC configuration using the
|
|
mechanism described in {{RFC5011}}.
|
|
|
|
The domain owner MUST have at least one in-domain authoritative
|
|
nameserver in its NS set. If that nameserver's name is within a
|
|
delegated child domain, then the nameservers for that delegated domain
|
|
MUST also have at least one in-domain authoritative nameserver. This
|
|
requirement is recursive for further delegation.
|
|
|
|
In order to benefit from the resiliency properties provided by this
|
|
mechanism, the domain owner should require that delegated domains
|
|
(zones) within the domain all have one nameserver that are
|
|
in-domain. Note that delegated domains do not have to maintain a trust
|
|
anchor and can rely on there being a chain of trust established using
|
|
DS records from the trust-anchor down. [OMK: is this actually clear?
|
|
Domain, sub-domain, in-domain, may become confusing]
|
|
|
|
Furthermore, the in-domain nameserver SHOULD be positioned in a
|
|
network that shares connectivity fate with the clients. For instance,
|
|
in our enterprise example it should be in the enterprise campus
|
|
network. More generally the location is subject to a risk based
|
|
assessment about the likelihood of not being able to obtain an IP
|
|
connection the in-domain nameserver.
|
|
|
|
[OMK: should there be language here about out-of-domain nameservers?]
|
|
|
|
The domain owner should communicate to its community that it is
|
|
deploying practices that support in-tree hints. That communication MAY
|
|
be out of band. A RECOMMENDED in-band signaling mechanism in-band
|
|
described in section {{signal}}.
|
|
|
|
|
|
Operational Considerations {#operational}
|
|
======================
|
|
|
|
bla
|
|
|
|
Signaling {#signal}
|
|
--------------------
|
|
|
|
It is RECOMMENDED that a domain owner (the owner of `<domain>`)
|
|
signals to its user community that they are using the mechanism
|
|
described in this section. Signaling is done by putting a TXT
|
|
resource record with owner name `_in-tree.<domain>` containing an
|
|
expiry timestamp in {{RFC3339}} format. The expiry timestamp indicates
|
|
the date to which the owner is committed to follow the instructions in
|
|
section {{auth}}.
|
|
|
|
The recursive nameserver operator should at first opportunity, but not
|
|
longer than 30 days after the expiration, validate if a new expiry
|
|
record has been published by the domain owner. If not, they SHOULD
|
|
disable the in-tree hints configuration for the domain.
|
|
|
|
|
|
```
|
|
_in-tree.<domain> TXT <expiry timestamp>
|
|
```
|
|
|
|
[OMK: Alternatively we create a trivial RR type for this. EXP RR
|
|
containing a timestamp as defined in RFC4034 section-3.1.5 ]
|
|
|
|
Out of band signaling is not in scope for this memo.
|
|
|
|
|
|
Achieving true resiliency of services within the domain. {#resilience}
|
|
--------------
|
|
|
|
This memo describes a method to achieve resiliency of name resolution
|
|
for a community of interest of a particular domain. This is, by far,
|
|
not sufficient to achieve actual resiliency for services that are
|
|
provided within the domain. While a detailed discussion is out of
|
|
scope for this memo we like to remind the reader of the following:
|
|
|
|
* The in-domain nameservers should run on IP addresses that can
|
|
reasonably be expected to be reachable by the community of use. For
|
|
example, if a service is critical for on-campus enterprise use then
|
|
the in-domain nameserver should run on the campus network.
|
|
|
|
* Any service provider that offers a service under a certain name
|
|
within the domain should make sure that those services itself can be
|
|
reasonably expected to be reachable by the community of use. Any
|
|
service dependencies should also be local.
|
|
|
|
* In an effort to create local resiliency one should not forget that
|
|
resiliency is also achieved by having no single source of
|
|
failure. Having in-domain nameservers, and having services in reach
|
|
of the community of interest does not mean that one deploys
|
|
infrastructure elsewhere.
|
|
|
|
Serving stale data {#stale}
|
|
----------------
|
|
|
|
In-tree hints are complementary to serving stale data
|
|
{{RFC8767}}. Serving stale data will allow continuity for all zones
|
|
when their authoritative servers are not reachable and the data
|
|
happens to be in the resolvers cache. In-tree hints works for specific
|
|
domains when data does not happen to be available in recursive
|
|
nameserver caches or when the parent's server(s) deliver faulty
|
|
delegation data.
|
|
|
|
In-tree hints is not scalable in the sense that there is significant
|
|
operational overhead for both the domain owner, they have to run
|
|
in-domain nameservers and follow {{RFC5011}}, and the recursive
|
|
nameserver operator as they will have to troubleshoot
|
|
inconsistencies. Serving stale data is highly scalable as it only
|
|
needs one configuration within the recursive nameserver and then it
|
|
applies for all domains.
|
|
|
|
Conclusions
|
|
=============
|
|
|
|
[TODO]
|
|
|
|
|
|
Security Considerations
|
|
=======================
|
|
|
|
In-tree hints can be used in recursive nameservers in combination with
|
|
protective block-lists and does therefore not debilitate the available
|
|
mechanism to protect the community of users of a recursive nameserver.
|
|
|
|
Malwares that use their own recursive nameservers configured with
|
|
in-trees for their command and control domains to circumvent
|
|
de-delegation by the parents. However, those recursive nameservers are
|
|
likely under the control of the malware administrators and the risk
|
|
of disproportional damage for blocking these recursive nameservers DNS
|
|
after it has been established that they are used in command and
|
|
control seems proportionate.
|
|
|
|
This mechanism intends to provide resilience for network
|
|
failures. However, it adds complexity in software and operational
|
|
procedures, thereby increasing the fragility.
|
|
|
|
When DNS validation takes place by clients that are 'behind' a
|
|
recursive nameserver that is configured with in-tree hints for a
|
|
particular domain then behavior in case of inconsistencies between the
|
|
domain and its parent will lead to undefined behavior. These
|
|
validating clients SHOULD also implement in-tree hints.
|
|
|
|
|
|
Policy Considerations {#policy}
|
|
=====================
|
|
|
|
Inherently the approach described in this memo provides a mechanism
|
|
for a community of users of a domain to overwrite the policies from
|
|
the parent domain. For instance, it allows the community of users to
|
|
continue to use the domain even when e.g. the delegation for that
|
|
domain expires. As such, this mechanism allows a community to
|
|
continue to use a domain when the parent has de-delegated the domain
|
|
for instance in the context of a court order. At the same time this
|
|
in-tree approach can be a building block to create resilience for a
|
|
critical infrastructure. It can potentially be applied to a country
|
|
code top-level domain (CCTLD) and its user community. While the
|
|
failure mode at CCTLD level is extremely low, this approach may add to
|
|
confidence in the domain name system as a whole in times of
|
|
international tensions.
|
|
|
|
When an inconsistency exists between what is published in the parent
|
|
and what is used as in-tree-hints there is a fragmentation of the DNS
|
|
namespace. The operators of the recursive nameservers should
|
|
pro-actively restore the situation to consistency. Note that there is
|
|
no technical enforcement mechanism to aid that restoration but it is
|
|
expected that if a recursive nameserver operator configures an in-tree
|
|
domain he is part of the community of interest and therefore has out
|
|
of band means to contact the domain administrator. Also note that the
|
|
operators of the domain (e.g. example.net) do not have communication
|
|
mechanism that can enforce the use or non-use of in-tree hints by
|
|
recursive nameserver operators.
|
|
|
|
The authority for using or not using in-tree hints is with the
|
|
operator of the recursive nameserver - as a user agent for its
|
|
community. Users have in general been able to overwrite their DNS
|
|
configuration since the first deployment of the DNS system. Users can
|
|
use a recursive nameserver that does not use in-tree hints for a
|
|
particular domain and therefore can opt-out of the mechanism.
|
|
|
|
|
|
|
|
IANA Considerations
|
|
===================
|
|
|
|
No IANA considerations herein.
|
|
|
|
Acknowledgments
|
|
=================
|
|
|
|
This document is inspired by various hallway conversations about digital autonomy.
|
|
|
|
The author is an employee of the Internet Society, this document does
|
|
not necessarily reflect the position of the Internet Society.
|
|
|
|
|
|
{olaf: source="olaf"}
|
|
|
|
|
|
|