00 version 20260317
This commit is contained in:
457
draft-kolkman-dns-in-tree-hints.md
Normal file
457
draft-kolkman-dns-in-tree-hints.md
Normal file
@@ -0,0 +1,457 @@
|
||||
---
|
||||
title: In-Tree Hints for DNS Resiliency
|
||||
abbrev: in-tree-hints
|
||||
docname: draft-kolkman-in-tree-hints
|
||||
category: info
|
||||
|
||||
ipr: trust200902
|
||||
area: ops
|
||||
workgroup: dnsop
|
||||
keyword: Internet-Draft
|
||||
stand_alone: yes
|
||||
pi:
|
||||
RFCedstyle: yes
|
||||
toc: yes
|
||||
tocindent: yes
|
||||
sortrefs: yes
|
||||
symrefs: yes
|
||||
strict: yes
|
||||
comments: yes
|
||||
inline: yes
|
||||
text-list-symbols: -o*+
|
||||
|
||||
author:
|
||||
ins: O. Kolkman
|
||||
name: Olaf Kolkman
|
||||
email: kolkman@isoc.org
|
||||
|
||||
normative:
|
||||
RFC2119:
|
||||
RFC3339:
|
||||
RFC5011:
|
||||
RFC7344:
|
||||
|
||||
|
||||
informative:
|
||||
RFC9499:
|
||||
RFC8767:
|
||||
E-Gov-Resilience:
|
||||
title: "Assessing e-Government DNS Resilience"
|
||||
date: 2022
|
||||
author:
|
||||
- ins: Sommese et al.
|
||||
seriesinfo:
|
||||
"IEEE": Proceedings of the 2022 International Conference on Network and Service Management (CNSM 2022)
|
||||
|
||||
|
||||
--- abstract
|
||||
|
||||
We present a methodology by which networks that rely very strongly on
|
||||
specific domain names can become more resilience to failures in the parent domain.
|
||||
|
||||
The approach presented uses a hints-file-like mechanism in recursive
|
||||
nameservers in addition to having the authoritative servers follow a
|
||||
few operational practices.
|
||||
|
||||
The suggested method can be seen as a means for increasing digital
|
||||
sovereignty. We describe the approach, the necessary operational
|
||||
practices, and the dilemmas this approach introduces.
|
||||
|
||||
--- middle
|
||||
|
||||
Introduction
|
||||
============
|
||||
-------
|
||||
|
||||
The Domain Name System (DNS) is a remarkably stable and resilient
|
||||
system. However, in many environments people are looking on how they
|
||||
can remain in control over the continuity of digital services in their
|
||||
own environments and reduce external dependencies. One those
|
||||
dependencies is the DNS, on which we focus in this document.
|
||||
|
||||
Consider the following failure case:
|
||||
|
||||
* A community of interest is highly dependent on services that are
|
||||
discoverable with names within the example.net domain;
|
||||
|
||||
* A failure in DNS resolution occurs in the delegation between .net
|
||||
and example.net;
|
||||
|
||||
* IP connectivity remains intact: The DNS servers that serve
|
||||
example.net authoritatively are still reachable by the community of
|
||||
interest. So are the recursive nameservers and the service of
|
||||
interest.
|
||||
|
||||
This failure case may sound relatively limited. But here are a few
|
||||
less abstract examples of such failure.
|
||||
|
||||
Consider an enterprise campus operating under the domain example.net
|
||||
that provides essential services, such as logistics, to users on its
|
||||
campus. If the transit connection to the broader Internet were to
|
||||
fail, the consequences could be significant. Even when all
|
||||
infrastructure (DNS recursive and authoritative, and the servers for
|
||||
the services themselves, etc) is on premise a failure to resolve the
|
||||
delegation between top level domain .net and example.net would
|
||||
eventually lead to inability to contact services.
|
||||
|
||||
|
||||
Another example is a small island nation state that has a number of
|
||||
its government services running on the island under its own TLD. Now
|
||||
considers a cable cut scenario where all upstream connectivity is
|
||||
lost. After a while, when authority information starts to time out
|
||||
from caches (for some implementations after 24 hours), connections to
|
||||
services on the island will start to fail.
|
||||
|
||||
A less benign example is an intervention in the DNS root. Where
|
||||
delegation data for a country's level top level domain(ccTLD) gets
|
||||
altered or removed. Such intervention would eventually debilitate
|
||||
users which rely on services within that ccTLDs domain, usually
|
||||
government services and local media outlets within that country.
|
||||
|
||||
While unthinkable even a few years ago these sort of scenario are now
|
||||
being considered in the context of international stability in
|
||||
cyberspace.
|
||||
|
||||
|
||||
In this document we document an operational approach that, with minor
|
||||
support of recursive nameserver can offer one of the elements towards
|
||||
greater autonomy and resilience of infrastructure dependent on a
|
||||
specific domain. While certainly not the only approach to increase
|
||||
resiliency (e.g. the small island nation state example would be
|
||||
solved by having a local anycast instance of the root) we introduce
|
||||
this to offer confidence building mechanism that does not
|
||||
fundamentally change the DNS design. This approach is consistent with
|
||||
the architecture, design, and operation of the DNS. By following
|
||||
practices herein we avoid namespace fragmentation. We also avoid
|
||||
fundamental protocol changes, in particular we avoid alternative
|
||||
roots.
|
||||
|
||||
|
||||
The approach called 'in-tree hints', offers protection against various
|
||||
attack vectors that could compromise the delegation process. For
|
||||
instance, on-path attackers may attempt to alter delegation records,
|
||||
which could lead to denial of service, particularly in systems
|
||||
utilizing Domain Name System Security Extensions
|
||||
(DNSSEC). Additionally, threats such as DNS supply chain attacks or
|
||||
inadvertent errors can result in unauthorized changes to the
|
||||
delegation, including DS (Delegation Signer) records. More general, we
|
||||
solve for the case that a DNS resolver receives parental data that is
|
||||
inconsistent with the intent from the domain owner, i.e. receiving
|
||||
data that is inconsistent with what is published on authoritative
|
||||
servers. That includes not receiving data at all.
|
||||
|
||||
|
||||
In-tree hints can be seen as a building block for resiliency of
|
||||
critical infrastructure or digital autonomy. The approach is
|
||||
complementary to serving stale data from the cache {{RFC8767}}, more
|
||||
on this in section {{stale}}.
|
||||
|
||||
In this memo we describe what the parties that are critically
|
||||
dependent on a specific domain and those that serve zones within that
|
||||
domain will need to do in order to guarantee continuous operation.
|
||||
|
||||
In section {{concept}} we describe the idea and the requirements for a
|
||||
recursive DNS server and the requirements of the zone associated with.
|
||||
In section {{resilience}} we shortly point to other measures that must
|
||||
be taken in combination with this mechanism. In section {{policy}} we
|
||||
discuss some policy considerations and the dilemmas that exist with
|
||||
respect to intentions of the DNS parent and child.
|
||||
|
||||
This document uses uppercase SHOULD, RECOMMENDED and MUST in the
|
||||
meaning defined by {{RFC2119}}. Their lowercase equivalents do not
|
||||
have normative meaning.
|
||||
|
||||
The in-tree hints concept {#concept}
|
||||
==========================
|
||||
|
||||
{{RFC9499}} describes the root hints file "Operators who manage a DNS
|
||||
recursive resolver typically need to configure a 'root hints
|
||||
file'. This file contains the names and IP addresses of the
|
||||
authoritative name servers for the root zone, so the software can
|
||||
bootstrap the DNS resolution process. For many pieces of software,
|
||||
this list comes built into the software."
|
||||
|
||||
The in-tree hints borrows this from this idea: by configuring a 'hints
|
||||
file' for a specific domain one allows oneself to bootstrap from that
|
||||
domain down, even if its parents are not available. Implementing it
|
||||
requires a modification in recursive nameservers and adherence to some
|
||||
operational practices.
|
||||
|
||||
|
||||
Recursive nameserver {#rec}
|
||||
----------------------------
|
||||
|
||||
Recursive nameserver software will need to be modified to deal to work
|
||||
with in-tree hints.
|
||||
|
||||
An in-tree hints is configuration for a recursive resolver that
|
||||
provides the names and IP addresses of authoritative name servers for
|
||||
a specific domain. A recursive name server may be configured for
|
||||
in-tree hints for multiple domains.
|
||||
|
||||
When there are no in-domain (in bailiwick) nameservers ({{RFC9499}})
|
||||
in the NS set for the domain then this mechanism MUST [OMK: SHOULD?] not be
|
||||
used. Without this requirement the resiliency properties can
|
||||
potentially not be achieved as there are dependencies outside of
|
||||
control of the domain. This requirement can be enforced by the
|
||||
recursive nameserver software at the moment of configuration
|
||||
parsing. In addition the in bailiwick server should fate share IP
|
||||
connectivity with its dependendants. For instance, in our island
|
||||
example one in-domain name server should be on the isle. In our
|
||||
enterprise example one in-domain server should be on campus.
|
||||
|
||||
In-tree hints are only useful if the domain owner follows certain
|
||||
practices. A recursive nameserver MAY only implement the in-tree hints
|
||||
mechanism for a specific domain if the domain owner indicates it does
|
||||
so. Section {{signal}} describes the RECOMMENDED way for domain name
|
||||
owners to signal their intent. [OMK: REVIEW 2019 Keywords]
|
||||
|
||||
In-tree hints MUST only be used in combination with a DNSSEC
|
||||
trust-anchor. i.e. a trusted public DNSSEC key that is associated with
|
||||
the name. The trust-anchor MUST be maintained. It SHOULD be maintained
|
||||
by the mechanism described in {{RFC5011}}. Alternatively an
|
||||
appropriate and trustworthy off-band mechanism MAY be used. The
|
||||
operator of a recursive nameserver must validate that the domain
|
||||
associated with the in-tree hints follows the operational practices
|
||||
described in this memo. This can be achieved by out-of band
|
||||
mechanisms, or by querying the TXT record as described in {#auth}
|
||||
|
||||
When a recursive nameserver is configured with an in-tree hint then
|
||||
the NS Resource Record set contained in the in-tree hint MUST be used
|
||||
during the resolution process. Which means that they always overwrite
|
||||
the NS and DS resource records received from the parent.
|
||||
|
||||
|
||||
When the NS RRset on the domain's authoritative server changes and has
|
||||
been validated using DNSSEC against configured key then the in-hints
|
||||
tree configuration SHOULD be updated with the changed authoritative NS
|
||||
set. This requirement guarantees that the intent of the domain holder
|
||||
will be followed.
|
||||
|
||||
The recursive nameserver should honor the TTLs to regular check a
|
||||
change of the authoritative NS RRset. Operators that implement in-tree
|
||||
hints SHOULD use tooling, possibly implemented in the recursive
|
||||
nameserver, to log and signal inconsistencies between information in
|
||||
the parents and the in-tree configuration to the operators of the
|
||||
recursive nameserver, these inconsistencies need to be well
|
||||
understood. They could be the result of a bona-fide re-delegation (in
|
||||
which case the parental records are likely a subset of the
|
||||
authoritative NS RR set), the withdrawal of the delegation by the
|
||||
parent, or an error or attack.
|
||||
|
||||
The trust anchor MUST be used for the validation of record within the
|
||||
tree-hint's domain even when a parental DS record exists. Nota bene,
|
||||
section 5 of {{RFC5011}} allows for deletion if a superior trust point
|
||||
exists - when a trust anchor is part of an in-tree hint that deletion
|
||||
with the motivation that a superior trust point exists MUST not
|
||||
happen. When a tree-hint exists for a subordinate domain, that trust
|
||||
anchor MUST take precedence.
|
||||
|
||||
Recursive nameservers that implement this mechanism SHOULD have a
|
||||
fallback mechanism implemented that will eventually allow them to
|
||||
reach the in-domain nameserver when other servers in the NS resource
|
||||
record set fail. [OMK: I think this is an existing requirement
|
||||
somewhere else in the mountain of RFCs]
|
||||
|
||||
Domain Owner {#auth}
|
||||
--------------------
|
||||
|
||||
This section describes the operational practices that the domain owner
|
||||
has to follow in order to achieve the resiliency within the domain.
|
||||
|
||||
The domain owner MUST maintain its DNSSEC configuration using the
|
||||
mechanism described in {{RFC5011}}.
|
||||
|
||||
The domain owner MUST have at least one in-domain authoritative
|
||||
nameserver in its NS set. If that nameserver's name is within a
|
||||
delegated child domain, then the nameservers for that delegated domain
|
||||
MUST also have at least one in-domain authoritative nameserver. This
|
||||
requirement is recursive for further delegation.
|
||||
|
||||
In order to benefit from the resiliency properties provided by this
|
||||
mechanism, the domain owner should require that delegated domains
|
||||
(zones) within the domain all have one nameserver that are
|
||||
in-domain. Note that delegated domains do not have to maintain a trust
|
||||
anchor and can rely on there being a chain of trust established using
|
||||
DS records from the trust-anchor down. [OMK: is this actually clear?
|
||||
Domain, sub-domain, in-domain, may become confusing]
|
||||
|
||||
Furthermore, the in-domain nameserver SHOULD be positioned in a
|
||||
network that shares connectivity fate with the clients. For instance,
|
||||
in our enterprise example it should be in the enterprise campus
|
||||
network. More generally the location is subject to a risk based
|
||||
assessment about the likelihood of not being able to obtain an IP
|
||||
connection the in-domain nameserver.
|
||||
|
||||
[OMK: should there be language here about out-of-domain nameservers?]
|
||||
|
||||
The domain owner should communicate to its community that it is
|
||||
deploying practices that support in-tree hints. That communication MAY
|
||||
be out of band. A RECOMMENDED in-band signaling mechanism in-band
|
||||
described in section {{signal}}.
|
||||
|
||||
|
||||
Operational Considerations {#operational}
|
||||
======================
|
||||
|
||||
bla
|
||||
|
||||
Signaling {#signal}
|
||||
--------------------
|
||||
|
||||
It is RECOMMENDED that a domain owner (the owner of `<domain>`)
|
||||
signals to its user community that they are using the mechanism
|
||||
described in this section. Signaling is done by putting a TXT
|
||||
resource record with owner name `_in-tree.<domain>` containing an
|
||||
expiry timestamp in {{RFC3339}} format. The expiry timestamp indicates
|
||||
the date to which the owner is committed to follow the instructions in
|
||||
section {{auth}}.
|
||||
|
||||
The recursive nameserver operator should at first opportunity, but not
|
||||
longer than 30 days after the expiration, validate if a new expiry
|
||||
record has been published by the domain owner. If not, they SHOULD
|
||||
disable the in-tree hints configuration for the domain.
|
||||
|
||||
|
||||
```
|
||||
_in-tree.<domain> TXT <expiry timestamp>
|
||||
```
|
||||
|
||||
[OMK: Alternatively we create a trivial RR type for this. EXP RR
|
||||
containing a timestamp as defined in RFC4034 section-3.1.5 ]
|
||||
|
||||
Out of band signaling is not in scope for this memo.
|
||||
|
||||
|
||||
Achieving true resiliency of services within the domain. {#resilience}
|
||||
--------------
|
||||
|
||||
This memo describes a method to achieve resiliency of name resolution
|
||||
for a community of interest of a particular domain. This is, by far,
|
||||
not sufficient to achieve actual resiliency for services that are
|
||||
provided within the domain. While a detailed discussion is out of
|
||||
scope for this memo we like to remind the reader of the following:
|
||||
|
||||
* The in-domain nameservers should run on IP addresses that can
|
||||
reasonably be expected to be reachable by the community of use. For
|
||||
example, if a service is critical for on-campus enterprise use then
|
||||
the in-domain nameserver should run on the campus network.
|
||||
|
||||
* Any service provider that offers a service under a certain name
|
||||
within the domain should make sure that those services itself can be
|
||||
reasonably expected to be reachable by the community of use. Any
|
||||
service dependencies should also be local.
|
||||
|
||||
* In an effort to create local resiliency one should not forget that
|
||||
resiliency is also achieved by having no single source of
|
||||
failure. Having in-domain nameservers, and having services in reach
|
||||
of the community of interest does not mean that one deploys
|
||||
infrastructure elsewhere.
|
||||
|
||||
Serving stale data {#stale}
|
||||
----------------
|
||||
|
||||
In-tree hints are complementary to serving stale data
|
||||
{{RFC8767}}. Serving stale data will allow continuity for all zones
|
||||
when their authoritative servers are not reachable and the data
|
||||
happens to be in the resolvers cache. In-tree hints works for specific
|
||||
domains when data does not happen to be available in recursive
|
||||
nameserver caches or when the parent's server(s) deliver faulty
|
||||
delegation data.
|
||||
|
||||
In-tree hints is not scalable in the sense that there is significant
|
||||
operational overhead for both the domain owner, they have to run
|
||||
in-domain nameservers and follow {{RFC5011}}, and the recursive
|
||||
nameserver operator as they will have to troubleshoot
|
||||
inconsistencies. Serving stale data is highly scalable as it only
|
||||
needs one configuration within the recursive nameserver and then it
|
||||
applies for all domains.
|
||||
|
||||
Conclusions
|
||||
=============
|
||||
|
||||
[TODO]
|
||||
|
||||
|
||||
Security Considerations
|
||||
=======================
|
||||
|
||||
In-tree hints can be used in recursive nameservers in combination with
|
||||
protective block-lists and does therefore not debilitate the available
|
||||
mechanism to protect the community of users of a recursive nameserver.
|
||||
|
||||
Malwares that use their own recursive nameservers configured with
|
||||
in-trees for their command and control domains to circumvent
|
||||
de-delegation by the parents. However, those recursive nameservers are
|
||||
likely under the control of the malware administrators and the risk
|
||||
of disproportional damage for blocking these recursive nameservers DNS
|
||||
after it has been established that they are used in command and
|
||||
control seems proportionate.
|
||||
|
||||
This mechanism intends to provide resilience for network
|
||||
failures. However, it adds complexity in software and operational
|
||||
procedures, thereby increasing the fragility.
|
||||
|
||||
When DNS validation takes place by clients that are 'behind' a
|
||||
recursive nameserver that is configured with in-tree hints for a
|
||||
particular domain then behavior in case of inconsistencies between the
|
||||
domain and its parent will lead to undefined behavior. These
|
||||
validating clients SHOULD also implement in-tree hints.
|
||||
|
||||
|
||||
Policy Considerations {#policy}
|
||||
=====================
|
||||
|
||||
Inherently the approach described in this memo provides a mechanism
|
||||
for a community of users of a domain to overwrite the policies from
|
||||
the parent domain. For instance, it allows the community of users to
|
||||
continue to use the domain even when e.g. the delegation for that
|
||||
domain expires. As such, this mechanism allows a community to
|
||||
continue to use a domain when the parent has de-delegated the domain
|
||||
for instance in the context of a court order. At the same time this
|
||||
in-tree approach can be a building block to create resilience for a
|
||||
critical infrastructure. It can potentially be applied to a country
|
||||
code top-level domain (CCTLD) and its user community. While the
|
||||
failure mode at CCTLD level is extremely low, this approach may add to
|
||||
confidence in the domain name system as a whole in times of
|
||||
international tensions.
|
||||
|
||||
When an inconsistency exists between what is published in the parent
|
||||
and what is used as in-tree-hints there is a fragmentation of the DNS
|
||||
namespace. The operators of the recursive nameservers should
|
||||
pro-actively restore the situation to consistency. Note that there is
|
||||
no technical enforcement mechanism to aid that restoration but it is
|
||||
expected that if a recursive nameserver operator configures an in-tree
|
||||
domain he is part of the community of interest and therefore has out
|
||||
of band means to contact the domain administrator. Also note that the
|
||||
operators of the domain (e.g. example.net) do not have communication
|
||||
mechanism that can enforce the use or non-use of in-tree hints by
|
||||
recursive nameserver operators.
|
||||
|
||||
The authority for using or not using in-tree hints is with the
|
||||
operator of the recursive nameserver - as a user agent for its
|
||||
community. Users have in general been able to overwrite their DNS
|
||||
configuration since the first deployment of the DNS system. Users can
|
||||
use a recursive nameserver that does not use in-tree hints for a
|
||||
particular domain and therefore can opt-out of the mechanism.
|
||||
|
||||
|
||||
|
||||
IANA Considerations
|
||||
===================
|
||||
|
||||
No IANA considerations herein.
|
||||
|
||||
Acknowledgments
|
||||
=================
|
||||
|
||||
This document is inspired by various hallway conversations about digital autonomy.
|
||||
|
||||
The author is an employee of the Internet Society, this document does
|
||||
not necessarily reflect the position of the Internet Society.
|
||||
|
||||
|
||||
{olaf: source="olaf"}
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user