17 March 2023
Docker likes to make things simple. If you expose a port on a host, then by default it is open to anything which can connect to the host, even if the host firewall by default drops all incoming requests. Many people have been surprised and burned by this over the years. Dockers affect on iptables is documented, but it doesn't make it super clear that if your firewall is set to drop by default, docker exposed services are still publicly accessible.
To understand how Docker bypasses the firewall, we need to look into how iptables works.
Iptables has a concept of tables and filter chains. A table can have a series of chains within it, and the chains can have filter rules, which can accept, drop or reject packets.
A packet first enters the RAW table, then the MANGLE table. On my fairly default system, there are no rules in either of these tables. Next it hits the NAT table. Docker is running on this host, and here we can see where Docker inserts its first rule, in the PREROUTING chain, directing all traffic into the DOCKER chain. Within the DOCKER chain we can see rules which correspond to ports exposed on running containers, sending the traffic to DNAT:
$ sudo iptables --line-numbers -n -L -t nat
Chain PREROUTING (policy ACCEPT)
num target prot opt source destination
1 DOCKER all -- 0.0.0.0/0 0.0.0.0/0 ADDRTYPE match dst-type LOCAL
Chain INPUT (policy ACCEPT)
num target prot opt source destination
Chain POSTROUTING (policy ACCEPT)
num target prot opt source destination
1 MASQUERADE all -- 172.25.0.0/16 0.0.0.0/0
2 MASQUERADE all -- 172.17.0.0/16 0.0.0.0/0
3 MASQUERADE tcp -- 172.25.0.3 172.25.0.3 tcp dpt:443
4 MASQUERADE tcp -- 172.25.0.3 172.25.0.3 tcp dpt:80
5 MASQUERADE tcp -- 172.25.0.8 172.25.0.8 tcp dpt:8080
Chain OUTPUT (policy ACCEPT)
num target prot opt source destination
1 DOCKER all -- 0.0.0.0/0 !127.0.0.0/8 ADDRTYPE match dst-type LOCAL
Chain DOCKER (2 references)
num target prot opt source destination
1 RETURN all -- 0.0.0.0/0 0.0.0.0/0
2 RETURN all -- 0.0.0.0/0 0.0.0.0/0
3 DNAT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:443 to:172.25.0.3:443
4 DNAT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:80 to:172.25.0.3:80
5 DNAT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:8080 to:172.25.0.8:8080
After traversing the NAT table, the packets will enter the FILTER table. Traffic assigned to NAT will skip the usual INPUT chain, which is normally where incoming packets will land, and goes to the FORWARD chain. This explains why the usual firewall rules applied to the INPUT chain in the FILTER table get by passed by Docker. Looking at the filter table, we can see docker has inserted chains and rules in the FORWARD chain:
$ sudo iptables --line-numbers -L -t filter
Chain INPUT (policy ACCEPT)
num target prot opt source destination
Chain FORWARD (policy DROP)
num target prot opt source destination
1 DOCKER-USER all -- anywhere anywhere
2 DOCKER-ISOLATION-STAGE-1 all -- anywhere anywhere
3 ACCEPT all -- anywhere anywhere ctstate RELATED,ESTABLISHED
4 DOCKER all -- anywhere anywhere
5 ACCEPT all -- anywhere anywhere
6 ACCEPT all -- anywhere anywhere
7 ACCEPT all -- anywhere anywhere ctstate RELATED,ESTABLISHED
8 DOCKER all -- anywhere anywhere
9 ACCEPT all -- anywhere anywhere
10 ACCEPT all -- anywhere anywhere
Chain OUTPUT (policy ACCEPT)
num target prot opt source destination
Chain DOCKER (2 references)
num target prot opt source destination
1 ACCEPT tcp -- anywhere 172.25.0.3 tcp dpt:https
2 ACCEPT tcp -- anywhere 172.25.0.3 tcp dpt:http
3 ACCEPT tcp -- anywhere 172.25.0.8 tcp dpt:webcache
Chain DOCKER-ISOLATION-STAGE-1 (1 references)
num target prot opt source destination
1 DOCKER-ISOLATION-STAGE-2 all -- anywhere anywhere
2 DOCKER-ISOLATION-STAGE-2 all -- anywhere anywhere
3 RETURN all -- anywhere anywhere
Chain DOCKER-ISOLATION-STAGE-2 (2 references)
num target prot opt source destination
1 DROP all -- anywhere anywhere
2 DROP all -- anywhere anywhere
3 RETURN all -- anywhere anywhere
Chain DOCKER-USER (1 references)
num target prot opt source destination
1 RETURN all -- anywhere anywhere
As documented, the traffic is first sent to the DOCKER-USER chain, where we have a chance to add custom rules, then into DOCKER-ISOLATION-STAGE-1 and then later into the DOCKER chain where we see the traffic gets accepted on our exposed containers / ports. Note in the above output it looks like there are duplicate rules, but changing the command to iptables --line-numbers -vL -t filter
shows there are some extra conditions attached to these rules, so they are not really duplicates.
Now that we know how docker works, we can devise a way to lock down the firewall using the DOCKER-USER chain.
Ideally, we would like one set of rules which can be applied to Docker containers and other non-docker services running on the host. To do that we can create a new FILTERS chain. From the DOCKER-USER chain, we can jump into the FILTERS chain applying our rules. If no rules match, by default deny the traffic.
First, jump to the FILTERS chain from DOCKER-USER for all traffic arriving on the external interface (ens3 here):
-A DOCKER-USER -i ens3 -j FILTERS
Inside FILTERS, allow the ports we want to open, and then drop everything else. We no longer need to worry about the interface, as we only jump to FILTERS for traffic arriving at ens3.
Note that we use the connection tracking module to track the original destination port. It is possible for Docker to export port 80 and forward it to port 8080. If we don't use connection tracking, the rule would fail to match, as the destination port at that point would be 8080:
-A FILTERS -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FILTERS -m tcp -p tcp -m conntrack --ctorigdstport 22 -j ACCEPT
-A FILTERS -m tcp -p tcp -m conntrack --ctorigdstport 80 -j ACCEPT
-A FILTERS -m tcp -p tcp -m conntrack --ctorigdstport 443 -j ACCEPT
-A FILTERS -j REJECT --reject-with icmp-host-prohibited
If you wish, you can also add a rule to the INPUT chain to jump to FILTERS, reusing the same rules.
Individual rules are great, but how can we put this into a full firewall script? Iptables allows its rules to be saved in a text file, and then restored. We can use that feature to create a firewall script which we can reload as required.
# ens3 is the external interface. Adjust accordingly if the external
# interface has a different name.
*filter
# Lines beginning with : are chain creation
:FILTERS - [0:0]
:WHITELIST-IP - [0:0]
:DOCKER-USER - [0:0]
# -F (flush) deletes all rules in the chain.
-F DOCKER-USER
-F WHITELIST-IP
-F FILTERS
# External interface is ens3, so send all traffic to filters.
-A DOCKER-USER -i ens3 -j FILTERS
-A FILTERS -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
# Will be updated separately with a whitelist IP
-A FILTERS -j WHITELIST-IP
-A FILTERS -m tcp -p tcp -m conntrack --ctorigdstport 22 -j ACCEPT
-A FILTERS -m tcp -p tcp -m conntrack --ctorigdstport 80 -j ACCEPT
-A FILTERS -m tcp -p tcp -m conntrack --ctorigdstport 443 -j ACCEPT
-A FILTERS -j REJECT --reject-with icmp-host-prohibited
COMMIT
To load these firewall rules, run iptables-restore -n /etc/iptables.conf
. The -n
is important, as otherwise the restore command will flush all firewall rules. With -n
, it will not flush anything unless it is specified in the script. That means this script will not affect rules in other tables and chains, eg those added by Docker.
The rules above only impact Docker containers, and it should be possible to load and reload them without impacting Docker itself, or any other firewall rules on the system.
Instead, a complete firewall script can be created that affects both Docker and access for other services running on the host. This allows the INPUT chain and DOCKER-USER chain to share the same FILTERS so that any exposed ports are the same for both Docker container and services running outside of Docker. It also ensure that external traffic is dropped by
# ens3 is the external interface. Adjust accordingly if the external
# interface has a different name.
*filter
# Lines beginning with : are chain creation
:INPUT ACCEPT [0:0]
:FORWARD DROP [0:0]
:OUTPUT ACCEPT [0:0]
:WHITELIST-IP - [0:0]
:FILTERS - [0:0]
:DOCKER-USER - [0:0]
# -F (flush) deletes all rules in the chain.
-F INPUT
-F DOCKER-USER
-F WHITELIST-IP
-F FILTERS
-F OUTPUT
# Accept all traffic from locahost
-A INPUT -i lo -j ACCEPT
# Note this will filter both internal and external interfaces
# add "-i ens3" (where ens3 is the external interface) to the above rule
-A INPUT -j FILTERS
# Filter only docker traffic arriving on the external interface ens3
-A DOCKER-USER -i ens3 -j FILTERS
# Open ports on the host
-A FILTERS -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
# Will be updated separately with a whitelist IP
-A FILTERS -j WHITELIST-IP
-A FILTERS -m tcp -p tcp -m conntrack --ctorigdstport 22 -j ACCEPT
-A FILTERS -m tcp -p tcp -m conntrack --ctorigdstport 80 -j ACCEPT
-A FILTERS -m tcp -p tcp -m conntrack --ctorigdstport 81 -j ACCEPT
-A FILTERS -m tcp -p tcp -m conntrack --ctorigdstport 443 -j ACCEPT
-A FILTERS -j REJECT --reject-with icmp-host-prohibited
COMMIT
We can make sure these rules are added at boot time by creating a simple Systemd unit file to run the restore when the system starts up. Create a file /lib/systemd/system/firewall-rules.service
:
[Unit]
Description=Restore custom firewall rules
Before=network-pre.target
Wants=network-pre.target
After=local-fs.target
[Service]
Type=oneshot
ExecStart=/sbin/iptables-restore -n /etc/firewall-rules.conf
[Install]
WantedBy=multi-user.target
Then enable it, or enable and then start depending on what systemd supports:
systemctl enable --now firewall-rules
OR
$ sudo systemctl enable firewall-rules
$ sudo systemctl start firewall-rules
If you need to change the firewall, simply exit the script, and run:
$ sudo systemctl restart firewall-rules
Note that my system originally had firewalld running on the host, and it clobbered these rules even if I had set it to start before these rules were applied. As I did not need firewalld, I simply disabled it and went with the setup here instead.
This post explains how I am using the WHITELIST-IP chain.
https://unrouted.io/2017/08/15/docker-firewall/
https://github.com/docker/docs/issues/8087
https://www.booleanworld.com/depth-guide-iptables-linux-firewall/