17 March 2023

Docker and the iptables firewall

Docker likes to make things simple. If you expose a port on a host, then by default it is open to anything which can connect to the host, even if the host firewall by default drops all incoming requests. Many people have been surprised and burned by this over the years. Dockers affect on iptables is documented, but it doesn't make it super clear that if your firewall is set to drop by default, docker exposed services are still publicly accessible.

To understand how Docker bypasses the firewall, we need to look into how iptables works.

Tables and Filter Chains

Iptables has a concept of tables and filter chains. A table can have a series of chains within it, and the chains can have filter rules, which can accept, drop or reject packets.

A packet first enters the RAW table, then the MANGLE table. On my fairly default system, there are no rules in either of these tables. Next it hits the NAT table. Docker is running on this host, and here we can see where Docker inserts its first rule, in the PREROUTING chain, directing all traffic into the DOCKER chain. Within the DOCKER chain we can see rules which correspond to ports exposed on running containers, sending the traffic to DNAT:

$ sudo iptables --line-numbers -n -L -t nat
Chain PREROUTING (policy ACCEPT)
num  target     prot opt source               destination         
1    DOCKER     all  --  0.0.0.0/0            0.0.0.0/0            ADDRTYPE match dst-type LOCAL

Chain INPUT (policy ACCEPT)
num  target     prot opt source               destination         

Chain POSTROUTING (policy ACCEPT)
num  target     prot opt source               destination         
1    MASQUERADE  all  --  172.25.0.0/16        0.0.0.0/0           
2    MASQUERADE  all  --  172.17.0.0/16        0.0.0.0/0           
3    MASQUERADE  tcp  --  172.25.0.3           172.25.0.3           tcp dpt:443
4    MASQUERADE  tcp  --  172.25.0.3           172.25.0.3           tcp dpt:80
5    MASQUERADE  tcp  --  172.25.0.8           172.25.0.8           tcp dpt:8080

Chain OUTPUT (policy ACCEPT)
num  target     prot opt source               destination         
1    DOCKER     all  --  0.0.0.0/0           !127.0.0.0/8          ADDRTYPE match dst-type LOCAL

Chain DOCKER (2 references)
num  target     prot opt source               destination         
1    RETURN     all  --  0.0.0.0/0            0.0.0.0/0           
2    RETURN     all  --  0.0.0.0/0            0.0.0.0/0           
3    DNAT       tcp  --  0.0.0.0/0            0.0.0.0/0            tcp dpt:443 to:172.25.0.3:443
4    DNAT       tcp  --  0.0.0.0/0            0.0.0.0/0            tcp dpt:80 to:172.25.0.3:80
5    DNAT       tcp  --  0.0.0.0/0            0.0.0.0/0            tcp dpt:8080 to:172.25.0.8:8080

After traversing the NAT table, the packets will enter the FILTER table. Traffic assigned to NAT will skip the usual INPUT chain, which is normally where incoming packets will land, and goes to the FORWARD chain. This explains why the usual firewall rules applied to the INPUT chain in the FILTER table get by passed by Docker. Looking at the filter table, we can see docker has inserted chains and rules in the FORWARD chain:

$ sudo iptables --line-numbers  -L -t filter 
Chain INPUT (policy ACCEPT)
num  target     prot opt source               destination         

Chain FORWARD (policy DROP)
num  target     prot opt source               destination         
1    DOCKER-USER  all  --  anywhere             anywhere            
2    DOCKER-ISOLATION-STAGE-1  all  --  anywhere             anywhere            
3    ACCEPT     all  --  anywhere             anywhere             ctstate RELATED,ESTABLISHED
4    DOCKER     all  --  anywhere             anywhere            
5    ACCEPT     all  --  anywhere             anywhere            
6    ACCEPT     all  --  anywhere             anywhere            
7    ACCEPT     all  --  anywhere             anywhere             ctstate RELATED,ESTABLISHED
8    DOCKER     all  --  anywhere             anywhere            
9    ACCEPT     all  --  anywhere             anywhere            
10   ACCEPT     all  --  anywhere             anywhere            

Chain OUTPUT (policy ACCEPT)
num  target     prot opt source               destination         

Chain DOCKER (2 references)
num  target     prot opt source               destination         
1    ACCEPT     tcp  --  anywhere             172.25.0.3           tcp dpt:https
2    ACCEPT     tcp  --  anywhere             172.25.0.3           tcp dpt:http
3    ACCEPT     tcp  --  anywhere             172.25.0.8           tcp dpt:webcache

Chain DOCKER-ISOLATION-STAGE-1 (1 references)
num  target     prot opt source               destination         
1    DOCKER-ISOLATION-STAGE-2  all  --  anywhere             anywhere            
2    DOCKER-ISOLATION-STAGE-2  all  --  anywhere             anywhere            
3    RETURN     all  --  anywhere             anywhere            

Chain DOCKER-ISOLATION-STAGE-2 (2 references)
num  target     prot opt source               destination         
1    DROP       all  --  anywhere             anywhere            
2    DROP       all  --  anywhere             anywhere            
3    RETURN     all  --  anywhere             anywhere            

Chain DOCKER-USER (1 references)
num  target     prot opt source               destination         
1    RETURN     all  --  anywhere             anywhere

As documented, the traffic is first sent to the DOCKER-USER chain, where we have a chance to add custom rules, then into DOCKER-ISOLATION-STAGE-1 and then later into the DOCKER chain where we see the traffic gets accepted on our exposed containers / ports. Note in the above output it looks like there are duplicate rules, but changing the command to iptables --line-numbers -vL -t filter shows there are some extra conditions attached to these rules, so they are not really duplicates.

Now that we know how docker works, we can devise a way to lock down the firewall using the DOCKER-USER chain.

Custom Firewall Rules

Ideally, we would like one set of rules which can be applied to Docker containers and other non-docker services running on the host. To do that we can create a new FILTERS chain. From the DOCKER-USER chain, we can jump into the FILTERS chain applying our rules. If no rules match, by default deny the traffic.

First, jump to the FILTERS chain from DOCKER-USER for all traffic arriving on the external interface (ens3 here):

-A DOCKER-USER -i ens3 -j FILTERS

Inside FILTERS, allow the ports we want to open, and then drop everything else. We no longer need to worry about the interface, as we only jump to FILTERS for traffic arriving at ens3.

Note that we use the connection tracking module to track the original destination port. It is possible for Docker to export port 80 and forward it to port 8080. If we don't use connection tracking, the rule would fail to match, as the destination port at that point would be 8080:

-A FILTERS -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FILTERS -m tcp -p tcp -m conntrack --ctorigdstport 22 -j ACCEPT
-A FILTERS -m tcp -p tcp -m conntrack --ctorigdstport 80 -j ACCEPT
-A FILTERS -m tcp -p tcp -m conntrack --ctorigdstport 443 -j ACCEPT
-A FILTERS -j REJECT --reject-with icmp-host-prohibited

If you wish, you can also add a rule to the INPUT chain to jump to FILTERS, reusing the same rules.

Complete Firewall Script

Individual rules are great, but how can we put this into a full firewall script? Iptables allows its rules to be saved in a text file, and then restored. We can use that feature to create a firewall script which we can reload as required.

# ens3 is the external interface. Adjust accordingly if the external 
# interface has a different name.

*filter

# Lines beginning with : are chain creation
:FILTERS - [0:0]
:WHITELIST-IP - [0:0]
:DOCKER-USER - [0:0]

# -F (flush) deletes all rules in the chain.
-F DOCKER-USER
-F WHITELIST-IP
-F FILTERS

# External interface is ens3, so send all traffic to filters.
-A DOCKER-USER -i ens3 -j FILTERS

-A FILTERS -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
# Will be updated separately with a whitelist IP
-A FILTERS -j WHITELIST-IP
-A FILTERS -m tcp -p tcp -m conntrack --ctorigdstport 22 -j ACCEPT
-A FILTERS -m tcp -p tcp -m conntrack --ctorigdstport 80 -j ACCEPT
-A FILTERS -m tcp -p tcp -m conntrack --ctorigdstport 443 -j ACCEPT
-A FILTERS -j REJECT --reject-with icmp-host-prohibited

COMMIT

To load these firewall rules, run iptables-restore -n /etc/iptables.conf. The -n is important, as otherwise the restore command will flush all firewall rules. With -n, it will not flush anything unless it is specified in the script. That means this script will not affect rules in other tables and chains, eg those added by Docker.

The rules above only impact Docker containers, and it should be possible to load and reload them without impacting Docker itself, or any other firewall rules on the system.

Instead, a complete firewall script can be created that affects both Docker and access for other services running on the host. This allows the INPUT chain and DOCKER-USER chain to share the same FILTERS so that any exposed ports are the same for both Docker container and services running outside of Docker. It also ensure that external traffic is dropped by

# ens3 is the external interface. Adjust accordingly if the external 
# interface has a different name.

*filter
# Lines beginning with : are chain creation
:INPUT ACCEPT [0:0]
:FORWARD DROP [0:0]
:OUTPUT ACCEPT [0:0]
:WHITELIST-IP - [0:0]
:FILTERS - [0:0]
:DOCKER-USER - [0:0]

# -F (flush) deletes all rules in the chain.
-F INPUT
-F DOCKER-USER
-F WHITELIST-IP
-F FILTERS
-F OUTPUT

# Accept all traffic from locahost
-A INPUT -i lo -j ACCEPT
# Note this will filter both internal and external interfaces
# add "-i ens3" (where ens3 is the external interface) to the above rule
-A INPUT -j FILTERS

# Filter only docker traffic arriving on the external interface ens3
-A DOCKER-USER -i ens3 -j FILTERS

# Open ports on the host
-A FILTERS -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
# Will be updated separately with a whitelist IP
-A FILTERS -j WHITELIST-IP
-A FILTERS -m tcp -p tcp -m conntrack --ctorigdstport 22 -j ACCEPT
-A FILTERS -m tcp -p tcp -m conntrack --ctorigdstport 80 -j ACCEPT
-A FILTERS -m tcp -p tcp -m conntrack --ctorigdstport 81 -j ACCEPT
-A FILTERS -m tcp -p tcp -m conntrack --ctorigdstport 443 -j ACCEPT
-A FILTERS -j REJECT --reject-with icmp-host-prohibited

COMMIT

What about Boot Time

We can make sure these rules are added at boot time by creating a simple Systemd unit file to run the restore when the system starts up. Create a file /lib/systemd/system/firewall-rules.service:

[Unit]
Description=Restore custom firewall rules
Before=network-pre.target
Wants=network-pre.target
After=local-fs.target

[Service]
Type=oneshot
ExecStart=/sbin/iptables-restore -n /etc/firewall-rules.conf

[Install]
WantedBy=multi-user.target

Then enable it, or enable and then start depending on what systemd supports:

systemctl enable --now firewall-rules

OR

$ sudo systemctl enable firewall-rules
$ sudo systemctl start firewall-rules

If you need to change the firewall, simply exit the script, and run:

$ sudo systemctl restart firewall-rules

Note that my system originally had firewalld running on the host, and it clobbered these rules even if I had set it to start before these rules were applied. As I did not need firewalld, I simply disabled it and went with the setup here instead.

This post explains how I am using the WHITELIST-IP chain.

References

https://unrouted.io/2017/08/15/docker-firewall/

https://github.com/docker/docs/issues/8087

https://www.booleanworld.com/depth-guide-iptables-linux-firewall/