ACM.66 Why you should use Subnet NACLs even if you already use security groups and how to create them
This is a continuation of my series of posts on Automating Cybersecurity Metrics.
In the past few posts we automated the creation of public and private VPCs, route tables and subnets. In this post, we’ll apply network rules using a Network Access Control List or ACL applied to a subnet. Here’s what we are going to create:
Private VPC NACL
We can assign one NACL to both subnets since they both share the same ruleset.
Private HTTP NACLOutbound: 80, 443 TCP, Any IP
Inbound: Ephemeral Ports, Any IP
Remote Access NACL
Inbound: 22, 3389, Any IP
Outbound: Ephemeral ports, Any IP
Note that I’m allowing any IP in the NACL for now because I will be more specific in the security group. I could change that for my personal use and specifically restrict the remote access IP address to my own IP address since I am the only one accessing resources in this account, but I presume in a larger organization you may have many people who need to log into resources remotely in a development environment. In that case it’s easier to leave the NACL board and make the security groups more specific.
There’s no one right answer to how you should deploy your networking, but the goal should be to reduce available paths for an attacker and the blast radius associated with any single resource. I cover those topics in more detail in my book at the end of this post.
I’ve seen a lot of people make mistakes with network rules thinking if they need to open port 80 inbound they also need to open port 80 outbound. That is usually incorrect, but it depends on which network protocol you are using.
For HTTP and HTTPS, you would open TCP 80 and 443 inbound and ephemeral ports outbound.
For access to a DNS server you would typically open UDP and maybe TCP 53 outbound and ephemeral ports inbound.
Rule Processing in NACLs
The rules in a NACL are processed in order until a rule matches. You can create both ALLOW and DENY rules which you can’t do in a security group which gives you some additional flexibility and options. I’m not going to get into that right now but this flexibility would enable you to create a three tier network where only the web server network can access the application tier network and only the application tier network can access the database tier network, for example.
Why use NACLS when you have Security Groups?
I get this question all the time.
- A NACL blocks access between networks. A Security Group applies rules to a specific host. Generally you want to block what you know will never be allowed as far as ports and protocols in a NACL. The security groups tend to be more application specific and zero trust.
- If you think about where a NACL is enforced versus where security group rules are enforced you can visualize how close you are letting a known bad IP address get to your resources before you stop them. I personally want to stop them as soon as I can within my security architecture. The web-facing NACL is the first entry point.
- You can create broad rules that protect your entire network with a NACL. This can save you if someone makes a mistake with a security group rule.
- You can segregate duties between who can create security group rules and NACL rules. I prefer a network team creates everything but some organizations let developers create security groups and the network team creates the subnets and NACLs.
- NACLs have DENY capabilities that Security Groups don’t which can help with certain network designs and attack scenarios.
- You may get better performance if you block known bad right out the gate with a NACL rather than letting it get to the point of your Security Group. This logically makes sense to me but I haven’t tested it out. NACLs are stateless meaning they don’t have to spend time figuring out if the packet is associated with an existing session to allow or deny the traffic. They will drop packets quickly. A stateful firewall needs to determine if packets are associated with an existing session, because if some inbound traffic is associated with an established outbound connection or vice versa, then the traffic is allowed. This requires more processing on the part of network devices. By the way, Azure and GCP do not have stateless firewall options the last time I checked.
You don’t get charged for NALCs or Security Groups on AWS but you do pay for data transferred out of an EC2 instance.
So what does “out” mean exactly. When it leaves the instance network interface? Or when it hits the gateway? It would be interesting to measure the impact of network rules blocking unnecessary outbound traffic to see if it makes a difference in a large organization. For example, AWS used to use public NTP servers by default. How much traffic would be generated to NTP servers in a very large organization and what if you blocked that and switched to an internal NTP server? Not sure how much traffic NTP would generate but it would be interesting to find out.
Note that AWS doesn’t do that anymore. I have an upcoming post on DNS and NTP that covers that in more detail.
DDOS prevention and Performance
What if you don’t use NACLs and let that traffic get all the way to your Security Group before you block it? How does that impact your performance? Do the hits on your Security Group impact the resource it protects in any way? I never sat down and tested this as I’m always too busy but I’ve been curious about this for a long time. If you test it let me know. In my case, I block as much known-bad traffic at the NACL as I can reasonably do without overcomplicating my rules. I don’t have a huge amount of traffic in any case so the difference is likely negligible to me.
However, I did write a post about auto-blocking any bad traffic using a firewall that had that capability once. I proposed that you could reduce the impact of attacks and automated scanners and improve network performance by immediately dropping packets that has sent invalid traffic in the past. Of course you would whitelist out known good IP addresses and you probably wouldn’t apply this to an internal network, but rather your Internet-facing network access.
Someone who worked at that same security vendor as me attacked my post publicly in the comments (seems a bit obnoxious versus coming to my desk to discuss the matter — don’t be that person), but customers of the vendor came to my defense in the comments and said they did, in fact, ward off DDOS attacks using those methods. The sooner you can weed out known-bad traffic and without evaluating the state of a session, the better. NACLS are your first line of defense and as already mentioned, stateless.
Attackers using your endpoints to relay traffic
By the way, have you checked our load balancer logs lately for any endpoints directly connected to the Internet to see who’s relaying traffic off of them? Are you paying for that noisy traffic? Is it impacting your performance? Add network rules to block the offenders and report them to your cloud provider.
Limitations and Network Design
NACLs are limited to 20 ingress an egress rules. Although you can increase those numbers to 40 both ways, AWS warns that could cause a performance hit. Generally I use NACLs for broad based rules rather than specific hosts unless there are some really bad, repeat offenders that I want to block out of the gate.
For example, I might create a single web tier, app tier, and data tier in a VPC for micros-services that share the same rules (abstraction as I wrote about earlier) and then service specific security groups that create zero-trust rules for each service.
For our public VPC I’m going to allow 3389 and 22 in. Then I’m going to create a security group that allows a specific IP address to connect to a specific instance on port 22 for Linux servers and another security group for port 3389 that will get applied to Windows servers. If your developers must use a VPN to login you will be able to create very specific rules and lock things down further — a concept people don’t usually think of when they use VPNs. I might write about this more in a later post.
Creating NACLs with CloudFormation
Because I have some experience trying to automate network rules on AWS I can tell you that it is tricky. For now, I am going create sets of NALC rules in separate templates that can be associated with a NACL. I’ll add a NACLRules folder in my CFN directory and create a separate yaml file for each ruleset.
To start I will create the following files and rulesets:
We can set the following properties when creating a NACL on AWS with CloudFormation.
CidrBlock: We’re going to allow all for now 0.0.0.0/0
Egress: Depending whether we are creating an inbound or outbound rule this will be set to true or false. (Ingress means inbound. Egress means outbound)
NetworkAclId: The NACL to which we are adding the rule. This will be passed in as a parameter so we have a resusable ruleset we can apply to other NACLs.
Port range: A rule for each port above. Unlike Azure you can’t specify a list of ports with a comma.
Protocol: We will only be using TCP in our rules (6).
RuleAction: Allow or Deny
RuleNumber: Since the rules are processed in order we want to number then accordingly. If, for example, we were being attacked by a particular IP address, we could create a deny all rule for that IP address and list that rule first with the lowest number. Then we could create our allow rules after that with higher numbers.
HTTP Outbound Template:
Remote Access Inbound Template:
Here’s the updated deploy_subnets function, which unfortunately gets a bit more complicated when adding NACL deployment:
Next up. Security Groups. Follow for updates.
If you liked this story please clap and follow:
Medium: Teri Radichel or Email List: Teri Radichel
Twitter: @teriradichel or @2ndSightLab
Requests services via LinkedIn: Teri Radichel or IANS Research
© 2nd Sight Lab 2022
All the posts in this series:
Need Cloud Security Training? 2nd Sight Lab Cloud Security Training
Cybersecurity & Cloud Security Resources by Teri Radichel: Cybersecurity and Cloud security classes, articles, white papers, presentations, and podcasts