ACM.108 How to inspecting traffic flows and troubleshoot VPC Endpoint connections
This is a continuation of my series on Automating Cybersecurity Metrics.
This post is long and detailed. You are really going to get your money's worth! If you want a summary of what we discover in this post, jump to the end. If you want justification for the summary, read the whole post or to understand how we came to those conclusions by inspecting and validating network traffic for different configurations.
We deployed a VPC Endpoint with CloudFormation for CloudFormation in this blog post.
Then we added a role to our EC2 instance in the last post to enable AWS CLI commands on that instance without using long-term developer credentials.
We tested out executing a CloudFormation CLI command in the last post and it didn’t work. Why not? We’ve got our VPC Endpoint and the security groups set up to access it. We do not allow access to the entire Internet but shouldn’t the call to the AWS CLI be sending the requests to CloudFormation to the VPC endpoint?
How can we troubleshoot why we cannot connect to the VPC Endpoint?
Initially I thought it was not possible to view the traffic to and from the VPC Endpoint interface. When I looked at network interfaces in the EC2 dashboard, the only network interfaces that existed there were for the two EC2 instances in my test account.
Although an interface is created in our subnet per the documentation, the interface did not seem to be visible in the AWS console either on the VPC Endpoint page or in the list of interfaces on the EC2 dashboard.
You can choose one subnet per Availability Zone for your interface endpoint. If you add a subnet, we create an endpoint network interface in the subnet and assign it a private IP address from the IP address range of the subnet. If you remove a subnet, we delete its endpoint network interface.
Because I could not get the ENI from the network interfaces dashboard I presumed the related to traffic was not in VCP Flow Logs. Even if it was, I wouldn’t know which Interface to look at. So I started with the EC2 instance. By by the end of this post, the VPCEndpoint network interfaces magically showed up in the network interfaces list. I don’t know if it simply takes them a long time to show up or what was going on there, but you can find the network interfaces on the EC2 dashboard.
Even though I thought we couldn’t see the network interfaces for the VPC Endpoints, I reasoned that we should be able to see traffic going to it from our EC2 network interface which is in VPC Flow Logs.
Look at the networking tab for your EC2 instance.
Copy the network interface ID (eni-xxxxxxxxx)
Navigate to VPC Flow Logs for the Remote Access VPC where we deployed our EC2 instance. Search for the ENI for your EC2 instance.
Click on it.
You’ll see all kinds of network traffic — most of it rejected noise from the Internet due scammers seeking vulnerabilities on your system.
Search for [space]443[space]. (You need the spaces to weed out other traffic with 443 somewhere in the data but not specifically the network port.) You can also put quotes around the port number like this: “ 443 “.
Here you can see a couple of things. The logs show the PRIVATE IP address of your EC2 instance, not the public IP address.
There are a bunch of 52.x.x.x IPs in my logs. Look up one of those IPs at arin.net:
Yes, it is an Amazon IP address:
That is your EC2 instance trying to reach the CloudFormation service but it is blocked. Why is it a public IP address? Shouldn’t it be sending traffic to a private IP address now that we set up our VPC Endpoint?
Navigate to your VPC Endpoints.
Click on the CloudFormation endpoint.
Here we can see there are some DNS names for our endpoint. It also shows that private DNS is not enabled.
Use dig to see what IP address is returned for one of those DNS names:
There you can see the DNS name resolves to a private IP address.
Unfortunately you can’t see any DNS queries in VPC Flow Logs.
The reason you can’t see DNS traffic when using AWS DNS servers in VPC Flow Logs is because that traffic does not traverse the network interface or appliance that captures VPC Flow Logs in the AWS Network.
If you are concerned about capturing threats in this traffic you have a couple of options. One is to set up your own DNS servers on AWS. That is very complicated. Trust me. I helped Capital One do it and you will have a myriad of complications, but DNS is one of the best places to spot attacks so some companies will opt for that option.You can also use AWS GuardDuty which monitors the DNS traffic, even though you can't inspect it yourself, and alerts you to suspicious activity it uncovers.DNS Is a huge topic so I'll leave it at that for the moment as we really just want to get our VCP Endpoint working in this post.
We also cannot see application layer logging in the OSI Model so we wouldn’t be able to see domain names even if that traffic was captured.
Viewing DNS and other network traffic on a host
We can use tcpdump on the local host, however to look at DNS queries. TCP Dump is a network packet sniffer like WireShark that runs on Linux. I wrote about packet sniffing here:
Run this command to capture only traffic to port 53 for DNS and not all your SSH traffic (which would be too noisy):
sudo tcpdump -vvvv port 53
Looks like when we run our CloudFormation command the host is making queries to EC2, not CloudFormation.
Now we can’t really figure out exactly what IP range belongs to CloudFormation because unfortunately it is not in this list:
We also can’t use a prefix list because currently none exists for CloudFormation so we cannot lock down our rules to this list.
Although not ideal, to complete our test, I’m going to open my IP ranges to 184.108.40.206/8. That’s any IPv4 address that starts with 52. This range does not only include AWS IPs unfortunately. Looking at the last IP address in the range we can see it belongs to Microsoft:
I really hope AWS expands it’s use of prefix lists to all services but for this short test I’ll use that 52.x range so we can see if we can get this working.
Temporary changes can cause security problems! Be very careful when making a temporary change to test something - this is often the point when people forget to restore configurations and leave their environments in a vulnerable state. I remember when one company's SQL database was compromised because it was exposed to the Internet. The statement from the developers was "We were only trying to run a temporary test."This will be a manual change so it will be overwritten if I simply delete and re-run my CloudFormation stacks. Note that it will NOT be overwritten if I make a manual change and my CloudFormation template hasn’t changed, unless I force a change with a timestamp as I showed you how to do in an earlier post. CloudFormation only makes an update if the template used to deploy the resource changes. And as we saw with trust policies that doesn’t even always work.
Open up outbound access to that range (or the applicable range if you are in a different region) in any security group and run tcpdump and your CloudFormation command again.
Now the traffic is not blocked so we can see that the DNS query is used to get the CloudFormation IP address and the HTTPS traffic follows.
In the other window where we ran our CloudFormation command we can see that it is successful.
AWS Public IP addresses for AWS services
Why is the DNS resolving to a public IP address? Here’s what AWS has to say about accessing services through AWS Private Link:
When an Internet Gateway is present, the service is available via the public endpoint. The traffic will traverse the Internet Gateway and use a public IP address but it will remain within the AWS network.
When using a VPC endpoint, here’s what the documentation has to say:
Traffic destined for the AWS service is resolved to the private IP addresses of the endpoint network interfaces using DNS, and then sent to the AWS service using the connection between the VPC endpoint and the AWS service.
But our DNS queries are resolving to a public IP address? Why?
Here’s the key point:
If you enable both DNS hostnames and DNS resolution for your VPC, we create a hidden private hosted zone. The hosted zone contains a record set for the default DNS name for the service that resolves it to the private IP addresses of the endpoint network interfaces in your VPC.
So have we enabled DNS hostnames and DNS resolution for our VPC? Let’s check. Look at the details of our remote access VPC.
Looks like we have DNS resolution enabled but not DNS hostnames.
What do these two settings do?
It looks like enable DNS host names gives EC2 publicly resolvable DNS names. I didn’t need or want that when I configured this VPC so I left it disabled. We did want our EC2 instances to be able to resolve DNS names so I left that setting on.
The documentation here that is referenced by the VPC Endpoint documentation doesn’t say anything about resolving VPC Endpoints to private IP addresses. I find it odd that we would need to enable public DNS host names for EC2 instances to get private IP addresses for our VPC Endpoint. Those public DNS names basically tell anyone querying the DNS which IP addresses have EC2 instances associated with them. I’d rather leave it off.
Before enabling that let’s take a look at one other thing. When we configured our VPC endpoint we did not set the PrivateDnsEnabled property to true when we created our endpoint.
It makes much more sense to me that we would need to set this to true to resolve our problem than to enable DNS host names in our VPC for EC2 instances. Let’s try setting this to true first and see how that affects our DNS queries.
No luck. Apparently, in order to enable private DNS on your VPC endpoint, you will need to enable DNS host names for your EC2 instances as well.
Right now the EC2 instance we deployed using a CloudFormation template in a prior post does not have a public DNS name:
Let’s change the endpoint template back to not enabling private DNS and enable the VPC DNS host names and see what happens.
Refer to the CloudFormation VPC documentation to enable DNS hostnames:
We can simply turn that on for all our VPCs because I expect to use it for our private VPC as well. We’ll be using VPC endpoints in an upcoming post with our Lambda function. If we thought we might optionally want to enable it we could make it a parameter but for now I just updated VPC.yaml to this:
Now I have VPC host names enabled:
My EC2 instance has a DNS name:
Private DNS is disabled on my VPC Endpoint:
If I check to see if CloudFormation resolves to a private IP address, no:
Now let’s enable private DNS on the VPC Endpoint.
Finally…we get an DNS query answer that resolves to private IP addresses:
Now test our AWS CLI command:
aws cloudformation describe-stacks
What happened? It’s not working. Let’s go back to VPC flow logs to see if we can find the related REJECT logs. Search for outbound requests to port 443 that are rejected.
We can see here that that yes, our request is making outbound requests to our VPC endpoint IP addresses. But why is it blocked now?
Let’s double check our networking. First of all, what is the IP range of the subnet containing our EC2 instance?
What are the IP addresses in that range? Check the ARIN CIDR calculator if you don’t know it off the top of your head:
Anything in that subnet range can communicate with each other because it is in the same subnet.
Check the route table for the subnet:
What’s the IP range for 10.10.0.0/24 (the local route)? Anything in the 10.10.x range can route traffic to anything else in that same range.
The IPs that came back for our VPC Endpoint are accessible via the route table to our VPC endpoint.
We created two security groups that should allow traffic between them based on security group IDs in the post where we created the VPC endpoint:
Double check that the correct security group is applied to the VPC Endpoint and the EC2 instance. Aha. We never added the VPCEndpoint security group to our EC2 instance:
Notice also that even though outbound access to the Internet is present, our instance still cannot reach CloudFormation because the DNS name resolves to a local IP address that is not accessible. And that is why people say when it comes to cloud problems: It’s always DNS.
In this case the problem is actually our security group rules.
Network Rules Don’t Always Update Correctly — TEST THEM
What I also notice here is that my outbound manual rule added to the security group for testing is still present even though I thought I deleted and redeployed the associated CloudFormation stack. I deleted the stack for the security group rules and my GitHub cloud formation rules get deleted, but NOT the rule I manually added.
I attempted to delete the security group again but the stack hands in the “Delete in progress” state.
I am not sure if this is due to the fact that I manually added a network rule or because I have associated this group with an EC2 instance. In any case I went back and manually deleted the rule I manually created. If you’re worried about this sort of thing in your own account check out CloudFormation drift detection — a topic for another day.
Now at this point the security group was stuck in the “Delete in Progress State.” When I tried to delete the security group I got the following warning which tells me what the problem is:
After disassociating the security group with the EC2 instance I could delete the security group.
Returning to CloudFormation, we can see that the stack is still stuck. What’s the problem? Eventually it timed out and the stack got removed.
However, check out the AWS documentation on AWS CloudFormation and VPC endpoints:
We might need to make some additional changes to address those issues if we see this problem again. I’ll leave that for another post.
Add the VPC Endpoint Security Group to the EC2 Instance
For now we can re-deploy and add the VPC Endpoint Access security group to the EC2 instance. Update the function in vm_functions.sh that deploys the developer VM:
Once again, I’m hitting this issue with the EIP association. I might need to revisit that at some point. I can think of a lot of ways to handle that problem but I’ll save it for another post. Delete the EIP association stack (not the EIP stack so we don’t have to revise our local firewall again.)
Of course that breaks our SSH connection to the developer VM.
Redeploy the VM with the new security group.
Redeploy the EIP association.
Verify the VPCE security group is added to the VM.
Delete your known hosts file since the EC2 instance has changed. Explained here:
You’ll need to run aws configure again to set the region.
Then you can run your AWS CloudFormation describe-stacks command again.
And now it works.
Check that the CloudFormation domain resolves to a private IP address:
Get the new network interface ID for our EC2 instance since it was redeployed and check the VPC Flow Logs.
Now here’s the odd thing. When I initially check the logs I only saw public IP addresses with ACCEPT traffic logs for port 443.
In order to really see what was going on, I took the following steps:
- I deleted all the traffic in my VPC Flow Logs.
- I once again created two terminal windows. I ran tcpdump but this time to capture port 443 in one window:
sudo tcpdump -vvvv port 443 -n
3. Then I ran my CloudFormation command in the other:
aws cloudformation describe-stacks
I can see on the local host with tcpdump that the connections are, in fact, made via private IP addresses.
Then, returning to the logs, I can see that the private IP addresses exist here too:
The other EC2 traffic may be related to some kind of initialization of the EC2 instance since the traffic was going to an EC2 domain, not CloudFormation. In addition, it takes a few minutes for traffic to show up in VPC Flow Logs so you might have to wait a bit to see it.
Network Interfaces for VPC Endpoints
I mentioned above that initially I couldn’t see any network interfaces in the EC2 dashboard for the VPC endpoints I created. I don’t know if it was some of the configuration I added during this post or whether it took a long time for them to show up, but eventually they appeared in the list.
I sort of reverse-engineered that we should be able to determine the ENI of the VPC Endpoint because I only have one EC2 instance and the VPC Endpoint associated with the VPC I’m testing in this post.
When I looked at VPC Flow Logs, after repeatedly deleting all the logs and re-running my commands, I could see there were always three ENIs. I reasoned that one was for the EC2 instance and the other two were for the VPC Endpoint — one for each subnet.
Now what AWS should do is put those ENIs in the dashboard where the VPC Endpoint details exist (#awswishlist).
But you can see them by navigating to the EC2 dashboard and clicking network interfaces on the left menu.
When you look at the logs for one of the VPC Endpoints you can see the requests made to it:
What exists in CloudTrail logs?
Someone was arguing with me that they could look in CloudTrail logs to find all the actions taken in their cloud account. I was curious to see what actions exist related to a VPC endpoint.
We can see the deployment and modification of the VPC endpoint:
We can see the DescribeStacks action:
But what about the DNS requests and dig commands? Those are not going to show up in your CloudTrail logs.
What if I ran a curl command to download some rogue software from a known bad IP address? In our case, the IP rules would block a lot of the Internet but we currently have all of GitHub open.
What if the attacker runs the clone command to download something malicious instead of the repo I’m writing for this post. I find a lot of penetration testing tools and ideas on GitHub (and do NOT download them unless you really know what they are doing — some of them are meant to infect the host to which they are downloaded, or send vulnerabilities the find to someone other than you!)
What’s to stop an attacker that gets onto the host via a software vulnerability from downloading code from GitHub?
Nothing at this point.
Would you see that in your CloudTrail logs? No.
Would you see that in your host logs? Only if the attacker already on the host has not disabled or altered those logs.
Would you see that in your network logs? Yes.
As you can see, security requires layers of protection and generally, only encryption or only IAM or only networking is not enough. The different controls work together to protect your assets and you need to architect them carefully.
What have we learned via this test?
We can glean a lot of information from this blog post. To summarize:
- Don’t assume your traffic is traversing private networks and IP addresses when you configure your network. Test it.
- Test your network logs to make sure you can see the traffic you need to see.
- Traffic to VPC services *should* stay on the AWS backbone when connecting to a public IP address. But we can’t really prove it based on the logs accessible to us. It will traverse an Internet Gateway so if you are trying to avoid adding one of those use a VPC Endpoint.
- VPC Endpoints require extra configuration to get the traffic to flow on a private network otherwise it will take the public route, if it exists. You need to enable DNS settings on your VPC and the endpoint itself.
- Deploy security groups to the VPC endpoint and the EC2 instance that needs to reach it. I explained how to deploy those security groups correctly with security group IDs, not CIDRs or IP addresses, in the last post.
- When you manually create Security Group rules, you have to understand that they may not be deleted via CloudFormation. Avoid manual changes and use CloudFormation drift detection to identify when someone has done that.
- You can’t see AWS DNS requests in VPC Flow Logs. You can see them with a host-based traffic sniffer or by using your own DNS servers (which is complicated — anything we’ve done in this blog series x 100). You can use AWS GuardDuty to detect threats you cannot capture yourself.
- We can see the network interfaces associated with VPC Endpoints in the EC2 dashboard. We can use those ENIs to view traffic to and from the VPC Endpoints in VPC Flow Logs.
- CloudTrail and host-based logs are not enough to detect all activity on your compute resources if your system becomes infected with malware or infiltrated by an attacker.
In the next post we will create a policy for our VPC Endpoint.
If you liked this story please clap and follow:
Medium: Teri Radichel or Email List: Teri Radichel
Twitter: @teriradichel or @2ndSightLab
Requests services via LinkedIn: Teri Radichel or IANS Research
© 2nd Sight Lab 2022
All the posts in this series:
Cybersecurity for Executives in the Age of Cloud on Amazon
Need Cloud Security Training? 2nd Sight Lab Cloud Security Training
Is your cloud secure? Hire 2nd Sight Lab for a penetration test or security assessment.
Have a Cybersecurity or Cloud Security Question? Ask Teri Radichel by scheduling a call with IANS Research.
Cybersecurity & Cloud Security Resources by Teri Radichel: Cybersecurity and Cloud security classes, articles, white papers, presentations, and podcasts
Leave a Reply