ACM.63 Enforce the existence of VPC Flow Logs on All VPCs
This is a continuation of my series of posts on Automating Cybersecurity Metrics.
Governance through Automation
We’ve already started looking at how we can enforce best practices through the use of automation in this series. However, there may be ways around the governance as it could be too easy for someone to change the code. I’ll address that later.
Here’s another thing we can automate. An AWS security best practice is to turn on VPC Flow Logs for every VPC you create. We can ensure that happens by incorporating that into our VPC deployment template.
As I just mentioned, the only caveat is that you have to ensure all VPCs are created with your authorized template. For now assume your employees are all on good behavior and they only use the templates and code you have defined for deployments.
There are some other tools you can use for governance which I may address in the future but the base starting point is to get the code right from the start, not find it after it’s been deployed and is a lot more time consuming and expensive to change because a lot of things have been deployed on top of it. So I’m starting with what you should be starting with — creating resources through code that adhere to your security policies.
Why you need network logs
I can give you a scenario as to why this matters. One time someone asked me to look at his AWS account because one of his hosts got ransomware on it. When I logged in, I could see that he set up his networking rules incorrectly. Although he had opened a certain port, he also left a default rule in the network rules that allowed all traffic on any port. Not only that, logging was not enabled on any of this networking.
The implications of that are that first of all, there are no longs to look at what connections were made from where to perform this attack from a network perspective. In addition there would be zero rejected traffic so there would be a ton of noise in the logs. Additionally, it was a flat network to a domain controller type server with no bastion host, VPN, etc. Luckily it was only to evaluate a demo product in a stand-alone account, so no really harm done.
By the way, when I got on the host, it was super easy to bypass the ransomware and figure out they were using XMRig. I later had the same concepts I used introduced in an advanced penetration testing class and although some of the concepts in that class were very advanced, that particular topic was not. I figured out that the attacker had turned off the host-based firewall. Attackers can turn off host-based controls when they get onto a host, but they can’t turn off your network logs with host-only access (unless they get onto a host with access to change your network rules).
I was able to glean some information from the host but lack of network logs made it difficult to determine the source of the attack or which ports and protocols the attacker used, not to mention better rules would have prevented the attack altogether.
If you monitor your network logs for anomalous activity, you may can spot an intrusion attempts before the attacker succeeds. You may need additional details beyond what exists in VPC Flow Logs for low-level network attacks but they can help in most cases. You can even auto-block nefarious IP addresses on a permanent basis when you see a malformed request that is clearly searching for a hole in your defenses.
Not only that, VPC flow logs are invaluable for troubleshooting network errors. When you can’t connect to something, you’ll be able to look for rejects in VPC Flow Logs to pinpoint the problem. (Most of the time…refer to my post on Lambda networking.)
Flow Logs
Network admins will be familiar with the concept of netflow logs. VPC Flow Logs are similar.
In my classes, I show people how to use VPC Flow Logs and why they are important. For now, we just want to make sure they are created for every VPC.
Flow Log Prerequisites
There are a few things we’ll need to create before we can deploy Flow Logs which we can see in the documentation:

DeliverLogsPermissionArn: The name of this parameter should really be more consistent like FlowLogsRole.
LogDestination: We want to send our Flow Logs to CloudWatch so we’ll need to create a CloudWatch Log Group.
LogDestinationType: We’re using the default so we don’t have to set this. Some people find S3 buckets cheaper for storing logs but then you need to be able to parse out and search through the data quickly in the event of an incident or for troubleshooting. Make sure you can do that. You’ll probably want to encrypt your logs and make sure to configure the S3 bucket correctly — something we haven’t covered yet.
LogGroupName: Neither Log Destination nor Log Group Name is required and it doesn’t say whether you have to set one or the other or both. But, when you try to set both you get this error so we only need one or the other. The documentation could be clearer.
Resource handler returned message: "Please only provide LogGroupName or only provide LogDestination.
ResourceID: We can reference our VPC in the same template.
ResourceType: VPC
TrafficType: Valid values are ACCEPT | REJECT | ALL. We want ALL. You can tell if someone is trying to break in by looking at rejects. You can see who’s made a successful connection and if anything looks anomalous by looking at want was accepted. We want all.
We can skip the others for now.
VPC Flow Logs Role
Sometimes to enable services in AWS we need to create a role and give the service permission to take actions in our account. Before we add FlowLogs to the VPC template we need to create a role.

Can we use one of our existing role templates? Not really because the trust policy is different. But it does look pretty similar to our Lambda role other than the service name.
Instead of rewriting a new role for every service let’s alter our Lambda role to work for all AWS services, starting with the two we are currently using: Lambda and VPC Flow Logs.

We’re using a map in the above template the way I described in this post to set the service in the trust policy based on the service name passed into the template.
Add a call to the new function to the deployment script for the VPC flow logs role. The calls to deploy the lambda function roles should still work.

Run the deployment script:
./deploy.sh
In order to re-deploy the Lambda function roles we’ll have to delete and redeploy the Lambda functions and policies and then re-deploy them. While we’re at it we will completely delete the Lambda roles and start over. This is one of the caveats of changing roles after you are far along into development and why it’s a good idea to think through your organizational deployment structure in advance — and test it!
After running the deployment script:
- Check the CloudFormation stacks for errors
- Check that your roles exist in IAM with the proper names
Deploy the Flow Logs Policy
Now we need to create and deploy the Flow Logs policy shown above. Note that we are specifying the role we just created for the Roles property.

We can use our existing function to deploy a role for a policy. Add the following lines to the deploy script:

Deploy the policy and verify it exists on the role we just created.
Create a CloudWatch Log Group
Next we need to create our CloudWatch log group. CloudWatch is like a log aggregation source in AWS where all logs can be sent. That includes application logs and just about any kind of logs that you can think of in AWS. You create a log group and then you can send your logs to it.
We’re going to use CloudFormation again to create our log group:

Add the LogGroup resource to the VPC template we’ve been working on.
I’m not going to add a KMS key just yet. We’ll need a name and we’ll set the retention to 30 days. Most organizations would want to store logs longer — perhaps 90 days or ideally a year.
Sometimes attackers exist in environments long before they are identified so more logs are helpful. We’re just creating a POC here so don’t want too much expense. One of the issues I’m having with AWS ControlTower right now as a small business is the cost of all the logs. They can add up. You can also archive your logs to save money but I’m not sure I’ll get to that in this series. I need to revisit that myself.
Deploy the VPC again to make sure the log group creation code is OK.

At this point we get an error saying our NetworkAmin role doesn’t have permission to create a log group so we need to fix this:
Resource handler returned message: "User: arn:aws:sts::xxxxx:assumed-role/NetworkAdminsGroup/botocore-session-xxxx is not authorized to perform: logs:CreateLogGroup on resource: arn:aws:logs:xxxxx:xxxxx:log-group:RemoteAccessPublicVPCLogGroup:log-stream: because no identity-based policy allows the logs:CreateLogGroup action (Service: CloudWatchLogs, Status Code: 400, Request ID: xxxxx)" (RequestToken: xxxxx, HandlerErrorCode: GeneralServiceException)
We also need: logs:PutRetentionPolicy and logs:DescribeLogGroups
Head over the NetworkAdmin role policy and add those permissions like we’ve done in prior posts. Deploy the role policy again and then try to deploy the VPC again.
Add VPC Flow Log Resource to The VPC Template
Now that we have a role and a log group we can add VPC Flow logs to the VPC template.

While deploying Flow logs I got this lovely error message:
If you get an encoded error message on AWS decode it like this:
aws sts decode-authorization-message — encoded-message encoded-message
The message I got back didn’t make a whole lot of sense but I can glean from it that I probably need to add the iam:PassRole permission for the particular role below to my NetworkAdmin permissions. I really hope AWS fixes this error message…it’s just time consuming to have to deal with this.
"DecodedMessage": "{\"allowed\":false,\"explicitDeny\":false,\"matchedStatements\":{\"items\":[]},\"failures\":{\"items\":[]},\"context\":{\"principal\":{\"id\":\"AROAZ7U3253AOWN23LBU6:botocore-session-xxx\",\"arn\":\"arn:aws:sts::xxxx:assumed-role/NetworkAdminsGroup/botocore-session-xxx\"},\"action\":\"iam:PassRole\",\"resource\":\"arn:aws:iam::xxx:role/VPCFlowLogsRole\",\"conditions\":{\"items\":[{\"key\":\"aws:Region\",\"values\":{\"items\":[{\"value\":\"global\"}]}},{\"key\":\"aws:Service\",\"values\":{\"items\":[{\"value\":\"iam\"}]}},{\"key\":\"aws:Resource\",\"values\":{\"items\":[{\"value\":\"role/VPCFlowLogsRole\"}]}},{\"key\":\"iam:RoleName\",\"values\":{\"items\":[{\"value\":\"VPCFlowLogsRole\"}]}},{\"key\":\"aws:Account\",\"values\":{\"items\":[{\"value\":\"xxx\"}]}},{\"key\":\"aws:Type\",\"values\":{\"items\":[{\"value\":\"role\"}]}},{\"key\":\"aws:ARN\",\"values\":{\"items\":[{\"value\":\"arn:aws:iam::xxx:role/VPCFlowLogsRole\"}]}}]}}}"}
After adding that last permission and only for that specific role (as mentioned before the iam:PassRole permission can be problematic if not specific) Flow Logs deployed successfully.
Now we have successfully installed Flow Logs on our VPCs and they will get created for any new VPCs we create using this template.
Teri Radichel
If you liked this story please clap and follow:
Medium: Teri Radichel or Email List: Teri Radichel
Twitter: @teriradichel or @2ndSightLab
Requests services via LinkedIn: Teri Radichel or IANS Research
© 2nd Sight Lab 2022
All the posts in this series:
____________________________________________
Author:
Cybersecurity for Executives in the Age of Cloud on Amazon

Need Cloud Security Training? 2nd Sight Lab Cloud Security Training
Is your cloud secure? Hire 2nd Sight Lab for a penetration test or security assessment.
Have a Cybersecurity or Cloud Security Question? Ask Teri Radichel by scheduling a call with IANS Research.
Cybersecurity & Cloud Security Resources by Teri Radichel: Cybersecurity and Cloud security classes, articles, white papers, presentations, and podcasts