ACM.37: Using CloudTrail Lake to query the actions needed to create zero-trust policies (Zero Trust Policies ~ Part 2)
This is a continuation of my series on Automating Cybersecurity Metrics.
I wrote in my last post about an unsuccessful attempt to use AWS Athena to query CloudTrail due to my Control Tower setup.
In this post I’ll try out a newer option called CloudTrail Lake. I’m going to walk through this demo and see if it works with my Control Tower setup.
Let’s start with my test account. I’m an admin in that account but I honestly haven’t looked at what SCPs Control Tower has applied on my use of CloudTrail. Let’s find out.
Navigate to CloudTrail and click on Lake.

Click Create event data store

On the next screen I have to choose how long to keep the data store. Let’s check out the price of this service:

It looks like if I were to ingest the data over and over I would pay repeatedly for that ingestion so I probably want to keep the data around as long as I think I will need it. How many times do I need to analyze it though? Is every query considered an “analysis?” Let’s look at the pricing example:

Although I don’t want to ingest the same data over and over I’ll pay for continued ingestion of new data.
Obviously the price goes up based on the amount of data. 50 TB of data. Almost $47K in this example. Wow.

How can I tell how much data I have in CloudTrail right now? I presume the amount of storage would match the size of the data in the S3 bucket we couldn’t access in the last post, further detailed in this post:
In my case the data is stored in the Control Tower log archive bucket so I’ll navigate over there and specifically to the bucket containing the CloudTrail logs for the account where I’m operating. Check the box for that account. Chose Actions > Calculate total size.

I created this account structure a while back for testing out Control Tower. Looks like my test account has stored 114.2MB of CloudTrail logs.

If I can limit my queries to only this account then it seems like my cost to ingest will be $2.5 and .0005 to scan. That’s not too bad.
What if I have to run it on all the CloudTrail logs for my organization? It’s harder to get that because I have to go into each account individually and get the size and to exclude the CloudWatch metrics. Someone could write a query for that but to get a rough idea I’ll just check the size of the entire organizational folder.
Ouch. That’s a bit bigger folder. I’ve been running this CloudTower configuration for a few months and the costs are definitely adding up compared to my other little test account as well. I’ll be exploring that all later.

I would love to just spend all my time writing about all these things for you but I don’t make enough money from my blog or book so I’ll probably have to jump over to some other work at some point. But I would love to explore cost-effective alternatives and write about it. We’ll see.
For now if I can get this working on one account I’ll try it. Let’s continue on with the steps for our single account, and I’ll set the storage for 30 days for this test of the CloudTrailLake service. I’m also going to limit the logs to the current region.
I lost my place since I had to switch accounts, so back to my test account and repeat the steps above.

On then next screen there are a few more choices to make.

We can include Management Events:
and Data Events:
The documentation doesn’t give a concrete list of what is considered a “data event”:
However there’s a list of types of events you can add on this screen:

And “Additional charges apply”.

That link just takes you to the pricing page above so I guess the prices are related to the additional logs that will need to be scanned.
Should you include data events in your logs?These logs will cost you more, however without them you cannot tell what an attacker accessed in the event of a data breach. A particular company I know that had one of the most notable AWS data breaches to date was about to turn off their data event logging before the breach. Without it, the organization would not have been able to define the specific data the attacker accessed. When you have a data breach you can be fined for each record exposed. Your logs help you reduce those fines by being very specific about what attackers accessed. By having the data event logs the company was able to show the specific S3 objects the attacker accessed with a get action and avoid fines for every object with sensitive data in their S3 buckets.So do you need data events? Depends on if you store sensitive data and if you want to see exactly what an attacker accessed in the event of a security incident. If you have a breach involving PII, PCI, HIPAA or other sensitive data your detailed logs and a good security analyst can reduce the cost of a breach - if you stopped the breach before the attacker got to all the data.
At this point, I’m not using the above services, but I expect to use S3 and Lambda in the future. I’ll add those for this test but I don’t know if I will need or use them.

Click Next. Reivew.

Click Create event data store.

Seemed to work. Click on the link to your data store. Note the value of the ARN. Then return to the previous page.

Click on Sample queries to get numerous queries you can try out:

Click on the link for the second one: Investigate user actions. Note that the value I’ve x’d out in the query below is the value at the end of your data store ARN. Note that it’s also querying userIdentity.username for the user bob. We need to change that to the role that we’re trying to create a zero-trust policy for if we can.

Let’s re-run our test script for the batch job we were working on when we embarked on this mission to create a zero trust policy. That way we can get some fresh events in our CloudTrail logs.
First I have to delete this CloudFormation stack: BatchJobAdminCredentials
Now I can re-run the test script for our batch job:
jobs/iam/DeployBatchJobCredentials/test.sh
Once successfully created I can review CloudTrail logs for the activity (it might take a few minutes for it to show up). Go to the full event history as we did in a prior post with the link at the bottom of the screen.
From here you can find the user name that took the actions that just occurred. I would personally rather search for all actions taken by a particular role, but we can get at it from the user session which is more helpful if you are running in an account where multiple people are using the same role. Copy the value from the username column below. Your username will have some numbers after it to identify the unique session.

Return to CloudTrail Lake and the sample query we looked at above. Replace the username Bob with the name you just copied above. Make sure you leave the single quotes. Click Run.

Scroll down to see that your query was successful:

Click query results. ?? I got no results.

Let’s go back and alter our query. Remove the time from the query. Still no luck. OK remove the entire where clause but we want to add the user identity information.
Add userIdentity to the data our query returns.

Now we get back a list of events with userIdentity in the first column. The value is a JSON blob wit multiple key-value pairs in it with information about the identity that took the action in CloudTrail.

Scroll down to find our CreateSecret action. You can also type Ctrl-F on a Mac and search for “CreateSecret”. I’m going to check the rows of three actions I think were taken by the batch job role.

Scroll back up and click Copy.

Now paste that information into a terminal window. There’s too much to redact in that blob of text so I’m not going to paste it here but it shows all the key-value pairs in the output. You can use those key-value pairs and their values in your query. I see that username is null, which is why our query didn’t work.
What was that value in the CloudTrail UI? I see that principalid has that value but proceeded by another value and a colon. That might work.
Scrolling down I also see the session context which shows mfaauthenticated=false as this is a role, not a user.
I also see the following within this blob:
sessionissuer={type=Role, principalid=xxxxxxxxx, arn=arn:aws:iam::xxxxxxx:role/BatchRoleDeployBatchJobCredentials, accountid=xxxxxx, username=BatchRoleDeployBatchJobCredentials},
What if I wanted to get at that username “BatchRoleDeployBatchJobCredentials”? I’d have to add a few things to identify that particular value. This gets a bit tricky to figure out the path to get the value we want from that big blog so let’s take it a step at a time.
The beginning of the blob looks like this:
[{"userIdentity":"{type=AssumedRole, principalid=
We can add a dot (.) instead of the { to get the next value. So in the above we can select userIdentity and then a dot (.) and then the next value we want — type. So userIdentity.type.
I’m also going to remove event ID because that’s not helping me at the moment. Make sure you have our data source after from as shown above.

Run that query and it works. You get the type which is AssumedRole for these events:

The value we want is deeper down the stack. Let’s organize the JSON so it’s easier to understand the structure by putting each curly brace ( { or } )on it’s own line and a line break after each comma. Indent the value after each curly brace.
I removed a few values but we end up with the structure below.
[
{
"userIdentity":
"{
type=AssumedRole,
principalid=xxxx:botocore-session-xxxx,
...
sessioncontext=
{
attributes=
{
creationdate=2022-08-14 17:38:28.000,
mfaauthenticated=false
},
sessionissuer=
{
type=Role,
Here’s how to construct our value that we want to retrieve.
- We want the value of an attribute of userIdentity.
- session context an attribute of userIdentity
- sessionissure is an attribute of sessioncontext
- type is in sessionissuerTry:userIdentity.sessioncontext.sessionissuer.type

Appears to work:

But we want a different value. Let’s look at sessionissuer in more detail:

There are more fields than that but the username is the one we’re after. change type to username.

There we go. Now we can see what roles are performing what actions.

But we want to restrict the query to our batch job role. Copy the username that we can see in the above JSON output:

Add a WHERE clause to the query where the attribute we are querying equals that value:

Finally. We can see all the actions taken by our batch job role:

We can now easily add each one of these to our policy document — and presuming our batch job only takes these action going forward and that CloudTrail is logging every action, this should work.
We can do one more thing to make it a bit easier to update our role. Remove all the extraneous fields, put the service before the action in the list, and add the DISTINCT keyword to only show a list of unique results.

That query results in the following which are the actions we should need:

Now before you do anything else — save this query. I didn’t when I was running this post and navigated away from the screen and had to re-create it.
Click Save at the bottom of the screen.

Give your query a name and description.

You can essentially copy and past that list into your policy document and edit it a bit. Remove “.amazonaws.com” and put a colon between the two values. If I ran this on the command line I could easily construct the policy. (So could AWS with their policy generator.)

Do you notice anything strange about the above list?
Decrypt but not Encrypt?
First of all, we initially wanted to give this policy ENCRYPT permissions only to create our new credentials and store them in Secrets Manager. We don’t want to allow it to decrypt the values it is creating and storing in secrets manager later. Encrypt is never called, and yet Decrypt is. This seems like a bug in my opinion and I wrote about it here:
This implementation prevents companies from creating separation of duties between encrypt and decrypt capabilities.
However, at least this role can only perform the action CreateSecret on secrets manager. It cannot retrieve a secret. Even though this role has decrypt permissions, it cannot retrieve the secret so as long as we don’t give that permission we’re OK. It would be more clear to segregate encrypt and decrypt permissions.
But what if we need to update the secret? What if the secrete needs to be deleted and recreated? For now we will assume that is a batch job override that happens outside the batch job. This role doesn’t get those permissions. It’s sole purpose is to create and store the credentials.
IAM actions are missing?
There’s one curious thing about this list. I don’t see the IAM actions to create the Access Key and Secret Key that we already added to the policy earlier in this post:
Why aren’t they showing up here?
Let’s make sure those actions were actually taken by our batch job role.
First, I deleted my CloudFormation stack and verified that the access key does not exist. Then I re-ran the stack to recreate it. Back to CloudFormation to find those entries.
The IAM entries for this username definitely do not exist. In fact, querying on any IAM events shows that the IAM events do not exist at all:


Well, I already showed you how to find those actions in the last post so let’s review the IAM Access Advisor for this role:

Turns out these IAM actions are logged in us-east-1 so we’d have to include that in our CloudTrail Lake query if we want to see those actions as well. We can switch over to that region and repeat the whole process again, or we can add all regions to our data store. I don’t expect I have too much data in the other regions so let’s see if we can add that to the existing store or if we have to create a new one.
If you return to your event data store like we did above and edit it we can uncheck the box to only include a single region:

I’m not sure how long it takes for the data to show up but let’s try to run our last query again.
When I run my query again I do not get any IAM events.
Perhaps I need to run batch job actions again to get them to appear in these logs.
No. I can see the actions in the us-east-1 region but I cannot get them in my CloudTrail Lake query.
To fix the problem maybe I have to create a new CloudTrail Lake.
I gave my new data store the name Policy 2 and used all the same settings as before except I unchecked the box to limit the logs to the current region.
Then I chose Policy 2 on the left hand side. I made sure my query referenced the new data store after the “FROM” keyword. I ran the query again.

No dice. In fact, I was getting no logs at all from CloudTrail Lake. I fiddled around with this and my old data store for more than a few minutes, and then my SSO session timed out. I don’t know if that was the root cause of the issue but after starting a new session I could query all the regions and all the data showed up. I also hit a hidden character bug when switching between data stores. I wrote about them here if you are interested but I hope they are fixed by the time you read this.
You can select your saved query, change the data store, and run it again and you’ll see more results this time:

The only other odd thing I notice now is that there is no DeleteAccessKey action even though that showed up in IAM access advisor for this role. I don’t recall ever using this role outside of this batch job script so I’m not sure why the Delete Key action is listed in the console, but I’m going to go ahead and leave it out of my role policy and test.
Test our new zero-trust policy
The first thing we need to do is re-deploy the batch job policy. that is not included in the test script in this folder because there is a delay in when IAM takes effect after a policy gets updated.
Run the deploy.sh script in the root folder of the batch job:
./deploy.sh
Of course we want to double check that our policy got updated and it did:

Next we need to delete the CloudFormation stack for our job otherwise there will be no changes and the actions we are trying to test won’t execute. Delete this stack:
BatchJobAdminCredentials
Now you might want to wait a bit or else just run the job and then run it again later to make sure all the IAM permissions have kicked in.
Yes! It worked!

Finally after all those prior blog posts and trial and error we have a way to create a zero trust policy that works. We could take this a step further to automate the process.
But what we do *NOT* want to do is fully automate it to the point where batch jobs are creating their own permissions in a production environment. You could create a tool for developers to generate a zero trust policy but then the developer reviews it, the QA team tests it, perhaps it goes through some security checks, and then it goes to production.
For my purposes at this moment it is faster for me to edit the query and I’m more interested in some other automation I want to get done so I’m going to move on.
Teri Radichel
If you liked this story please clap and follow:
Medium: Teri Radichel or Email List: Teri Radichel
Twitter: @teriradichel or @2ndSightLab
Requests services via LinkedIn: Teri Radichel or IANS Research
© 2nd Sight Lab 2022
All the posts in this series:
____________________________________________
Author:
Cybersecurity for Executives in the Age of Cloud on Amazon

Need Cloud Security Training? 2nd Sight Lab Cloud Security Training
Is your cloud secure? Hire 2nd Sight Lab for a penetration test or security assessment.
Have a Cybersecurity or Cloud Security Question? Ask Teri Radichel by scheduling a call with IANS Research.
Cybersecurity & Cloud Security Resources by Teri Radichel: Cybersecurity and Cloud security classes, articles, white papers, presentations, and podcasts