Intro
Creating least privilege policies is an integral part of creating secure AWS workloads. A least privilege policy can :
- reduce blast radius in the event of a breach.
- prevent bad application logic from deleting critical resources.
- prevent "crossing of streams" and support strong logical isolation of workloads.
If you've done some AWS contracting or if you've joined an organisation as the first "Cloud Person" you have probably encountered some dodgy policies over the years. You've probably encountered something like a
webserver
role with the following policy attached:
This person probably doesn't know any better, they just wanted to make their code work and this is at least better than checking the root access keys into the codebase itself. We all start somewhere.
Whats much more offensive to me is a role like this:
This person knew that they shouldn't attach the yolo administrator policy, but they didn't bother to invest the time to create a least privilege policy that captured the actual permissions required by the workload. This person definitely doesn't return their shopping cart. They probably told themselves a classic infrastructure engineering lie:
We'll come back and prune the privileges here later
Now this webserver can not only read and write all secrets manager secrets in the account, it can read all S3 data, delete all S3 data, create buckets, delete buckets, purge any SQS queue etc etc.
This is a company ending event waiting to happen.
Let's explore some options below that can help put the IAM genie back in the bottle.
Trial and Error
Not very scientific but your first port of call can often just be to start with little to no actions and brute force your way to the appropriate policy as you wade through access denied errors. This is a tried and true approach. It doesn't have to be a shot in the dark either! The Service Authorization Reference is a life changing piece of documentation that lists actions, resources, and condition keys for AWS services. e.g can I specify resource level permissions for this action or do I need to specify *
I often see confusion between S3 actions that are a property of the bucket itself VS objects in that bucket. The ListBucket
action applies to a bucket
{
"Sid": "AllowListing",
"Action": "s3:ListBucket",
"Effect": "Allow",
"Resource": "arn:aws:s3:::awsexamplebucket1"
}
The actions in the following statement apply to the objects themselves, note the /*
{
"Sid": "AllowReadWrite",
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:GetObjectVersion",
"s3:DeleteObject",
"s3:DeleteObjectVersion"
],
"Resource": "arn:aws:s3:::awsexamplebucket1/*"
}
I have seen plenty of policies like this over the years
{
"Sid": "Combined",
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:GetObjectVersion",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::awsexamplebucket1",
"arn:aws:s3:::awsexamplebucket1/*"
]
}
Does this person understand the subtleties of whats going on here and they are being succinct, or did they brute force their way to a working policy? The Service Authorization Reference is your friend.
Anyway there are more robust ways to create policies that perfectly capture the actions required by a workload, lets take a look at some options.
Cloudtrail
You should use Cloudtrail to capture all IAM activity across your AWS organization. If your organization adheres to the CIS AWS Foundations Benchmark standard you are already persisting this data in S3 and Cloudwatch Logs. asecure.cloud is a great resource if you would like to achieve this standard, they have helpful terraform / cloudformation boilerplate.
Once you have this data flowing you can then examine it using Athena or Cloudwatch Logs Insights to answer the question:
What does this role actually do?
This can be really helpful when you spot a role that is clearly over privileged and you want to prune it down to the required permissions only. You can list all the actions made by that role over the last 30 days for example.
Access Analyser
AWS have begun to automate some of the above with Access Analyser Policy Generation I have had mixed results so far and "only" 50 services are currently supported but its a helpful tool for more signal.
This feature has a lot of potential, ultimately what I'd love to see here is this feature running implicitly per role and presenting some sort of nag screen / red icon. "Accept the new auto-generated least privilege policy" if you want to do Click Ops. Or provide an option to download the JSON / YAML definition for your infra as code tool of choice.
Client Side Monitoring
This is the holy grail for workloads that you can test / develop locally. Most AWS SDKs support CSM (Client Side Monitoring) which will report the underlying API calls being generated. Tools such as iamlive can map these api calls to the IAM actions they require.
Using iamlive
The iamlive readme is excellent but here is a quick example of using it to generate a least privilege policy for terraform
# Install
brew install iann0036/iamlive/iamlive
# Run binary
iamlive
You can leave iamlive
running in a shell.
To enable CSM for the SDK you can export the following environment variables in another shell.
export AWS_CSM_ENABLED=true
export AWS_CSM_PORT=31000
export AWS_CSM_HOST=127.0.0.1
This will instruct the Go SDK in Terraform to enable CSM. You can now run terraform. You will see output indicating that CSM is enabled.
aws-vault exec my-account -- terraform apply
2021/10/29 12:05:31 Enabling CSM
If you look at the shell running iamlive you will start to see some really interesting output as it builds up the policy. The following example shows the permissions required to deploy some machine learning workloads that interact with iam, lambda, step functions, sagemaker and s3
{
"Effect": "Allow",
"Action": [
"ec2metadata:GetToken",
"ec2metadata:GetMetadata",
"sts:AssumeRole",
"ec2:DescribeAccountAttributes",
"iam:ListAccountAliases",
"s3:PutObject",
"iam:CreateRole",
"iam:GetRole",
"iam:PutRolePolicy",
"iam:GetRolePolicy",
"sagemaker:CreateModel",
"iam:PassRole",
"lambda:CreateFunction",
"states:CreateStateMachine",
"sagemaker:DescribeModel",
"sagemaker:ListTags",
"states:DescribeStateMachine",
"states:ListTagsForResource",
"lambda:GetFunction",
"lambda:ListVersionsByFunction",
"states:DeleteStateMachine",
"iam:DeleteRolePolicy",
"iam:ListInstanceProfilesForRole",
"iam:DeleteRole",
"sagemaker:DeleteModel",
"lambda:DeleteFunction",
"sts:GetSessionToken",
"iam:GetAccountPasswordPolicy",
"iam:ListAttachedRolePolicies",
"iam:CreatePolicy",
"iam:GetPolicy",
"iam:GetPolicyVersion",
"iam:AttachRolePolicy"
],
"Resource": "*"
}
The ec2metadata:GetToken
and ec2metadata:GetMetadata
calls here are interesting, this is nothing to do with the resources deployed by terraform itself but is likely the Go SDK stepping through the default credential provider chain its essentially checking "Am I running on an EC2 instance?"
Wrapping up
I think IAM is a fundamental AWS skill, for years I battled against it and treated it as something that just got in the way! it feels like you've unlocked a superpower once you have a good handle on it, advanced usage of IAM can simplify your business logic or result in more elegant solutions.
Here is a classic re:Invent session that makes IAM very accessible.
I hope you found this useful, if you did be sure to check back in future for more AWS adventures.