In traditional applications, security misconfiguration can happen at any level of an application stack, including network services, platform, web server, application server, database, frameworks, custom code, and preinstalled virtual machines, containers, or storage. Luckily, almost none of that has anything to do with serverless.
The network services, platform, database, frameworks, VMs all of that belongs to the cloud provider. Containers? We’re passed that. Servers? What are those?
Okay, okay. We can still have some configurations to do in our cloud resources. Cloud storage and cloud databases do encrypt data at rest by default. But, we could provide our own keys for encryption for stronger security, or even more separations if we’re using it in a multitenant architecture. Cloud storage as well has another significant configuration that is under our responsibility— access-control for the objects stored in it.
As I demonstrated in a previous blog in the series, if we misconfigure our cloud storage, it could end up hurting us. So, where does misconfiguration impact serverless most? There are a couple of things you might not consider in a monolithic environment that shift a little when we move to a serverless architecture. For instance, unused pages are replaced with unlinked triggers, unprotected files and directories are changed to public resources (e.g. public cloud storage), etc.
Attackers can also try to identify misconfigured functions with long timeouts or low concurrency limits in order to cause Denial of Service (DoS). Additionally, functions which contain unprotected secrets, like keys and tokens in the code or environment variables, could eventually result in sensitive information leakage. Functions with long timeout configuration give an attacker the opportunity to make their exploit run longer and do more damage, or just cause an increased charge for the function execution.
Functions with low concurrency limit configuration, could easily end up in DoS. All the attacker needs to do is invoke the misconfigured function enough times to make it unavailable, and you pay for it too!
If you’re thinking “then, I’ll just set the max concurrency limit.” Well. then you’ve got yourself a Denial of Wallet (DoW), which can also be referred to as Exhaustion of Financial resources. So what to do? Configure the right amount, but make sure you’re not open on other locations. If the function is triggered through the API gateway, then you can also add some validations on incoming requests and configure caching, which will help prevent malicious requests from getting into your function.
Let’s explore this and see how it plays out. For this demo, I’ve created a function that is triggered via rest API calls. All the function does is sleep for 3-seconds. The function itself is configured with a 5-second timeout. Additionally, the function is configured with 10 reserve concurrency.
Calling the function once will get us:
Now, let’s run the following simplest 4 lines-of-code threading script (which I have spread over 10 lines to make it easier on your eyes), to invoke this function 32 times in parallel:
As you can see, only 10 requests got the 200: ok response. While, the rest failed, with the response coming back faster. Running this again, 1000 times and looking at CloudWatch metrics shows the same thing. 10 concurrent executions and ~980 throttles:
If we require an API key or a header, we can simply configure that under the API gateway. This will completely eliminate any unauthenticated attacker from invoking the function, with 403 on all requests:
But, let’s assume that it’s an open API or that the attacker is authenticated. We can configure the API Gateway to handle the throttles for us. This means the “too many requests” will not be able to run, but at least we won’t pay for it either:
As you can see, most incoming requests received the 429 response status code from the API gateway and have never arrived into the function itself. But, then again…it’s still a DoS; we just didn’t pay for the Lambda invocations. In addition, we could add caching if the response is static enough. This takes a few minutes to get into action:
There’s a price for everything. So, you need to consider what works best for you.
This is of course, only one configuration scenario. But since we ran out of time… (well, my time). Then it will have to do.
So, what else should we do to protect against security misconfigurations? Oh, lots!
- Scan cloud accounts to identify public resources. Use built-in services available from the provider such as AWS Trusted Advisor which provides security checks
- Review cloud resources and verify that they enforce access control
- Follow providers security best practices: How to secure AWS S3 Resources, Azure Storage security guide, Best Practices for Google Cloud Storage and IBM Data Security
- Monitor Concurrent Executions on CloudWatch metrics and investigate spikes in AWS Lambda concurrency. More info can be found here
- Set up alerts on AWS billing. More info can be found here
- A blog on tips and tricks in monitoring aws lambda functions can be found here
- Check for functions with unlinked triggers. Look for resources that appear in their policy but are not linked back to the function
- Set timeouts to the minimum required by the function and the required concurrency
- Follow the provider’s function configuration suggestions: AWS configuring Lambda functions, Azure functions best practices, Google functions Tricks & Tips
For API Gateway:
For Cloud storage:
Use automatic tools that detect security misconfigurations in serverless applications. Oh, we offer that 🙂