PODCAST: Serverless Observability Breeds Confidence
In the latest episode of The Serverless Show, Hillel was joined by Ran Ribenzaft, Co-Founder & CTO, Epsagon. Ran introduced himself, “I’m a cybersecurity specialist. I did lots of development through low-level stuff and the IDF, which was very fun. But honestly, I can’t tell anyone about it. Currently, I’m the cofounder and CTO at Epsagon for the last two years, and I live in Tel Aviv, which is fine, warm weather, located in Israel. Let me tell you a bit about Epsagon. To sum up, we’re doing distributed tracing, and we’ll talk about it in some of our topics. But when you’re working with microservices and serverless, you can end up with lots of services out there that are talking to each other. Now when you’re a developer and you need to troubleshoot something or understand the performance implications of one service to another, you need the right tool or distributed tracing that will tell you the full story with the context, with the logs rather than have everything individually. That’s, honestly, what we’re trying to achieve for the longer term.”
Hillel replied, “Obviously, not just a very cool product, but really something that, I think, is an enabler for a lot of what people are trying to do with serverless.”
Serverless Predictions – Serverless Observability Breeds Complexity
Hillel continued, “I wanted to touch first on an article that Gadi Naor from Alcide wrote in Forbes that talked about his serverless predictions for 2019, and I thought he did a nice job. One of the areas he mentioned was this notion that observability has to span the entire cloud-data-center mesh. I think he was trying to say that the kinds of things that you guys are trying to solve to are really critical for giving people a view of not just what a particular function is doing, but really the entire application landscape.
“Then he made this interesting claim, which I think will resonate with you, Ran, that these types of solutions, for observability and monitoring that you guys are pioneering, are really fundamental to enabling the next wave of applications. I think what he was saying was, if we give people the tools to be able to see what’s going on in these very complex and fast-moving environments, they will then be able to write more and more complex applications. Do you see that already? What are the key points for how you’re solving this and what is this letting people do?”
Ran replied, “I think that the crucial point today that every developer encounters in the microservices environment is that you can monitor or log only a specific or individual resource, but most of the times you’ll find problems that span over multiple resources. Sometimes multiple clouds, sometimes multiple managed resources. For example, it can be a problem that you’ve got in Lambda, but something that started with a Kinesis stream or a Kafka stream from another service, from another customer, so it really spans over lots of resources. Without a context or without a distributed tracing that leads you from the beginning all the way to the end, you’re almost clueless. Otherwise you just have to scroll through thousands of logs to try to understand and have some correlation between what your services are doing.”
“What we’re doing at Epsagon is collecting the data from each resource individually, even if it’s not your resource, even if it’s a managed resource. For example, S3, you can’t install any agents. You only rely on maybe logs that the cloud which logs can tell you, but you need to have a better understanding how many operations do you do against this S3? What it leads to. If it triggers another Lambda function, you want to be able to match these resources together or any interaction that there is between these resources. You need to monitor all these resources all together composed by an application rather than every resource individually. That’s what, honestly, Epsagon is doing. It’s collecting traces from lots of places and building the bigger picture. Hence, it gives you the observability that spans over multiple resources, multiple cloud vendors, multiple environments.”
Hillel asked, “To this point, once I’ve got Epsagon installed in my system and I’m using it as a developer, I’m now able to build more complicated things. Do you see that already?”
Confidence Leads to Speed
Ran said, “I’ll try to rephrase the words. I’ll call it confidence. When a developer develops something in a mash of a hundred resources and he adds the 101st resource, without being able to understand that he added the resource correctly, even with the right permissions, maybe, if we’re talking about security, and he’s making the right calls to that resource. Without it, he’s got no confidence. And with it, he’s got the confidence to move faster and not spend some more time on trying to read some logs, whether he did something correctly or not, so it gives the confidence that allows faster iterations or improvements in development velocity.”
The Need for Serverless Security at Scale
Hillel stated, “Let’s talk for a second about something else that Gadi wrote about, which is closer to my area of activity. He mentioned how security needs to be simplified and needs to work at scale. I know that Protego spent quite a bit of time on it, so it resonated with me. We obviously spend a lot of time trying to understand how security needs to work for serverless applications and how it needs to scale with serverless applications and how it needs to be stateless and ephemeral like serverless applications and workloads. At the same time, though, something that I’ve seen which I think maybe is worth pointing out is some of the things that we see in serverless applications, some of the challenges, at least, are not necessarily new to serverless. Sometimes serverless is just a good way to shine a spotlight on a problem.
“One example I give is I don’t think the perimeter has been a great paradigm for security, for cloud applications, in general, this notion that all the bad guys sit in one place on the outside and come through our front door and go through our WAF. I think cloud applications have already, for a while, been suffering from this notion of, ‘Data can come in through all sorts of third-party APIs.’ It’s really complicated to imagine in a very simple perimeter, but serverless really forces you to reconcile that issue and deal with it whereas, perhaps, in previous applications you haven’t had to.”
Wrong Tool for The Job?
Hillel continued, “One of the things I wanted to ask you about — Epsagon’s use cases, and maybe I’ve run up against a little bit — is do you guys see people trying to use tools like Epsagon, other tools, for security purposes as well.’ Is that something you see people able to use Epsagon for today?”
Ran replied, “Actually, some of our customers approach us because we do show the architecture overview, and they tell us, ‘I want this, but let me know when something new appears, like when I’m calling an unauthorized something or whenever I’ve got this exit denied error that I shouldn’t contact, let me know about this because these are the use cases that I’m looking into.’ Definitely, if you look at any application, regardless if it’s a serverless microservice or a monolith, you need to have security in place because you have to. It’s your responsibility for your own customers to save their data as much as possible. Yeah, we definitely see that. Unfortunately, we don’t offer Epsagon as a security solution. We’ll let them know that there are lots of other startups out there that are doing dedicated security for serverless.
“I think it’s critical. I think that people look for it. People need it. People need security for serverless and it can pose a bit different attitude about how you should secure serverless applications, but I think it’s obviously the permissions that it’s very easily you can give an asterisk to do everything or the variety of triggers. It’s not the same SQL injection that’s coming from a post method. It’s becoming more and more a variety of things, more and more interconnection between services. It’s getting more and more amplified. The problem is getting amplified in serverless. Everyone needs something dedicated for it.”
N+1 = Discomfort
Hillel replied, “Great point. Also, I think we have a very similar experience with people where there’s some size and complexity where they’re comfortable, and then there’s some N+1 where they go, ‘Okay, now I don’t know what’s going on.’ I think you probably hit that wall in observability faster. I can’t debug. I can’t monitor. I can’t trace. I kind of felt like I was in control. Then I put 11 functions in place and suddenly I feel like I don’t know where they’re running. I don’t know how they’re configured and all that, so I agree there, also. Once you start to grow to a certain scale and size and complexity, you start having real problems. I think debugging is probably first. Then monitoring and observability is second, and then security. It’s very similar.”
Ran replied, “I really relate to that thing you mentioned, that it comes at a certain scale and a certain complexity, because single Lambda function, two Lambda functions, or even 10 might be very easy, but when you end up with millions of requests, hundreds of Lambda functions, you’ve got to have something in place. You can’t avoid it. You can’t say, ‘I’ll be okay with security. I’ll be okay with monitoring or troubleshooting.’ You need to have something in place.”