PODCAST: Yan Cui Serverless Lessons
Hillel and Tal from Protego were joined by a guest Hillel referred to as, “The Mick Jagger of serverless,” Yan Cui, principal engineer at DAZN. Watch the video below or listen to the audio on SoundCloud.
As principal engineer, Yan helps team overcome challenges, in addition to working on recruitment and increasing brand awareness. DAZN is in the sports streaming business, has millions of subscribers already, and will be opening in 15 countries next year. Yan also writes prolifically on serverless, and is a frequent speaker at conferences. He aims to share his experiences from the last two years working with serverless, so others don’t have to make the same mistakes.
Yan Cui Serverless Lessons
Hillel observed that Yan has been in this space since before there was a space. So, what drives companies to adopt and continue using serverless technologies?
Yan replied, “I think the reason people are going to serverless is that you just don’t have to worry about a lot of things that we used to. I started doing AWS about 10 years ago, and before that, I was working for investment banks. We had all these crazy meetings where literally a room of 20 people would talk about how we’re going to spend the next three months getting a new server.
“AWS was a big gamechanger back then. With serverless, the draw for me is that all of that non-differentiated heavy lifting I can just give to Amazon, Google, or Microsoft. They can do a much better job at those low levels, and I can focus my efforts on doing the things that my customer actually wants from me. From an organization point of view, especially for startups, it means that you can get a lot more done with fewer people. Benefits that make people stick with serverless include the reactive programming model, making it easy to bring data to your code, and faster feature delivery.
“But serverless also leads to a massive change in company culture. Engineering teams shift focus from the tech to business value. They start asking different questions around what should we build? What are the things that we should be doing in order to maximize the value that we deliver to our customers?”
Hillel replied, “Interesting. You’re saying you come in for the time-to-market, application velocity, and because you don’t want to operate servers. But then this also creates a culture where, because you’re writing these small functions that provide value, you’re very focused on what the next value is, and how to provide it.”
Yan responded, “Focusing on the grunt of things, such as provisioning instances, engineers can start to lose sight of the goal of why we do all these things. Serverless can bring that focus back in line with what the business is actually trying to achieve in terms of delivering value to your users.”
Are People Worried About Serverless Security?
Hillel asked Yan if people were thinking about serverless security. Yan replied, “I think people should be thinking about security a lot earlier, but it’s often still very much an afterthought, and that hasn’t changed from when everything was running on premises versus when everything was running on EC2 or containers.
“The fact of the matter is that most engineers are just not well prepared enough to really think about security. Also, the tooling is there, but it’s often defined as a steep learning curve. For things like serverless, you get a lot out of the box, so I think that’s great. All the security around infrastructure — the operating system, the virtual machines — all of that is taken care of by AWS. But application-layer security is probably still just as messy as before.
Tal commented, “It’s funny, with every new technology we forgot everything we learned, and there is a new learning curve. When we got to mobile, it’s not like we were 20 years into web application security. Everyone just forgot about it and we started from nothing.”
Yan agreed and commented that he sees the same thing in terms of production readiness. After learning in microservices that we need structured logging, we forgot as soon as we got to serverless.
Google’s Cloud Next
Hillel attended the recent Google’s Cloud Next event, where it was announced that Google Cloud Functions were out of beta, which highlights how Google takes this seriously. Hillel added, “But then to confuse me a little bit, they also announced some sort of serverless over Knative, over Kubernetes, over other stuff with Istio and a bunch of other pieces. That seems to be another path to serverless they’re pushing.
“Yan, how do Google, AWS, and Microsoft compare and what gravitates people towards Google as opposed to AWS as a market leader?”
Yan said, “Personally, I’ve always used AWS and have been for about 10 years. I did have a stab at Google with App Engine and Google BigQuery. I think the thing you find with Google is that the services, themselves, are really good, but they’re more designed for the geeky users and the people that really like to tinker with things. But with App Engine, I think they were the first big cloud provider to go in there and try to offer something that is kind of like serverless, even though it was a bit of a leaky abstraction because you’re still having to provision and pay based on machine hours. Both AWS and Azure seem to be pushing serverless a lot harder in the last couple of years than Google. And Google seems to be still very much camped in the Kubernetes game and appealing to a different sector of the industry.
“In terms of what I think about Kubernetes/Knative thing, you’ve really had the options of running functions as a service on your own Kubernetes cluster with Kubeless and I guess you can do the same with OpenWhisk. For me, it’s great that it allows an adoption path to serverless or FaaS, at least. But at the same time, you lose a lot of the benefits of serverless, such as not worrying about infrastructure, scaling, and not paying for things you’re not using. With something like Kubeless and Knative, you’re getting the problem from both sides of the coin. You’re both having to inherit the challenges of managing your own Kubernetes cluster, and you have to take on the learning curve and all the tooling around FaaS as well. It’s great that they’re meeting a lot of engineers where they are at in the organizations, but at the same time, I wish we were not having this conversation.”
Hillel said, “I saw a tweet from Chris Munns from AWS
“Ok guys, I installed k8s and am ready for #serverless functions!”
“oh ok, i’ll install knative too”
“oh.. i need then a FaaS engine on top of it..”
“oh.. i still need an event source”
“ok, 9 servers, monitoring/metrics/logging of them.. oh OS patches.. ready for FaaS!”
— chrismunns (@chrismunns) July 26, 2018
playing to that saying, sure, I can do serverless on Kubernetes, I have to then go build nine things and orchestrate and run them, as opposed to real function as a service. It’s interesting because Google has a FaaS offering, but I guess, as you’re saying, they’re trying to meet developers at their various points of adoption. Maybe there’s something to that strategy.”
Yan responded, “Yeah. But at the same time, I wish we just go straight to serverless and help get the serverless platforms themselves ready for the developers that are still stuck on Kubernetes and containers.”
Hillel said, “Part of that Knative announcement was not just Google. It was also Pivotal and Red Hat. How much more sense does this sort of strategy make if an organization is trying to shift to serverless, but still has pieces that are on-prem and are going to be on-prem for a while? For example, a financial institution may not want to, or can’t move certain things to the public cloud for regulatory or internal reasons. Does that help some of those organizations?”
Yan responded, “I guess I’m not in the best place to answer that because I haven’t worked in that world for a very long time. Thank God for that. But from what I’ve heard, a lot of companies, even financial institutions in the U.K., have gone from on-premises to the cloud. Many of them have just sliced up the whole Docker and containerization and went straight to serverless. There is no reason why you shouldn’t just go straight to serverless rather than going to Kubernetes first. I’ve also found that many people who went down the Kubernetes road end up spending 12 months just to get the first application running in production, whereas those doing serverless were delivering value in production pretty much from day one.”
Hillel added, “Then go all-in. Make the switch, go all-in, you’ll see all the benefits rather than doing it halfway and getting the pains of both sides.”
Tal added, “AWS is putting great effort into standing into all those regulations, GDPR, so there is no real reason why not to.”
Yan said, “When people tell me the cloud is not secure, I just point out that AWS is certified for pretty much everything now. And if the U.S. government can use AWS cloud, I struggle to see why your organization is not able to use AWS for security reasons.”
Hillel commented that people tend to oversell how secure on-prem is as well. Yan agreed and cited a recent example of a large financial company having their on-premises trading system down for two days.
Will Security Kill the Advantages of Serverless?
An article by Hillel was recently published in The New Stack. It addressed how security organizations are at a crossroads and faced with a decision. Security can take control of the pipeline, own the deployment process, and make a lot of inline decisions the way they did in the past. But security reviews can slow things down, negating the serverless benefit of rapid feature deployment. Alternatively, they can let the developers run free and try to chase them with some anomaly detection tools, trying to spot attacks or weaknesses and running after those.
Hillel explained, “My point was both of those things are not good decisions. The first decision is not aligned with the business, as it’s counter to the efforts to deliver more value to customers more rapidly. The second option is the same old paradigm, you’re left with ‘whack-a-mole’ chasing after security problems, where you can only catch so many things, and you can only remediate so quickly. The ideal is to find a middle ground, where you’re not shoving all the responsibility on developers. Because I don’t think they’ll take it and it’s not real to me to say, ‘Hey, developers, own security and if anything happens, we’re going to yell at you,” because they’re developers.”
Learn from the Pain: Tightly Integrate Ops & Development
Yan agreed that a middle ground is realistic. For many years, we all learned the pain of both of those approaches, and DevOps was born from that frustration. You want to have a tight integration between ops and developers.
Yan continued, “Oftentimes, developers end up taking on many of the ops responsibilities and learning new skillsets. I think the same should be happening to security as well, especially with serverless. More and more things can be done by the developers, and the tools are there so that the application developers are empowered to do a much better job with regards to security. But we do need more education as well as automation and tools that can help the developers do the right thing. I definitely think there should be better integration between security specialists and development teams, maybe with embedded teams.”
Hillel stated, “Right, security people don’t want to get in the way of the developers. They understand that developers, for example, need to craft IAM roles on their own so they can move quickly. But at the same time, security people need the ability to have some policy and visibility.”
Tal stated, “Google Cloud Functions have a major issue now that they allow only one IAM role for the whole application. That means that it doesn’t matter what function you write, they’re going to have all the permissions that you need for the application. Even if the application just needs to write to logs, it’s going to have access to your database. Makes no sense.
“AWS is doing a good job about tuning these IAM roles, but maybe it’s not clear for developers to understand exactly how they can customize it to specific function actions. Some automation here would be very valuable to give developers a way to understand what they need, and not just provide wildcards. This is part of what we are offering at Protego.”
Hillel stated, “I agree. One of the things that we saw early on was serverless gives you hundreds of functions in an application instead of tens of containers. So, you’ve got the ability to make hundreds of decisions about IAM rules, which is really challenging to do right and security has the struggle of not knowing what developers are doing and understanding their code. Developers have the struggle of not having the time or focus to really think about crafting the rules correctly, and so we really try to step in and try to help in a bunch of places.”
Tal replied, “We’ve actually seen that. More than 95% of the functions that we’ve seen out of thousands were overprivileged.”
Wildcards are OK! [No, Not Really]
Yan stated, “One of the really frustrating things I find is that many of the official documentations from cloud providers contain wildcards in their example code. That probably is one of the reasons why a lot of people starting with serverless think wildcards are okay because they’re following those official documentations.”
Hillel added, “With Amazon, even when you don’t want to put a wildcard in place, it can be challenging to figure out what permissions you need for each function. Not every service has clear documentation about what the IAM roles are and how they map to API calls. You can follow best-practice documentation and get some wildcards. You can even try to be improve, and still not find all your information you need, so it’s definitely a struggle.
“Hopefully, we’ll all get better at it. We’ll get more education. We’ll have more tooling from companies like us, and we’ll also see Amazon continue to close the gap on some of those things.”
Tweets of The Week
I want to tank you all for moving the Serverless vs Devops argument to Serverless vs Kubernetes. Now I can enjoy my popcorn.
— botchagalupe (@botchagalupe) July 25, 2018
Hillel commented,” I think that’s really topical to what we talked about earlier, and maybe there are no new problems or new arguments. It’s just us reliving the same old problems in new environments.”
after the SQS announcement, here’s my wish list for #awslambda including ability to pay for pool of warm containers & ENIs, to stream logs to Kinesis without going through CW Logs, and finalizer handler to clean up resources deterministically https://t.co/RmcrQD3C4A #serverless
— Yan Cui (@theburningmonk) July 19, 2018
Tal said, “I enjoyed Yan’s tweet with a great AWS Lambda wish list. There were some nice ones like the ability to forward logs to Kinesis streams without the need to go through CloudWatch first. Another good one is the finalizer handler. You know the code just ends. It would be great to be able to do all the cleaning. Also, predicting scaling for Lambda, which would be a great solution. At Protego, we provide some predictive timeouts with minimum function timeout required. I have my own wish list about security, but I’ll save that for the next one.
This and so much this. I take it for granted I care about writing code to provide some business value and what that code costs me, Serverless is more about what I don’t want to care about, the plumbing, the underlying stack, containers, scaling it etc -> https://t.co/ur2D9EHC5d
— Simon Wardley (@swardley) July 26, 2018
Yan said, “My selected tweet is very much in line with what I was saying earlier about getting that focus back to delivering the right value to your customers as opposed to thinking about how to get data to your code to do your computation.”
Hillel replied, “That’s a great quote. Simon is a rock star. He’s the Yan Cui of DevOps.”