0:00
/
0:00
Transcript

The Creator of Kubernetes On Building Kubernetes

Convincing Google, technical details, scaling for LLM workloads

This is a conversation with Brendan Burns, co-creator of Kubernetes. We discussed what it was like building it at Google, how he got buy-in, and what he learned along the way.

Check out the episode wherever you get your podcasts: YouTube, Apple Podcasts, Spotify (link coming after post goes live).

Timestamps

00:00:37 - How he convinced Google leaders

00:09:26 - Building the MVP

00:11:43 - How he made time for Kubernetes

00:25:28 - Technical details on building Kubernetes

00:38:46 - Rallying the open source community

00:50:01 - Scaling Kubernetes up for AI training workloads

00:55:31 - Reflections on getting a PhD

01:00:22 - The inevitable trajectory of software is death

01:04:16 - Top book recommendations

01:05:22 - Advice for his younger self

Transcript

00:00:37 — How he convinced Google leaders

Ryan:

[00:00:37] Here’s the full episode. I mean let’s start with Kubernetes because that’s super interesting. I don’t fully understand the business motivation. Like let’s say I was your director or something like that and you, you came to me at this and you said hey, let’s do this for everyone. I don’t fully understand what would be in that strategy doc or that, what would you say in that meeting that would say here’s the impact for Google if we invest in building this for the industry?

Brendan:

[00:01:09] Yeah, it’s funny because the hardest part actually of the project I would say in those early days was actually articulating that. And I think it was really clear in our heads. But figuring out how to convince people was tricky. And I think there were a variety of different ways that we articulated why it was important. One of them was related to the MapReduce white paper. So MapReduce at the time, especially Hadoop and Big Data were a big deal.

[00:01:45] I think that other things have kind of replaced them at this point. But like MapReduce was a big deal and the big data revolution or whatever they called it. And you know, Google had written the original white paper, but Hadoop was an open source project that Google had nothing to do with and got no credit for. Right. And they just read the white paper and they re implemented it and it’s not the same.

[00:02:12] It’s similar but it’s not the same.

Ryan:

[00:02:13] Right.

Brendan:

[00:02:14] Because anytime you re implement something it’s not the same. And so part of the argument was like, look, we have this cloud and we want to be influencing the technological landscape. If all we do is kick out white papers, we’re not, if it doesn’t run, if it’s not something that people can run, we’re not going to be in the driver’s seat. And so that was one of the arguments. I think the Other couple of arguments were like, why Containers?

[00:02:43] Why not why, why people are using VMs, why containers? And a lot of that was talking about, look, the demands of writing software. We know internally, we know from doing this internally that the demands of writing reliable software necessitate having systems that are sort of like autopilots for your application. And we know that this is something that, as software becomes more and more critical to more and more businesses, this is something that they’re just going to have to have.

[00:03:14] And so that’s sort of the why Containers part. And then I think that the third part was the like, why Open Source, right? And in some states that’s like the most interesting conversation because people are like, wow, this would be, you know, you’ve convinced me, right? Like, you’ve convinced me, we should build it, we should make it available to the world. It’s something that they’re going to find useful.

[00:03:34] But wouldn’t it be so much better if they could only use it on our platform? And you’re like, yeah, that’s absolutely the case, but you can’t win if you make it only on your platform.

Ryan:

[00:03:48] Why is that?

Brendan:

[00:03:49] Well, because there’s other platforms out there, right? And so if you make it an exclusive, then the people who, for other reasons are on other clouds or on, on premise, they’re shut out and so they’re going to just go build an alternative, right? Like the open source. It’s sort of like the Linux. You know the reason Linux won, right? It’s because Linux could go anywhere, right? The whole reason that open tech and open ecosystems win is because the majority of people are going to be not on your platform.

[00:04:25] Like, if you’re, if you’re not the leader and you know, GCP was not the leader, then the majority of people are not going to be on your platform. And so if you make it such that the majority of people can’t use your thing, they’re just going to ignore you and then they’re going to go build their own, right? Whereas if you go and build it and you build it for everybody, but you make sure that it’s awesome on your platform, then you have a chance of attracting more people to your platform than otherwise.

[00:04:51] And then I mean, also in some ways it’s just like the aesthetic of the time also, right? It’s like if everybody’s using Linux and everybody’s using Docker and everybody’s using these programming languages that are all open. Like, if everything else is open source, you don’t want to be the thing that’s not right. And there’s only a few places where, like, where that hasn’t been true historically in technology where you could be different and still succeed.

[00:05:15] And you have to be so differentiated that. And I don’t think that we were that differentiated. You have to be so differentiated such that people are like, oh, actually I want that thing so bad, I don’t care that it’s closed.

Ryan:

[00:05:27] So at the time, just thinking what was the competitive landscape? I guess if I, if I remember it was, I mean, AWS was dominant, they were there first and doing very well. GCP at that time, probably an up and coming company or I guess offering. And so my understanding then that the idea is let’s pull market share away by giving kind of open source distributing Kubernetes to more and more developers and then they’ll be more open to kind of migrating or using gcp because you’ll make a.

Brendan:

[00:06:00] They’ll pay attention, right? They’ll pay attention. And also like you change the dialogue too, right? Like tail light chasing is hard, right? Like if, if someone else has built VMs and everybody’s using VMs and all you’re doing is saying like, well, we’re building the same thing that they have, only maybe it’s a little bit better because we do something else over here or you know, whatever. Like that’s a hard market strategy to articulate.

[00:06:22] But if you create a brand new playing field where you are the thought leader, suddenly people are listening to you. Even if they’re not using it on your platform, they’re listening to you. And that gives you way more voice. You get a lot more voice in the market. It changes the narrative, it changes who people are listening to. And so that control of the story is an important aspect of I think, how you break through that dynamic or you try to break through that dynamic.

[00:06:51] Obviously they’re still in third at this point. So didn’t work out, but I mean, I wouldn’t say it didn’t work out. I think it worked out in general, but it’s still hard. Overcoming those kinds of market dynamics is hard. And I think the other thing that happened is everybody in the cloud consolidated around it. And so now Kubernetes is just sort of a utility everywhere.

Ryan:

[00:07:12] You mentioned that. I guess that perception benefit of being the dominant offering, which, I mean if you, if you look at what happened in hindsight, it makes a lot of sense. I mean that is what happened and it’s, it’s a wonderful benefit. But I guess when you were looking forward and you Were talking to leadership. Were you cognizant of those benefits and saying we need to do this because we’re going to kind of become the dominant offering and it’s going to have all of these optics benefits?

Brendan:

[00:07:44] I mean, I think we absolutely wanted to make sure that we were front and center in terms of thought leadership. And we definitely articulated that.

Ryan:

[00:07:50] Right?

Brendan:

[00:07:51] Absolutely right. Like being in a thought leadership position is valuable.

Ryan:

[00:07:56] It’s interesting because I feel like it’s hard to quantify. I mean, if I was in that meeting and we’re trying to make a call, how much is that worth?

Brendan:

[00:08:06] Yeah, well, I think you have to also realize though, at the time it was pretty cheap, right? It was like eight or nine engineers and we kind of. And this is in some sense, I mean, it’s both a blessing and a curse. Like, part of the reason why we articulated and argued for having such a distinct brand, where the Kubernetes brand was separated from the Google brand, was that it kind of gave us freedom to fail also.

[00:08:29] It was like, hey, if these eight people go off and do something and it turns out to be stupid, like we’ll just kill it off. And it won’t have, like, it won’t, you know, it won’t, it won’t damage the broader perception of the cloud. And so I think there’s that benefit of the open source part of it too. It helped with adoption, right? Like, it helped us and especially as we went to like the Linux foundation and things like that and truly established like an independent entity, it helped ensure that, you know, people like Red Hat or Azure or AWS could take a bet on Kubernetes and feel confident in that bet.

[00:09:05] But it also was an insurance policy against failure. And also, to be honest, it simplified a lot of things too, because we were competing against startups at the time. Docker is a startup. They can be way more agile than a big company. And so by virtue of being a separate entity, we could be a little bit more agile also.

00:09:26 — Building the MVP

Ryan:

[00:09:26] I mean, the earliest conception of this project was you and two others kind of hacking something together.

Brendan:

[00:09:32] Yeah, it was a demo. I mean, it was sort of a demo almost. It was like, look at what we can do if we just smash a bunch of existing open source tech together.

Ryan:

[00:09:39] What did that demo do?

Brendan:

[00:09:42] I mean, it was basically like a basic cube control. It was like, hey, here’s a container I built. I mean, at the time you had to explain Docker to people. You were like, hey, here’s Docker. I used it to like build this Container image and then you could run it and deploy it and see that it had gotten distributed across a bunch of machines and that you could load balance to it because you’d hit a single endpoint and it would go, I’m replica one.

[00:10:06] And then you’d hit reload and go, I’m replica three. So it showed that it was replicated and then basic health checking so if you killed it, it would come back. And a V1 to V2 upgrade, that was about it.

Ryan:

[00:10:21] How long did that initial MVP take to build?

Brendan:

[00:10:26] I wrote it in, I don’t know, a little under a week maybe, something like that. I mean, like, I don’t, don’t work on the weekends, so maybe five days, Four days. Five days.

Ryan:

[00:10:35] And did you drop all of your existing work? Because I’m imagining you had existing project work and this is kind of extra credit stuff that you were working on.

Brendan:

[00:10:45] Yeah, well, I mean, I wouldn’t say I dropped it, but like in a timescale of that timescale, like you can kind of like slack on it a little bit, you know, like you could be sick for a week. I mean, I guess the thing is like you could be sick for a week, you know, And I’m not saying that that’s not what we did, that’s not what I did, but like, but, but there was enough flexibility in the system that like you could hack it together.

[00:11:10] And I mean, believe me, it was hacked together, right? Like every possible shortcut to take and every. And you know, I think one of the things I’ve been good at historically is integrating other open source projects together, seeing how you can take stuff off the shelf and put it together. And so a lot of the nuts and bolts were pieces that we could take from other open source projects and kind of combine together with glue and glue code to give the feel of it.

[00:11:42] So that helps too.

00:11:43 — How he made time for Kubernetes

Ryan:

[00:11:43] I think a lot of software engineers, when they hear this kind of story, they think, oh, I have my existing responsibilities and I can’t necessarily go off and build this thing, even though I think it’s a great idea. Do you have any advice for someone who has that opinion?

Brendan:

[00:11:59] Yeah, well, I mean, I think that what I would say, there’s two things, two. I have two answers to that. One is advice that I’ve always given to every single person that’s ever worked for me, which is I believe you can hide order 10% of your effort from your management. Right? So like, you know, there’s there, you have slack, you have the ability to slack no matter What? Right. And you know, as you get a bigger and bigger org, actually the percent that, what you can do with that 10% actually increases.

[00:12:36] And a lot of really influential good ideas that I’ve had have come out of that. I mean, it’s another, I mean, it’s kind of a flip way of saying, I want to empower people locally to make local decisions that they think are optimal for the business without having to consult up the chain, without having to ask permission. And you tell people, when you tell them that you’re like, by the way, you’re also going to make a bunch of bad decisions and you’re going to waste a bunch of time.

[00:13:02] And so you have to be comfortable with this idea of like, I’m going to try some ideas. Some of them are going to fail, some of them are going to succeed. When I look back retrospectively, the ones that have failed were effectively a waste of time. And it might be the difference between exceeded expectations and a met expectations. You don’t want to drop below meets expectations, but it might be the difference between an exceeded expectations and a meets expectations.

[00:13:25] And you have to be comfortable with the notion that you’re going to bet five times and the payout from one of them hitting is going to be way better than the grinding to get that exceeds every single time. And I think there’s equally valid paths where you don’t do that. And I think you have to be the right kind of person who’s willing to take that kind of chance. And that’s not everybody. And that’s okay.

[00:13:54] And I think the other side of it, but I always remind everybody also though is like, people say things like that to me sometimes and I’m like, so do you play Call of Duty? Do you watch Netflix? Do you watch YouTube? Because I pretty sure there’s probably 10, 15, 20 hours in your week at least when you’re doing something that’s not work. And I can tell you in that time period, I wasn’t doing anything except for this and work and a little bit of family and sleeping.

[00:14:25] And so sometimes it’s also about saying, well, what are you willing to give up? You know, to, to have the space to do that. And, you know, I’m not a big, like, I, I, I’m not a big like, work all night kind of person. But, like, it does mean, like, maybe not going to watch YouTube for a while, maybe not going to watch, you know, sporting events for a while.

Ryan:

[00:14:44] That makes sense. And I mean, on that second point in this case, I mean, it was Such a. The returns of this project were exponential, obscene. I mean, it. If you put in two times the time for a year, you get 20 times the impact. So it just kind of makes sense in terms of investment of time.

Brendan:

[00:15:08] Well, and also, I think, I mean, I found personally that it was addictive. Right. I mean, I think I benefited from two things. One is I really like to write code, right. Like, I enjoy it as an activity. And so if I’m choosing between Netflix and coding, I’m actually pretty happy just coding. So that’s a benefit for me. And then for me anyway, once people start using it and they’re excited about the project and they’re putting issues on GitHub and all of this kind of stuff, I’m just addicted to it.

[00:15:44] I just want to close that issue. I want to help that person out. I want to. I’m going to go till I’m falling asleep on my laptop. And that’s just because I enjoy that. That’s why I’m in the industry. So even in the moment, I wasn’t. I mean, I was definitely not thinking about like, oh, here’s this payout that I’m going to get for the rest of my career. I was definitely in the, like, wow, I just want to keep this thing going.

[00:16:10] I want to keep this, you know, I want to keep this rush going.

Ryan:

[00:16:13] Because it sounds like it took a while for you guys to get buy in from leadership.

Brendan:

[00:16:17] I think there was a solid six months of going from a very hacky prototype to something that, like, legitimately we thought somebody could take and use and laying down the right kind of groundwork for that. You know, there’s a lot of little details that you have to get right along the way. And, and it’s always nice also because, you know, a lot of, a lot of the people who we brought in in the early days had built similar systems before.

[00:16:52] And so they were having this opportunity, this kind of clean room. It’s rare in your life as an engineer to get a clean room opportunity to rebuild something that you have ideas about how it could be better. It’s like getting a second chance, you know. And so that was also, I think, really attractive to people because it’s suddenly like, oh, wow, like this is a clean room. We don’t have any users right now, so we don’t have to be fixing bugs because some big company who pays us a lot of money is asking for something or whatever.

[00:17:21] We’ve got this clean room time and we’ve all spent a lot of time thinking about what the system could look like. And so now we get to go and just build the thing that we imagined.

Ryan:

[00:17:33] You mentioned that first point on hiding maybe 10% of your bandwidth from management. And I mean, that’s. That’s super interesting to me. What does that look like in practice?

Brendan:

[00:17:45] Well, I mean, I think that what it means is that you should always have sort of like a side project that you think is relevant. Right? Like, you should always have something that is. That nobody told you to do, but that you think is important that you’re working on.

Ryan:

[00:18:01] I see. So it’s kind of like, I remember Google, I don’t know if they still do this, but at the time there was 20% time.

Brendan:

[00:18:07] Yeah, it’s similar to that kind of idea. Yeah, yeah, exactly.

Ryan:

[00:18:11] Okay, so when you say hide, you don’t mean hide. You say manage expectations so that your manager is also okay with you working on another project.

Brendan:

[00:18:20] Oh, no, I actually do mean hide. Right. Like, don’t ask permission. Right. Like, sooner or later you’ll show it to them. But, like, it’s pretty like you need a solid, I don’t know, couple months or whatever to get to, like, a place where it’s something you could show to somebody. Right. And so, like, it’s all about saying, like, I’m not going to ask permission. I’m going to go build something that I think is important and useful.

[00:18:43] And then obviously, when it comes time to launch it or whatever or put it out there, then you do have to ask permission. And so then, yeah, you say, hey, I built this thing, but you’ve had that time to get it from. Because, I don’t know, I feel like it’s hard to articulate the value of something like in a doc or in a PowerPoint. It’s way more effective if it’s like a running thing that you can, like, somebody can interact with.

[00:19:13] Right? So getting that time to basically build it up into something that’s real and could ship. Because also, like, in some sense, your manager’s always assessing, like, well, should you spend your time on that or should you spend your time on this? Right. And by building it, you kind of like force their hand because it’s no longer like, should you spend time building this or should I spend time building this?

[00:19:36] It’s like, I’ve already built this. Do you want to ship it? And that’s in some ways a much easier decision. I don’t know about easier, but it’s not an either or. It suddenly becomes just sort of about, like, is your idea good? Because the work is done.

Ryan:

[00:19:51] I mean, in the happy path, I feel like it’s a great idea. You launch this thing has impact. It’s great. What about in the case that you work on this thing and you didn’t tell anyone about it and then no one cares when you launch it or it’s not as good as we thought.

Brendan:

[00:20:07] Yeah. And that’s the flip side, right? You have to be comfortable. I mean, as I said, you have to be comfortable with that idea that you’re going to waste some time and maybe there’ll be a waste time in the sense of like, wow, I could have been watching Netflix or know, whatever, and I. Instead I wrote a bunch of code that nobody liked. It could be wasting time in the sense of like, wow, I could have, you know, could have gotten promoted.

[00:20:33] I could have done enough work to get promoted and I didn’t because I thought this was the great idea that was going to get me over the hump. But it wasn’t. And I think you just have to be comfortable with that. Like, you know, it’s taking. I mean, it is taking a risk. It’s not unlike in some sense like doing a startup or something like that. It’s taking a risk and you can’t assume. I think sometimes people go into any of these sorts of things and they’re like, oh, I will have the idea and it will be amazing and then it will hockey stick.

[00:21:03] And I think if that’s your mindset when you go in, you’re probably setting yourself up first in disappointment. You have to go in with that mindset of, I think this is good, I’m going to try, but I’m okay if I fail. I know that I’m making an explicit choice here.

Ryan:

[00:21:20] I also imagine at some point at the highest levels of engineering ladders, you need to take that risk to get promoted to higher levels. For instance, if you’re a staff engineer or senior staff engineer and you ask your manager, how do I get promoted? They’ll often tell you, you need to figure out what that project is. Because I can’t just hand this to you because it’s starting to become more ambiguous.

Brendan:

[00:21:46] Oh, yeah, for sure. Yeah.

Ryan:

[00:21:48] I could totally see that this becomes a necessity at some point if you want to kind of grow. And I mean, this project also got you promoted as well at Google, right? From staff at some point.

Brendan:

[00:21:59] Yeah. No, I mean, certainly my career absolutely benefited from the success there. Absolutely right. And I think you’re right that there’s also this aspect of at a certain point, you’re just expected to be the person who knows enough to come up with the really good ideas. And that’s just the expectation. It’s no longer about can you execute on the ideas that other people give you? And that’s a big part of it as well.

[00:22:22] And I think there’s also the other thing I tell people sometimes when they’re thinking about getting promoted is if you create the idea yourself entirely, it’s blindingly obvious that it was you who did it. If you succeed and have impact in something that is a bigger project or someone else’s idea, you can still have a very successful career, but it’s a little bit harder for it to be directly attributable to you.

[00:22:57] But again, I think that again, it’s a roll of the dice at some level. There probably are people out there who’ve tried over and over and over again and just have never had the right idea or just had the right idea at the wrong time. I mean, I think one of the other things that is interesting about innovation that is disruptive is that it’s a combination of being the person who has the idea and being in a time in which the idea can take off.

[00:23:27] So you could have the idea, but it could just be the wrong time and it won’t do the same and it won’t go in the same direction.

Ryan:

[00:23:33] That point on promotions, I think, I mean, if you create the scope, not only is it obvious that the credit should go to you, but it also feels kind of permissionless, like you don’t need to wait for, I guess, management or someone to give you the opportunity. You can kind of create it. And so you have a lot more control in that process. One thing on the business strategy, I guess before we leave that for kubernetes, is at that time, Borg to me felt like a competitive advantage for Google, like some secret infrastructure sauce.

[00:24:08] I would have thought people would be kind of worried about giving away any part of that to the industry. So what was the thinking there? How. How did you convince people that hey, it’s okay?

Brendan:

[00:24:21] Yeah, I mean, I think that there was a little bit of that and I think, you know, sort of to be a little bit make it into a little bit of a joke or whatever. I sort of, one of the things I said to people was it’s not like you men in black flash people as they leave Google. It’s not like everybody comes to Google and it just stays there forever. Right. And in fact, as we talk to people at Facebook, as we talk to people at Twitter, we talk to people at other scale out tech companies.

[00:24:47] They were all building this stuff. It wasn’t really a secret. And there was also, I mean, mesos at the time was, you know, not the same, but similar. And like you could just see that there was going to be an open solution. And so in some sense part of the argument was like, look, there’s going to be an open source solution. Do you want it to be one that we can influence or not? It’s not like, do you want there not to be an open source one or not there’s going to be an open source one.

[00:25:06] Do you want it to be ours or do you want it to be someone else’s? And it’s just reframing the choice. Right. And making it clear that people understand that that is the choice. Right. That you don’t get to choose the proprietary option because it’s just not viable.

00:25:28 — Technical details on building Kubernetes

Ryan:

[00:25:28] When you were building that MVP for the Orchestrator, well, how’d you decide? Because there were no customers or anything like that. So how’d you decide this is the minimum set of features that we need for this to before we launch it?

Brendan:

[00:25:42] Sure. Well, I mean, I think absolutely, you know, we benefit from the fact that there were three of us working on it. Right. So Craig was a great product manager and Joe was a great engineer and fantastic at API design. And I could write code fast. Basically, I think is sort of the, if I had to sort of stereotype all of us. You know, that’s the like, Craig was the product business guy and Joe is the like, I know how to design, really good at design kind of person.

[00:26:11] And I was basically like, I can hack prototypes, like there’s no ways to know tomorrow. And I think we reflected a lot about our own experiences. And also we had seen the pain of people deploying into traditional VM infrastructure. And so we had that kind of knowledge of the pain that people were going through. And then at the time there were people like Netflix who were talking about immutable infrastructure and they were kind of advancing some similar concepts.

[00:26:47] And so there was also kind of like a broader movement happening that we were taking part in. And so obviously there were literally no customers, but in some sense there were customers, they just weren’t customers yet. So it’s not like creating something brand new. I guess in some sense. I really don’t feel like Kubernetes was something that was brand new. I feel like it was a coalescing of a lot of ideas that were kind of circulating in the industry at the time.

[00:27:19] And it just became an anchor point and a really good expression of those ideas.

Ryan:

[00:27:23] So when you talk about, I guess you wrote a lot of code quickly, did you write most of the code, I guess for this initial MVP?

Brendan:

[00:27:34] Yeah, I don’t know what the number is, but high 80s percentage, maybe more of the original code. And I think I’m still number. I mean, I haven’t contributed significantly to Kubernetes in a while and I’m still, I think, number five on the overall contributor, you know, overall contributor list on the GitHub commits. And I was number. I was number one for a long time.

Ryan:

[00:27:55] I mean, after writing that much code for Kubernetes, which part of this system was the hardest to build?

Brendan:

[00:28:03] I think I’m going to say, like. Because I don’t think any of this specific code was that hard. I think that the hard part was the decision that we made early on that it was going to be a really loosely coupled system. And so it’s very. Which is great for resiliency. Like we made this decision around very loose coupling, a lot of independent actors taking actions. There’s all these control loops running all over the place, which is really good for resiliency.

[00:28:36] But when things go wrong, it’s really hard to figure out why it went wrong because you’ve got 15 different processes that are all having to work together to achieve an outcome. And so you can see that the outcome wasn’t achieved. But then you’re like, okay, but what happened? And now I have to sift through a bunch of different logs and a bunch of different operations of executables and sort of reconstruct in time what happened.

[00:29:03] And especially early on, we didn’t have very good. We didn’t have very much consistency around logging. We didn’t have very good consistency about events and things like that. And so I think the hardest part, I mean, this is the hardest part of anything, I guess.

Ryan:

[00:29:22] are all distributed everywhere and.

Brendan:

[00:29:25] Yeah. And everything’s out of time sync and I mean. And hopefully you logged the right things. But a lot of time early on, like you didn’t log it. Like, you know, and because it’s an interaction effect, it’s hard to reproduce. Oftentimes it’s hard to reproduce the problem. Like if the problem reproduces easily, then it’s pretty easy to fix, even if you don’t have the logs, because you just go and add the logs and then you do it again and you see what happens.

[00:29:52] But for Problems that are transient, that because of it’s just a race condition between two or three different things happening. You can go in and add the extra logs, but then you have to figure out how to make it happen. And that can be pretty tricky. And also, I mean, I’ll say the other thing is we were all learning Go. So maybe the other thing was there’s some gotchas in Golang and we were all kind of like learning all the gotchas on the fly.

Ryan:

[00:30:19] I would have thought there’s something that’s controlling everything, right? Like there’s some leader or maybe some nodes that are looking at one of them to kind of coordinate everything. Like leader election if the leader goes down, I would have thought would be some really challenging problem.

Brendan:

[00:30:37] Yeah, I mean, well, I think the reason it wasn’t that big a deal was because we relied pretty exclusively on Etcd to do that for us.

Ryan:

[00:30:50] ETCD is another framework or open source.

Brendan:

[00:30:54] Yeah, Etcd is an open source. It’s really part of Kubernetes now. But at the time it’s a RAFT based consensus system key value store. And so it was a pre existing piece of software that CoreOS had written that implemented the RAFT protocol because at the time, because Paxos was the original for this and Paxos is really hard to implement because the algorithm is really complicated and nobody understands it.

[00:31:25] Well, probably somebody understands it, but not a lot of people understand it, but it’s provable. And then right around that timeframe people had come up with Raft, which is a provably correct consensus algorithm, but it’s way easier to implement. ETCD implemented the RAFT protocol and then gave you this consensus reliable store so that you could do, you had multiple replicas and it would do the consensus there.

[00:32:00] And it doesn’t do leader election exactly, but it gives you enough primitives that doing leader election is relatively straightforward. And I guess I would also say that two things about that. One is we decided to force all of the access through an API server. So nobody had. And this was actually something I pushed really hard initially, but I think the general agreement, but I definitely pushed for it, we was that nobody got to use Store, everything had to be remote store.

[00:32:33] Like nobody got to write stuff to disk themselves. Every piece of the system had to use the API server and had to use Etcd behind the API server as the way that it did any sort of persistence.

Ryan:

[00:32:48] What’s the main benefit of that?

Brendan:

[00:32:49] The main benefit of that is that everybody gets to restart all the time and they just come up and they work. So you don’t have to worry about corruptions, you don’t have to worry about schema changes, you don’t have to worry about any of the like. Everybody was effectively stateless except for the database. Like there was, there’s the ETCD consensus algorithm database and that’s the only place where there was state.

[00:33:11] And so as a result, the whole system was just a lot easier to make stable. The downside of it is it leads to this loose coupling where it’s a bunch of independent loops mediating everything through this storage layer, which made the debugging part harder. Those are the trade offs. If you have a complete log of I’m in. If you think of it as a state machine, it’s much easier to understand where you are and where you got to if you’re in a state machine.

[00:33:49] But state machines are a nightmare to make reliable. They’re easy to debug, but they’re hard to make stable. Whereas the system we built was designed to be stable, but hard to debug. The trouble with the state machine is a state machine says the world looks like this and unless you get it exactly right, sometimes the world looks like something you didn’t imagine. And at that point you’re kind of screwed because your state machine doesn’t know what to do.

[00:34:20] Whereas because we had these control loops that were based on a desired state and a current state and trying to drive the current state to the desired state. Like no matter where you woke up and found yourself, you kind of knew where you were supposed to drive to. And that. And that’s the stable part. That’s the stable part of it is that like it didn’t really matter where the system got itself. It was always trying to drive towards the desired state.

[00:34:48] You know, inspired honestly a lot by like control theory from robotics actually. Like.

Ryan:

[00:34:53] Oh, like pid.

Brendan:

[00:34:54] Yeah, the same idea. Right. Like you could imagine if you tried to write a PID controller to balance a beam with a bunch of if else loops. It just doesn’t work very well.

Ryan:

[00:35:04] And this kind of reminds me, because I was reading in some of the design, I guess it’s a feature of Kubernetes, is that it’s declarative rather than imperative. So you kind of just say, I want this to be true, I don’t care how you get there, instead of saying, just do X, Y and Z. I’m curious the pros and cons of that, that design decision.

Brendan:

[00:35:27] Well, yeah, I mean, and that was something that was happening broadly in the industry, like that’s a part of the whole like infrastructure as code movement that was happening at the time. So we’re not the only ones who said that, but we definitely embraced it. You know, I mean, I think the benefit is, you know, you have clarity about the way you want the world to work. Right? It’s not like if you, if you, if you execute a bunch of instructions, start this, run that, do this, you’ve done a bunch of stuff in pursuit of some objective, but you never wrote down what the objective was.

[00:36:04] There’s no record of what you were trying to achieve. I’m trying to create a website. Well, you didn’t write that down. You just took a bunch of steps with a declarative approach. You actually write it down, you say, I’m trying to create a reliable website. And here’s what a reliable website looks like. Hey system, could you take the steps to get there? And so you have that record and it obviously makes it easier for things to be self healing because if you’ve written it down, now I know where I’m supposed to go back.

[00:36:34] If I get perturbed from that state, if something fails or something restarts, well, I know where I’m supposed to go back to. And similarly it has side benefits of once I write it down. Well, I can apply code review to it, I can apply unit tests to it. There’s a lot of the mechanics of how we do software development that apply once you write down that declaration. So a lot of those are the benefits.

[00:37:02] I think the downside is probably just complexity in comparison to going click click through a wizard or whatever in a GUI learning. And you know, everybody complains about the YAML and I have to learn all this stuff and you know, like it does introduce a learning curve. Now I think fortunately at this point there’s enough education material out there that it’s not and Gen AI too for that matter, that it’s not that bad a learning curve.

[00:37:33] But certainly in comparison to what people have done before, that’s probably the biggest downside. But I don’t know, I think that the upsides are like up here and the downsides are like way down, like there’s, there’s not a lot of downside.

Ryan:

[00:37:45] I see. Yeah, I could also see that being helpful for I guess if you want to optimize anything under the hood because you’re just making a promise to people that this is going to happen, but if you want to do it in a more efficient way or something like that, then I guess it just gives you all the Power to do so.

Brendan:

[00:38:02] Yeah, well, I mean, and that does make things like machines failing a lot easier because people don’t say, run this on this machine. They just say, I want three of this to be somewhere. If a machine fails, well, it just moves somewhere else because the application isn’t tied. Because I can’t. In some ways, I don’t know your intent. If you log into a machine and you start a process on that machine, is it because you wanted a process or is it because you wanted a process on that machine?

[00:38:31] I don’t know. You didn’t tell me. And so if that machine fails, what should I do? Well, I don’t know. Right. But if I. But if I know you said, hey, I just want three replicas, well, then I know it doesn’t matter that it’s on that machine. It could be on a different machine. You’d be just as happy.

00:38:46 — Rallying the open source community

Ryan:

[00:38:46] I want to shift a little bit to kind of when Kubernetes was scaling, and it sounds like a large part of this was getting buy in from other companies and other people. And so how did you get the buy in from. I know OpenShift was an important part of Red Hat and other companies that join on. How did you sell those companies that Kubernetes is what you want to use?

Brendan:

[00:39:14] Well, I think for a lot of them, especially in the early days, it was kind of that quote around undifferentiated heavy lifting. They had some other objective. OpenShift was trying to build a platform as a service or, you know, they were, you know, a lot of our early users who were also contributors, you know, they were trying to build some sort of reliable web service or something like that. Right.

[00:39:36] And so it was like, well, we’re going to have to build this thing anyway. Why don’t we all build it together? And we don’t really care because we don’t think that’s our value. Like our. We don’t think our value is tied up in that layer. So we’ll go contribute to your thing because we’re going to get more value out of the collective than out of trying to do it ourselves. And so for a lot of the early partners, that was a big part of the argument was like, hey, we’ll let you in.

[00:40:03] And part of that is making sure that they understand that they’re going to like that they’re going to be equal partners where it’s not like, because it’s one thing to take a dependency on something, but then you’re kind of like taking dependency on someone else’s roadmap. And so it was really important also to say, hey, you can come take a dependency on us, but also we’ll give you a seat at the design table.

[00:40:27] So when you need new features, you can contribute those features. And here’s what we’re trying to achieve and it matches up with your roadmap and that kind of stuff. So I think that’s how we approached it. And then over time, people became more and more interested in being part of it because there was a growing ecosystem. So when you look at like networking providers or storage providers, as their users were starting to become Kubernetes users, they were motivated to make sure that their networking system worked well with Kubernetes or their storage system worked well with Kubernetes and things like that.

[00:41:03] So that was sort of a secondary layer of partner discussions that we had.

Ryan:

[00:41:06] Right. And that’s downstream of becoming the dominant player. And I guess that’s validating the open source strategy, which is you become dominant, everyone’s kind of got to integrate with you and all of that. How did you prevent Google from dominating in the roadmap or I guess controlling what Kubernetes would be given that it started at Google, funded largely by Google?

Brendan:

[00:41:30] Yeah, I mean, I think that was really important and I think it was a critical part of gaining adoption. Right. And becoming the industry standard was giving it independence. And I think there’s two pieces to that. The first is getting it to foundation. So the creation, it was only a year in that we created the Cloud Native Compute foundation that we donated all of Kubernetes to the Cloud Native Compute Foundation.

[00:41:58] And so getting the project, the logos, all of the legal stuff, trademarks, all of that stuff into an independent software foundation with the Linux foundation was critical. Right. Because it’s hard to partner if somebody else has trademarks on the Kubernetes logo or whatever. And then I think the other piece that was important, that came a little bit later, was writing down the governance rules. So for the first time for a few years, Kubernetes didn’t have any really governance rules written down.

[00:42:38] It was a mistake, I would say. Right. Like we didn’t realize how it was really. It was something we should have done earlier, but we didn’t. And so we sat down in 2016 to write the governance rules. And I think all of us were aligned on this idea that we didn’t want any one company to be able to take control of the community. And we really built the community and the rules of governance to be Democratic.

[00:43:09] You know, we’d never. I mean, I think that’s an aesthetic from Craig and Joe and I. We never set out to be like a benevolent dictator for lifestyle project. We always set out to be a distributed, you know, distributed ownership, democratic kind of project. And we codified. We codified a lot of that into the governance docs that, you know, continue to this day. So I think both those things together really helped make sure that it was an industry standard and not any one particular company standard.

[00:43:39] But also, I think we’re critical to its success. I think they’re duels of each other. You can’t have one without the other.

Ryan:

[00:43:46] People wouldn’t have come on if they didn’t see that it was governed well and open.

Brendan:

[00:43:52] Yeah. Because obviously, if you’re thinking about adopting it or you’re thinking about putting it in your service, the thing you’re worried about is whose roadmap am I betting on?

Ryan:

[00:44:02] When you said governance, is that literally, when I think of government, there’s a constitution somewhere?

Brendan:

[00:44:09] That’s literally what we wrote.

Ryan:

[00:44:11] Did you write that yourself or is that something that lawyers do?

Brendan:

[00:44:14] No, we wrote it ourselves. In the span of about. It was a couple, three fairly intense meetings amongst six or seven of us. We got together and just kind of talked it through and looked at a bunch of other communities and kind of like, what had worked, what had not worked, what were we worried about, what were we trying to achieve? Some of it was codifying stuff that already existed. So we had some loose organization stuff that already existed in sort of a de facto way, but didn’t exist in an explicit way.

[00:44:56] Some of it was. We created this steering committee that had never existed before. And we just basically. And we were lucky, I think, that we were able to gather so the people that came together. We called it the Bootstrap Committee. We were lucky in the sense that we had enough people who kind of were not who the entire community would look at as being leaders. And we weren’t fighting with each other, we weren’t infighting.

[00:45:29] So we were all kind of aligned and we kind of got everybody. So we didn’t have to be like, oh, we grabbed this side and not that side. We kind of. We grabbed. We were able. And it was like seven people, I think seven or eight people. We were able to pull together a group of people that really represented everybody and that everybody kind of all respected each other and respected each other as leaders in the space.

[00:45:52] And a lot of credit there, I think, goes to. I mean, everybody who is involved deserves a lot of credit. But Sarah Novotny, who is our community leader at the time, deserves, I think, a ton of credit for bringing that thing together.

Ryan:

[00:46:07] When you look back on Kubernetes, because with an open source project there’s obviously the read aspect, which is everyone can use and duplicate this code and execute it. But there’s also, I guess the writing part, which is people making contributions. What percent of the contributions actually come from the community and what percent is actually just the main stakeholder companies just putting in their code?

Brendan:

[00:46:33] Yeah, I don’t have the specific numbers for Kubernetes, but my experience in open source says it’s like 80, 90% the core contributors and less than 10% to other people. It’s hard, I think in general it’s really hard to get people to contribute. Part of it is companies, honestly. Right. Like, you know, companies like Microsoft. We make a commitment to contributing to open source. And so, you know, we at a leadership level, we’ve decided that this is something that we want to invest in.

[00:47:06] And so we’re willing to have teams of people who specialize in working in upstream open source projects. But for a lot of users of Kubernetes, you know, they’re a retailer or they’re a banking industry or they’re like it’s tech isn’t their core thing that they’re doing. Tech is a means to an end, to deliver an app for their user. And in that world it’s pretty hard to justify. Well, I’m going to take 10% of my people and I’m just going to do upstream open source contributions.

[00:47:36] And especially if the leadership is like not a technical leadership. And so they didn’t necessarily grow up in those communities. If you grow up in finance, it’s hard to explain what’s the value of contributing to the. The value of taking the open source is very clear. Right. It’s free, but the value of contributing back, it’s harder to explain or legally. Like we ran into people. Even today it’s getting better, I think.

[00:48:00] But even today I’ve run into people who say we would really love to contribute. Our engineering leadership is aligned, that we would want to contribute, but our legal team is worried that if we contribute we’ll be liable if we introduce bugs.

Ryan:

[00:48:15] Right.

Brendan:

[00:48:16] So like if.

Ryan:

[00:48:17] Would someone sue them or.

Brendan:

[00:48:18] Yeah, I think that’s what they’re worried about. I don’t. It doesn’t hold water legally and I think the Linux foundation can give you plenty of like case law and things like that to show why it doesn’t hold water. But sometimes that’s enough to block someone from contributing.

Ryan:

[00:48:37] I never thought someone would get sued for adding a bug. I mean, everyone adds bugs on accident.

Brendan:

[00:48:44] Well, but, I mean, but on the other hand, like if you write a proprietary piece of software and you sell it to somebody and it has a bug and it causes your house to burn down, like you can imagine you’re going to sue the people, right? So like it is sort of a legit worry at some level of like if I wrote the JPEG open source JPEG library that ended up in the smoke detector that caused the house. Yeah, like you can sort of imagine the chain of logic that gets you there.

[00:49:08] I don’t think it’s true. I don’t think it would hold up. I think a lot of the licenses, you know, a lot of the open source licenses include indemnification language that basically says if you use this, you’re using it under your own risk. And like you can’t sue us if it burns down your house. But I think that that’s a worry for, I mean. Well, I’ve heard, no, I don’t think I’ve heard from people that, that, that their companies do have that worry.

[00:49:34] And again, in some sense it’s because they’re like, well, what’s the value if I see this potential risk and I don’t necessarily see the value. And I mean, and like, also, like again, if it’s not a core thing, if you’re not a tech company, you know, is that developer really capable of like arguing with legal, Arguing with legal about what you can and can’t do? Probably not. Right. Like they’re probably just going to fade away.

[00:49:56] Right. So, you know, there’s that aspect too.

00:50:01 — Scaling Kubernetes up for AI training workloads

Ryan:

[00:50:01] I remember this is many years ago. I read this blog post that OpenAI put out before. I think OpenAI was kind of huge and it says, here’s how we scaled Kubernetes to 7,500 nodes or something. Crazy.

Brendan:

[00:50:15] I forgot.

Ryan:

[00:50:16] Yeah, yeah. And so I want to know, there’s these new workloads coming in for AI. There’s training, which is this huge, I guess, all at once workload. And then there’s inference, which is latency sensitive and you kind of need it to come out instantly. Here’s how has Kubernetes adjusted over the years to handle these kinds of workloads?

Brendan:

[00:50:43] Yeah, I mean I remember when we couldn’t really handle more than about 100 nodes. So it’s definitely been a lot of optimization in the core systems. And there’s places where the APIs were pretty noisy and we needed to reduce the noise level or we needed to extract components into another component so that you could scale. ETCD in particular actually could be the main bottleneck to that kind of scale.

[00:51:14] And so figuring out how to run ETCD really well is a core part of figuring out how to run Kubernetes really well at scale. I mean, I don’t think it’s that different than learning how to run a database or anything else like that at scale. Large scale is just weird and you just have to run it, see where it breaks, figure out how to fix it, rinse and repeat. I do think what’s interesting is that while AI training as an example, is a really large cluster, large scale kind of thing, I think by virtue of being in the cloud, a lot of our users actually have much, much smaller clusters, but lots of them.

[00:51:57] So hundreds or thousands of clusters where each cluster itself is a little bit smaller. And I think that’s not something we anticipated because we came from a world of physical data centers where you only want one because you don’t want to have to set it up a bunch of times, you just want to set one up for the entire data center, call it a day. But because the cloud, because aks you press a button, pops up in two minutes, it’s really easy to get yourself a cluster.

[00:52:25] So people create lots of clusters. I think we’ve also invested a lot in the Kubernetes community and in Azure as well on managing lots of clusters. How do I manage clusters at scale? I think one of the jokes we spent a lot of time talking about containers is replacing snowflake servers. Not Snowflake the company, but specially handcrafted servers. And now we just have a bunch of snowflake clusters.

[00:52:56] So the VMs all look the same, but the clusters are all weird. So we have to provide people with tools to make sure that the monitoring software is the same on all of them and that all of the Kubernetes versions are the same and all the admin users are the same and all this kind of stuff. So that’s another aspect of scale out that I think we didn’t anticipate that we had to go and build, which is number of clusters as opposed to size of cluster.

Ryan:

[00:53:21] I always hear in the news that the anticipated scale is even higher than today’s unprecedented scale. And I see people are purchasing GPUs like crazy. I’m curious, is there any upper bound where Kubernetes just cannot handle that load? Let’s say you 10x it from where it is today. Is it going to break down at some point and you need something more custom?

Brendan:

[00:53:51] Well, I mean, I think it all comes down, it comes back to the storage layer because everything again, because there was this design decision that everything routes around the storage layer. Everything else is basically horizontally scalable. So you have more API requests coming in from more nodes. Well, you just need more API servers. You want to do scheduling faster. Well, you need to just have more schedulers.

[00:54:21] Everything else more or less, you can just horizontally scale out. It’s the storage layer that is the bottleneck and that’s where the work comes. And so you want to say go 10x up. Well, you’re going to have to probably figure out if you can make etc scale that way or if you need to replace ETCD with something else that has the same characteristics but can operate at scale. So I don’t think there’s anything inherent in the design that would prevent it.

[00:54:51] But obviously there’s a famous quote that every time you change an order of magnitude, the problem moves. And so I think that’s really true. Every time you increase by an order of magnitude what you thought was the main problem is going to become easy and then the problem moves somewhere else. So you were network constrained, now you’re CPU constrained, you were memory constrained, now you’re network constrained.

Ryan:

[00:55:12] Yeah, that’ll be cool to watch how. Because it seems like everyone wants to.

Brendan:

[00:55:16] Yeah, I think it’s definitely the case that people continue to try and push the limits of scale. But I think like anything else, like when there’s motivation, people go and figure it out. As long as there’s not something inherent in the design.

00:55:31 — Reflections on getting a PhD

Ryan:

[00:55:31] At the last part of this conversation, I just wanted to reflect over your career a little bit, maybe ask you a few questions about things. And you mentioned that you had a PhD in robotics. And I hear a lot of people say they don’t recommend PhDs. Some people do. Curious what your take on getting a PhD is.

Brendan:

[00:55:49] Yeah, that’s probably like if I had to have a top 10 questions or top five questions that people ask me. That’s definitely in the top five top 10 questions. And I guess I’ll answer it with two different stories. One story is that at one point in my career I ran into a guy, same company, this guy, who I’d went to undergrad with and he’d gone off and done startups and done the tech industry thing and ended up in the same company that I was working at and I’d gone off and done my PhD and come back into the industry and we were at the exact same level.

[00:56:25] We graduated the exact same year, same degree and we were at the exact same level in this company. And so I guess that’s one way of me saying like, eh, it probably doesn’t matter. It probably doesn’t matter one way or the other. But I’ll also turn it around and say, hey, I had a lot of fun, right? So I had a lot of fun doing a PhD in robotics, so that’s worth it to me. And then two, I think I learned a lot about how to, I think from the PhD and my PhD advisor, I learned a lot about how to write and put my ideas forth in both written and presentation form that you don’t necessarily learn in the industry.

[00:57:10] And I think that benefited me. I think that benefited my ability to argue. And we talked about that six month period where we were arguing for why we should be allowed to open source this thing. I think the skills I learned in terms of presentation and in terms of writing benefited me during that time and have continued to benefit me. And then I think when I went out as a professor for a couple years teaching CS101 and having to explain stuff to students who didn’t really know anything about computers, I think really helped me organize the initial parts of the Kubernetes project so that somebody could learn about Kubernetes because people were coming in and being like, what is a container?

[00:57:54] What is orchestration? How do I do this? There’s a lot of just teaching that you had to do. And I think that experience as a professor thinking about how do I teach students something really helped me do a good job with teaching Kubernetes to people. And so I think those things were really beneficial. And so I guess there’s the two different arguments, which is one is it doesn’t matter. The other is I learned a ton of stuff that I think was pretty useful to my career and I had some fun earlier.

Ryan:

[00:58:25] You said top questions that people ask you and I’m kind of curious, what’s the top question?

Brendan:

[00:58:31] I think the one, the other one that I get a lot is how do I know what I should learn? Like a lot of, especially when I talk to the interns or the first couple years, a lot of the questions are revolving around like AI seems really hot right now, but I’m really interested in systems, like should I go learn AI because it’s hot or should I learn systems because I think systems are interesting. I actually kind of don’t care what you learn, I care that you’re learning.

[00:59:01] And so the most important thing is to find something that you’re excited about and energized about because you’ll do that instead of watching YouTube. So if you’re not excited about AI, well, you’re probably not going to do a very good job learning AI, which means that you’re kind of going to waste your time. But if you’re really excited about systems, you’ll probably put a lot of passion and energy into it.

[00:59:24] And we still need systems engineers. I think that’s a pretty popular question. I think there’s a lot of, I sense anyway, a lot of fear of making the wrong decision. And I always tell everybody, like, there was no plan. I’ve never had a plan for my career, like never, ever, ever. Like, I’ve always just chased after things that I thought were useful and were fun and interesting and you know, obviously like, that can work out badly for people.

[01:00:00] I’m sure it’s good to have a plan, probably for some people. But like, I also want to make sure people understand that like when you look back, sometimes the things you think were mistakes or dead ends, like actually were critical things that taught you stuff. And so worrying about, did I choose the wrong thing? Am I going to choose the wrong thing? As long as you’re learning, you’re probably doing okay.

01:00:22 — The inevitable trajectory of software is death

Ryan:

[01:00:22] I mean, what you’re describing, it reminds me of, I don’t know if you’ve seen that Steve Jobs commencement speech, but he literally says exactly that you wrote down somewhere. I don’t fully remember where you wrote this down, but I have this in my notes. It says the inevitable trajectory of software is death. And I just can’t imagine Kubernetes dying. But how do you see that happening and what do you think about that if it did?

Brendan:

[01:00:50] I mean, I definitely stick by that statement. Although I think that the sentence before I said that was, you really should never fall in love with your software because the inevitable trajectory of software is death. Which means don’t stick with it, don’t stick with it past when it’s dying. You should always be willing to throw away stuff. Just don’t stick with it just because you wrote it. You should always be willing to throw it away.

[01:01:19] But obviously I think if you look historically across the industry, it’s true too. And quite frankly, even within Kubernetes, the source code that I wrote has been rewritten a number of times over the 10 plus years history of the project. So what does it look like? I mean, I think it looks like something coming along that achieves similar things, but easier with less complexity and more utility. And I think that I can imagine what that looks like.

[01:01:57] I think some of these natural language stuff, if you could actually really get it to be an interface that worked 100% of the time, like obviously it’s way easier to come in and say I would like a reliable web service than it is to say YAML, YAML, YAML, YAML, yamile. You know, I think sometimes I think it’s sort of true two different trajectories. Like sometimes the trajectory is. It goes away, sometimes it just becomes so hidden that nobody sees it, right?

[01:02:21] Like underneath Linux there’s, I mean, excuse me, underneath Kubernetes there’s Linux and underneath Linux there’s a processor. But you know, people don’t pay much attention to that and there’s a lot of attention now on AI and underneath a lot of the AI is Linux. But it could be that people focus so much on the AI that they forget about the Kubernetes part. And I think that’s happening already. Honestly, like, I feel like if I look at the volume of changes and things like that, like I think it’s, you know, I think we’ve sort of plateaued in terms of like the amount of change that’s driving through the system.

[01:02:54] Stuff needed to support AI is kind of like the exception to that category. But you know, I’d be shocked. I mean, I guess put it this way, let me take the long view and say in a hundred years, is Kubernetes still going to be running? I’d be pretty surprised, right?

Ryan:

[01:03:13] No way.

Brendan:

[01:03:14] It’s hard to imagine, right? That would be true. I mean, I don’t know, we haven’t had computing systems for long enough to maybe know for certain. And there are things that we still use. I mean, there is some stuff that we still use from back then. Plugs are still the same shape, ish, stuff like that. So maybe, but even like something like X86, like if you’d asked me six years ago and said, Is the X86 processor going away?

[01:03:40] I’d say like, well, maybe on, I mean, obviously on mobile it did, but like in the server maybe not. But now two things have happened. One is all the processing is on GPU now and two, ARM64 is now a pretty important platform on the server for energy usage and other reasons. And so it’s pretty dangerous to predict the future because it has a tendency to show up sooner than you predict or longer Than you predict too.

[01:04:10] Right. Self driving cars. I’ve heard self driving cars were five years away for the last 15 years.

01:04:16 — Top book recommendations

Ryan:

[01:04:16] Yeah, me too. I don’t know if you read books for career’s sake, but if you do, is there a book that impacted your career the most?

Brendan:

[01:04:27] Well, I mean, I would say early on the book that impacted my career the most was a book. It was the Gang of Four book. Was it Software Engineering Designs and Patterns or whatever? Like it’s a software engineering book.

Ryan:

[01:04:38] I see it. It’s Design Patterns Elements of Reusable Object Oriented Software.

Brendan:

[01:04:43] Yeah, there you go. So that, like early on that was a very influential book. It’s like a late 90s or mid-90s kind of book. There’s a much more recent book called Leadership on the Line that as I become sort of a large org leader. That’s. I really like that book. And then there’s this. What’s this book? It’s called, I think five Dysfunctions of Teams. I think that’s a really good book too, from a, like, how teams operate perspective.

Ryan:

[01:05:07] If I’m understanding. If you’re an. If you’re an engineer, check out that first book. If you’re a manager or a leader, check out the second two books.

Brendan:

[01:05:14] Yeah, that’s probably about right. Yeah, I think that’s right. And it’s an evolution over time. Right. Maybe you’ll do both.

01:05:22 — Advice for his younger self

Ryan:

[01:05:22] Last question for you is if you could go back to yourself when you just graduated college and give yourself some advice, what would you say?

Brendan:

[01:05:31] Keep better notes. I feel like there’s a great MBA thesis or a great book in the whole Kubernetes Journey and Beyond. And like, I just don’t have enough notes to do that, to write it down. You know, we went through a lot, like a lot of different stuff happened and I remember some of it and I don’t remember a lot of it. And I would have been nicer if I’d kept better notes, I feel like.

Ryan:

[01:05:57] Right, well, you got all the code there. Maybe an LLM can parse it or something.

Brendan:

[01:06:02] Yeah, it’s not so much about the notes part. It’s not so much the code part. It’s like all of the, like the stuff you were talking about, like all the partner discussions, all of the, like, interpersonal stuff and you know, all that kind of stuff. And like, I remember a lot of it, but I don’t remember all of it. Cool.

Ryan:

[01:06:18] Well, thank you so much for your time, Bren. I really appreciate it.

Brendan:

[01:06:21] Yeah, for sure.

Discussion about this video

User's avatar

Ready for more?