0:00
/
0:00

Meta Senior Staff (IC7) Eng's Honest Demotion Story

Learnings on what level made him happiest, Meta vs Google culture, navigating the demotion

In this episode, I talked to Igor, a senior staff engineer who has worked at Meta, Google, and Cruise. We discussed his experience of wanting a demotion at Meta and the challenges he faced in that process.

Check out the episode wherever you get your podcasts: YouTube, Spotify, Apple Podcasts.

Timestamps

00:00:37 - Why he wanted a demotion

00:07:32 - Why Senior Staff at Meta was different

00:16:01 - Meta vs Google culture

00:19:09 - Downleveling at Google

00:23:17 - Why he’s willing to be transparent

00:25:11 - Best quality of life eng level

00:30:42 - Senior Staff promo at Google

00:42:27 - Mentorship stories

00:43:11 - Biggest career regret

00:46:46 - Advice for younger self

Transcript

00:00:37 — Why he wanted a demotion

Ryan:

[00:00:37] What makes you willing to share? Here’s the full episode. You mentioned in a post, a pretty famous post that I’ll link so people can see that Meta didn’t have a process for demotion and you were looking for demotion. Can you talk about what led you to wanting a demotion at Meta?

Igor:

[00:01:02] Yeah. So when you join a big company, Meta is not the exception here, but you’re joining a big company at a very high level, like senior staff, there are certain expectations from your performance, right. Come performance review like Meta, it’s called psc, you’re being judged against other senior staff engineers in the organization. Right. And Meta, as of a year ago, they started to lay off people.

[00:01:33] They started doing the Amazon thing, essentially, let’s lay off the 10% lowest performance. So what it means is that you have at most one year to ramp up to be comparable in performance to other old timers in the company. And also, what does it even mean to be an E7 in a company like this? Right. You need to be an expert, a big expert in the field. You need to know a lot of people, you need to understand the infrastructure that you’re working with very deeply.

[00:02:10] You probably need to be easily familiar with any piece of code that you, your team is working with. And also you need to know all the surrounding, like what all the surrounding teams are doing. And you need to know all these people, like all core people around you, and they need to know you and they need to trust you. Right. That’s a very difficult thing to achieve within like a relatively short period of time.

[00:02:37] I initially saw that, okay, when you were so again joining Meta day one, I know less than an intern sitting next to me who spent maybe two weeks, who was hired two weeks prior to me. So I start from level zero and then hopefully you be able to get to something that L3 would perform. So you can take some very easy task that under supervision from other folks, you can kind of promote yourself to a junior engineer.

[00:03:12] Then you do some more stuff, you learn additional things. You can get promoted to a less junior engineer like E4. And then slowly you ramp up to more and more higher levels. Essentially you start from zero and you climb the ladder as fast as you can. Yeah. So you ramp up yourself to the level that you supposed to achieve and you know, the more senior you are, the more levels you need to jump through.

[00:03:41] And I believe that within this year and two months that I spent in Meta, maybe if I’m generous with myself, I, I maybe achieved like E6. So I, I don’t feel like I reached E7.

Ryan:

[00:03:56] What did you see that made you think you weren’t living up to the expectations?

Igor:

[00:04:00] It’s a self judgment crossly that, you know, I. This scale where we call numbers, it’s not like quantum states. Right. There are many things in the middle. Right. It’s a continuous scale so it’s very hard to say where you are on the ladder. But again, comparing myself to other senior staff engineers, the work that I knew, I felt like if anybody going to be on a chopping board, that’s going to be me.

[00:04:40] That was one thing about this ramp up process. And the second thing I also noticed that I really enjoyed doing the coding stuff. Just sitting down, debugging things. That’s what I really loved about the job. So when during my ramp up, when I was around like E5, E6 territory, that’s what I really wanted to do. Right. I felt like as a, you know, as someone who was programming from age of 12, I really like coding, debugging, designing, mentoring more junior people.

[00:05:18] But like a senior staff engineer, it’s more than that. It’s like a leader who you know, spends most of the time in meetings and design docs and touches much less code. And it’s a slightly different type of work. And just by going through this process I realized that actually I want to be in E5, E6 territory. And then I asked the management, can I actually drop a level? And I understand that for the company it’s a difficult thing to implement because I already have some granted stock, I have a certain compensation package and so how do you like, you know, how do you execute that?

[00:06:06] It’s not a well polished process. I know that they can switch levels when you go from one title to another. So let’s say you were a director and you want to become an ac. So they do have some of that like demotion process and frankly I don’t know how they do it. But yeah, like Staying in the same job category and dropping level, just like they said, it’s not possible.

Ryan:

[00:06:36] I see. So when you asked if it’s possible for demotion, your manager went to HR and kind of the result was just, it’s just an impossible thing to do.

Igor:

[00:06:47] You know, I asked the question and I got the answer no. Maybe if I pushed harder, if I like went to talk to the, you know, VP or something, they would have made it possible. But at the same time I, you know, I also felt like I’m too far outside of my comfort zone. And before coming to Meta, I worked a little bit at Cruise, and before that I worked at Google for 14 and a half years. And that was my comfort zone.

[00:07:18] I just had this option that like the, the easy way out essentially, right. It’s had the recruiter who was contacting me periodically, like I already to come back to Google. So this time I just said, yeah, let’s talk.

00:07:32 — Why Senior Staff at Meta was different

Ryan:

[00:07:32] When you were at Google and at Cruise, you were at these senior levels as a senior staff engineer. What’s the difference that makes it so that you can perform there, but not at Meta?

Igor:

[00:07:45] So when I came to Google, I started from the lowest E3, like L3 level, so junior software engineer. And slowly over the many years, I raced through promotions, many times failing, like they could promote it, and then, you know, a year or two later asking again and then climbing the ladder. My last promotion to senior staff happened because I was able to accomplish something big in the project that I worked.

[00:08:18] I worked in that project for several years, so I was already like an expert. I knew everybody, right? Everybody knew me. And I was able to build something that I can be proud of. But again, coming then, then leaving Google and going to Cruise, it was also very difficult for me to ramp up there because you’re coming starting from zero. And the first year was very difficult for me in Cruise. But over time I found like a, a relatively safe, like, you know, my found some comfort zone there until Cruise as a company went bad.

[00:09:01] Like they had the accident and then they had layoffs and then people were just quitting all the time, like losing people. So I decided that I want to leave as well. But before, before the accident at Cruise, I actually was quite happy. I was able to do some coding. I was able to do like mentoring of other folks and I was able to also like act as an ubertiel a little bit like, you know, guiding the team on things that, you know, they need to do.

Ryan:

[00:09:35] You successfully ramped up the senior staff at Cruise and then at Meta, it was a bit harder. What, what were the differences between those two ramp ups in your opinion?

Igor:

[00:09:45] Cruise was a smaller company, so less infrastructure to learn. The team is like less crowded space I would say. Like you can easily carve out space for you to grow into. Like, you know, like there is lack of people and you can say, okay, well this project needs someone to work on and you can just go and work on this stuff. At Meta, I felt like it’s quite crowded, like especially among the CGO folks.

[00:10:20] It felt a little bit like they have too many, frankly. At least in the Orchid where I worked. So I don’t know about all of Meta, but it felt like the space is a little bit crowded. You need to find scope for yourself.

Ryan:

[00:10:38] Is there anything that you would have changed in your onboarding process that could have made things different?

Igor:

[00:10:45] Probably. Again, I didn’t switch companies that many times in my career. I started at Google, spent so many years there, then I only switched crews and then Meta. So I’m not very well experienced in switching companies and essentially this whole ramp up process is new to me. And I also felt like the management. Meta also doesn’t know how to onboard senior folks because most of the senior folks in the company, they grew within the company to those senior levels, even if they came to this specific org from another org within Meta.

[00:11:30] But at least they already have the meta knowledge. They know how to run the job in the cluster. Right. I think I spent a lot of time just reading docs and trying to build this foundational knowledge. I think I spent too much time doing that. It is much better to just get your hands dirty and just try doing stuff and then doing those things in parallel. You build something, accomplish some tasks and also you’re learning in parallel.

[00:12:08] And I did more of the learning part, less of the building. Yeah.

Ryan:

[00:12:13] Is there something that you wish was done or maybe something that would have helped you from the manager side?

Igor:

[00:12:20] Just guiding me, right? If they say like this other person just ramped up like you know, three quarters ago and here’s what they did, I would have followed that recipe. We just didn’t have the recipe. Like I don’t have any complaints against my management. Like they, you know, they tried to support me the best. They just also lacked this expertise of like ramping up with senior folks, you know. So it was a learning experience for all of us.

Ryan:

[00:12:52] I’ve heard in conversations, some other people as well, that I guess job mobility as a senior IC actually becomes progressively riskier or maybe scarier, I guess because you get so used to your existing org and all of that. Is that something that you’ve seen in other people or like peers as well? In that senior IC job mobility is a lot tougher.

Igor:

[00:13:21] I’ve seen folks at Google that left Google and then came back a year or two later saying that it didn’t work out well for them. I. Yeah, it’s, you know, people don’t share things like that openly usually, so it’s much harder to, to hear those stories. You hear the success stories, you don’t hear the failures. Also, I think that if I kept trying, like instead of going back to Google, I went to, I don’t know, some other company, I would be much like, you know, more experienced in ramping up.

[00:13:59] And then maybe, maybe if you do this often enough, it becomes a habit and then, and then you can do it easily.

Ryan:

[00:14:08] You mentioned that you, you thought you weren’t meeting expectations based on your own judgments. I’m curious, did you ever get any feedback from your manager or anyone saying, hey, you need to do more for your lawful’s expectations?

Igor:

[00:14:22] Yeah, I did get feedback. I just, you know, don’t necessarily want to openly share all of that. But yeah, I was so like just the few months leading to my living matter, I was working on a project that initially I thought it will take me like two weeks to accomplish. It was like a small thing that I thought would be easy to do and turned out to be actually a lot more involved and a lot more complicated.

[00:14:51] And I felt like even if I finish it, which I almost did, I almost brought it to completion, even if I finish it and launch it, everything successful, it is still not an E7 level project. You cannot still justify my level with the project being completed. Probably it’s my fault in which projects I pick, you know, or how I underestimated the complexity of the thing.

Ryan:

[00:15:24] Because you’re going back to Google, are you going back to an org where you have all the existing context and relationships and all of that?

Igor:

[00:15:34] No, no, everything will be new to me. New people, new infrastructure, new everything. But at least I know, you know, I know how Google operates. I know the, you know, the culture. There are differences in cultures also between the companies and I think that meta also like in terms of internal culture at Meta, I didn’t feel it like it fits me the best.

00:16:01 — Meta vs Google culture

Ryan:

[00:16:01] What’s the biggest cultural differences between Meta and Google?

Igor:

[00:16:05] Meta tends to set up very ambitious goals. They give you like oftentimes they will give you very arbitrary deadline saying like, okay, this project, you need to finish it within by September 15th, whatever, like one month from now. And then everybody works hard. There is a lot of pressure. You need to constantly send the updates to the leadership, how the project progresses. Comes the date, the deadline, the project is still unfinished and it just keeps dragging on.

[00:16:38] And everybody is fine with that. Again, I don’t know. I haven’t seen all of Meta. I’ve seen the specific Org where I work and it feels like, so then what was the purpose of setting this aggressive deadline? Right. Then you can do it once. But after five times going through this project, through this artificial pressure with non realistic goals, then people just say okay, you know what, 5:00pm I’m going home.

[00:17:10] I’m not going to try to, to, to work hard because I, I know that there is like this, this whole pressure is artificial. Right. I think Google is much more reasonable in that regard. Like if, if there is a deadline, it’s probably for good reasons and, and people would like, would work hard, but usually there will not be pressure. Again, talking about Google, like as of 10 years ago, I’m not sure that it’s still true today.

[00:17:39] But you would be pressured to fix something or to accomplish something when there is really exceptional case for that. You wouldn’t be pressured to work under pressure for years.

Ryan:

[00:17:57] Yeah, I mean I get the sense that the industry as a whole is kind of becoming a little bit more intense when it comes to execution and deadlines. I’ve heard some people saying that Google as well has felt a little bit of pressure too, but hard to say. It depends on the Org, I’m sure.

Igor:

[00:18:15] Yeah, it could be. And again, I didn’t feel personally much pressure at Meta, but just talking to other folks, seeing how they work and operate and as I said, there is this cultural thing where the leadership wants updates and everything and then they’ll, the people on the ground, they actually kind of dismiss it essentially. Like it’s, it’s like they, you know, it almost feels like a elementary school, like where the teacher is yelling but kids are still playing.

[00:18:47] Like if you’re yelling too much, it’s, it’s like it stops working essentially if you, if you constantly put pressure on your people, it just doesn’t work anymore. Like you say. So what do you do next? You start laying off people? Yeah, it might work to some degree, but again people adjust to everything.

00:19:09 — Downleveling at Google

Ryan:

[00:19:09] That’s crazy. You mentioned that you, when you reached out to Google too, you, you explicitly asked for kind of a demotion or going back as a L6 when before you were an L7. Was that a challenge or was that just a very straightforward process with the recruiter?

Igor:

[00:19:25] It was a challenge for the recruiter to also make this happen because they have a process for bringing people in back at the same level or even like level up. They don’t have a process for bringing people at a level down, but they made it possible for me. Again, I can totally see how like a person comes back and says like, I want to be a level down. It’s like there might be some red flags. There may be, you know, things that I’m not telling.

[00:19:53] You know, it’s a gamble for a manager to hire a person like that. Also just by giving me offer at the level down, they understand that it will be, they will not be able to match my compensation, you know, the previous place. So like would they even accept it? Right. And so I had to like and assure my recruiter that yes, I’m happy to accept the offer at the lower compensation. Like, you know, a lot of people will not do something like I did because it’s like a significant drop in compensation.

Ryan:

[00:20:33] You mentioned stuff that people might be hiding from the recruiter. What comes to mind when you, you mentioned that?

Igor:

[00:20:41] I don’t know, maybe you, maybe you did something in the company that, you know, you like a fireball offense or something. Like, like I, I don’t know, like if, if you’re a manager and somebody comes to you with like, you know, saying like, yeah, previously I was this level, now I want to be level below, please hire me. It does sound fishy. Like why don’t you like what’s wrong with you? Like you, like, you know, there are so many companies out there.

[00:21:12] Like why, why would you come back to, to this company? Like why don’t you try something elsewhere? Like I, I don’t know what questions popped up in my hiring manager’s head when he saw this.

Ryan:

[00:21:25] Google’s current process, do they do host matching or what was the. Like did you meet up with the hiring manager beforehand or just through the recruiter?

Igor:

[00:21:34] The recruiter sent me a few openings and asked me which one. Sound interesting? I spoke to a few hiring managers. One was sound like sounded like the best match for me. And then I spoke to a few engineers on the team. Then I spoke to the manager’s manager and that was it. So another nice thing for me is that they didn’t require me to re interview.

Ryan:

[00:22:03] Oh, interesting. But it had been like three years.

Igor:

[00:22:07] Maybe over yeah, three and a half years. So yeah, they didn’t ask me to re interview actually that was another Interesting thing that they said, like, if you’re coming back as E7, you definitely don’t need an interview because you were already L7. But if you’re going to be like L6, then you might need to do a coding interview because who knows? If you wrote code that you know while you were out.

Ryan:

[00:22:35] That’s kind of funny.

Igor:

[00:22:36] Yeah, I got an exception from coding.

Ryan:

[00:22:39] I see. That’s funny. You got promoted out of competence in the lower levels.

Igor:

[00:22:46] Yes. But it’s understandable. When people go to higher levels, they write less code and maybe they get rusty. Who knows?

Ryan:

[00:22:55] It’s kind of crazy that you could boomerang after it had been three years or more.

Igor:

[00:23:01] Yeah. To be fair, I don’t know if that’s a general Google’s policy or it was just a special case for me. I don’t have any friends that did this recently to understand if they did it for anybody else.

00:23:17 — Why he’s willing to be transparent

Ryan:

[00:23:17] You mentioned earlier that a lot of people don’t share this kind of stuff about the motions and these types of things. And I’m curious, what makes you willing to share?

Igor:

[00:23:29] I’m a generally very open person. Like, I like sharing my life experiences, positive and negative, with friends and, you know. Yeah, I’m generally quite open. And also I’m quite confident. That’s another thing, like, you wouldn’t share if you worried about your career, if you’re worried about. There are many things that can go wrong when you share stuff like this. I’m very privileged, essentially.

[00:24:03] I don’t need the work visa, so I can live in this country without being attached to a certain employer. My spouse has health insurance, my kids are already grown up and off to college and don’t have mortgage on my house. So, like, I can afford not to work for a few years. I can. You know, I don’t. I’m fine with, like. Let’s say I said something in this chat right now and Google says, we don’t want you for some reason.

[00:24:40] Like, I’m totally fine. I will just. So that’s a position of privilege, right?

Ryan:

[00:24:48] Definitely. I mean, you’re free. Yeah. You’re entirely free. You’re your own person. Which is awesome.

Igor:

[00:24:53] Yeah. So that’s a position of privilege that many other people wouldn’t have. So if another person asked me, like, should I post about my challenges at work? I would probably say no, unless you feel so safe and secure.

00:25:11 — Best quality of life eng level

Ryan:

[00:25:11] And you mentioned that Google, you were asking for promos, you went from their lowest level up and you were pushing and pushing for promos, and that’s A very different mindset from now, which you are kind of pushing for the motion. I’m curious, is there did something major change that made you not motivated to go for promos anymore in your career?

Igor:

[00:25:36] Essentially, if, if you are in a. A senior Engineer right at E5 or E4, you don’t even know like what’s the life of like two levels above you looks like, right? You, you probably, you can imagine what, what happens at like one level above you. But, but it’s, it’s very hard to see like what, what does a principal engineer do? Like most people don’t know and I don’t know by going through this like I realized that my happiest time was when I was like working in a relatively small team, doing a lot of coding, debugging, designing, mentoring more junior folks.

[00:26:20] That’s, that’s where I felt the happiest.

Ryan:

[00:26:22] Let’s say money is not a constraint at all and it was just which engineering level has the best quality of life in your opinion, which one would you say?

Igor:

[00:26:36] Senior engineer like E5, L5? Yeah, that’s probably least pressure. You’re still shielded by probably. If you’re working on a team, you probably have a TL who is 6 on the team, who sends all the updates to the upper levels and shields the team from all this stuff. You have the management, also the lower level managers who shield you from all this stuff. So you can just do the stuff that hopefully you enjoy what you’re doing.

[00:27:16] Like, not everybody enjoys doing work. Like a lot of people do it just for money. Personally, I was programming since I was 12 years old, so I really like this stuff. I got into this job because I love it and it happens to pay well, but I would do it even if it wasn’t paying well. Like a lot of people come to software engineering because just because that’s the Monday thing.

Ryan:

[00:27:41] So then if it was easy, as easy to do, would your ideal situation be two levels of the motion?

Igor:

[00:27:50] That’s too extreme. I think I can still enjoy being like E6, I quite comfortable with and maybe I haven’t been E7 long enough to, to like to get comfortable in the, in the, in the role. Like maybe if I just kept doing it for some time longer, I would have started enjoying it. Yeah, like, you know, one thing is like was I was. I really like, you know, imposter syndrome is something that everybody has regardless of what your level is.

[00:28:26] I’m sure Elon Musk has imposter syndrome. So, you know, I’m questioning myself like Was I really, you know, qualified to do to be E7? And my answer is like partially yes, but not fully. It’s like a multidimensional thing. And in some dimensions I probably was good and then some dimensions was not as good.

Ryan:

[00:28:49] You mentioned working with other peer ICs that were also really high level is there skills that you saw that they had which would have closed the gap for you personally or something that you thought made them so strong.

Igor:

[00:29:04] It’s often a mistake to compare yourself against a group of people. Like that’s what gives you a lot of imposter syndrome where you’re saying like, oh, those people around me, they’re so smart, they are so good at talking, they’re so good at doing presentations, they’re so good at communicating, they’re so good at leading. But they’re turns out that it’s like one person is good at this, another person is good at this and you’re comparing yourself against a team of people.

[00:29:32] And so that’s always, you need to be careful not to make that mistake. So yeah, I think given enough ramp up time I would have been able to reach full productivity and being useful to the company. At E7 I just felt like I, like I was not enjoying the ride essentially and I didn’t have to like. And I debated a lot before making the decision to leave, right. I was talking to friends, I was talking to family and pretty much everybody was telling me like, are you crazy?

[00:30:07] Like you’re not laid off yet, nobody showed you the door, why would you do something crazy like that? And even if you’re leaving, like you don’t enjoy it, why won’t you try E7 elsewhere? Like why do you need to go level down? There is a lot of peer pressure and my answer to that was just like, yes, I can try that, I can do that, but I don’t want that. I’m capable of running a marathon, but I don’t like running a marathon.

[00:30:37] I don’t like running. Why would I do that?

00:30:42 — Senior Staff promo at Google

Ryan:

[00:30:42] You mentioned that you kind of rose through the ranks at Google. I’m kind of curious, what was the project that got you promoted to senior staff at Google?

Igor:

[00:30:52] It was a project in ads. ADS builds a lot of machine learning infrastructure because the ranking of ADS is essentially a recommendation system. And I was working on this machine learning infra and when they came to the team, they were doing training on CPUs, not on GPUs, but on CPUs. And TPU was a new thing, new hardware that Google was developing. It was about to be released like maybe a year or two when I joined the team and my role was to make sure that this adds training infrastructure can run on those TPUs.

[00:31:42] And additionally when you train machine learning models the scale matters and the utilization matters and there are lots of nuances that can like, you know, your model may be training but things go wrong. It’s online training which is also like something that most people don’t know. Like LLMs don’t usually train in online training mode. So there are lots of experience on the team. Like the whole infrastructure that existed within that team was built for the CPU training and it was polished over, I don’t know, over 10 years.

[00:32:30] It was polished to be super reliable, super like you well monitored, the unit tested, everything is like very rigid, very polished. Right. And here I am coming and building a completely new piece of infrastructure to run on TPUs and it has to be as good as the old, you know, as polished, as monitored, as tested, as reliable. So it was a big project and you know, it succeeded. Yeah, it’s something to be like I was proud of what I built and so that got me promoted.

Ryan:

[00:33:08] What was the operating model of that project? Were you TL or were you kind of writing a lot of it yourself or how would you describe the execution?

Igor:

[00:33:18] Yeah, initially it was like a quite exploratory project. So it was just too pure people. It was the TL plus one person building like a prototype plus a lot of people from other teams helping, like the TPU team, the compiler team. There are lots of other people who were helping us succeed, like the TensorFlow team. We were writing TensorFlow back then, you know, so it’s also like where you need credibility.

[00:33:49] That’s why knowing people and them knowing you is very important, like trusting you. If you don’t have that, you cannot succeed. So it’s a really cross team collaboration, very big project. And again hardware like this, you need to decide how many of these chips you want to order and they will be delivered like 18 months from now before and right now you don’t have any. It’s not like you can try out and, and see.

[00:34:22] So like, and those chips are super expensive, right. So it’s, it’s a very risky thing to do for the company. And yeah, it’s like a lot of risk taking a lot of like leadership skills that needed for, for doing stuff like that.

Ryan:

[00:34:44] So you were on the, well kind of the product team, not actually product, you were machine learning infra for the ads Org and then all the underlying infra teams were helping collaborate with you. You mentioned compilers, maybe some training infra other infrastructure teams.

Igor:

[00:35:04] Yeah. And once the things started working a little bit, the team grew quite a bit more. A lot more people joined the effort and like started like working polishing and then again like first generation GPUs comes in and then they already tell you we already designed, the next generation will be coming next year. So you’re already rushing to adjust your infrastructure to the next thing and that’s like pretty much every year you upgrading your infrastructure.

Ryan:

[00:35:34] How did you test the initial builds of this infrastructure if you didn’t have the chips to begin with?

Igor:

[00:35:41] A lot of things were just about like input reading and processing, which you can do without having the chips. The difference between CPU training where you have the CPU and Intel cpu, they can do floating point arithmetic and integer arithmetic in parallel. And the floating point arithmetic is relatively slow. So input reading and processing was free on the same cpu. Going from that mode of operation to you have a relatively weak computer with eight TPU chips.

[00:36:27] Each one of those TPU chips is like order of magnitude faster than the CPU that you had before. If several orders, I think it’s like two orders of magnitude faster. Now there is no chance you can do the input reading on the host of those TPUs. So you need to build the input processing pipelines and everything. But that’s something you can test outside of without having TPUs. Actually we did many times we did this test where people say next tpu will be 2 times more performant than the previous one.

[00:37:08] And a nice test to test your infrastructure is to say like okay, we will just remove all computation from our models. Just let’s see how fast we can feed the data in, take the outputs out and process this whole thing. Like how fast your system can run. If the TPU was infinitely fast. And that’s a nice test to do to see that you don’t have bottlenecks. And you do find a lot of bottlenecks all the time.

[00:37:39] Like things that previously were thing that schedules which data to train on previously was never an issue. And then suddenly this is your bottleneck in this model and many things like that.

Ryan:

[00:37:58] So if I’m understanding correctly, the the TPU consumes data orders of magnitude faster than the CPU one. So everything around the processing unit, like the data loading and maybe I don’t know, the scheduling and all the other things that are around the TPU needed to be scaled and tested. And that’s a lot of what you did to get promoted is that Right, Yeah.

Igor:

[00:38:23] Plus, you know, there are always like funny things that you get where like certain resources are more available than other resources. For example, you’re training on some insane amount of data, petabytes of data. This data is stored on spinning disks because there is not enough SSDs in the world. So spinning disks, if you look at the history of spinning disks, they’re getting bigger in terms of storage space, but the speed at which they rotate is staying constant and the speed at which the head moves staying constant.

[00:39:01] So if I gave you previously, let’s say we go back in year 2010, a typical disk would be like maybe 200 gigabytes and it has a certain throughput of how much it can read. Now fast forward to today. The typical disk is like 6 terabytes. So your data center has a lot fewer disks to store the same amount of data, but the throughput is like an order of magnitude less. And you’re coming to the people responsible on building data centers and you say, I need more disks, not for storage, I need more disks for throughput.

[00:39:44] And you usually would hear an absurd thing that says disks are very cheap compared to everything else in the data center, super cheap. But you cannot get them. You know, you cannot easily go and buy like a few thousand disks and easily install them. Like you need racks, you need power, you need, you know. Yeah, it just, it’s similar to how like during COVID we all ran out of toilet paper, right? Like the stores just couldn’t, couldn’t.

[00:40:15] Why would you run out of toilet paper? Right. It’s just because it’s bulky item that the stores cannot easily store on the shelves. So that’s roughly.

Ryan:

[00:40:25] You mentioned having to predict 18 months out how many TPUs you’d need. I’m curious, when you look back on that prediction of how many you needed, did you over order or under order?

Igor:

[00:40:37] Usually under. We tried to be relatively conservative because your main goal is to make money, right? Like with LLMs. If you look at this current world of LLMs, they just order more. Like we want more giga watts of data centers. They don’t even really like. None of these companies is profitable with LLMs, right? They’re all losing money and they all fine with that. But when you have a mature business like ads, you need to be making money on that.

[00:41:09] So if you wasted, if you over provisioned and this goes to waste, then it’s not good. But then it constrains what kind of models you can train. Also models don’t stay the same. So when you’re saying 18 months from now we’ll not be training what we’re training today. But you don’t know what you’re going to be training. The attention mechanism wasn’t in use 10 years ago and suddenly it’s needed. So your chip might not be even capable of doing that.

[00:41:43] So TPUs are quite specialized hardware. So the, they are not as generic as GPUs. And there were certain things that we had to work around a lot of times where certain functionality was very difficult to implement.

Ryan:

[00:42:01] Like what?

Igor:

[00:42:02] Yeah, embedding lookups in certain types of embeddings where essentially the chip is very good at doing matrix multiplication and accessing memory by big chunks of memory. And embeddings are things where you need sparse access, like, you know, random access essentially. And that’s, that’s, that’s difficult to do in the chip.

00:42:27 — Mentorship stories

Ryan:

[00:42:27] Yeah. One thing I wanted to ask because you said you enjoy mentoring others and I was curious, do you have favorite, maybe stories of mentorship or favorite advice that you like to give when you’re mentoring other people?

Igor:

[00:42:40] Well, I once had an intern on the team. I was also not very senior back then. I was like L4 maybe. And I had an intern that I hosted and then later that intern converted to full timer and a few years later he became the manager of the team that he interned in.

Ryan:

[00:43:01] That’s funny. Were you reporting to him eventually?

Igor:

[00:43:04] No, by that time I already switched projects. But don’t underestimate your interns. They can be really, really good.

00:43:11 — Biggest career regret

Ryan:

[00:43:11] When you look back on your career, is there any regret that you have that some people could learn from or could help people avoid?

Igor:

[00:43:20] Not everything was very smooth in my career, but I think it was still like a learning experience. Like I wouldn’t be who I am today if I didn’t go through those periods of time. I was very idealistic when I was younger. Like I believe that like Google is a really positive force in the world. That it’s really like, you know, I would work here just because it’s Google, not because it pays more like, you know, not, not, not because of some like, you know, I really wanted to be in Google because of my personal values and how the company operated.

[00:43:59] And then I got disillusioned over the years that yes, it’s just a corporate, like maybe in the, as of 17 years ago it was really a more different company. But today it’s, it is a corporation just like any any other corporation has positive things, it has negative things. But what matters to the company at the end of the day is the bottom line on the financial reports and the, you know, they will do whatever it takes to get there to increase that number.

Ryan:

[00:44:34] Is there something that led to that disillusionment?

Igor:

[00:44:37] You know, Google had a lot of remote offices, for example, and I was working in the Pittsburgh office, which was a relatively remote office. And they, at the same time, they shut down an office in Atlanta. And it was roughly the same size. And it was quite shocking to everybody. Like, why would you shut down an office with a few hundred people working there? And the answer we got from the leadership, they said, well, we have those big senior vice presidents of the company who decide how to allocate the headcount and where to invest.

[00:45:16] I want to hire in Bay Area, I want to hire in Seattle, I want to hire in this place. And just as so happened that the big lead who was sponsoring that Atlanta office decided to pull out and nobody else was willing to take over the headcount of that office. So it was just like, you feel like, yeah, at the end of the day, I’m just a sell in the spreadsheet. You know, it’s like there was very little empathy that company showed to those people.

[00:45:48] They said, like, yeah, we can help you relocate if you want to do other places, but if not, here is your exit package and good luck. And that was like many, many years ago, before all the layoffs that happened after Covid. And you know, nowadays people much less. It’s much more understandable that the company can lay off anybody just because this project here doesn’t make sense anymore. We’ll just shut it down and let people go.

[00:46:20] Like, this is now a common scene. It was not the case as of 10 years ago.

Ryan:

[00:46:27] Yeah, I remember, I think when Google was a lot earlier, it was don’t be evil. And the culture was very, very set on that.

Igor:

[00:46:37] Yeah, I think they still, like, they still trying not to be evil. But again, the bottom line often drives the overrides that decision.

00:46:46 — Advice for younger self

Ryan:

[00:46:46] I guess it’s true for all public companies. Yeah. And then the last question I’d like to ask is if you could go back to when you just entered the industry or you were working at Google and give yourself some advice, knowing everything you know now, what would you say?

Igor:

[00:47:02] I worked on some projects that didn’t make sense to me in the. Essentially when they, especially when I was more junior, I was working on some projects that why are we building this in the first place? Who needs this? Then a year later, the leadership realizes the same thing and they just shut down the project maybe work on what matters. Ask yourself, does my company really benefit from what I’m doing?

[00:47:36] And if not, then maybe you shouldn’t.

Ryan:

[00:47:38] Be there in that situation. Let’s say you recognized it and then your, your org’s going in that direction, you think it’s useless. But you’re a junior engineer. What, what could you do to kind of adjust your direction?

Igor:

[00:47:53] I mean, talk to other managers. Hey, find another project. There are different, different companies have different treatment of people who want to switch projects. Before Google actually briefly worked at intel, their switching project was almost impossible. It’s much easier to just quit and then reapply versus Google was very fluid. Like you could easily switch projects within Google. So this advice doesn’t work for any company.

[00:48:23] I know some companies like IBM is another example I’ve heard where switching projects is impossible. Yeah, some companies have certain reputations. I don’t know. I’ve never worked at IBM, so I don’t know. But that’s the reputation they’ve got. Within Google, it was easy to switch projects. Within Meta, I believe it’s easy to switch projects. Yeah, I don’t know about other companies.

Ryan:

[00:48:47] Yeah, I think most modern day Silicon Valley companies are inspired by Google Meta, those types of companies. So similar culture on team switches. So awesome. Well, yeah, thanks so much for your time Igor. I really appreciate it.

Igor:

[00:49:01] Yeah, thank you very much. And yeah, I hope somebody finds it useful.

Discussion about this video

User's avatar

Ready for more?