The Peterman Pod

Dropbox’s Former Most Senior Eng: Building Great Systems and Advice for the AI Era | James Cowling

0:00

-2:01:53

Dropbox’s Former Most Senior Eng: Building Great Systems and Advice for the AI Era | James Cowling

Transcript & Audio

Ryan Peterman

May 25, 2026

James Cowling is the CTO at Convex and was previously the most senior engineer at Dropbox. We discussed technical details of his past projects, simplicity vs complexity, and career advice given where AI is today.

Check out the episode wherever you get your podcasts: YouTube, Spotify, Apple Podcasts.

Timestamps

0:53 - Systems work during his PhD

13:05 - Dropbox technical deep dive

21:57 - Why Dropbox migrated from AWS

36:40 - How to do massive migrations

44:31 - Simplicity vs complexity in promos

49:23 - What technical teams should be focused on

1:00:25 - Doing the right thing vs promo hypothetical

1:08:13 - Why he dipped into management sometimes

1:11:36 - Why you shouldn’t lead by example

1:23:23 - How to mentor Senior Staff+ engineers

1:27:30 - Career advice for the AI era

1:37:21 - Why he started his own company

1:46:05 - The most technically challenging work of his career

1:48:10 - How he got involved in Silicon Valley

1:52:16 - Career regrets

1:55:54 - Top technical book recommendation

1:56:36 - Younger self & permanent underclass advice

Transcript

0:53 — Systems work during his PhD

Ryan:

[0:53] Here’s the full episode. First off, your PhD thesis was huge. It’s 156 pages. I didn’t know theses were that long; I thought papers were maybe 10 pages or something like that. It’s its own book.

James:

[1:12] It’s very similar to Spanner ultimately. What’s funny is I did that work; it came out a little bit before Spanner, so I got a citation in the Spanner paper. But then Spanner came out, and everyone kind of forgot about my paper.

Ryan:

[1:27] You know, about the research. If you could kind of describe the problem that Granola was solving and how it solved it, those types of things. We can talk about that.

James:

[1:38] Yeah, absolutely. So my interest in my career has been two parallel threads. One has been abstractions in general, the idea about how to build simple models for complex problems. I find that a very difficult and intellectually stimulating design exercise—how to design APIs, basically. The other has been large-scale transactional systems. I’m a big fan of transactions. A transaction is doing a bunch of things at once, and I think transactions are just one of the most incredible abstractions we’ve invented because they allow us to manage probably the most difficult problem in computer science, which is concurrency.

[2:22] Granola was an algorithm for how to do distributed transaction coordination, particularly for transactions that I described at the time as one-shot transactions. I’m not actually sure if that was a standard term at the time or whether I made it up. A one-shot transaction means you send some code to the server and say, “Run this function, all these reads, all these writes on two, three, whatever nodes at once.”

[2:55] And how to make sure this commits atomically across multiple shards in a distributed system. That was my PhD thesis, and my master’s thesis was on Byzantine fault tolerance, a consensus protocol in the presence of malicious nodes. It focused on how to achieve agreement in state across multiple parties when there are malicious entities. I guess the most well-known work I did as part of grad school was on a paper called View Stamp Replication Revisited.

[3:27] And that was a paper on redefining a protocol called View Stamp Replication Revisited, which predated Paxos. It’s a very similar algorithm. Paxos, Raft, and View Stamp Replication are all based on virtual synchrony; they’re all basically the same thing. That was a paper I wrote at the time, which ended up being influential to a few really great companies like Tiger Beetle.

Ryan:

[3:53] You mentioned transactions, and I saw in the paper there’s this idea of an independent transaction. If I’m understanding correctly, a lot of the efficiency in a distributed system is lost by needing to get consensus and to vote for consensus. In your research, you found a way to avoid needing to vote to reach consensus. How did you do that?

James:

[4:20] Yes, I mean a lot of people think about performance maybe from the wrong angle because I think of performance as a factor of just this raw horsepower, how fast are your disks, how fast is your network, etc. No one has particularly faster disks or memory than anybody else. What really matters to performance in a large-scale system is eliminating points of coordination. So it’s how to allow systems to progress without having contention between parties.

[4:53] Where you’re basically reducing parallel throughput to serial throughput across large numbers of transactions. In Granola, we had this idea called an independent transaction. Again, that was the terminology at the time where there were two entities basically processing pure functions. They have independent state and they’ll both come to the same conclusion as a result.

[5:16] And all they need to do then is serialize those transactions. They have to decide if it was to happen atomically across multiple nodes, what timestamp should it get. Granola, the paper, was mostly about how to exchange these timestamps very efficiently. So, how to have multiple parties each propose a timestamp and then choose the maximum of these, basically so you could safely serialize the transaction.

[5:42] And there’s a lot more complexity that goes into this. But really the focus there was about how to maximize throughput in a large distributed system without resulting in the alternative, which is two-phase commit and two-phase locking. Two-phase commit with two-phase locking is basically the standard approach for when you want to have multiple nodes agreeing on the same thing, where they both agree to lock their state, not process any other data, and then commit a transaction.

[6:09] Two-phase commit can be quite low performance because you’re blocking basically the systems for the duration of the transaction. It can be high risk too because you are taking a dependency on another node; you’re basically blocked waiting for another node to return. And so that was what Granola was. Now, it’s funny you asked about Granola because I forgot about it.

[6:33] That’s so far in my history, I kind of forgot about that work. I even forgot the phrase “independent transactions,” to be honest. It’s great to hear you bring it up again. But I guess all this stuff does feed into all the work you do later on in interesting ways.

Ryan:

[6:48] When I think about distributed systems, I think about you’re at a big company and you got a bunch of machines. But when you were at MIT building this thing, how did you build and test it? Were there spare machines that the college had, or was this in the cloud?

James:

[7:05] Yeah, this was a long time ago. Now I’m showing my age. This is pretty early in the days of Amazon Web Services. We had a rack of servers in our office that we could use. I was fortunate enough to be at MIT where we could afford a rack. There was also a service called Planet Lab, which was a big communal set of nodes that academics could use to run tests. But Planet Lab was a communal system.

[7:34] And so it was continually having problems. It’s just a free for all. For the longest time, I would just sleep next to my desk and wake up whenever. Planet Lab was free. Because you need to run some benchmarks these days, you just spin something up on Amazon Web Services. But then I would sleep next to my desk, and I would wake up in the middle of the night or at random times and check Planet Lab status and then kick off a benchmarking job.

[7:58] Now this was right at the cusp of when, in some respects, Google ruined systems research. I say that with affection and respect to Google because before then, most of the research was coming out of academia. A lot of distributed systems research was done on smaller scales, and a lot of the value proposition was the ideas. So, like, hey, I wrote a paper, here’s an interesting new idea, and yeah, it’s all in theory.

[8:28] And so the value that came out of it was the idea. Around this time, you started seeing papers out of Google, Amazon, and these various companies where it wasn’t necessarily a paper about an idea; it was a paper about a system. Hey, I built this giant system, it has a whole bunch of features, some of which are interesting, some of which are not, and by the way, it powers Gmail or whatever. That kicked off a pretty interesting transition because at that point, program committees reviewing papers started to expect to see realistic benchmarks that, frankly, grad students were not able to produce at least at the time.

[9:08] I mean, I wasn’t running Gmail on my system, and I think it did, in some respects, obscure intellectual ideas. I’m more for papers about systems. But I think there’s also value in a paper about an idea. It’s not just, “We built this big thing, and it works, and it’s up to you to figure out what’s interesting about it.” Instead, it’s, “Here’s a new thing.” There’s a paper, an old paper just called Leases, where someone invented the idea of a time-based lock.

[9:37] And it’s just a paper called Leases. That was a cool era of systems research. I think a lot of it now has shifted towards industrial research, where people are building stuff for practical purposes. Whereas I think it’s hard for academia to compete on pragmatism. Academia is really a great place to do impractical work.

Ryan:

[9:59] When you look back on getting the PhD in academia, being enabled to completely explore an idea versus going into industry, maybe working on Spanner at Google or something like that, which path do you think would be better and why?

James:

[10:18] I think that’s a bit of a misconception a lot of folks have about PhD programs. A lot of students are high-achieving students in college, and they think that the PhD program is college. It’s like, hey, I like learning things, so I’m going to go do a PhD, but it’s not. A PhD is training to be a researcher, and for most people, they shouldn’t do that. If someone wants to be a professional software engineer, they probably shouldn’t spend their time training to be a researcher.

[10:46] But with a really important caveat, I think there’s a really interesting and challenging developmental experience you go through in the PhD program at a top university, at least. You reach a certain point in time when you have a problem that you’re facing that no one else in the world knows the answer to. You have a problem that you are the world expert on, and you can’t chat about it with folks, but you can’t ask your advisor because your advisor doesn’t know either.

[11:12] Right. So you face these difficult challenges where you can’t read a book, you can’t ask Claude. And so you’re forced to go through a quite difficult and, frankly, emotionally challenging experience of being unsure, being uncertain, and learning to think for yourself. I think that is really valuable for all engineers. I think that was really, really valuable in my career.

[11:39] I do see a lot of folks early in their career think that all knowledge comes from reading or absorbing it from someone else, or that there’s a right way to do things. But I think being in a PhD program or being faced with really demanding, open questions does train your mind to be comfortable with that discomfort. I feel a little bit lucky that I went through grad school without large language models existing because I had to deal with this uncertainty.

[12:15] There was no crutch to help me out, which I think was very valuable.

Ryan:

[12:18] So, I mean, a lot of, if you want to be a researcher, go do a PhD. If you want to build stuff, go build stuff. Since leaving academia, I’ve done a lot of work that I guess one could call innovative. It advanced the industry in certain ways and did novel stuff. But our goal was never research. Our goal was just to solve problems.

James:

[12:22] I mean, a lot of, if you want to be a researcher, go do a PhD. If you want to build stuff, go build stuff. Since leaving academia, I’ve done a lot of work that one could call innovative. It advanced the industry in certain ways and did novel stuff. But our goal was never research. Our goal was just to solve problems.

[12:47] And I think that’s the difference. In academia, your goal is to advance knowledge. In industry, your goal is to solve problems. I gravitate more towards just solving problems, and I find that a more comfortable environment within which to work.

13:05 — Dropbox technical deep dive

Ryan:

[13:05] Going to your time in industry, at Dropbox, you became the most senior engineer at the company. Looking through all the projects that you did, I had a series of just technical curiosities. I saw this idea early in your career about multi-homing, and I wasn’t familiar with that concept. What is multi-homing? What’s the problem it solves?

James:

[13:27] Yeah, I mean, multi-homing is the ability to have data in two locations, two homes. Multi-homing can be valuable for a variety of reasons. One is called primary-secondary or lazily replicated multi-homing, whereby all writes commit authoritatively in one region and get replicated to a secondary region. This is normally used for business continuity. One thing we did at Dropbox was to ensure that if the entire West Coast blew up, which hopefully wouldn’t happen, Dropbox would keep running because the data would be replicated in other regions, but with a window of vulnerability, with a window of time where there may be some data lost.

So that is, you know, that’s kind of primary-secondary replication or multi-homing. There’s something else called active-active multi-homing where there is truly an authoritative copy of the data in multiple locations. Basically, when you, for example, write to a system, you don’t externalize that write as having succeeded until it has landed in all the regions. For example, in the storage system at Dropbox, the block storage system, we truly had a multi-region replicated system where we could take down an entire region, say take down the region in Ashburn, Virginia, where everyone’s data centers are, and there’d be zero downtime for the company and the data was still safe in multiple regions.

[14:13] So that is kind of primary-secondary replication or multi-homing. There’s something else called active-active multi-homing, where there is truly an authoritative copy of the data in multiple locations. Basically, when you write to a system, you don’t externalize that write as having succeeded until it has landed in all the regions. For example, in the storage system at Dropbox, the block storage system, we truly had a multi-region replicated system where we could take down an entire region, say take down the region in Ashburn, Virginia, where everyone’s data centers are, and there’d be zero downtime for the company, and the data was still safe in multiple regions.

[14:54] I think multi-homing is something a lot of engineers aspire to work on because it seems like the right thing to do. But the reality is for most companies, it is not. Not because there’s very high cost, but because there are very high latency costs for this. The speed of light is fixed. If you have to synchronously write data across multiple regions in the United States, you’re going to have, say, 60 milliseconds in the commit path of your protocol, which for most applications is not tenable.

[15:32] So there is a lot of desire; a lot of engineering teams reach out for advice on how to adopt active-active multi-homing for their company. I would normally say don’t do it. Frankly, if US East is down, if Amazon is down that day, that’s okay. If Amazon’s down that day, your company will be down. That’s a shame, right? But by avoiding that complexity, you’re going to be able to move much faster and build a much better product.

[16:00] And so in my mind, systems are all about trade-offs and making the right ones. I would generally recommend most people do not make the trade-off to have partition tolerance or multi-region availability. Even though for a company like Dropbox, yes, that does make sense. When your job is storing data and you have several exabytes of it and hundreds of millions of customers, I think that’s when it starts to make sense.

Ryan:

[16:26] So it’s just another term for, I guess, data replication and having it available in other regions. I imagine also, I mean, the cost of storage is a concern. How many replicas would you keep for something like, let’s just say I stored something in Dropbox? It’s my document. Is that on the order of one or two, or are there multiple?

James:

[16:47] No, it’s on the order of many. Many. If you were to store a file in Dropbox, I modeled the storage to be, as we advertise, at least 12 nines of durability internally. The models look around 24 nines of durability. That means the data is secure with 99.9999% where there’s 24 nines. Right? Which means, at least according to the model, the universe will be extinct before any data is lost.

[17:20] And the way that is done is by a combination of what’s called erasure coding. Erasure coding is how you take several blocks of data and combine them together with an encoding scheme and spread them around in different locations. At that scale, at Dropbox scale, you’re taking things into consideration, like putting data in different racks, different rows of a data center because they’re on different power feeds.

[17:41] You’re taking into consideration different eras of hard drives and different manufacturers of drives because they could have correlated failure patterns and then replication across regions. So you could be looking at 27 fragments, for example. That doesn’t mean you’re storing 27 times the data. It’s kind of encoded in many regions. But the replication schemes get really quite sophisticated at that point.

[18:06] And we actually had our own custom encoding matrix we had developed. It’s called a Vandermonde matrix, where you take a bunch of data and combine it together to produce outputs. We would plug in some variables like how much do disks cost? How much does network bandwidth cost? Because there’s a trade-off. You can either store, if you don’t want to lose your data, more copies on more disks, or you can store fewer copies.

[18:33] Anytime a disk fails, you re-replicate it really fast. Re-replicating really fast costs disk cross network bandwidth. These kinds of variables go into this equation. It ends up being an extremely complex field of endeavor, but it’s kind of abstracted away into a part of the system that doesn’t leak into anywhere else.

Ryan:

[18:53] So with erasure encoding, my document in Dropbox is fragmented into a bunch of different chunks of data and loaded potentially from many different machines.

James:

[19:03] Yes, absolutely.

Ryan:

[19:05] I mean, the first thought I have is now I might be waiting there, and one of the 27 machines is slow, and I can’t look at the whole document. So how do you prevent against that?

James:

[19:17] It’s actually faster than not replicated. Because if you imagine, I’ll pick a simplified example. Imagine this is not the encoding scheme Dropbox uses, but to reconstruct a file, you have to read six out of nine fragments. So if you read six out of nine fragments, you can just ask all nine and reconstruct and return the data as soon as you’ve heard from the first six. It’s actually faster.

[19:46] And you can construct these encoding matrices such that it’s actually faster to have a ratio-coded data than not.

Ryan:

[19:53] I see. Okay. So you can oversubscribe, over-request, and then you complete on a portion of them being received.

James:

[20:01] Yes. Now in practice, it was a bit more complex than that. We’d often have a copy and a single disk to provide fast access. We’d often try to make sure that you could serve your data out of a region close to your home region. So we’d make sure that your data was mostly served with low latency. But if that region had failed, you could reconstruct it from the remaining regions. So there’s a lot of talk about building your own infrastructure.

[20:28] And you can save money by moving off the cloud. Almost definitely you can’t unless you have very small requirements, very fixed requirements, or very heavy investment. Because if you want to compete with Amazon, if you want to build a more efficient storage system than Amazon, you have to have a supply chain team that’s working with Western Digital and Seagate, constantly negotiating on prices of disks and buying shipments at certain times, along with capacity teams and data center teams.

[21:00] There’s a lot of work that goes into optimizing this because ultimately our desire was to use the disks to 90, 95% of their disk size to maximize storage efficiency. To put it this way, that was, you know, I guess a ballpark figure of a $1 billion project. And it was, as far as I know, the largest ever data migration in history at that time.

[21:28] And so, an extremely large engineering project with very high technical investment. Yes, if you have that scale and you have the engineering team to do it, and you’re willing to keep innovating, if you’re willing to keep optimizing and investing effort in it, then yes, you can do it. But I think the real— I mean, the cloud has been an incredible innovation. Most people are not experts at this, and most people should not be experts at this.

[21:55] Most people should focus on their applications.

21:57 — Why Dropbox migrated from AWS

Ryan:

[21:57] When you say that most people shouldn’t, my immediate thought was, one of the big projects you worked on at Dropbox was migrating away from Amazon Simple Storage Service. So why did Dropbox migrate away from S3?

James:

[22:12] Yeah, that was a desire at the company for a long time, from even before I was there. I started at Dropbox in 2010, and I spoke to Drew, the Dropbox founder, about this project. There was a desire to control the destiny of the company from a strategic perspective. At the time, this was before Dropbox reshaped itself as being more about collaboration.

At that time, it was in the file sync and share category. Owning the file system was really valuable to the company. Ultimately, we saved a huge amount of money. And this was before the company went public. We really drove massive cost efficiencies through the project. But it was hard, in a way that I think would be very difficult to emulate without a huge investment.

[22:36] At the time, it was a file sync and share category. That was the market sector, and owning the file system was really valuable to the company. Ultimately, we saved a huge amount of money. This was before the company went public. We really drove massive cost efficiencies through the project. But it was hard, in a way that I think would be very difficult to emulate without a huge investment.

[23:03] And I do think there is a benefit to an organization from having hard problems to solve. If you have a company with extremely hard technical challenges, you can attract engineers who like working on those hard technical problems. When they’ve solved those problems, they cycle off and work on different parts of the system. After we all worked, we had such a great team.

[23:28] It was a very, very small engineering team. After we shipped the storage system reliably, we all went off, and Jamie redesigned the sync protocol, the desktop client, and I worked on the file system and the distributed databases. There’s value to a business in having that level of technical investment. But it’s like having a baby; you have to raise the baby. You can’t just build a system like this and say, “That’s it, we’re done.”

[23:59] You own it, and you have to keep investing in it.

Ryan:

[24:03] Did S3 do any counter negotiation before you set out to leave them? Did they say, “Oh, we’ll cut you a deal?”

James:

[24:12] I guess I’m allowed to talk about this now. It was a long time ago. For the longest time, I don’t think they were particularly aware that this was happening. But the data center folks talk, and certainly it was noticed that Dropbox was buying up a lot of data center space. So yeah, we had, obviously at the scales that we were at, we were negotiating very good rates with Amazon Web Services.

[24:38] We weren’t paying sticker price. We were paying very, very, very good discounted rates. But at a certain point, they weren’t able to meet our cost efficiency because when we launched the system, it really was more efficient than Amazon Simple Storage Service. And that’s for a variety of reasons. One was that we were using kind of new experimental disks called Shingled Magnetic Recording. We were the first ones, I think, to use these disks at scale.

[25:02] And two, we had a very tight understanding of our workload. We were able to design the system specifically optimized for our workloads, whereas S3 has to design the system for everybody. It got to the point where Amazon would not have been able to offer us a more competitive deal because we had a more efficient system. I wouldn’t recommend another company do this right now, but I think at the time it certainly made sense for us as a company.

Ryan:

[25:28] Can you give an example of a tight understanding of your workload leading to something you could do that Amazon Simple Storage Service couldn’t?

James:

[25:36] Yeah, absolutely. So, for example, I know that, well, I’ll try not to leak any confidential data, right? But when you upload a file to Dropbox, there is a pattern of access. Typically, people access the file very quickly shortly afterwards because you’re sharing it with someone, or maybe Dropbox is processing that file to generate an image preview, and then it decays at a certain rate. We understand in general the average block size, and we also understand the access pattern.

[26:11] So we could do things like, at a certain point, we had these two clusters. One was designed for temporary storage that was kind of storage inefficient but access efficient. It was very cheap to read and write to, but it was inefficient to store. Data would get written to there first, and then in the background, it would get moved in bulk to this colder storage system. This cold storage was far more static.

[26:40] And so it was able to have more efficient algorithms and be written to in bulk. If it went down for writes, that was no problem because it wasn’t in the live path. We were able to trade off again; like systems are all about trade-offs. We were able to trade off the live data write path from the long-term read path. That was one of many examples where knowing the size of your data, where it’s accessed from, how frequently it gets accessed, and how long it takes to delete that data, you can really tune a system to your workload.

[27:13] You know, stuff like looking at even things down to knowing how much power to put in a rack. You have a rack of hardware; there’s a power distribution unit, a PDU. At the top of that rack, it has a circuit breaker which can handle a certain number of amps. We would have to figure out how many amps are required for that rack based on access patterns. There were times where we got it slightly wrong.

[27:37] There was a time when we got maybe a bad batch of hardware, and disks were failing too frequently. As a result, we were re-replicating the data more regularly. I was getting messages from the data center team saying, “Hey, we’re running the racks really hot right now.” That’s the level of optimization you can make when you get to that scale. But again, that’s multi-exabyte scale—million hard drive scale.

Ryan:

[28:03] Yeah. When I was reading about this migration, I saw somewhere in the migration you initially started with Go, and then the racks or something that you’re running on, the hardware itself was out of memory too much or requesting too much memory. And then you migrated to Rust.

James:

[28:20] So initially, the prototype was in Python, if you can believe that. To be fair, Python is actually pretty efficient for I/O. I think people give Python a bad rap for I/O-bound workloads. It’s pretty good at I/O, but obviously not great for concurrency, not great for memory management, and very hard to refactor. It was critical the system was correct. So we migrated everything to Go, and we built most of the storage system in Go.

[28:47] This is before Go was in general availability. We built the system in Go. Go’s a great language for concurrency, a great language for proxies. It’s really well designed for servers that move from one place to another. At a certain point, though, we would have, let’s just pick a number, let’s say a million. Let’s say we have a million nodes in the system, and every node has some amount of memory, some amount of disk, some amount of sheet metal in the chassis.

[29:18] Right. We would itemize all these things. You’d have a pie chart of how much money is spent on all the things. It would include items like sheet metal and screws. So you’re trying to optimize the storage. A big problem for us was the amount of memory these nodes were using. Not just the memory that we were using, but the unpredictability of it. With Go having a runtime, in a storage system, an out of memory error is pretty bad because if a node runs out of memory and restarts, that looks like a disk failure.

So that looks a lot like a disk has failed and has to be re-replicated. A batch of nodes running out of memory can lead to cascading failures throughout the system. I remember a time it was a band; it might have been De La Soul, I can’t remember. There was a band that released an album on Dropbox, which caused a big spike in load to a few files. It was very high bandwidth and it caused the machines to run out of memory.

[29:54] So that looks a lot like a disk has failed and has to be re-replicated. A batch of nodes running out of memory can lead to cascading failures throughout the system. I remember a time it was a band; it might have been De La Soul, I can’t remember. There was a band that released an album on Dropbox, which caused a big spike in load to a few files. It was very high bandwidth, and it caused the machines to run out of memory.

[30:24] So those machines, those disks oomed. As a result, no problems; the system went to try to recover that data from a whole bunch of other replicas. Now all of a sudden, you’ve taken one amount, one fire hose worth of load coming in, and you’ve turned this into seven fire hoses worth of load coming. Because now you have to do a more expensive reconstruction operation, right? So now you’ve 7x the load.

[30:47] And these are the kind of cyclical behaviors that can lead to something called congestion collapse. Congestion collapse is when workload to a system crosses a threshold where it all kind of collapses. Designing against congestion collapse is really one of the hardest parts of Magic Pocket, which is the name of the storage system. Ultimately, we switched to Rust for the storage nodes themselves.

[31:13] And this, again, this is before Rust was in GA. So that was a bit of a risky move. But we rewrote the switch to Rust, which coincided with getting rid of the file system entirely on the disks and directly addressing the disk heads. There’s an instruction set, I think it’s called ZBC, Zone Based Block Control, something like that. There’s an instruction set for accessing disks that we were using. The disk manufacturers gave us the draft specs of these new disks, and we were operating off the draft specs and directly controlling the disks.

[31:45] And so all that product was tied up together into a product called Discotech, which was the disk technology project. Ultimately, if you look at that pie chart, congestion collapse stopped and reliability improved. But if you looked at the pie chart of where all the money was going, it really shifted to be almost all disks.

Ryan:

[32:16] Metal, et cetera, the cascading failure you mentioned was there. That happened, and then there was a postmortem.

James:

[32:23] I don’t think we needed a postmortem. I think we knew as it was happening. When I was getting paged in the middle of the night on these things, it was pretty evident. This is, again, an argument in favor of the cloud. Imagine you’ve spent hundreds of millions of dollars on a storage system, and it’s out in production on physical hardware.

[32:47] You can’t just go and put new memory chips in every one of them. I mean, you can, but you have to pay people to come in and swap them out. That’s a tricky place. This is what I love about the industry. Some of the more challenging moments on Magic Pocket were stuff like, in one week, just a weird coincidence, two trucks crashed that were delivering servers.

[33:13] So there are two trucks showing up to deliver racks, and they both crashed. I don’t know, the drivers were okay. So we lost capacity for two weeks. We lost capacity for more than two weeks, probably six weeks. What do you do? What happens is the equivalent of your disk filling up on your laptop, except it’s a million disks, and you can’t tell the customers to go away; you can’t delete their files.

[33:43] A lot of tricky capacity work. Ultimately, what that led to was trying to build in all these protections against the unknowns, making sure we had the right amount of buffer planned out for anything bad that could happen, making sure we designed the system so there couldn’t be congestion collapse and so there wouldn’t be memory spikes. At one point, there’s a process called FMEA, which is a threat modeling process where you have a big spreadsheet and you kind of write down every bad thing that could possibly happen, and then you know how bad it would be if it happened. Existential risk does someone die? If there’s a fire in the data center, all these kinds of things get put into this spreadsheet, and you kind of do a bit of a pre-mortem to figure out all the potential failure modes and then design around them. I love that work. I really, I don’t know, I do like the firefight. I can’t say I like getting paged because I’ve spent my whole life on call, but I do like that rubber-hits-the-road stuff.

[34:21] Existential risk does someone die? If there’s a fire in the data center, all these kinds of things get put into this spreadsheet, and you kind of do a bit of a pre-mortem to figure out all the potential failure modes and then design around them. I love that work. I really do like the firefight. I can’t say I like getting paged because I’ve spent my whole life on call, but I do like that rubber-hits-the-road stuff.

[34:52] I like that. Wow. There’s congestion collapse and there’s no one that can help you. You’ve got to think through this problem.

Ryan:

[35:01] When I worked on infrastructure at Instagram, we had this concept of defcon knobs, which are basically these configs that you could flip to gracefully degrade your system while still operating it. But maybe in the case of Dropbox, you store fewer replicas, so you take on a temporary increase of risk in losing data because you need to. Did you have something like that, and did you flip it in that type of case?

James:

[35:29] Never. When it came to durability, we had absolute zero, non-negotiable standards; there was no room for negotiation on user durability. We had those knobs for background processes, for CPU and memory, for example. If there was a spike in load, you could turn off background processes and then turn off the test load on the system. Eventually, we built this system called Trampoline, and what Trampoline did was save us a ton of money.

[36:05] If we ever got too close to the threshold, we would just start writing data to Amazon Simple Storage Service because it’s right there. You can run your capacity way closer to the edge if you’re willing, under worst-case scenarios, just to dump 30 petabytes on S3 and then move it back when it’s done. Now, that didn’t happen very often. We would do it to test it. We would do that just to make sure the system worked. But yeah, being able to have an escape hatch for worst-case scenarios was really nice.

Ryan:

[36:35] I see. So S3 is kind of like elastic storage.

James:

[36:39] Exactly.

36:40 — How to do massive migrations

Ryan:

[36:40] When I looked at this project, it’s such a massive migration, and my first thought is how do you coordinate this whole project without breaking the system as it’s running? What are your thoughts on doing such a large migration without breaking things?

James:

[36:58] Yeah. Now, in terms of the engineers that built the initial version of the system, it was a handful—three, four, five, six engineers. It wasn’t a team of a thousand; it was a very small team. So how do you build such a large system with a small number of people? And then how do you do a high-risk migration? One of the things you do is keep things simple. You try very hard to build cleanly abstracted, simple systems so that their failure modes are very understandable.

[37:33] And so they’re decoupled, so they don’t have congestion, collapse, et cetera. Focusing on simplicity gets tricky, by the way, with people doing agentic development. They’re not the best at building simple systems. Simplicity is still the domain of human beings for now, but there’s a big focus on simplicity. The other was this very thick layer of validation checks during this migration.

[37:58] And in fact, when we did the migration off of Amazon Simple Storage Service, we had something called the dark launch where we would be moving data off of S3, but we’d keep it in both locations. We had to demonstrate to the Dropbox founders that we would have to keep this system running with no incidents, no downtime, and no data loss for six months before we would delete any of the data from S3.

[38:26] So we have double routed, and there’s one point in time, halfway through this process, where a bug got through to production. Nothing bad happened with the bug, but it was like bugs slipped through our multiple layers of the release process, etc. I went to the VP and I said, “Hey, a bug made it through to production, and we’re going to reset the launch clock. As a result, it’s going to launch later, and it’s going to cost us some amount of money, let’s say double-digit millions.”

[39:05] And they were like, great, thank you, that’s good, I trust you. That was the cool thing. I mean, Dropbox had a lot of incredible cultural values, but that was like, okay, cool. If you’re prioritizing user safety, that’s the right thing to do. No one was mad about that. It was almost like they were proud, you know? It was just like, yes, you’re operating in accordance with the principles of this company.

Ryan:

[39:31] Okay. So it was double writing both systems for a while and then some subset of data.

James:

[39:34] for a while and then some subset of data.

Ryan:

[39:36] Yeah, okay. And then you switched over reads once.

James:

[39:39] We were sure that the system was durable and we had all the validators running in production 24/7. We were just migrating as fast as we could. I think at some point we got to 764 gigabits per second of peering bandwidth between Amazon servers and ours. Certainly, someone on the network team over there noticed that there was that much data moving out. At one point, I got a slightly nasty email from someone saying it was super weird.

[40:08] I think the phrase “super weird” is like, it’s super weird that you’re doing so many reads and not that many writes. I didn’t respond to that email. But to be fair, Amazon was a great partner for Dropbox. So Dropbox still uses Amazon Web Services. There was no concern that they would do the wrong thing by us as a company. I mean, we only had an excellent experience with Amazon Web Services. But yeah, it was a moment for us.

[40:36] It probably strained the relationship somewhat.

Ryan:

[40:41] You mentioned simplicity and intuition; it makes sense. Do you have a concrete example, though?

James:

[40:48] Yeah, I’ve got a concrete example for you. Maybe it shows the difference between academia and industry. The storage system is a giant distributed system with files stored in various locations. You need a mapping from the file to where it lives on these disks. All we did was have a cluster of 1,000 MySQL nodes, a big giant database. It was indexed by the block ID and indicated that this block is on these disks.

[41:21] And that’s pretty simple. It’s not sophisticated. Every time we’d hire someone out of academia or maybe from other companies, they would say, “Oh, this is not very sophisticated because you could use a Patricia trie or you could use a distributed hash table, and that would map a block to a set of locations.” I think that’s optimizing for the wrong thing because the really nice thing about dumping a list of files and the locations in a giant database is that it is written in one location.

[41:57] If I want to validate what happened, if I want to check all the data is where it’s meant to be, I just walk over the table and check. We had services constantly walking over the table and checking. Whereas if it was a distributed hash table or some giant complex data structure, it’s very hard to validate. Designing for validation is very important. Designing for understanding is very important.

[42:19] It’s not about getting a system to work. It’s what do you do when it doesn’t work. Right. And so having a very simple boundary, that’s a very basic example. There are more sophisticated examples that take more time to explain. But something like that is a lot of engineers will feed, a lot of engineers will want to do interesting work, will want to advance in their career. They want to be seen as an intellectual problem solver.

[42:49] And so the tendency can be to design complex systems. My argument is always that simple systems are way harder to design than complex systems. Simplicity is so hard. And I think to the untrained eye, a simple system can seem obvious. The best compliment you could ever get about anything you design is when people say, “Oh, isn’t that the obvious way of doing it?” It’s the same as Convex; people say, “Oh, isn’t that just the obvious way of structuring?” Great, because it wasn’t obvious when we did it. No one else was doing it. Everyone thought we were idiots. If after the fact people think it’s obvious, then you really nailed it. But I think it requires an understanding that simplicity is the hardest thing in systems. And because simplicity is scalable. Yes, simplicity is scalable in terms of numbers of queries per second.

[43:20] What’s that? That’s just like the obvious way of structuring. Great. Because it wasn’t obvious when we did it. No one else was doing it. Right, right. Everyone thought we were idiots. If after the fact people think it’s obvious, then you really nailed it. But I think it requires an understanding that simplicity is the hardest thing in systems. And because simplicity is scalable. And yes, simplicity is scalable in terms of numbers of queries per second.

[43:48] But what I really mean about scalability is you can take a simple system and have it run for five years, have people work in it for five years, have all sorts of features added to it, and have requirements changed because the company realized the product didn’t work the way it wanted to work and it wants to change things. It still stands the test of time, whereas a complex, over-optimized system will not.

[44:11] I think that’s the tough thing about distributed systems design, especially LLM-augmented distributed systems design. Just because something works doesn’t mean it’s maintainable over a long period of time, doesn’t mean it’s understandable, and doesn’t mean it’s cleanly architected and abstracted. That stuff’s really very hard.

44:31 — Simplicity vs complexity in promos

Ryan:

[44:31] Absolutely. And I agree with you. I think it’s the long-term beneficial thing to do. One unusual thing, though, in the industry that I’ve seen is the incentive system for engineers. I mean, you mentioned the desire for an engineer to want to be seen as someone who can do something difficult. There’s that, but there’s also the incentive system of promotions. I’ve had many friends whose promotions were rejected because their work wasn’t complex enough.

[45:01] And so that kind of forces complexity, which is kind of unusual. I wanted to know what you thought about that.

James:

[45:08] Yeah, I mean it almost angers me. I just like it so much. Partly why I started my own company. I think the ideal for anyone is to be doing work where you’re being appreciated for solving the problem. If we get philosophical, this is what it was like going back to the farming days, right? There was no incentive to make it really complicated to milk a cow because the goal is to milk the cow, and then the reward is you got milk.

At a startup, that’s the same thing. The goal is to build the system, have it work, have the users like it, have it grow, and everyone gets rewarded and celebrated for solving the problem. It gets hard to scale that. So at large companies, you end up with so many layers of organization that people end up building alternative incentive structures.

[45:42] Right. I think it sounds so silly, but at a startup, the goal is to build the system, have it work, have the users like it, have it grow, and everyone gets rewarded and celebrated for solving the problem. It gets hard to scale that. So at large companies, you end up with so many layers of organization that people build alternative incentive structures.

[46:08] Right. It’s like I’m so far away from whatever the hell we’re trying to do over here that my goal now is to get all green check marks on my OKR plan. But who cares about your OKR plan unless it solves the problem? The thing that really drives me insane is when people try to chase artificial goals. I understand that if you’re in a company like this, you may have no choice in the matter.

[46:39] But what I want to tell people is there is a better way. That better way may not be available to you; you may not have job opportunities near where you are, for example. But if you do have the ability to work at a company where you are being appreciated for problem solving, that will make you so much better as an engineer. I see this when I interview people.

[47:07] If I do a deep dive with them, they’ll say they built a system, and I’ll ask, “Why did you build it?” and they’re like, “I don’t know, the VP told me to.” I’ll ask, “How’s the system used?” and they’re like, “I don’t really, I think ads use it.” I’m not sure. This is a caricature, but I think it’s very, very hard to do good engineering in that environment. You can do competent engineering, but the best engineering comes from a deep understanding of why. This is something we just drill into the team here at Convex; the team embodies this so strongly at Convex: everything exists for the why. Don’t build a fancy load balancer unless it’s needed.

[47:32] I’m not sure. This is a caricature, but I think it’s very hard to do good engineering in that environment. You can do competent engineering, but the best engineering comes from a deep understanding of why. This is something we just drill into the team here at Convex. The team embodies this so strongly: everything exists for the why. Don’t build a fancy load balancer unless it’s needed.

[47:57] Turns out we do need a fancy load balancer. We’re building it right now. But you should always start with, why are we doing this? What’s the point? I feel for people stuck in environments that are not like this. But you know what? Try to fight the system a little bit. I do see a lot of nihilism, a lot of defeatedness sometimes amongst junior engineers, a lot of this cynicism, like, what does it matter? Who cares? It’s just a big organization and nothing matters. But I think it does matter. If I think of the happiest times in my life, it’s been dedicating myself to a cause, trying really hard, and trying to do the right thing. I felt good when I went home, not trying to get promoted, just trying to do the right thing and then assuming I’m going to get promoted. If not, go somewhere else.

[48:26] Who cares? It’s just a big organization and nothing matters. But I think it does matter. If I think of the happiest times in my life, it’s been dedicating myself to a cause and trying really hard to do the right thing. I felt good when I went home, not trying to get promoted, just trying to do the right thing and assuming I’m going to get promoted. If not, I’ll go somewhere else.

[48:53] I know it does sound quaint when I’m saying this, but I think it’s possible to do this, and especially possible if you surround yourself with people like this. If someone is in a big company and they’re feeling frustrated by politics, look around and see if there’s a team of folks who just seem to want to do the right thing, just seem to want to do good stuff. I don’t think that’s selling out. I think that’s being true to yourself. That’s what real engineering is. Not trying to make a complicated fancy thing to get promoted. Just build the coolest thing that solves the problem.

[49:13] I think that’s being true to yourself. That’s what real engineering is. Not trying to make a complicated, fancy thing to get promoted. Just build the coolest thing that solves the problem.

49:23 — What technical teams should be focused on

Ryan:

[49:23] This really reminds me of something you had written. I thought it was really good writing. In the writing, there was this idea of system bias. You have this quote you’re writing. It says, here are some examples: the team is spending six months to improve performance by 10% when it was completely fine to begin with, or the team is trying desperately to force their tooling on clients who don’t need it.

[49:49] Or the team is riding their outdated system to the grave like the captain going down on the Titanic. I’ve definitely seen examples of all those types of things in industry. And so, yeah, I think it was in the context of your writing about what you should orient your team around. Not systems, but actually missions. And maybe that’s a way to fight system bias.

James:

[50:20] Yeah, I mean, one of my jobs at Dropbox wasn’t the most fun job, but it might have been one of the most impactful jobs. It was shutting down projects. You know, looking around and being like, huh, that thing over there that has had 60 people working on it for two years doesn’t seem to make a lot of sense to me. And then I, you know, it wasn’t a hostile thing, but I’d go and chat with a team and I’d say, hey, what are you all doing?

Do you believe in what you’re doing? Does this make sense? And the team in private would say, I don’t really know. But inertia is so strong. You know, this whole desire to not get in trouble, to just keep doing what you were previously doing is so strong, and talented people can end up doing things that don’t make a lot of sense. One of the things I said in that article is it’s kind of a cheesy story, but when we started building the storage system at Dropbox, the team was called the Magic Pocket Team because that was a silly code name for the system we built.

[50:44] Do you believe in what you’re doing? Does this make sense? The team, in private, would say, “I don’t really know.” But inertia is so strong. This whole desire to not get in trouble, to just keep doing what you were previously doing, is so strong, and talented people can end up doing things that don’t make a lot of sense. One of the things I said in that article is it’s kind of a cheesy story, but when we started building the storage system at Dropbox, the team was called the Magic Pocket Team because that was a silly code name for the system we built.

[51:17] The team was oriented around building that system, but as soon as we shipped it, I renamed the team to the Storage team. That actually took a bit of work because you had to rename all the email addresses, the channels, the repos, and the things. It seems like a waste of time. But my argument to the team was that the responsibility of the storage team is not to advocate for Magic Pocket, the storage system. It’s to solve the needs of storage for the organization. Because who else in the company knows more about storage than the storage team? If there was a point in time where S3 was a better idea, it would make sense to move back. Or maybe there was a different kind of storage system that meant to use. Right. It’s the job of the storage team to advocate for moving back. Right. And so I’ve seen this before.

[51:46] It’s to solve the needs of storage for the organization. Because who else in the company knows more about storage than the storage team? If there was a point in time where Amazon Simple Storage Service was a better idea, it would make sense to move back. Or maybe there was a different kind of storage system that meant to use. Right. It’s the job of the storage team to advocate for moving back. Right. And so I’ve seen this before.

[52:12] You have a team called the Puppet team. People don’t really use Puppet that much anymore, but for the job manager, Puppet versus Chef. They’d be kind of advocating for their team’s thing when really a team should be oriented around what problem they solve. They should not care about the system that survives. Because if you are on the Magic Pocket team and someone says we should move back to S3, that’s pretty threatening to your identity and your career.

[52:44] But if you’re on the storage team and it turns out it makes sense to move back, I don’t think that’s the case. But then that’s an exciting new product for you to own. I think it seems like such silly management philosophy, but I think it’s really, really important to orient a team and an identity around solving a problem and not owning and defending a system. You just see this in big companies. Inertia is so strong.

[53:07] Inertia is so strong.

Ryan:

[53:10] Yeah.

James:

[53:10] And you see people doing things they don’t believe in because that’s just what they do.

Ryan:

[53:14] If inertia is so strong, how did you fight it and close down all those projects?

James:

[53:20] I think I got lucky insofar as I was there pretty early on and I worked hard enough. I was putting in probably 16-hour days at the start. I’m not advocating for that, but I was dedicating my life to the company, and I think it became pretty obvious to people that I cared. This is a guy over here that really wants to do the right thing and cares about the company.

[53:42] At that point, you build up enough confidence, capital, that you feel comfortable saying things. I wasn’t afraid for my career; I was afraid of the wrong decisions getting made. So I felt psychologically comfortable making observations. Then at a certain point, there were a lot of engineers. I would mentor many of the staff plus engineers at the company.

People would sometimes get grumpy because they felt we weren’t doing the right thing or that something was inefficient. Every engineer listening to this has a story like this; they’re annoyed about some inefficiency at the company. My response was generally, do you think we should solve this problem right now? Because if we should, let me know the team to take some engineers off and the product to shut down, and I can redirect resources so we can solve this problem right now.

[54:16] And people would sometimes get grumpy because they were like, “Oh, we’re not doing the right thing over here,” or “This is inefficient.” Every engineer listening to this has a story like this. They’re annoyed about some inefficiency at the company. My response was generally, “Do you think we should solve this problem right now? Because if we should, let me know the team to take some engineers off and the product to shut down, and I can redirect resources, and we could solve this problem right now.”

And they’d be like, “Oh, well, we shouldn’t shut down any other stuff.” I’m like, “Cool, well, we just have this many engineers right now. If there’s anything lower priority, let’s stop doing the lower priority thing and do the higher priority thing.” Oftentimes the answer was, “Oh no, nothing else is lower priority.” Then the answer is we just have to accept it. Just have to accept it, right?

[54:42] And they’d be like, oh, well, we shouldn’t shut down any other stuff. I’m like, cool, well we just have this many engineers right now. If there’s anything higher or lower priority, let’s stop doing the lower priority thing and do the higher priority thing. This thing. Oftentimes the answer was, oh no, nothing else is lower priority. Then the answer is we just have to accept. Just have to accept, right?

[55:06] There’s no point in being angry or upset that we’re not doing the right thing all the time. I think there’s a dimension to “you break it, you bought it.” I don’t think, when I said part of my job was shutting products down, it was just going around causing problems. Ideally, it’s about solving problems, like, “Oh, this product is not going in the right direction, let’s redirect it and do this alternative thing.”

[55:28] And so I think the thing that helped me, I guess, was having a sense of ownership that instead of complaining, I just wanted to go fix problems. That was part of the culture. I mean, when I started at Dropbox, the infrastructure team was, I don’t know, seven, eight, nine people. We’d have someone join from Google, for example, and they’d say, “Well, someone should go build.” I can’t do anything without this logging framework. I couldn’t possibly do anything with this logging framework. I’m like, “Well, we’re going okay without it.” And then someone needs to build this thing. Everyone would be like, “Well, who is someone?” Right? Because it’s just us. It’s just us. We build it or we don’t build it. They’d pretty quickly come to understand, “Oh wait, it’s just us.”

[55:55] I can’t do anything without this logging framework. Couldn’t possibly do anything with this logging framework. I’m like, well, we’re going okay without it. And then someone needs to build this thing. And everyone will be like, well, who is someone? Right? Because it’s just us. It’s just us. We build it or we don’t build it. And they’d pretty quickly come to understand, oh wait, it’s just us.

[56:17] There’s no other idiots out there. We’re the idiots, right? So that was, I don’t know, I loved that time because it was just a time of accountability. Life gets easier and harder when you realize that everyone else is not an idiot. When you realize that everyone else is just dealing with their own stuff, right? I do not think someone will have good luck going around complaining about stuff and just saying this is a dumb idea and being negative.

I think people will have—everyone wants problems solved, though. So if you’re someone in an organization who is willing to put their head up and say, you know what, I think this thing over here is a bad idea, but here’s a different idea and I’m willing to own it and put the effort behind it, I think that’s a recipe for success.

[56:50] I think people will have—everyone wants problems solved, though. So if you’re someone in an organization who is willing to put their head up and say, “You know what, I think this thing over here is a bad idea, but here’s a different idea, and I’m willing to own it and put the effort behind it,” I think that’s a recipe for success.

Ryan:

[57:11] From that article, you had a great quote or a great question to think through this. It said you went to everyone or a bunch of people, and you would say, if we could be spending these resources working on any project at the company right now, would this still be the best use of time? I feel like it frames exactly what you just described really cleanly.

James:

[57:32] Yeah, I mean, I guess this is like maybe a trick for being a tech lead or a manager. Nothing’s a yes or no question. It’s a prioritization question. It’s not like, should we redesign the database? I don’t know, maybe, I guess. Is it the most important thing to do right now? No. Cool, let’s not do it. I think it’s much easier to have those conversations than to think about it as a yes or no.

[57:54] I often hear, “Oh, my VP won’t let me do blah.” Okay, well, it could be that your VP is uninformed, but it’s probably not. It could be your VP has a different set of priorities. Maybe their VP knows that you need to ship these features, and if you don’t, then the company is going to struggle. I don’t know. But to really frame it around prioritization, the way I think about the career ladder for engineers is that some people—maybe it’s not common—think that becoming a senior engineer means you get better at programming.

[58:27] But I don’t know, I think my programming abilities went down from level four onwards. I think once I got to level four, that was peak programmer for me. Then I probably went downhill from there, and I got wiser, whatever. But I think the real thing that happened is the scope that I cared about increased. At a certain point, you’re just thinking about what matters most at the company for the next five years.

[58:52] And I don’t think a junior engineer in their first year on the job should try to do this because you probably don’t yet have the wisdom, insight, or knowledge to make a good assessment. I think it would be a mistake to try to come up with redirecting company strategy. But as you grow, the real key part about growing is that the IC 6, 7, and 8 engineers are not necessarily the best programmers. They’re just getting better at having a broad perspective in decisions within a company.

[59:22] They’re just getting better at having a broad perspective in decisions within a company.

1:00:15 — Doing the right thing vs promo hypothetical

Ryan:

[1:00:15] You mentioned this idea of doing the right thing and then, you know, the byproduct, you also get promoted. I think that’s the dream, you know, do the right thing, get promoted.

[1:00:25] But in reality, oftentimes people would have to make the trade-off. Imagine a two-by-two matrix of doing the right thing, doing the wrong thing, getting promoted, and not getting promoted. Obviously, do the right thing, get promoted—great. Do the wrong thing, don’t get promoted—obviously bad. But I’m curious about the other two quadrants. Which one would you have picked when you were earlier in your career?

[1:00:50] Let’s say I came to you, I said, hey, you can do the wrong thing, but you’re going to get promoted, or you can do the right thing and I guarantee you’re not going to get promoted. Which one?

James:

[1:01:00] Absolutely the second one. I’m going to say a really tacky thing, right? There are a set of engineers who at a certain point just make infinity money. The amount of money they can make, you know, going to work at Anthropic is a huge amount of money. Right? And so there’s a certain point where it just doesn’t matter anymore. The money is whatever. But if you really want to maximize long-term income, I don’t think you should try to do that.

If you get to a certain level of experience, skill, and seniority, you’ve made it; that’s it, you’re done, the money’s fine. I think there’s this desire amongst junior engineers early in their career to be kind of over-optimizing for promotion and salary, etc., as opposed to investing in themselves.

[1:01:31] But if you wanted to maximize long-term income, if you get to a certain level of experience, skill, and seniority, you’ve made it. That’s it, you’re done; the money’s fine. I think there’s this desire among junior engineers early in their careers to be kind of over-optimizing for promotion, income, and salary, etc., as opposed to investing in themselves.

[1:01:59] They probably don’t want to hear me say this because it’s easier for me as an old guy to say this, right? But I think, for the longest time at Dropbox, there was a time at Dropbox where we just didn’t have our level system. I was tech leading the team and I was making the least amount of money on the team because I was at some point a technical manager. I saw everyone’s salary, so I was making the least money, but whatever, right?

[1:02:20] It all worked out in the end, right? Now that was a happy story. But I do really think I benefited so much from working with the best people. If you have two choices—making 20% more money now or working with the best people in the world—just be around the best people because that’s going to set you on the ship to success. Now, not everyone wants to do that. Not everyone wants to go on that. There’s no shame in just wanting to have a regular job and just be chilling and getting paid. Fine, there’s nothing wrong with that. But if you do want to maximize your career growth, then the way to maximize that is to maximize your skills. Hopefully, you’re not so cynical to think that there is no correlation between talent and compensation.

[1:02:49] Not everyone wants to go on that. There’s no shame in just wanting to have a regular job and just be chilling and getting paid. Fine, there’s nothing wrong with that. But if you do want to maximize your career growth, then the way to maximize that is to maximize your skills. Hopefully, you’re not so cynical to think that there is no correlation between talent and compensation. I don’t. If you think that’s the case, okay, I don’t know what to say, but there is a correlation between talent and compensation and growth. So my advice to people early in their career is to land at the best company with the best people doing the most important problems. Your life will be great because we are so lucky as engineers. Where else can you work solving problems for a job, solving puzzles for a job, and getting paid so well? Engineers get paid so well compared to most jobs.

[1:03:19] I don’t. If you think that’s the case, okay, I don’t know what to say, but there is a correlation between talent, compensation, and growth. So my advice to people early in their career is to land at the best company with the best people doing the most important problems. Your life will be great because we are so lucky as engineers. Where else can you work solving problems for a job, like solving puzzles for a job and getting paid so well? Engineers get paid so well compared to most jobs. The real privilege that we get is to work on something cool. That’s what I advocate for.

[1:03:53] The privilege, the real privilege that we get is to work on something cool. And that’s what I advocate for.

Ryan:

[1:04:02] I love the idea you mentioned earlier. That’s kind of counter to the common opinion I often hear. The common opinion I hear is that someone working at a big company is in this machine, and it’s kind of defeatist. They’re going, “I ship this thing, but I hate this” or “I don’t believe in this.” I really liked your perspective on doing the right thing and basically giving a damn that someone outside of you is doing the right thing too and expanding your sense of ownership.

[1:04:35] And I want to know what is your motivation for that? Because there are so many other people who are faced with the same inputs, and they come to a very different conclusion. So what motivates you to actually do the right thing when the machine doesn’t necessarily incentivize you to do so?

James:

[1:04:51] Yeah, well, I think the machine does incentivize you to do so, but I don’t think it’s visible immediately. I think people who do less job hopping really do grow the most. Because I would say very strongly, if you’re not in a job for three years, you’re not going to see whether your decisions were good. You can get more money as a junior engineer, but you cannot become a very talented senior engineer without being around for long enough to own the consequences of your decisions.

[1:05:23] It’s like playing a basketball game and leaving before the game’s over. You’re just not learning. I do think there’s an actual structural incentive towards staying in a job now. If you’re in a bad job, leave the bad job. Right? But there’s a structural incentive to be able to stay long enough to have an impact. But also, I don’t know, I guess it’s easy for me to say. I joined the tech industry when it wasn’t a very lucrative field.

[1:05:50] It wasn’t like it is now, but I just like it. The reason I left academia was because I wasn’t confident I was making the right decision. So what would happen, how I was in academia—I’m not saying everyone was like this, but you don’t know what to do. So you have to make up a problem to solve. You make up a problem, and then you make up a solution, hopefully a good one.

[1:06:18] And then you write a paper where you try to convince everyone that it was a really good idea. I hated it because I didn’t want to convince people. I just wanted to build it and see if it was good. I just wanted to build it and ship it and see; I didn’t want to play pretend and argue about whether it was good. That’s what makes me feel good. I don’t know. Obviously, if you’re an engineer who’s living paycheck to paycheck, you should maybe ignore what I’m saying.

[1:06:50] I don’t want to come across as unempathetic to anyone who’s really struggling financially. But if you are not struggling financially, the best thing you can do for your quality of life is to enjoy what you do every day. I don’t know, you could have a fancier car or you could enjoy what you do every day. I love cars. I’m a motorhead. But I promise that enjoying what you do every day is going to have a much bigger impact on your life.

[1:07:22] Personally, I don’t enjoy going to work and working on products that I believe in. Jamie, my co-founder, is the same way. We really get along well in this respect. We can only put up with doing things we actually care about and believe in. I think that’s the luxury—it’s more luxury than taking a first-class flight.

[1:07:49] That’s more luxury than going to a three-million-star restaurant. The luxury is you don’t get up and have to do a terrible job. You get up and go to a place where you like your coworkers, and you solve cool problems, and you go home and feel proud of yourself. If you can construct your career that way—and it’s not easy, it requires very active effort—then you’re going to have a good life.

1:08:13 — Why he dipped into management sometimes

Ryan:

[1:08:13] I saw in your career journey it said that you’re an occasional manager, and you seem like someone who really enjoys the technical aspects of things. So how did you decide to occasionally dip into management?

James:

[1:08:27] I feel like most people should not want to be managers, and I sometimes think it’s a bit of a red flag if someone wants to be a manager too much. Right. Because being a manager is a hard job. I think most people should go into management because they need to, because it’s necessary. What happened every time I went into management—I’m in a management role now as well—was because you have to.

[1:08:50] Right. But someone needed to manage the team, and so I do genuinely enjoy accountability. I like responsibility; I like having weight on my shoulders, I suppose. I went into management several times, but as soon as I had the opportunity to get out, as soon as someone else would come in and manage the team, I would bounce out of that role back into engineering. Now, going into management is tremendously educational.

[1:09:22] Being a manager really lets you see the world in a different way. You realize companies are more complicated than you thought. You realize the engineers are more complicated than you thought. You realize everyone on the team is going through something, and you realize, oh wow, now I understand why that thing happened. So being a manager is very educational. One thing I would say, I would strongly caution people against is going into management too early.

[1:09:44] In their career. I see this happen a lot with well-intentioned people who want to push others into management as a means of career advancement. I think it’s doing people a disservice. You should let folks take their time in an organization, in a career journey. I don’t think you should be going into management under most circumstances in the first three years of your career.

[1:10:08] I think you should ideally get to staff engineer before you do that. That’s not going to happen for everybody. But ideally, you take the time because if you go into management before you’re an excellent technician, it will limit your ability to influence strategy later in your career.

Ryan:

[1:10:24] How does that play out?

James:

[1:10:25] It plays out with people who are stuck or have roles as people managers where they see their job as making a team happy or maybe coordinating a team or dealing with all the day-to-day challenges of people on a team. But they’re not organizational leaders. They’re not pushing the team towards excellence. They’re not asking, “How can we reframe what this team is about? How can we help influence technical strategy?”

[1:10:59] And again, people management is a fine job. But I think the best companies are ones where all the managers are very technical, so they’re able to make sure the company’s doing well at a certain point. If you’re not a particularly technical manager, you’re not going to be able to evaluate the work of your team. You’re not going to know whether your team is even doing well. I guess maybe you might contribute to some of the stuff you were saying about cynical organizational attitudes where my manager doesn’t understand me.

[1:11:29] Maybe your manager doesn’t understand you if they haven’t spent enough time developing technical skills and struggling.

1:11:36 — Why you shouldn’t lead by example

Ryan:

[1:11:36] Yeah, there’s another piece that you wrote I thought was really good about leading by example, and you actually say it’s bad to lead by example. But there’s a quote in there that says modern tech workers can be an anti-authoritarian bunch at the best of times. Let’s say you’re a tech lead or manager. How do you strike that balance so you don’t lose credibility as a tech lead?

James:

[1:12:03] Yeah, there are command and control companies where the manager just tells people to do things, and they just do them. Those typically are not excellent companies, and not every company needs to be excellent. But if you want to have a company where your team is really innovating, where the engineers feel very personally responsible for making high-quality decisions, you have to have them believe in what you’re doing.

[1:12:30] And so, at Convex, I’m the founder and CTO. I can’t really go to someone’s desk and say, “Do this thing.” Now, they might do it just because they like me. There’s a good chance they would do it, but not because I’m the boss. They’d be like, “Well, why?” And that’s because that’s our culture. We don’t just do things because we’re told. If I want someone to do something, I’ll spend time with the team talking about why we’re doing it, where the market’s trending, where the gap in our product, what, you know.

[1:12:57] And then once we’ve really well articulated why something matters, people are just going to do it anyway. Sometimes you have to have hard conversations, but I think about it in terms of conflict in an organization. There’s this kind of hierarchy of the values you have and then why, what, and how. Engineers are very often debating the how. They’re often debating what algorithm we should use for this and whether we should use this container service or that container service. These are kind of like the implementation details. Most times when I see organizational conflict, it’s because well-intentioned people are debating as best they can about how to do something, but they don’t agree on why we’re doing it. If one team thinks the most important thing we can do right now is get more features out to expand our customer base, and the other team thinks the most important thing we can do is increase reliability because there’s a risk to the business.

[1:13:24] And should we use this container service or that container service? These are kind of like the implementation details. Most times when I see organizational conflict, it’s because well-intentioned people are debating as best they can about how to do something, but they don’t agree on why we’re doing it. If one team thinks the most important thing we can do right now is get more features out to expand our customer base, and the other team thinks the most important thing we can do is increase reliability because there’s a risk to the business, they’re both very valid perspectives. However, it’s going to lead to them doing very different things. Within an organization, I’m a strong believer that everyone needs to have 100% alignment on why. The stuff we argue about or debate or talk about ad nauseam is why. Largely, I just trust the team to do the right thing. I think that’s the case for any tech lead. If you want to have credibility on a team, if you think that you can just tell someone to do something and they’ll listen to you because you’re the senior guy, no, they won’t.

[1:13:54] They’re both very valid perspectives, but it’s going to lead to them doing very different things. Within an organization, I’m a strong believer that everyone needs to have 100% alignment on why. The stuff we argue about or debate or talk about ad nauseam is why. Then largely, I just trust the team to do the right thing. I think that’s the case for any tech lead. If you want to have credibility on a team, if you think that you can just tell someone to do something and they’ll listen to you because you’re the senior guy, no, they won’t. They won’t listen to you. If they listen to me, it’s because I’ve come in and explained it in a way that resonates with them. Again, one of my jobs at Dropbox was to resolve situations. Someone would say, “Oh, that team’s an idiot; they won’t do blah.” Then I’d go talk to that team. That team was not an idiot. We talked through it, and they would decide to do the project. It wasn’t because they were scared of me.

[1:14:30] If they listen to me, it’s because I’ve come in and explained it in a way that resonates with them. Again, one of my jobs at Dropbox was to resolve situations. Someone would say, “Oh, that team’s an idiot, they won’t do blah.” Then I’d go talk to that team. That team was not an idiot. We talked through it, and they would decide to do the project. It wasn’t because they were scared of me.

[1:14:51] I hope they weren’t. At least it was because I took the time to figure out what their motivations are. That’s something you really have to learn. If you’re a tech lead, you don’t really have that much authority over people, and you have to encourage them and get them to believe in what you’re doing. If you are leading through authority, you’re not going to have a culture where good ideas arise from within the organization.

[1:15:17] People are just going to do what they’re told.

Ryan:

[1:15:19] Influence without authority is huge. Even big companies like Meta technically don’t have titles; everyone’s just a software engineer. Is that also how Convex is run?

James:

[1:15:32] I mean, we’re a very flat organization. There are people in tech lead roles. I don’t completely buy that everyone’s a software engineer thing. It’s like, you know, Anthropic. Everyone’s a member of technical stuff because it’s kind of like a wink-wink thing. You kind of know, it’s like no one’s mentioned the title, but you kind of know if the most senior person in the company comes to your desk, you’re probably going to notice. I think there’s no point in playing pretend. People know. But I will say that you should have an organizational culture where you don’t do something just because a senior person says something. Now, I do think that reputation matters. I certainly think that if a very experienced person at a company comes and tries to explain something to you, you probably should listen to them because they probably have some wisdom.

[1:16:03] People know. But I will say that you should have an organizational culture where you don’t do something just because a senior person says something. Now, I do think that reputation matters. Like, I certainly think that if a very experienced person at a company comes and tries to explain something to you, you probably should listen to them because they probably have some wisdom. There might be something to be learned there. You should be open to it. But ultimately, even the most senior person, I don’t think should be leading through authority. They should use the benefit of their experience to be able to articulate, to win the hearts and minds. And that’s how your engineering culture is so important. And if you want a culture of ownership and innovation and drive and enthusiasm, and people trying to do the right thing, not get promoted, you have to have a culture where everyone believes in what they’re doing, and no one’s going to believe in what they’re doing.

[1:16:27] There might be something to be learned there. You should be open to it. But ultimately, even the most senior person, I don’t think should be leading through authority. They should use the benefit of their experience to be able to articulate, to win the hearts and minds. And that’s how your engineering culture is so important. If you want a culture of ownership, innovation, drive, and enthusiasm, where people are trying to do the right thing and not just get promoted, you have to have a culture where everyone believes in what they’re doing, and no one’s going to believe in what they’re doing.

[1:17:00] If the senior principal engineer comes to the desk and says, “I’m not going to tell you why, but you have to delete this database and do this other thing,” that’s not an empowering statement. What the empowering statement is, “Hey, let’s spend some time together to talk about where this product’s trending and how this is probably not going to work out and how there might be a different way we can solve this problem.”

Ryan:

[1:17:20] On the topic of tech leadership, in that article I mentioned, the title was “Don’t Lead by Example.” I think that might be confusing for people. Can you explain why you think you shouldn’t lead by example?

James:

[1:17:33] Yeah, leadership by example is a very passive thing to do. Engineers are passive people at the best of times. I think that’s stereotypically a little bit part of that personality. The very concrete example is when I first started becoming an engineering leader, I was trying to lead by example. I wanted to demonstrate the behaviors I wanted everyone else to have. Very specifically, with regards to on-call and people getting paged, I wanted people to have high ownership. I wanted people to jump on issues as soon as they happened. So I would do it. I’d be the first one to respond to a page. I would always be writing up the reports. I would always be jumping on all the bugs and stuff, really falling over myself to show how I wanted people to be. But from their perspective, all they see is that the lead is just doing all these jobs, and they don’t know. They’re like, “Oh, maybe that’s James’s job,” or “Maybe James knows how to do it, and I don’t know how to do it.”

[1:18:05] I want people to jump on issues as soon as they happen. I would do it. I’d be the first one to respond to a page. I would always be writing up the reports. I would always be jumping on all the bugs and stuff, really falling over myself to show how I want people to be. But from their perspective, all they see is that the lead is just doing all these jobs, and they don’t know. They’re like, “Oh, maybe that’s James’s job,” or “Maybe James knows how to do it, and I don’t know how to do it.”

[1:18:37] Turns out I didn’t know. I was just kind of figuring out, or maybe he likes doing those things. At a certain point, being a leader is about understanding human psychology. I don’t think you can just act a certain way in front of people and wait for them to copy you. I think you should act with integrity and values, and you should own the values of the team.

[1:19:01] But you sometimes have to explain stuff; sometimes you have to tell people very, very specifically. There’s a trajectory almost every high-achieving leader goes through. They become a tech lead and care so much that they become a micromanager. They review every line of code and are involved in every decision, and at a certain point, they become the bottleneck for the team.

[1:19:26] So this person is so overwhelmed; you might have gone through this yourself. They’re so overwhelmed, and they’re like, “Wait, my team doesn’t even seem busy right now. And I’m so busy. I’m reviewing all this code. I’m doing the strategy, what’s going on?” Then a manager will come along and say, “Hey, you’re micromanaging. You got to let your team have more ownership.” So the tech lead says, “Okay, sure, whatever.”

[1:19:49] I’ll let them own stuff. And they just take their hands off the wheel. Then the team falls apart because you can’t just stop doing the things you’re doing. You have to go and have a conversation with people. I see leadership as this kind of slider between oversight and accountability. When someone’s new to the team, when they’re very junior, they’re in a mode of oversight, like you’re checking their work. But at a certain point, you have to dial down the oversight and, very importantly, dial up the accountability. Instead of saying, I’m not going to look at what you’re doing anymore, you say, okay, cool, you’ve got this project. Let me know when it’s going to get done. Next Thursday.

[1:20:17] Right. But at a certain point, you have to dial down the oversight and, very importantly, dial up the accountability. So instead of saying, “I’m not going to look at what you’re doing anymore,” you say, “Okay, cool, you’ve got this project. Let me know when it’s going to get done. Next Thursday.”

Ryan:

[1:20:34] Cool.

James:

[1:20:35] All right. What’s the plan? This is going to happen. How are you going to know it’s correct? Great, great, great. It’s on you. I expect you to do that. Let me know if there are any issues. The ownership relationship is explicitly on them. What you want to do is basically encourage ownership within teams. You can’t go from owning something to not owning something and expect that to develop.

[1:21:00] You have to go have those conversations. You have to go and give people accountability. Now, what I have found, even though that can feel like an awkward conversation, most people genuinely like accountability. Most people like to own their work. Most people like to say, “Hey, this is on you. We’re all going down with the ship, right? I’m not going to leave you high and dry.”

[1:21:23] I’m the tech lead. I still take responsibility for this project. That’s how you can develop people within your team. A tech lead shouldn’t be about you as the boss and the team as the people who do the work. It’s about this kind of flow where you’re the more experienced person, typically maybe the more organized or more strategic person, and you’re working on developing your team members so they can take your job and then you can do something else.

Ryan:

[1:21:55] That slider you mentioned, what if you give someone accountability and they blow it? Do they go back down the slider to where you start micromanaging?

James:

[1:22:06] You have to recognize it. Right. So this is growth. I mean, this is growth. Right. It doesn’t always work. I had another article about paper cuts or something. So we’d call these paper cuts. Some decisions don’t matter that much. You get them wrong, and maybe it sets you back a week. You get them wrong, and maybe something’s a bit suboptimal. These are great candidates to give people accountability for because they do it.

[1:22:33] And if it doesn’t work, they get to experience it, and it will feel bad. I guess that’s good. It’s good to feel bad. You shouldn’t be demoralized, but it’s good to try something. It doesn’t work, and you’re like, oh, wow, that didn’t work. Then you feed that back into your little LLM in your brain, and you get better next time. Right? That’s a growth experience.

[1:22:56] I think what you don’t want to let people do is lose their arm. Paper cuts are fine, but I would not let someone, a junior person especially, design the replication system at Convex. You have to have safeguards. One of the arts as a CTO, a leader, or tech lead is figuring out the right level of altitude for how to know whether you can trust someone to take on ownership for something.

1:23:23 — How to mentor Senior Staff+ engineers

Ryan:

[1:23:23] You mentioned that, because you were the most senior engineer at Dropbox, at some point you had to mentor other very senior engineers. When I think about mentoring a junior engineer, it’s relatively straightforward patterns. But when I think about, let’s say, needing to mentor a senior staff engineer, maybe even a principal engineer, how do you mentor someone like that who’s already so polished and knows how to take ownership?

[1:23:51] Yeah. How do you mentor someone so senior?

James:

[1:23:53] Yeah. I mean, there’s in two ways. One is there is still just a whole bunch of commonality. Everyone goes through the same problems. They don’t know how to deliver harsh feedback. There is a bunch of just standard stuff people are working on. But I do think that at a certain point everyone needs to become the best version of themselves. Sounds so cheesy, right? But there is not an archetype for what. According to me, there’s not an archetype for a senior principal engineer. You’ve got to be your own brand of engineer, right? So you might be. For me, I’m like the strategic kind of collaboration, simplicity, abstraction engineer. And my coding just fell off a cliff. You know, some engineers are like the deep science fiction, you know, the super hardcore science, you know, problem-solving engineer.

[1:24:18] According to me, there’s not an archetype for a senior principal engineer. You’ve got to be your own brand of engineer, right? For me, I’m like the strategic kind of collaboration, simplicity, abstraction engineer. My coding just fell off a cliff. Some engineers are like the deep science fiction, the super hardcore science, problem-solving engineer.

[1:24:42] So everyone’s going to find their own brand. Oftentimes, what I’ll be working on is a combination of two things. One, finding their strengths and helping them be more spiky—helping them really excel in the area that makes them special. At the same time, I’m almost always working on the personal side of being an engineer, like the organizational and personal understanding, people understanding the why. I don’t know, we’re so mathy as an industry. We think that somehow, even silly things, like for example, I have to tell so many senior engineers that you can’t change someone’s mind in a meeting.

You just can’t do it. You can disagree with someone, and they’re going to be mad or whatever, right? But to change someone’s mind involves going through a complex series of neurological processes that don’t happen live in front of ten people in a meeting. If you force someone to agree with you about them being wrong, they’re just going to go along with it, and their ego is going to get bruised and whatever.

[1:25:23] You just can’t do it. You can disagree with someone, and they’re going to be mad or whatever, right? But to change someone’s mind involves going through a complex series of neurological processes that don’t happen live in front of ten people in a meeting. If you force someone to agree with you about them being wrong, they’re just going to go along with it, and their ego is going to get bruised.

[1:25:52] So, one thing, you know, firstly, meetings generally aren’t for decision making. Most of the time when you identify a disagreement, I would point out the disagreement and why I don’t agree or where the problems are, provide enough information, and then let it sit. Let them go and reflect on it and come back and have a conversation a week later after they’ve thought about the why.

[1:26:16] Right. And so this is a lot of very psychological, but it makes no sense to be like, well, they should be able to change their mind. I mean, well, they won’t, right? That team should do this. Well, they didn’t, right? You know, people should like my API? Well, they don’t, right? I think it’s like that whole, it’s almost like a capitalist attitude towards interpersonal behaviors.

[1:26:41] Right. It’s like capitalism rewards success or impact. It doesn’t matter how well-intentioned you were. If you didn’t manage to convince, you could be the smartest person in the world. But if you can’t change someone’s mind, then you are kind of useless. Right? And so I think a lot of seniors still struggle with this. It doesn’t matter how smart you are, it doesn’t matter how right you are; are you effective?

[1:27:07] And a lot of being effective at that level is about uncertainty, project management, and simplicity, but also kind of interpersonal dynamics because most hard problems happen with a team. Not always, but generally when you’re at that level, you’re going to have to have 10 or 20 people working with you to get something done. And that’s a whole different ball game.

1:27:30 — Career advice for the AI era

Ryan:

[1:27:30] On the topic of career advice, the industry’s changed a lot in the last five years because of all these agentic tools. I wanted to know if you thought there is any career advice that has majorly changed in the last five years. Is there something that you used to say five years ago that you don’t say anymore, or vice versa?

James:

[1:27:52] No, I don’t know if I’ve changed my perspective much, but I think the industry has changed this perspective dramatically. I mean, let’s just be honest about it. There is incredible demand in Silicon Valley for senior engineers, and it is getting harder for junior engineers to succeed and grow for a variety of reasons. Why is there demand for senior engineers? Well, because large language models can’t do everything.

[1:28:17] Well, the architecture and simplicity in design are still the domain of human beings, despite what you might hear on Twitter. Every company, including the labs, is desperately hiring senior engineers. But junior tasks are getting a little bit commoditized, and that worries me because I do think that learning—wisdom is kind of facts put into practice and then synthesized. People would argue that it’s easy to learn now because of ChatGPT; you can just go ask it a question about how two-phase commit works or what’s the difference between snapshot isolation and serializability, and it will give you probably a pretty good answer. But I think growth as an engineer does require wisdom, and wisdom only really happens when you synthesize it, in my opinion. I’m still very bullish on young people. We’re hiring junior engineers at Convex, and I’m very excited about it. I love working with junior engineers who are really hungry to grow.

[1:28:56] Or what’s the difference between snapshot isolation and serializability? It will give you probably a pretty good answer. But I think growth as an engineer does require wisdom. Wisdom only really happens when you synthesize it, in my opinion. I think I’m still very bullish on young people. We’re hiring junior engineers at Convex. I’m very excited about it. I love working with junior engineers who are really hungry to grow.

[1:29:28] But what I would say is train your mind. Do not listen to anyone who tells you that there is an advantage to having less knowledge. I’m not sure if you’ve seen people say these ludicrous things, like, oh, maybe in the future, not knowing engineering will be an advantage because you won’t have biases and you’ll just use Claude. I think these are ludicrous statements. Software engineering is an intellectual discipline that helps you think. The best software engineering is not about knowing syntax and it’s not about knowing an algorithm; it’s about being really good at conceptualizing problems, breaking them down into building blocks, and coming up with clean solutions. That requires experience. It’s like doing weights with your mind. Just like if you went to the gym and it never hurt.

[1:29:56] The best software engineering is not about knowing syntax, and it’s not about knowing an algorithm. It’s about being really good at conceptualizing problems, breaking them down into building blocks, and coming up with clean solutions. That requires experience. It’s like doing weights with your mind. If you went to the gym and just picked up really light weights, you’re not growing. Also, if you went to the gym, picked up a heavy weight, and let the robot pick it up for you, you’re also not really growing. You have to do the reps. It’s tough, but find a way to stress your brain every day.

[1:30:17] If you went to the gym and just picked up really light weights, you’re not growing. Also, if you went to the gym and picked up a heavy weight and thought, “Okay, cool, I think I can do it,” but let the robot pick up the weight for you, you’re also not really growing, right? You have to do the reps. It’s tough, but find a way to stress your brain every day. Find a way to.

[1:30:47] To avoid. Now, obviously, agentic coding is here, right? Obviously, that’s. I could never tell someone to never use a coding agent, because that would be silly. But I can say that it is easy to fall into a passivity trap, where you’re just being passive about your learning. I do think you need to spend some time in the intellectual wilderness of not being able to solve a problem and struggling.

[1:31:16] I would say if you’re running into a new problem, try to think of a solution yourself and then go check with a large language model. It’s going to be hard for you. It’s almost like, for example, I’m not very good at reading anymore. I got to be honest, I find it hard to sit down and read a book because my brain has been fried by the stimulation economy, right? I find that I go home and I’m tired, and I watch a YouTube video. I don’t tend to go home and read a novel. I should be better at that, but it takes discipline to do that. I would say similarly, it’s getting harder to solve a difficult problem without reaching for help. It’s getting harder and harder every day to be faced with a very difficult intellectual problem and not be like, well, I’ll just do a Google search or I’ll just ask Claude.

[1:31:45] I find it hard. I go home and I’m tired, and I watch a YouTube video. I don’t tend to go home and read a novel. I should be better at that, but it takes discipline to do that. I would say similarly, it’s getting harder to solve a difficult problem without reaching for help. It’s getting harder every day to be faced with a very difficult intellectual problem and not think, “Well, I’ll just do a Google search or I’ll just ask Claude.”

[1:32:10] You’re whatever, I’m still doing the work. I don’t know. I would really encourage people to practice using your brain every day.

Ryan:

[1:32:19] If I was just playing devil’s advocate or thinking from the junior engineer perspective, I might think, well, the proof that Claude can do this work today means that I don’t need to know it today or tomorrow or in the future. So why even build that skill in the first place? Also, these agentic tools have positive trajectories too.

James:

[1:32:43] Yes. So firstly, the agents are not particularly good at a lot of parts of engineering currently. Probably everyone should agree Claude is not good at designing distributed systems protocols right now, for example, or managing a 3 million line code base. There are parts of engineering where engineers are valuable. And like I said, you know this because the labs are all hiring engineers.

[1:33:10] No matter what they say, they’re still hiring engineers, right? Desperately hiring engineers. Really, really aggressively hiring engineers. So engineers still have value. And maybe they won’t in the future, but I doubt it. I really do think there’s a role for human ingenuity in engineering. And so here’s the trade-off I would pose to people. I mean, there’s kind of three paths. One, get out, get out of engineering, go mow lawns and do whatever if you want.

[1:33:41] That’s the most nihilistic attitude. I don’t believe in that. I really love engineering. I believe that there’s promising roles for human beings in engineering. So the second is to realize that there’s value right now for humans in engineering. And maybe one day AGI will be here. Let’s say in a year’s time AGI will be here. And there’s two people. One person just gave up on problem solving right now and they’re just feeding the machine. They’re running 17 coding agents in parallel, accepting everything Claude says, and they’ve given up.

[1:34:15] Right. And one person said, you know what, I still want to have an active role in my learning. I want to understand what it’s doing. I want to think hard about problem solving, right? Played out one year, AGI arrives. Who is going to be better placed for the future, right? The person who’s been training their mind, right? You don’t—engineering is not a means to an end. It’s a mechanism for improving your mental processes.

[1:34:41] It’s like, I still do whiteboard coding interviews, and by the way, Anthropic still does whiteboard coding interviews, just in case you were wondering. They don’t say, just use Claude. So I still do whiteboard coding interviews with candidates. Candidates are getting worse at coding. That’s absolutely true. But I don’t know any better vehicle for evaluating someone’s intellectual capacity for problem solving than seeing them solve an engineering problem.

[1:35:08] It’s a great mechanism for doing so. If you were doing a math degree, a big part of doing advanced mathematics is proving theorems. By the way, the theorems are already all proven. Part of the exams is proving a theorem that has already been proven. You might say, well, what’s the point of proving that theorem? It’s already been done. The point is that the act of proving that theorem improves your mind, and then you can go and do more innovative stuff. I would say, sure, you may not. I still think that there is a very promising role for human beings in engineering. Maybe not in coding, but coding and engineering are very different things. Even if you don’t believe me, you can either give up now, but.

[1:35:31] It’s already been done. Well, the point is that the act of proving that theorem improves your mind, right? And then you can go and do more innovative stuff. So I would say, sure, you may not. I still do think that there is a very, very promising role for human beings in engineering. Maybe not in coding, but coding and engineering are very different things, right? But even if you don’t believe me, you can either give up now, but.

[1:35:56] Or you could just keep trying to improve your brain, and you’ll be better off anyway. Imagine if you’re wrong. Imagine if you take the defeatist view. Imagine if you say, “Well, AGI is coming next week, why bother doing anything?” And imagine you’re wrong. Oh my God, you just got off the ride. You get off the ride, and there’s so much cool stuff. I mean, this is a cool time for engineering, right? This is a really cool time. There’s so much cool stuff happening. And I hate this talk about how AI means humans don’t have a role anymore. AI means no one’s going to have a job. AI means we’re not going to do anything new. What I like is, hey, look at this. All this new cool stuff we can build. Look at the ways we can make people’s lives better.

[1:36:19] This is a really cool time. There’s so much cool stuff happening. I hate this talk about how AI means humans don’t have a role anymore, that AI means no one’s going to have a job, or that AI means we’re going to do nothing else new. What I like is, hey, look at this—all this new cool stuff we can build. Look at the ways we can make people’s lives better.

[1:36:42] And to be honest, I really wish my peers and my cohort would stop it with the real doomer, human elimination kind of narrative. I think there is a little bit of a psychological anchoring. As part of Convex, we didn’t start Convex to eliminate jobs; we started Convex to make it easy for people to build cool stuff.

[1:37:10] I think the more we as an industry get behind, let’s do everything we can to make it possible for people to do more cool things. I think it’s a really exciting future we have ahead of us.

1:37:21 — Why he started his own company

Ryan:

[1:37:21] I wanted to talk about what you’re working on now at Convex and why you quit Dropbox to build Convex.

James:

[1:37:27] Dropbox is a great place. At a certain point, I was there eight years. I don’t know if I outgrew the company, but I’d been the most senior engineer for a while. At a certain point, I wanted to grow and do my own stuff, so I left the company very amicably and started Convex. Why Convex? Convex started pre-agentic era because my observation was that the real differentiator in the success of a project, especially a large project, is the quality of the abstractions, the quality of the design, the quality of the architecture. Systems that thrive over time and are extensible are systems that are architected well. In my mind, the most difficult challenge in engineering is distributed state management. How do we store state reliably and reason about it, modifying it concurrently with other users? We designed a platform for application building based on our experiences building large-scale systems.

[1:38:08] Systems that thrive over time and are extensible are systems that are architected well. In my mind, the most difficult challenge in engineering is distributed state management. How do we store state reliably and reason about it while modifying it concurrently with other users? We designed a platform for application building based on our experiences building large-scale systems.

[1:38:34] So Convex is a transactional database where the transactions are written in TypeScript; they run as stored procedures. TypeScript stored procedures are serializable. There’s automatic reactivity, so what the client sees is a consistent view of what’s on the server. I would say Convex is a very designed platform because Convex is designed to be very composable and fit together well. We designed Convex for developers to use.

[1:38:59] In particular, we wanted to make it so that application developers were able to build complex full-stack applications. That was the goal of Convex. Now all of a sudden, a coding agent development came along. That’s been really interesting for us because it turns out that what humans find hard is also what agents find hard. The coding agents are not particularly good with large code bases.

[1:39:24] They’re not particularly good at reasoning about action at a distance, race conditions across services. They’re not particularly good at simple architectures over time. And these are the things that Convex gives you as a developer. So the idea now, and almost everyone using Convex is using Convex because they have the coding agent doing their front end. But they need a backend abstraction, a higher-level abstraction than something like Amazon Web Services or something like hosted PostgreSQL, which makes their problems go away.

Ryan:

[1:39:56] So this is a layer of abstraction on top of those types of primitives that makes it easier for an application developer.

James:

[1:40:11] It ties into a lot of the stuff I said earlier about making problems go away. Frankly, I watched the interview you did with Barbara Liskov. Barbara was my advisor in grad school, and we worked a lot together on abstraction and the value in clean designs that minimize complexity. AWS is a fine tool. PostgreSQL is a fine tool, although none of the mainstream databases are that great, frankly.

[1:40:36] But they’re fine tools, but they don’t make problems go away. Right. And so the idea of Convex is a higher level set of abstractions that you can use and not reason about state management, not reason about concurrency, not reason about scheduling, not reason about transactions, not reason about polling and data sync and type safety and all those things. So Convex is a, if you think about, you know, the history of engineering over time, the abstraction floor raises.

[1:41:05] You know, when Barbara Liskov was first starting, she was using punch cards. I don’t know if she mentioned it to you, but when she started as a programmer, she never heard the word “programmer” before. That was the first time she heard the word. And then you went from punch cards to having proper operating systems, and then languages like C, and then higher-level languages, and then you had cloud computing.

[1:41:31] And over time, the abstraction floor raises, and you largely forget about what’s going on beneath the surfaces. Most people don’t think about how Amazon Simple Storage Service is implemented. I do, but that’s what I used to work on. Most people just use it, and it stores your data and gives it back. That’s great; that’s a successful abstraction. But I do strongly believe that the world is and has been overdue for a new abstraction one level up the stack.

[1:41:55] And especially now that people are doing agent development, they don’t want to own a PostgreSQL instance. They don’t want to think about Kafka versus RabbitMQ. They don’t want to think about what set of tools to use. They want it just to work so they can focus on building their application.

Ryan:

[1:42:11] When you talk about the abstraction, there’s obviously a lot of stuff going on behind the scenes in Convex and the technical side. What is it that Convex is building behind the scenes that you’re most excited about and why?

James:

[1:42:27] Basically, Convex is a new operating system in some respects. We have the primitives: queries, mutations, actions, subscriptions. What I think is cool is how we built this. We have our own distributed database that tracks read ranges and write ranges and does very efficient subscriptions over WebSockets, et cetera. So that’s the current operating system set of primitives.

[1:42:56] But Convex is getting much larger workloads now and much more interesting workloads and more high-performance workloads. So we’re in the process of developing a slightly lower-level API for doing very efficient background processes, singletons, and APIs like fork, like operating system primitives. I’m pretty excited about launching these and how much faster it’s going to make various Convex components like the workflow system.

[1:43:28] And to be honest, the thing I find exciting every day, challenging every day. I still find Convex very hard. I struggle every day. I don’t find my job easy. I feel confident at my job, but it’s not easy. Designing the new API for this is super hard. I can’t just go ask Claude; it’s not going to give a good answer, right? Because it’s innovation, it’s new ideas. I really enjoy it.

[1:43:56] I find it stressful sometimes. I find it challenging and tiring, but I also find it exciting. I would encourage engineers to try to find this kind of stuff to work on, where it’s like you’re on that edge of, “I’m really liking this, but also it’s a bit tricky.”

Ryan:

[1:44:16] You mentioned fork, and in operating systems, I’m familiar with it. You take the existing process and kind of split it. What’s the idea of fork in a distributed system?

James:

[1:44:27] So Convex almost never has scale issues with regards to live traffic. Live traffic is typically bound by user-facing interactions, people clicking on stuff, running a website, acting on a website. Every now and then, someone will come to Convex and want to kick off a million background jobs to do some background processing. It’s a big workload. You can programmatically trigger huge workloads, right?

And so one of the things we have to scale is these background workloads, and a lot of them involve things like scheduling. There are a lot of workloads in Convex that would be very efficient if you had a background singleton process to perform things like aggregates. I’ll give a very silly example, right? If you’re building an election on Convex, a voting system, and every vote is a new row in the...

[1:44:50] And so one of the things we have to scale is kind of these background workloads, and a lot of them involve things like scheduling. There are a lot of workloads in Convex that would be very efficient if you had a background singleton process to perform things like aggregates. I’ll give a very silly example. Let’s say you’re building an election on Convex, a voting system, and every vote is a new row in the table. You want to show a tally of the votes. One way of doing this is having a bunch of background processes or cron jobs adding these things up. One way is doing a table scan, which is the obvious way to use PostgreSQL, which doesn’t scale. The other is to have a background job, which, if there are new votes, adds them all up and keeps a tally. If there are no new votes, it goes to sleep and waits on a condition variable to wake up again when there is a new job to perform.

[1:45:19] In the table, you want to show a tally of the votes. One way of doing this is having a bunch of background processes or crons adding these things up. One way is doing a table scan, which is the obvious way to use PostgreSQL, which doesn’t scale. The other is to have a background job that, if there are new votes, adds them all up and keeps a tally. If there are no new votes, it goes to sleep and waits on a condition variable to wake up again when there is a new job to perform.

[1:45:45] And so these are the kind of primitives that we’re working on right now. Most people won’t even know they exist, but they allow us to build this very high-performance primitive for scheduling, aggregates, background aggregations, et cetera. I’m pretty excited about the next generation of workloads we can support.

1:46:05 — The most technically challenging work of his career

Ryan:

[1:46:05] As a result, when you reflect on your career, it sounds like you’ve done a lot of gnarly technical work across your PhD. Dropbox seemed like pretty intense systems work, and Convex is also doing a lot of cool stuff. When you look back on your career, what was the most technically stimulating work you’ve ever done? Why was it hard, and what did you learn from it?

James:

[1:46:29] There were certainly times in grad school where we were formally modeling consensus protocols and stuff. I’d be on the phone with Barbara Liskov on weekends, talking through and trying to reason about this in our heads. That was pretty intellectually stimulating and fun. But I think the stuff I found most stimulating was working on a very large storage system with a team where things were going wrong.

[1:46:53] Where the rubber hits the road, that’s where I find, and this is every day at Convex. The rubber hits the road. We have a compaction process that runs in the background, but it’s running into issues. We might have to redesign it using partitioning, etc. I feel most intellectually stimulated where there’s a really clear constraint in front of me.

[1:47:16] And that to me is engineering. If I don’t actually know what the definition of engineering is, I’m just going to make it up in my mind. Engineering is science with constraints. It’s like, how do you solve problems in the presence of resource constraints? I’m not particularly interested in constraint-free environments. That’s art. I like craft and engineering. The more visceral and difficult the constraints, the more fun that is for me.

[1:47:47] And I’ve been lucky enough to, whether it’s luck or intention, I don’t know, but I’ve always placed myself in those environments. You know, let’s go get on the hardest team and own the hardest problem and then put the effort in to survive.

1:48:10 — How he got involved in Silicon Valley

Ryan:

[1:48:10] This question might be a little bit off topic, but I know you were a consultant for the TV show Silicon Valley. I love that show, and I gotta hear, how’d you get involved with that?

James:

[1:48:21] Yeah, that was a lot of fun. A lot of folks might not know this: I had nothing to do with season one. Many TV shows don’t know whether they’re going to survive as a series. Mike Judge, who wrote Silicon Valley, is also known for Beavis and Butt-Head and Office Space. He started his career as a software engineer at, I think, Lockheed or something. So he actually was a software engineer, which a lot of people don’t realize.

[1:48:48] And so Silicon Valley was like a throwback to the kind of work he did. If anyone’s seen the movie Office Space, you would get this. That’s really a dystopian cubicle era tech industry film. They did season one of Silicon Valley, and then it was very popular, so they got picked up. They had to figure out what to do for season two, but they didn’t know what to do because they had written a storyline that gets to the point where there’s a compression algorithm, and then what happens?

[1:49:18] And so they needed to find an expert on compression. I guess ostensibly that was me. I don’t know whether I was an expert on compression; I guess I was an expert on storage at least. They came to the office, and we just chatted. It was so much fun. I was pretty heavily involved in the show. A lot of it was storyline. So first, like, yeah, sure, what would you do with the compression algorithm?

What would you design for a storage system? Coming up with story ideas that were technically accurate, you’d be surprised to know how much they care about accuracy. A lot of people I know can’t watch that show because it’s just so creepily accurate. They find it so cringy. Partly why Silicon Valley can be so cringy is because it’s real. Those stories are almost.

[1:49:47] What would you do? Could you design a storage system? Coming up with story ideas that are technically accurate, you’d be surprised to know how much they care about accuracy. A lot of people I know can’t watch that show because it’s just so creepily accurate. They find it so cringy. Partly why Silicon Valley can be so cringy is because it’s real. Those stories are almost...

[1:50:14] Almost. Maybe not everyone. So many of the stories in Silicon Valley are just real stories. They went to a bunch of companies, just farmed everyone for stories of crazy things that happened in the tech industry, and they wove them into the series. All the characters are based on real people and real archetypes. But they also cared very much about technical accuracy. So I would also do technical consulting.

[1:50:37] And then they’d say stuff like, “Oh, we’re building a data center in our house. What should the rack look like? And what should the diagram on the wall look like?” Part of me wanted to say, “Oh, well, it doesn’t really matter. No one’s going to care.” But they were like, “No, no, no, it matters.” They really cared to get it right. So, yeah, I love the show, but it can be hard to watch just because of, oh my God, how real it can feel. Yeah.

[1:51:01] How real it can feel. Yeah.

Ryan:

[1:51:05] Are there any Easter eggs where you look at that and go, that’s unusually accurate? Or, you know, that system diagram actually is very spot on.

James:

[1:51:15] I can’t. I don’t think I can even say them because there were stories of early Dropbox. There were stories of having ideas ripped off by other companies and being tricked into having meetings with folks, only to find it may be the competitor’s team there to steal the information. A lot of those stories are real. And so there are people who watch Silicon Valley and think, oh, wow, that was something I went through.

[1:51:43] And probably it was because it was about that situation.

Ryan:

[1:51:48] That’s such a cool experience. Did you get paid for that, or was it just for fun?

James:

[1:51:52] Yeah, that’s a complicated question. I got paid because I had to get paid; it was like a Hollywood union thing. I didn’t want to get paid because it made my visa more complicated. So I’ve got a green card now; I’m all good. But there was something where they had to pay me $400 anyway. I made a grand sum of $400 off that show.

1:52:16 — Career regrets

Ryan:

[1:52:16] Looking back on your career, is there any regret that comes to mind that maybe other people can learn from?

James:

[1:52:23] Yeah, I mean, I think I underinvested in my personal life, to be honest. People don’t probably say that much because I think you can be all about growth, and sure, I could have grown more. I could have dropped out of grad school, say, three years in or four years in. I probably would have learned just as much. I could have taken a job at Dropbox two or three years earlier and made a lot more money.

[1:52:48] Everyone who has been in the industry long enough has been offered the chance to co-found several billion-dollar companies. Everyone has a story about the times they could have been a billionaire several times over. I don’t really regret those. Yes, it’s been a lot of sacrifice, to be blunt. I’ve been on call my whole career. I’ve carried a laptop almost every day. There are many dinners, parties, and events I’ve had to skip, and there are people in my personal life who have suffered as a result.

I really appreciate these people; they love and care about me, and they know that I have a passion for this, so they accept me for who I am. I think this is old person talk, but everyone has to decide how much they want to really drive their career because there’s a trade-off. Absolutely. I made a tremendous amount of sacrifices in my career, and I really prioritized building as probably the number one.

[1:53:21] And I really appreciate these people. They love and care about me, and they know that I have a passion for this, so they accept me for who I am. I think this is old person talk, but yeah, I think everyone has to decide how much they want to really drive their career because there’s a trade-off. Absolutely. I made a tremendous amount of sacrifices in my career, and I really prioritized building as probably the number one.

I mean values first and then building. But if I went back in time, I would have had a good life, but I probably would have done more vacations and just had a bit more of a balanced life. I really do think. I see that with all the 996 stuff and this kind of performative photos of being in a bar with a laptop, and I’m like that’s not real.

[1:53:52] I mean values first and then building. But if I went back in time, I would have had a good life, but I probably would have done more vacations and just had a bit more of a balanced life. I really do think—I mean I see that with all the 996 stuff and this kind of performative photos of being in a bar with a laptop and stuff, and I’m like that’s not real.

[1:54:16] That’s not real. I mean, sure, I was working more than I do now back then, but I probably still work more than I did then. But I don’t do it as a checkbox; I do it because I really want to be doing stuff. I would caution people against that hustle culture. That’s not, firstly, like, you’re only young once; you should have fun. You know, I’ve got quite a few grays in here.

[1:54:46] But that’s acting. I mean, focus on solving problems, work hard, be passionate. Yeah.

Ryan:

[1:54:55] When I studied your career, there are mentions of you working 16 hours a week in various places. But you like being on call, you like fires, you like ownership; all those things are a recipe for working obscene hours.

James:

[1:55:13] Yeah. I go home and I’m tired, and I wind down by building stuff. I’m lucky enough to have a little workshop at home, so I go home and make things with my hands. You become an infra person; you become an engineer in all aspects of your life. But yeah, I would just say the cool thing is doing cool stuff that humans use.

[1:55:41] The cool thing is not working long hours, the cool thing is not showing off that you were running a coding agent all night. Who cares? The cool stuff is enjoying doing important things.

1:55:54 — Top technical book recommendation

Ryan:

[1:55:54] Do you have a best technical book recommendation for people?

James:

[1:56:00] I have to be honest, I haven’t read almost any technical book. I was in academia for a long time, so I read a lot of papers. Learning is awesome, reading is great, but balance it, right? Read something and then go put it into practice. Most of my career has been about doing. Sure, I did a PhD, so I guess that is like the academic side. But after that, most of my learning has been by doing because there’s nothing like being faced with a real problem to really develop as an engineer.

1:56:36 — Younger self & permanent underclass advice

Ryan:

[1:56:36] Yeah, I think if I answered the question, I’d probably say the same. I think learning by doing is really where it matters most. And then the last question for you is, if you could go back to the beginning of your career and give yourself some advice, what would you say?

James:

[1:56:51] I’d say, it’ll be okay. Don’t sweat the small stuff as much. It’s hard because my whole engineering brand is about caring about details, and I love design, so being obsessive is a little bit part of my DNA. But I think I would go back and say careers are long. I mean, there’s just any story anyone reads about a 22-year-old billionaire, blah, blah, blah. Just ignore that story. That’s not real. That’s not repeatable. That’s not normal. And it’s not that healthy. And it’s not that good for the people either. You probably won’t, ideally, max out your growth for 20 plus years as an engineer. I’m still learning all the time. I’ve been an engineer for several decades. So my advice would be don’t sweat it. There’s time to grow. And I do think that, again, this is a little bit of a modern phenomenon, but there is a feeling right now, oh my God, AGI has come and better max out my growth in the next three months.

[1:57:26] That’s not real. That’s not repeatable. That’s not normal. And it’s not that healthy. And it’s not that good for the people either. You probably won’t, ideally, max out your growth for 20 plus years as an engineer. I’m still learning all the time. I’ve been an engineer for several decades. So my advice would be don’t sweat it. There’s time to grow. And I do think that, again, this is a little bit of a modern phenomenon, but there is a feeling right now, oh my God, AGI has come and better max out my growth in the next three months.

Well, guess what? You ain’t going to do it. It’s not going to happen. You can’t max your growth out in the next three months. It won’t happen. You don’t have to be running 17 agents at the same time. All you gotta do is orient your career around learning every day, getting better at what you’re doing, and trying to solve things in the most simple ways.

[1:58:04] Well, guess what? You ain’t going to do it. It’s not going to happen. You can’t do it. You can’t max your growth out in the next three months. It won’t happen. You don’t have to be running 17 agents at the same time. All you gotta do is orient your career around learning every day, getting better at what you’re doing, and trying to solve things in the most simple ways.

Ryan:

[1:58:26] I mean, on Twitter, I see this take on the idea of a permanent underclass, where if you don’t make it in time for AGI, then you’re going to be part of this permanent underclass.

James:

[1:58:39] Yeah. I mean, look, it’s a bit challenging economic times for a lot of folks. I don’t want to be unsympathetic to people who are having financial difficulties. At the same time, it’s just not an instructive attitude. There’s not much you can do with that information other than feel bad about yourself, and I don’t. Someone will argue back.

[1:59:00] No, what you can do is get really good at using Claude. Well, guess what? It’s not very hard to use Claude. Sometimes people tell me, “Oh my God, I’m finding it hard to keep up with all the new models.” The new model drops, and I don’t even know what the new models are. I use them, but I forget the latest model because it doesn’t matter. Right? Like there was this thing called Ralph.

[1:59:23] Right. I guess it’s still RALPH. It’s like a loop thing. I don’t really know what RALPH is, and I haven’t heard anyone mention RALPH in the past few weeks, but it was the biggest thing on Twitter for like a month. And it just doesn’t. This is noise somehow. Sometimes I feel like it’s like tech tabloids. It’s like people think that they’re learning by somehow knowing, like listening to what Jensen said today or like, oh my God, Boris said that Claude writes itself.

[1:59:53] I don’t think people—look, that’s like—it’s just like reading about Beyoncé, but you’re a nerd, and so you’re reading about Jensen, right? But it doesn’t matter, you know? Growing—like, you don’t have to know. It’s okay. The new coding agent could come out, and you could miss it. And then next year, if it turns out it’s the big one, you’ll just use it. It’s not hard. I haven’t seen any skill so far.

I guess there’s some skill, but it’s not a hard skill. If you’re good at engineering, you can figure out how to use Claude code, right? Or open code. So just watch out for tech tabloidism. It doesn’t matter. Just be building stuff. Just do real work.

[2:00:19] I guess there’s some skill, but it’s not a hard skill. If you’re good at engineering, you can figure out how to use Claude code or open code. Just watch out for tech tabloidism. It doesn’t matter. Just be building stuff. Just do real work.

Ryan:

[2:00:37] I love that mindset. Yeah, well, thank you for your time.

James:

[2:00:42] I’m on Twitter, too. I’m part of the thing, but just ignore me.

Ryan:

[2:00:51] Oh, God. All right, well, thank you so much for your time, James. I really appreciate it. This was a lot of fun.

James:

[2:00:56] Ryan, it was great. Thank you.

The Peterman Post

Dropbox’s Former Most Senior Eng: Building Great Systems and Advice for the AI Era | James Cowling

Timestamps

Transcript

0:53 — Systems work during his PhD

13:05 — Dropbox technical deep dive

21:57 — Why Dropbox migrated from AWS

36:40 — How to do massive migrations

44:31 — Simplicity vs complexity in promos

49:23 — What technical teams should be focused on

1:00:15 — Doing the right thing vs promo hypothetical

1:08:13 — Why he dipped into management sometimes

1:11:36 — Why you shouldn’t lead by example

1:23:23 — How to mentor Senior Staff+ engineers

1:27:30 — Career advice for the AI era

1:37:21 — Why he started his own company

1:46:05 — The most technically challenging work of his career

1:48:10 — How he got involved in Silicon Valley

1:52:16 — Career regrets

1:55:54 — Top technical book recommendation

1:56:36 — Younger self & permanent underclass advice

Discussion about this episode

Ready for more?