The Developing Dev
The Peterman Pod
Turing Award Winner: Data Abstraction, Dijkstra, Distributed Systems | Barbara Liskov
0:00
-34:43

Turing Award Winner: Data Abstraction, Dijkstra, Distributed Systems | Barbara Liskov

Barbara Liskov is a Turing Award winner known for her work in programming languages and distributed systems. We discussed the major problems she solved in her career, stories about Dijkstra, getting rejected from Princeton because she was a woman and misc topics around her work.

Check out the episode wherever you get your podcasts: YouTube, Spotify, Apple Podcasts.

Timestamps

1:00 - Getting rejected from Princeton

2:53 - The software crisis

9:03 - The drawbacks of Python

10:17 - Getting into distributed computing

13:09 - Paxos vs Viewstamped replication

21:44 - The significance of Dijkstras letter

25:04 - Why she stayed in academia

30:39 - Why her award was questioned

Transcript

1:00 — Getting rejected from Princeton

Barbara:

[1:00] Okay, so this was when I got my bachelor’s degree and I applied to several graduate programs in math, which is what I majored in as an undergraduate. And I got this little card back. It was a postcard from Princeton saying, we do not admit women. Which was surprising. I knew they didn’t have women in their undergraduate program, but I hadn’t realized it extended to the graduate program. But it was how it was.

[1:33] So what happened was I did get into Berkeley, which is where I did my undergraduate work. But I decided I really wasn’t ready to do a PhD in math and that I should get a job instead and just sort of see, you know, see how things were. And the best job offer I got was as a programmer. So that’s how I got into computer science. By a happy accident.

Ryan:

[2:00] You were consistently, at least at Berkeley, you were consistently a top student. So it is odd that you wouldn’t even get an opportunity to play in some cases.

Barbara:

[2:11] But that’s how it was back then. It was just. I hadn’t realized it was more than an undergraduate thing. And Berkeley was co ed. So, you know, there. What I found was there weren’t very many women in my classes. There were lots of women students, but very few of them majoring in math. And there were only maybe a couple of women in my classes. But the idea that a door was shut and women couldn’t do things, that was not what went on at Berkeley.

[2:42] So it was a different environment. Of course, nowadays in the top schools, the women are about 50% in computer science.

2:53 — The software crisis

Ryan:

[2:53] I want to talk about some of the core problems that you were solving in your career. And so I understand that there’s the software crisis in the 1970s, and I was wondering if you could give the context behind what the problem was at that time.

Barbara:

[3:09] Well, it was a huge problem at the time because people did not know how to build big programs that worked. And so you would often pick up the newspaper and see an article about some company that had spent millions of dollars and hundreds of man years developing some software system for their company. And then in the end they’d have to throw it away because it simply didn’t work. The problem was that to get a big program that works, you need modularity, and you need to break your program up into small pieces.

[3:52] Each piece provides an interface with a hopefully complete description of what service it provides for you. And then inside is an implementation that’s hidden and nobody pays any attention to it on the outside. And if you have a system organized like that, you can actually reason about its correctness one module at a time. But in those days, people didn’t know how. They couldn’t figure out how to design systems that were modular.

[4:25] And the only kind of modularity mechanism present in programming languages was a procedure. And procedures didn’t match the kinds of modules you needed. Because if you think about a file system or a database or, you know, Amazon or whatever, you know, they. They aren’t a procedure where you put something in and get something out. They’re a much more complicated thing. So there was no notion of a module that sort of matched the kind of things that people were looking for.

Ryan:

[4:58] So when you saw that problem, what were the early solutions like?

Barbara:

[5:02] Well, so people were proposing modularity and they were talking about how important modularity was. They didn’t exactly know what a module was. And so it wasn’t like they could give you a rule, this is a module. It was just a chunk of code. In fact, there were papers written that talked about how big a module should be and stuff like that. Just a chunk of code that’s not going to work because you need to to design these systems.

[5:28] So you need a way of thinking of a design that sort of fits into this notion of modularity. They also didn’t really know what the rules should be for modularity. And it turned out that in some of the early work I did after grad school, when I was working at mitre, I had already invented a notion of modularity that was sort of more complete than what people had been talking about. Just because I had a small team of programmers, we were building a complicated system and I wanted to keep us out of trouble.

[6:02] And so I had that sort of sitting there when I started to work on this topic. And it enabled me to see this idea of data abstraction. All of a sudden I sort of saw that this thing, I had this kind of modularity mechanism which consisted of basically what I just described, a bunch of code providing you with an access through a number of what I called operations, so you could call it in various ways.

[6:31] And then inside was all hidden, and whatever data it was using was not accessible to the outside. And then at some point I saw I could see this as a data abstraction. It could be a set, it could be a sequence, it could, you know, and so forth. And that meant we had a new type of module. Now, when I look back at the papers from the time I see that idea, it’s almost there, except people hadn’t managed to pick it out.

[6:58] And so it was probably. I think this happens in science a lot. There’s sort of a time when an idea is ready and I happened to see it.

Ryan:

[7:07] You pieced these together and you put it into the CLU programming language that you’re working on. How did you see it influencing the industry?

Barbara:

[7:17] The first thing that happened was I wrote a paper with a man who was a graduate student at MIT, Steve Zilles, about this idea of data abstraction. And it sketched a notion of what abstract data types would be and how a programming language could support them. And this was very impactful paper. And so there was a big impact on the research community. And then CLU came along, and that involved a lot of additional research in the programming language area.

[7:52] The next thing that happened, so people were watching this in the research community, but of course, people who are in companies that want to write programs, they need a programming language. And I decided I wasn’t going to try and turn CLU into a product because that would have required working in a company in those days. You didn’t just put software out on the Internet and people used it. There wasn’t an Internet yet, for one thing.

[8:22] And I was much more interested in doing research than in working in a company. So I put CLU on the side. It had a user base. But for companies to use a programming language in those days, they wanted a company behind that language. The next thing that happened was the government put out a call for a programming language they could use. This led to the Ada programming language. So that was already a big impact.

[8:52] If you think about it, there was a language explicitly being designed to have data abstraction in it. And then finally in the 90s, Java came along.

9:03 — The drawbacks of Python

Ryan:

[9:03] I thought it’d be interesting because you worked on CLU, you designed that programming language to ask you about other programming languages. You said somewhere that there’s something wrong with Python. I was curious to hear your thoughts of why.

Barbara:

[9:15] Python has modules. But it doesn’t have encapsulation, so it allows code on the outside to muck around with what’s going on on the inside of a module. And that’s all I was talking about. Encapsulation is a crucial part of making modularity work. And when you’re building big programs so you have many programmers working on them, your team is really only as strong as your weakest programmer. So it’s nice if the compiler can enforce things and make certain kinds of bad behavior not possible.

[9:53] I mean, Python is, you know, has another intended use. It’s helping naive programmers learn quickly how to write programs and stuff like that. And people in the programming language world do think about issues like this, like, you know, how to make languages safer and so forth. But since I stopped working in that area, I’m no longer an expert in programming languages.

10:17 — Getting into distributed computing

Ryan:

[10:17] What got you into distributed computing?

Barbara:

[10:20] I read a paper by Bob Kahn, who was with Vint Cerf, considered to be, you know, the founders of the Internet. And Bob talked about his dream of distributed computing, where you would have a program composed of pieces on different computers connected by a network, and nobody knew how to build those programs. And so I just thought, great problem, and I was looking for a new problem. So I jumped into distributed computing and the first project was actually a programming language to write distributed programs in.

[10:57] And then I started looking at other problems in the distributed systems area.

Ryan:

[11:03] What was that first programming language?

Barbara:

[11:05] It was called Argus. It was very strongly influenced by CLU. It was an object oriented language, had a special kind of object called a guardian, which was a module sitting at a single computer. And then guardians could compute, could communicate through remote procedure calls. One of the things you run into in distributed computing is if you have a computation that starts at one guardian and then makes use of other guardians at other nodes in the network, in the end you want that computation to either complete entirely or have no effect at all.

[11:44] And how do you do that? Well, I borrowed the notion of transactions coming out of the database field, and that was part of how Argus worked. It ran computations as atomic transactions.

Ryan:

[11:57] Oh, interesting. Like distributed transactions across those nodes.

Barbara:

[12:02] Yeah, And I think it led right into the work I did on Viewstamped Replication, which was. That was the beginning of cloud storage. I was thinking in terms of a file system, but it doesn’t really matter what it is. You know, how do you have data out on the Internet stored at multiple sites with correct behavior and always accessible as long as enough nodes are up and running and the network is working.

Ryan:

[12:32] I see. And what is a viewstamp in this context?

Barbara:

[12:35] Oh, that had to do with some details of how the system worked. And it was a way of noticing when some nodes failed and other nodes had to take over. You know, could figure out which ones had the most recent state in them. So you could pick up and not lose anything that had happened in the past.

Ryan:

[12:54] What if the clocks are out of sync?

Barbara:

[12:56] It had nothing to do with clocks because it was just numbers. In other words, we were in view 25, the next view was 26, so.

Ryan:

[13:05] Oh, okay. Just incrementing some numbers passed around.

Barbara:

[13:08] Yeah.

13:09 — Paxos vs Viewstamped replication

Ryan:

[13:09] This reminds me a lot of Leslie Lamport’s work, actually.

Barbara:

[13:13] Yes.

Ryan:

[13:14] Did you ever work with him or.

Barbara:

[13:17] No. Leslie and I developed what is essentially the same idea independently and he had the system he called Paxos, and I had this thing called Viewstamped Replication.

Ryan:

[13:28] Oh, interesting. And they’re essentially the same thing.

Barbara:

[13:32] They are, yeah.

Ryan:

[13:34] Are there pros and cons of the two approaches?

Barbara:

[13:37] Yeah, I mean, when you do this kind of system, there’s lots of little details you can play around with, so you could, you know, decide to do it. Gets technical, but, you know, there’s tiny differences. But no, they’re basically the same. In fact, what happened was the first time that I’m aware of that this was actually used in a real system was when the Google file system came along and one of my former students, who was, I think he was a consultant at Google, looked at what was going on in there and he said, oh, he says that’s Viewstamped Replication.

[14:11] So. So the people at Google seem to think it was Paxos, but in fact they are the same system.

Ryan:

[14:17] I also noticed, like when you were, when you came up with abstract data types, maybe it’s just because there wasn’t the Internet wasn’t, as you know, wasn’t there that there was also the object oriented stuff going on on the West Coast?

Barbara:

[14:33] Yes. Alan Kay was developing SmallTalk at the same time that I was working on CLU. And then at Carnegie Mellon, Bill Wolfe and Mary Shaw were working on alphard, which was another data abstraction language. And you’re right, there was no Internet, although there was the arpanet, you know, so we did, but it was like there were two independent streams of research going on and we weren’t talking to each other and so I wasn’t paying much attention to small talk.

[15:04] And on the West Coast, I don’t think they were paying much attention to data abstraction, and this led to the Liskov substitution principle, which, I mean, the story is a sort of a cute one, because in 1986, I think it was, I was asked to give a keynote at uppsla. OOPSLA is the Object Oriented Programming Conference. And this was the second year it was happening. And so I decided, I think I’ll read all those papers about SmallTalk, and there were other languages being developed that were based on SmallTalk and see what’s going on with them.

[15:41] And I discovered. So SmallTalk has this idea of inheritance in it, where you can have a class and a subclass which borrows from the way the class is implemented and changes it a bit. And CLU doesn’t have that CLU, just has what we call clusters, and they’re all independent. So I saw that they were talking about this notion of classes and subclasses, and then I saw that they were also talking about something they called type hierarchy, where they wanted the type implemented by a class to be related in some way that they didn’t understand to the type that was implemented by a subclass.

[16:24] I really thought about modules in terms of their specifications. This partly had to do with a class I developed at MIT that I developed jointly with my colleague John Guttag. And in that class we taught the students how to do design about modularity, data types and so forth, but also how to write specifications, how to reason about correctness. And so it had a big focus on think about the meaning of things first.

[16:52] And the implementation is something that’s kind of hidden inside and you don’t worry about it very much. And so that was a different way of thinking about things than what was going on on the west coast, where they were really thinking in terms of classes and subclasses. And I even saw papers where they would describe the behavior of a class by explaining how its implementation was different from the implementation of the superclass.

[17:18] So they were kind of focused on implementations in a way that we weren’t. And so when I read these papers about type hierarchy and I saw they just couldn’t figure out what it was supposed to mean, I was thinking about it from the terms of meaning. And so I was able to say it has to do with the behavior. And this subclass better behave like the superclass if you use it in an environment where the superclass is expected.

[17:47] And that became what ended up being called the Liskov substitution principle.

Ryan:

[17:59] it get that name? Because you didn’t go off on stage and say, hey, everyone, here’s the Liskov. No, no.

Barbara:

[18:04] But no. As I said, one day in the 90s after the Internet had arrived, I got an email saying, can you say if this is the correct interpretation of the Liskov substitution principle? That’s when I discovered that there was such a name. And I don’t think I had really been aware of how important it was until that point, because I wasn’t thinking about this stuff. I was working on other stuff.

Ryan:

[18:32] In the academic community, it seems very reasonable to have multiple people have similar ideas. But it seems like some ideas stick more than others. For instance, like the Viewstamped Replication versus Paxos. They’re the same thing. I’m wondering, like in that case, for instance, why would Paxos be more known than Viewstamped Replication?

Barbara:

[18:57] Yeah. So what Leslie says is, I went around giving all the talks and she implemented it. Really something to that.

Ryan:

[19:11] Then he got notoriety for speaking about it.

Barbara:

[19:16] Yeah, he did give lots of talks about it and wrote several papers and so forth. And I didn’t realize it was the same system, but finally it became clear that they were the same system.

Ryan:

[19:28] I also wonder how much the name matters, because Paxos is kind of catchy.

Barbara:

[19:33] It is a cute name.

Ryan:

[19:34] It’s a cute name.

Barbara:

[19:35] Yeah, Right.

Ryan:

[19:36] So I could see maybe it has better marketing around the idea, I guess.

Barbara:

[19:41] I also think that this approach to solving this problem, which both Leslie and I picked up independently in my case, came from the work I’d been doing with transactions. Because in transactions there’s a leader that says, now we’re going to commit and ask everybody, can we commit? And if they all say okay. But unlike in transactions, in transactions, if the leader fails, there’s what’s called. This can be a real problem here.

[20:12] We had to have a way of moving to a new leader if the old leader failed. And that’s the big step forward. That happened in both the Viewstamped Replication and Paxos. Now there was also Byzantine fault tolerance, which came along later. That was developed. So view stamp publication. I developed with my student Brian Oki. It was his PhD thesis. And then Paxos I developed with my student Miguel Castro was his PhD student thesis.

[20:43] And Miguel got interested in this problem because there was a DARPA request for proposals and there was one about this problem on the Internet. By then there were Byzantine attacks and malicious attacks and nodes that would purport to be working correctly, but actually they had been compromised and so forth. And so Viewstamped Replication and Paxos only handled crashes. And if there were messages that had been played with, you could tell when they arrived that they were bad.

[21:18] And that was about the extent of what we dealt with. But these malicious attacks with nodes that purported to be working when they weren’t, that was a big step forward. And Miguel saw this request for proposals, and he said to me, why don’t we see whether we can come up with a protocol that works in the presence of Byzantine attacks? And I think, by the way, Leslie is the one that invented the word Byzantine too.

Ryan:

[21:43] I think he did, yeah.

Barbara:

[21:44] Right.

21:44 — The significance of Dijkstras letter

Ryan:

[21:44] I saw in the software crisis stuff, there’s the paper with Dijkstra that you had mentioned was one of the papers you wrote that said need modularity. And that paper was the go to statements considered harmful.

Barbara:

[21:59] Right.

Ryan:

[22:00] And I’m curious because Dijkstra, I mean, when I was studying computer science, we all know his name. Did you ever meet him or work with him?

Barbara:

[22:08] Oh, yeah, many. I never worked with him, but I did meet him multiple times. And, you know, he had very interesting ideas. And that paper, go to statement considered harmful. It’s not actually a paper. It was a letter to the editor of the communications of the ACM. But it was very impactful. And what was really important about that paper was that Dijkstra was talking about how difficult it is to reason about the correctness of code.

[22:42] And this was at a time when, in the sciences, computer science was kind of dismissed as nothing much and anybody can write code. And, you know, Dijkstra was trying to make a point, which I think was actually important, that it’s not as trivial as you think it is. It’s not trivial at all. And he was also pointing out that Go Tos can be misused.

Ryan:

[23:08] And it was controversial at the time.

Barbara:

[23:11] It was, yeah.

Ryan:

[23:12] But today I’ve written code for over a decade.

Barbara:

[23:17] People don’t use Go Tos.

Ryan:

[23:18] I haven’t even seen a Go To. So why was it controversial at the time?

Barbara:

[23:24] The programming languages were different then. First of all, there were people writing programs in assembler. In assembler, you have to use Go To. Programming languages didn’t have some of the constructs in them that we think of today. And so some people were using Go Tos because they had to. And also compilers didn’t do all the kinds of optimizations that they do today. So there was a concern if you didn’t have Go Tos, maybe your program wouldn’t be efficient enough.

[23:57] And then there were people who used gotos and wrote really good code and they were offended that Dijkstra was saying, your code is bad. So there was a whole bunch. And Dijkstra was not the most diplomatic person, so he didn’t write it in a nice. You can imagine writing that paper more nicely where you said, but there were many reasons why it was controversial. People were offended. But then there were also concerns about programming languages don’t have these features.

[24:33] I need, what am I supposed to do? And then there were concerns about what the compiler was doing. And so the world is very different now. But clearly Dijkstra won the day here because no, go to.

Ryan:

[24:48] And is Dykstra in person also like his writing?

Barbara:

[24:55] He was not always as tactful as he might be. Yeah, but, you know, he was a very distinguished researcher.

25:04 — Why she stayed in academia

Ryan:

[25:04] Computer science has had such a huge impact on the industry. Why did you choose to stay in academia instead of going into industry?

Barbara:

[25:14] So I like doing research and I enjoy working with students. And teaching was never my favorite thing. But I always felt teaching and research were very closely connected. But also it was a different time. So this business about how professors are all forming companies and so forth, which goes on today, that wasn’t happening 20 years ago. Or maybe it was happening 20 years ago, but 30 years ago it wasn’t.

[25:48] And so when I was young, it wasn’t the thing that you did. It was sort of a either or sort of thing. So I wasn’t even thinking about doing stuff like that. I did work in a startup briefly at the end of the 90s. I didn’t like it. I much prefer doing research. And as a professor, you have this. It’s a gift and a curse. The gift is you can do whatever you want. The curse is you have to figure out what it is that you’re doing.

[26:19] But I like that freedom and the fact that I just could go off in any. I mean, my career is full of these interesting zigzags where I would switch to something else. I had the freedom to do that.

Ryan:

[26:32] What stops you from going off in a very useless direction?

Barbara:

[26:36] You won’t get tenure.

Ryan:

[26:40] Who’s the judge?

Barbara:

[26:41] The community. Yeah. It’s not, you know, the way that you’re viewed at your university has mostly to do with one very important facet of it is how are you viewed in your research community? So. Because that’s one of the important things they want from their faculty. If you’re in a research university.

Ryan:

[27:07] You mentioned earlier that research and teaching were heavily related. In your opinion. Why is that?

Barbara:

[27:15] Because when you teach, you have to teach from first principles. And if you’re doing good research, you need to understand deeply what’s working, what’s not working, what assumptions you’re making. It’s really the same thought process.

Ryan:

[27:32] It sounds like in research, I mean, this is true in every walk of life. There’s the, I guess, the direction that you take and then the work that you do towards that direction. And it sounds like for research, the direction is maybe the most important thing because you could have a phenomenal researcher doing an idea that has no, you know, even if you did it at 100%, it’s not going to lead to anything.

Barbara:

[28:00] That’s right. So you, you know, you tell students, graduate students don’t do incremental work. Don’t, you know, you’ve got, you. You can’t just keep working on the same thing over and over, just making little teeny improvements. You need to find. You’re right, you have to find a good problem, but you also have to find a problem that’s amenable to a solution and one that matches your skill set. And you have to recognize when you’re going in a good direction versus not going in a good direction.

Ryan:

[28:32] Looking back on your career, you say that it was luck and that it seems like things just fit together.

Barbara:

[28:41] I feel there was luck involved. There was a lot of hard work involved. There was not allowing something negative to cause you great difficulty. So, for example, when I finished my PhD, I would have liked a faculty position, but I didn’t have any good offers. So I went back to mitre, the company I had worked for initially as a researcher. In other words, I just kept on marching along. And it turned out to be a very good choice because I was switching from AI to systems.

[29:26] And this gave me. I worked there for four years. Gave me four years to make that transition without having to worry about students and teaching and all the other stuff. So it worked out well. But it also meant that I didn’t feel really sorry for myself that I didn’t get a job. It meant I just kept on going. And then Title IX was about to be passed, and finally academia was opening up to women and I was ready.

[29:58] But I think when I talk to other people about their careers, they talk about doors opening and doors closing. So I didn’t get the academic job I wanted, but I had this other opportunity. I majored in math, and fortunately I was offered a programming job. I mean, doors opened and then you have to decide, am I going to step through? And this is a pattern that people see in their careers. It’s not just me.

[30:27] Many people have. Have those kinds of choices that you make along the way. Opportunities and things that aren’t so great, you know, and you’re sort of working your way through.

30:39 — Why her award was questioned

Ryan:

[30:39] I’ve interviewed a few people who have won Turing Awards, and I’ve done research and watched a lot of these speeches that people give after they receive the award. And I noticed something different in your speech or some of the stuff that I read about your work, which is that in yours, it seems like when you got the Turing Award, there were people saying, why did she get the Turing Award? It was kind of negative commentary on that.

[31:10] And I didn’t see that in the other cases. Why do you think that they said that about your work?

Barbara:

[31:16] So I’m not so sure. This doesn’t happen to other people, by the way. It was just that my husband was out on the Internet, by the way. There was an Internet by then.

Ryan:

[31:24] Okay, okay.

Barbara:

[31:25] As you know, the Internet is not necessarily nice.

Ryan:

[31:29] I know that.

Barbara:

[31:30] Right. And so I can imagine that other people might have had a similar experience had they bothered to look. But I also think that the world had changed so much and the work that I had done was so basic to what was going on that people really didn’t understand that there was a before. I mean, my graduate students really didn’t understand that there was a before because I discovered that at the time I got the award, it really hadn’t struck them that data abstraction didn’t always exist.

[32:13] So I think that. I mean, I viewed it as, in a way, a huge compliment, not really just to me, but to the group of researchers who all together got us to the point where we had these modular programming languages and we understood about abstract data types and we understood about how to reason about correctness. All that stuff was so fundamental that people thought there was no time when it hadn’t already existed.

Ryan:

[32:44] So they took it for granted.

Barbara:

[32:45] They took it for granted. I mean, the Byzantine fault tolerance, you look at that there’s a protocol, and you can understand that there was a protocol. Somebody invented that protocol, you know, so you get the credit for that. This was much more fundamental. This was the whole mindset you had about how to develop programs, and that’s what people were talking about. And things had changed so much that all these inventions of mine and others were just there in the.

[33:17] Would work. And people didn’t realize that there was a time before.

Ryan:

[33:21] That’s validating then. Yeah, it’s. It’s so impactful, and everyone uses it all the time. That they don’t even know what it was like before what you did.

Barbara:

[33:30] So I thought, you know, it’s a huge. The person who said it didn’t mean it nicely. But I thought really, it’s a huge, A huge compliment and a statement about where we are now as opposed to where we were then. So.

Ryan:

[33:44] Okay, well, that’s all the questions I have for you. Thank you so much for, for your time. Really appreciate it.

Barbara:

[33:49] You’re welcome.

Discussion about this episode

User's avatar

Ready for more?