Interview with Barbara Liskov

1. Hello and welcome. Thanks for joining us, Barbara, Professor at MIT and ACM Turing Award winner of 2008. You’ve just delivered and excellent keynote at QCon London, which will be up on InfoQ website, later on and one of the things that I guess people most know you for is the Liskov Substitution Principle, which has been around. I wonder if you can just tell our viewers a little bit about what that is and what it means?

So it’s a very simple rule, it’s really very intuitive and what it means is that if you have a type that is a subtype of another type and you use an object of that subtype in a context where you expect an object of the supertype, then the object of the subtype ought to behave like you expect. In other words you’re depending upon the specification of the supertype and the object should meet that specification even though it might belong to a subtype.

2. And that specification means more than just a syntax of the methods that supported the semantics as well?

Absolutely. It means the semantics; the language and the compiler will typically give you the syntax, but what really matters is what the different methods do, how they perform.

3. In your keynote, you gave an example of a stack and a queue, how would that apply in this case?

So you might define the stack and a queue to have exactly the same methods with the same names and the same arguments, but a stack is a LIFO behavior and queue is a FIFO and that’s a big difference semantically. So if you are using a queue, you expect a FIFO behavior and you would be very surprised if you’d got the other kind of behavior. So that’s the behavioral part, that’s the important semantics of the definition.

4. Do you think that in those examples it’s worth having a refined subtype to explicitly encode the difference between the LIFO and FIFO semantics as part of a typing system or that would be something that would be annotated externally to the type system, like semantic constraints?

Our compilers today are not powerful enough to support or enforce semantic constraints. So we have to handle the semantics extra language and the typical way we do this is by writing specifications in English, for example you might use JavaDoc to explain the meaning of your type and your methods. And then it’s up to the programmer to make sure that they do the right thing, which means that it’s a little bit on not the firmest ground and requires training and understanding on the part of the programmers. We might hope in the future that the compilers or the verification systems will get better and would be able to do more enforcement.

5. There have been programming languages in the past like Eiffel, which has design by contract, and we’ve had the OCL specification for pre and post conditions as well. Do you think that’s the future for being able to encode those semantic constraints or do you think maybe we’ll be using other approaches or perhaps while we haven’t seen as widespread adoptions of the ideas behind design by contract?

My understanding with design by contract is that there is an enforcement but it’s not really compiler enforcement, it’s more runtime enforcement, which is a lot better than not having anything, but what you’d ideally like is to be able to write a specification that could be proved at compiler time, so that you wouldn’t have to worry about misbehavior at runtime. And I think verification systems are getting better but they are not at a point yet where we can do this on a widespread basis.

6. You actually mentioned about the runtime constraint checking within your presentation. Do you think that there’s always going to be an aspect that needs to happen at runtime, because you can’t know everything at compiler time by virtue of different data come again or do you think that we will be able to encode more and more parts in a static specification?

Well I don’t know when these techniques will become practical, but if you think about what’s going on in most languages or rather programs, typically you know what you’re dealing with and if you are allowing for a set of possibilities then you’ll start out with something that represents the set of possibilities and you explicitly refine, and the refinement step is dynamic but the specifications on both sides are not. So if you are able to refine from a type to some subtype, than after you’ve done that the compiler knows that you’re working with a subtype and if you could do verification you can then verify that you’re using it properly, and before you refine, you can verify that you are using it according to the supertype definition. So I think that you can have what’s basically a static analysis to handle these things merged with a little bit of dynamic mechanism where it’s needed.

7. [...] Do you think that those are in a higher level specification for how a module behaves versus how the types within it behave and maybe has a separate notion of the entry points, if you like, to the module as a means of aggregating types so is the module just an example of a bigger type?

Alex's full question: You also mentioned about modularity, which is really about being able to plug together different subsystems or different modules with a way of separating out, I think in the descriptions you were using earlier the initial part of it was to separate out the teams that were working on it. But subsequently having a clean separation of that interface makes it easier for other modules to use. Do you think that those are in a higher level specification for how a module behaves versus how the types within it behave and maybe has a separate notion of the entry points, if you like, to the module as a means of aggregating types so is the module just an example of a bigger type?

I think the module is actually an example of a bigger type or maybe more specifically if you think about putting distributed systems together it’s actually an object belonging to a bigger type and it has an interface of its own with its own specification and then this specification of course might - other types might show up in this specification, like you might take an object of type T as an argument returning S as a result, but still it’s all specifications and what’s going on inside should be invisible.

8. And the fact that you than have the hidden internals of the module is very much just like a class where you want to hide the behavior to protect the users of that module from knowing about the internals?

Well yes, probably a different way of saying it is you would like to protect the implementation of the module from misbehavior by the users, so that other users who aren’t misbehaving can actually rely on what’s going on.

9. Do you think that there’s a sort of stage forward from the way that current object oriented languages like Java work to have a more module oriented language, where by the modules might be themselves implemented in objects, but treating modules as a first class citizen as things like Jigsaw and OSGi in the Java world for example?

I’m not actually familiar with those, I’ve heard the names but I don’t really know what they are, so I’m not sure how to answer it, I mean I really do believe that functions should be first class arguments, so to that extent, passing modules as parameters seems ok, but modules to me mean just a way of grouping a whole bunch of code, so they don’t seem to correspond so much to abstractions. I think that the currency of our programs is actually abstractions and modules are just a way of putting the code together, at least that’s my understanding of what we’re talking about, when we’re talking about modules.

10. You’ve also mentioned about read-ability versus write-ability of code and Perl is probably one of the classic programming languages which is being described as write-only. What do you see as being the use of new languages coming out, which are very innovative in their syntax, being able to write concise code, perhaps to the detriment of being able to read it?

It seems like this is just a lesson that has to be learned over and over again. I feel like we’ve talked about it 40 years ago, we’ve talked about it 30 years ago, Perl, after all, has been around for quite a long time, but it just seems it happens over and over again, and it just seems to be something that’s hard for people to keep in mind that read-ability matters more then write-ability and cleverness on the part of the compiler is not necessarily a good thing although it’s technically very interesting but it doesn’t mean it’s a good thing from the point of view of the code that results.

11. Is that because it makes the code more difficult to understand or read or is it the actual sheer mental model that you have to adopt to try to understand what the compiler would do when faced with the code?

I think you shouldn’t have to think about what the compiler should do. In other words, you should be able to read a manual for the language, understand the language and read the code based on that understanding, without having to think about the fancy stuff that the compiler is doing.

12. You are interested very much in parallel and distributed computing, as your areas of research and there’s an interesting analogy with CPUs within existing devices, where we’re going from a single CP, to multi-core, to multi-threaded and those things. Where do you think we’re going to go with distributed computing between computers and also parallel computing for getting the most pout of a single computer?

I’ve mostly been working in distributed computing for the last 20 years and distributed computing is a bit different from what we’re encountering on the multi-core machines because in the distributed world there is no shared state. Every computer has its own state, if you want to share, you’d have to explicitly pass stuff back and forth. If you are working on a multi-core machine, there is a shared memory and so it isn’t clear that the paradigms that we use for thinking about distributed systems are going to carry over directly into the multi core machines. You can think of multi core machines as being distributed systems, but it’s not clear that’s the right way to think about them, because then you’re not taking advantage of the shared memory which can have a big impact on performance. So I think we’re right now learning how to use those machines and what will end up is not so clear.

13. Do you think in the parallel computing world, where you have databases that might have eventual consistency and so share the data in an asynchronous fashion, do you think that’s the right way to be able to share a state across parallel systems or do you think parallel systems should be explicitly passing data with the requests?

Well the interesting thing is I actually don’t think eventual consistency is the right model for distributed systems. Because the problem is that it doesn’t give you good semantics and it works fine for certain kind of applications, it’s fine for Facebook, it’s fine when you’re building your shopping cart, it’s not fine when you’re paying your money and so it isn’t a model that’s adequate for all applications and if you try to use it as the basis of your programming of those applications, you end up reinventing how to build real consistency on top of eventual consistency. Now in the case of a parallel machine I absolutely do not understand right now what the right memory model ought to be, because if you use sequential consistency, which is a very strong memory model there’s a big loss in performance and so it’s actually something I’m thinking about right now, how far down the road can you go with using a weaker semantics without ending up having a lot of trouble building your applications at the next level up.

14. [...] If your bank account could apply two transactions in either order than could there be some reasoning of the programs where you’re looking at the deltas, if you like, the changes rather than absolute values and then apply them?

Alex's full question: Is there any issue that databases really enforce some state rather than a transition between states and that if you could encode a transition and compose those transitions than perhaps it wouldn’t matter in which order you do them. You used it as your example of some kind of bank accounts. If your bank account could apply two transactions in either order than could there be some reasoning of the programs where you’re looking at the deltas, if you like, the changes rather than absolute values and then apply them?

Well first of all, if you have operations that are commutative, so it doesn’t matter what order they’re in, then this gives you a lot more flexibility in how you implement things, but many operations are not commutative and in that case the order in which they happen matters. Now whether you capture that by means of a log, and you run the log later to achieve the state or whether you operate on them right away and produce the state, I don’t think that matters. What matters is the semantics and if the semantics depends on an order you have to have a way of enforcing that order, one way or the other.

15. Some large database systems operate in a sharding form, whereby they have one set of details owned by one machine and then another set of details owned by a different machine. Is that another way of tackling where the data is and sort of essentially parallel stripe on the data rather than on the operations?

So if I map this into what I’ve been doing in distributed systems, we think of partitioning the database, so that you take “That table is over there, that table is over there” and everything works beautifully as long as your transactions only use one of those tables, but you always have to take account of the fact that some transactions will use multiple tables and then you have to have the answer for how that works. So it’s really a kind of a performance game that they’re playing. If you mostly have transactions that work on just one partition, everything works beautifully and if it’s just a once in a while that you have to have multiple sites involved, then you’ll pay the price for that, when it happens. So it depends on whether the data allows that kind of distribution or not.

Alex: And the presumably you have to include some sort of third party observer or entity or transaction system, when you have the different systems.

Yes, absolutely, you have to have transactions to really get true consistency.

16. Do you think that within the distributed computing model, we are seeing perhaps client side computers are playing part of that distributed model as well; everyone is carrying around a Facebook app on their phone, for example, and that’s talking with some of the back-end data; does the thoughts behind the distributive computing view the separation with “there are some client-side processing and we can use that as a different node” or is it mainly focussed on the back-end systems, for how they are implemented?

Well I think the work on distributed computing is been all over the map. So there certainly has been work in which they’ve looked at “could we off-load some of the processes into the client machines” and this depends on what the client machines are, whether they’re available for other stuff and so forth. There has certainly also been a lot of work on disconnected operation, which you know, if you start to think about people are going to put stuff on their cell phones, you have to worry that, if they’re storing something essential, nobody else will be able to get hold of it for a while and so I think the model is very flexible and I think there’s been a lot of thought about these kinds of issues over the past 20 years, what’s been changing it’s the technology, so every time the technology changes, you have to sort of rethink things to see whether now, with the new technology, something that didn’t make sense before, makes sense now.

17. [...] Has there been much progress in that as a view point or do we think that the agents themselves almost violate modularity and the fact that they need to have internal knowledge of how other systems work?

Alex's full question: Some time ago, there were discussions of agent-based processes, where you could have an agent on your behalf to go out and execute perhaps on one machine and then move to a different machine and then execute there to get data and then come back to you, has there been much progress in that as a view point or do we think that the agents themselves almost violate modularity and the fact that they need to have internal knowledge of how other systems work?

Well they could be visiting and still using stuff at an interface level, so you could imagine an agent that sort of flows around world, and as long as it runs at an abstract level, it’s not going to be violating modularity. Whether it’s better than “Here I am on my client machine and first I go there and then I go there and then I go there”, that’s less clear and sometimes it doesn’t seem to really matter, it’s more like an implementation, you just change the implementation a little bit, but it’s basically the same thing.

Alex: I think the real key in those things as you say are the interfaces and how you know where those services are and how you talk to them, so in some sense, service discoverability comes into that as well.

Absolutely, but you always need that when you’re dealing with a distributed system and typically you always put on to the client machines some code that’s able to help, figure out where stuff is.

Barbara Liskov, thank you very much.

You’re welcome!

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

Bio

About the conference

This content is in the Architecture topic

Related Topics:

Sponsored Content

Related Editorial

Related Sponsored Content

Popular across InfoQ