BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Interviews Ralph Johnson, Joe Armstrong on the Future of Parallel Programming

Ralph Johnson, Joe Armstrong on the Future of Parallel Programming

Bookmarks
   

1. Let me start off with a question to Ralph Johnson. You are working on parallel programming patterns, so you are doing some work in that area. Could you explain what that's about?

Ralph Johnson: There is a group of us at the University of California at Berkley and the University of Illinois and there are a couple of people from Intel, Microsoft, some of the national labs. It's a group of people collaborating on basically documenting parallel programming patterns mostly coming from high performance computing. You asked a question earlier about a book and obviously the long term goal is to get books out, but that's not really our immediate goal. The immediate goal is to figure out what they are.

We are doing things now as documenting the patterns, but looking at software and trying to say "How do these patterns play out in this actual software project" and "Are we getting all the patterns? What sort of things are missing?" Not everyone is an academic, but we are sort of taking an academic view towards it. We expect books and all to follow. In fact, there are a couple of projects from the group to do that, but that isn't really the group project and not necessarily my personal goal.

   

2. At what level are these patterns?

Ralph Johnson: There are coming at lots of different levels. One of the important things is that we're really motivated by the fact that with multi-cores we're all going to have parallel computers in front of us. We sort of are already, but it's going to be more so in the future and people don't know what to do with them. How can we figure out what works in parallel programming and get that out to people. It's going to be I think an education process as much as anything else. We're focusing on improving the performance of programs by using multiple processors and how you split your program up into pieces and that sort of thing.

   

3. Algorithms in parallel programming

Ralph Johnson: But one of the things about parallel programming that is different from the object oriented patterns is that in parallel programming you often change your algorithm. It has an effect on the algorithm you use. That wasn't really true with object oriented programming. Some of the patterns are really about the algorithms themselves and some of the patterns are more about the overall architecture of the system, because there are certain ways of putting a program together, that lead themselves a certain type of parallelism.

For example, if you are using a pipes and filters type of approach, which usually has been done for reuse, not for parallelism, but if you have a parallel machine, it's pretty obvious that your different stages of the pipeline can run on different processors and you get a certain amount of performance improvement that way. Whereas on the other hand, if you are going to have this big database in the middle of everything talking to the database, that's going to be a natural bottleneck and you have to do things to prevent that bottleneck.

You get this architecture, you get the algorithms you want to use and then from that you start getting into the real parallelism. Those things aren't so much about parallelism, they are more setting the stage for parallelism. Then we have patterns that focus on whether the parallelism is evenly distributed or whether you put it in certain geometric regions - things like pipeline, you have a tree, there are different ways that you're parallel processing and you see ways of organizing that.

   

4. Data structures in parallel programming

Ralph Johnson: Then you get down to particular data structures, what techniques you use for the synchronization. The message passing is one of those, but the shared memory is important for what people are doing. Joe is going to tell us all about why that won't scale, which I agree with, but that's beside the point. That's what people are doing right now. It goes all the way down to this lower level. An important thing is we use different programming paradigms or programming styles. I think they are more complicated than just patterns, but we call them "patterns", as well.

There is partly a way of organizing knowledge about parallel programming and the unfortunate part of this, what makes us unhappy is it's just a huge amount of some many different choices. Fortunately, no project does all of them. People pick one subset of them and there is going to be a lot of opportunity. In the future we're going to find out that some of them work better than others and we'll have a winnowing down and not keep all of them, but often, a lot of them are going to be used.

   

5. The role of GPUs

Ralph Johnson: We talked about multi-cores, but what about people doing GPU programming? That's very highly parallel. It's a different style, different type of architecture. Maybe things will turn out not to be so good for the multi-core, but then they are going to work fine on the GPU. Then, of course, right now we're just getting 4, maybe you start to see 8 with the multi-cores.

When we start getting up to the hundreds of them, it's going to become much more of a message passing. I think some of the shared memory staff isn't going to work as well, but that hasn't happened yet. If you are focusing on what people are doing right now, what works right now, it's the patterns that we're focusing on.

   

6. This brings us to Joe Armstrong. What's your take on this?

Joe Armstrong: Difficult to say. I've approached parallelism from a different point of view and it all started with the point of view of making things fault tolerant. I always thought to make things fault tolerant you need 2 computers. You can't make something full tolerant with one computer, obviously. If it crashes, it's not going to work, so you need 2 computers, so it's parallel and it's distributed. If you want to make something fault tolerant, by nature it must involve distributed computation and it must involve parallel computation.

   

7. Modeling the world with Erlang

Joe Armstrong: I'd be much more interested in the architectural aspects of parallelism, especially as reflecting the real world. I think there is a part of the programming community that thinks writing parallel programs is difficult and another part (the Erlang people) think it's easy. We've done it 25 years and we just say "It's not difficult. It's really rather easy." The world is parallel actually, it's not sequential and we have absolutely no problems in the real world of dealing with the notion of concurrency. There is 5 people sitting in this room - 2 of them are on camera and 3 of them behind the camera.

We don't have any problem with the notion that we're all doing things simultaneously and communicating by passing messages. We don't have any problems with that. When we glue components together, we also don't have a problem with the notion of message passing between components because if I've written one program and it's in the UK and some other guy has written a program in Australia and I want to make use of services of that program, I have no other alternative but to send a message to it and then wait for the reply to come back.

We glue components together by using messages as a medium between them. Unfortunately, we don't glue components on the same machine together in the same way that we glue components in a distributed system together. You have the rather absurd situation that people can write in one programming language they could write in Java or use the JVM or they could write .NET and there is rather difficult to glue these things together. This is totally absurd! We should be able to glue things together. When you come to the Internet, things are glued together by using sockets, but then, unfortunately, everybody has decided on their own protocols for gluing things together.

   

8. The state of protocols for IPC

Joe Armstrong: If you look at the assigned protocol numbers in the TCP (I was just checking this because I've got a lecture tomorrow and) I think they are something like 4,800 protocols. One of these protocols is for example HTTP and it's a 120 page document that tells you what one of the protocols is. All of these describe the English in an ad-hoc manner, which makes gluing things together rather difficult. But that is even easier than gluing things together on the same machine. We have rather an absurd situation that we can glue things together on the Internet but we can't glue together things in a small environment.

I'm interested in how you glue things together and how you make that. You get the performance of course by increasing the number of things that you glue together and you can only get performance increase if the things are independent. If they're dependent, then you can't get performance increase. If you got a single point where they have to go to, the performance of the single point will dominate performance. Shared memory is intrinsically evil, because it prevents fault tolerance, it produces a single point where you will fail and they limit performance.

If we had shared memory, it would be rather difficult because our brains would be glued together and you have to walk out the door together and go everywhere. We don't think like that. We have different views of the world which we update by sending messages. My notion of concurrent algorithms is just based on breaking problems down into sending messages backward and forward between things and that way of programming I think we pioneered.

   

9. How language support can help with agent programming

Joe Armstrong: We come from agent programming and Erlang became the pioneering language for that. It's finding it's way into other languages as well now. Don't think it's actually very difficult. If you don't have the right abstractions, you can make things artificially difficult. For example, if I was going to teach arithmetic and I only knew about Roman numerals, you might get the idea that multiplication is extremely difficult. Given the idea of Arabic numerals it becomes a lot easier. If we took Roman numerals, the Romans have no way to express zero.

It was just a sort of concept that didn't exist so a whole branch of mathematics was not only difficult, it was impossible. If we have the wrong abstractions, we can make things which are intrinsically rather simple very difficult. I think that's what's happened in parallel programming. We're using the wrong abstractions and that's making things artificially difficult.

That's why possibly patterns have come in, as a way to get around this. Given that we have the wrong abstractions, how do we use them in a way that we can nevertheless do things. By changing the way you view the world it's becomes less difficult.

   

10. Parallel Patterns in Erlang

Ralph Johnson: Some of the patterns that are in our catalog (particular data structures and so on) might be like that, but the patterns of parallelism, the algorithmic strategies are things like you break a problem into 2 pieces and then you can break those subparts into 2 pieces and you end up with this whole tree of things with tasks. When you calculate the values of the leaves you put them back together and that's a great way to put a lot of parallelism, a lot of concurrency into a problem pretty fast.

That's perfectly doable in Erlang. Some of the patterns are pretty generic across any type of problem, but there are others that are much more specialized. Like I said, when you are in pattern work it's more like being an archaeologist looking for what's there rather than trying to invent things. Every so often we get papers that are submitted which are very cool, but really are inventions, they are not patterns. It's a great idea and a lot of people copy it then it will become a pattern, but the whole point of patterns is trying to see what people do and why they do it.

Sometimes, the "why" is because you don't know any better, but usually there is actually some reason. That reason might go away, technology changes, so it what used to be a good reason to do it it's not a good reason any more. But there are usually good reasons why people do things. We're really trying to study what it is that the people are doing. There are plenty of patterns in Erlang, too, so they have their own set of patterns.

   

11. Are there Parallel Patterns in Erlang?

Joe Armstrong: It's funny you say that, because I don't see the pattern. I see one or 2 patterns. For example you're doing distributed programming. Remember, a procedure call is you send a message and then you wait for the reply to come back and then knowing that something has crashed and why it crashed, what the error was - so that's a pattern. I find it difficult to imagine a lot more patterns. I do see very complex behaviors which could be put into some sort of framework.

An example is fail over. If you've got the pattern where if a one machine is doing something and it fails, a client who is using that should just be able to go to some other machine and just carry on as if nothing has happened. I see that as a sort of high level pattern. You don't actually need many of them to build a system. It's very good to have a small number of components that you can glue together in interesting ways. If you have too many of them, it just gets confusing.

   

12. Is that the situation in OTP and the behaviors you have there?

Joe Armstrong: Yes, we have 6 or 7 patterns and we just use them for everything and it seems to be enough.

Ralph Johnson: Those are patterns about fault tolerance mostly.

Joe Armstrong: One of them is about fault tolerance. It just creates a tree of things and it says that if something dies and it's reported upwards in the tree, the parent tries to correct the faults in the children. We have a client server, an event logger, we have some rather complicated patterns which virtually nobody uses apart from people who designed them, which are the system upgrade patterns. We can a release the system, then roll it forwards and backwards. If it goes wrong, it just rolls back. These are fairly difficult to understand.

They are patterns in Erlang, but they are used very infrequently. You'll have one system upgrade pattern for your entire system, because you've only got one system. Client servers use it a lot, it's a dominant pattern. Then, there are interesting patterns which you could make but it's client server where somebody else replies. I ask you a question, but I get an answer from somebody else, but I think you've replied. That's a very powerful way of doing things, because it allows us to delegate things.

The things that are inside these, I don't know if you'd call them patterns, are basically boils down to protocols and the messages we respond to. One message, everything has to respond to its code change. If the message has code change, we've got some new code and you're supposed to just change what you are doing to this new code. It's very generic. That's what we use to upgrade systems. One can imagine the system (I'll be talking about it tomorrow) where you just put empty servers on every node and you could put an empty server on every Internet node in the world.

They don't do anything at all and then you send them messages. Could you become an HTTP server? Could you become an FTP server? Could you become an IRC server or something like that? After a while one team might say "There's a mistake in your code, could you become a new FTP server? A new HTTP server?" In a sense, if you start with a fixed server like Apache, then you can put modules in it to change it's behavior, but you could back off one level before that.

You could start with a server that doesn't do anything, it's just the framework to allow you to put code into it and then you put something that turns this empathy thing into Apache, for example. Then, you could put things into that give it behaviors. It's think if the bottom level of the system it's extremely general and undedicated and we just send code round networks and they become what you want them to become. They do that for a while then you tell them to do something else.

I don't see where the problem is, why is this difficult? It's like having your boss and you've got your employees and your people and they know how to do things and you say "OK, now you can paint the wall. When you've done it you can lay a carpet or something." This is how we write programs. We say "Now you become an HTTP server for a while and when you're done, you become a storage server". It's not difficult.

Ralph Johnson: There is sort of frameworks libraries (I'm not sure what to call them) that are becoming popular. Intel has this thing called threaded building blocks, Microsoft in their Visual Studio 2010 has got a TPL and PPL. One of them is C++ and the other is C# and I can't remember which is which. In Java they have the concurrency library.

   

13. Lightweight Tasks vs I/O

Ralph Johnson: They're all pretty similar, they are based on lightweight tasks, but one of the irritating things is there is no IO in them. When you do this actually and you describe your parallel algorithm, if you try any IO, you end up blocking the thread and they are just lightweight tasks, which means that a single thread is going to be switching back and forth between different tasks. So, to get good performance, you want that thread to be focused on just running your tasks, not on blocking.

These are, in some sense, anti IO. If you try to do IO with them you get bad performance. People are looking at how could you put IO in them, but basically it's got a different purpose. Its purpose is just running your algorithm in parallel. The question when you do want to do IO is how should you structure things. Then, there is much more of a tendency to have a message passing model.

I think it's because people have large bodies of C or Java code that is shared memory and they want to reuse that and want to put parallelism in it that these libraries are popular because they are something that you can add to those libraries and add parallelism to the libraries with less problems than normal.

   

14. The problem with shared memory

Ralph Johnson: There is no question that shared memory and parallelism cause a lot of trouble, there is no question about that. At Illinois we use the phrase "wild shared memory". Wild shared memory means just using semaphores and threads and trying to synchronize things. This has sort of been the tradition for what operating systems people do and some place somewhere there's got to be a little bit of that, but I assume that the VM in Erlang has a few places it does that.

When you are programming in Erlang you never have to worry about that. In general, there are lots of different projects in Illinois, people have different ideas about how to solve the problem.There are people who are into message passing, other people though are trying to using type systems to prove non-interference between things so you could write this shared memory multi-process and not have to worry about explicit synchronization because a compiler would just make sure it all happens automatically for you.

It's a wide range of things, but generally people acknowledge that the old style explicit synchronization with shared memory is just too prone to error.Basically you have interference. 2 processes - one is trying to read memory while the other one is trying to write it at the same time. You don't know what order things are going to happen in. Then, when you put enough synchronization in to fix that, then you're liable to have deadlock. That has just been the traditional problems for many decades. I can remember Dijkstra had a paper in 1968 on the semaphore.

The paper was actually on THE operating system and it gave the whole structure of operating systems, which we have actually been using pretty much for 40 years, since that paper, by having had a 2 page appendix that said "One of the guys on my project had this cool idea. Let me explain it to you." and it explained the semaphore.

I say it's pretty cool to write a paper where the little goody at the end was the semaphore has been really important in building operating systems, but it's just something that only gurus should use. It's not something that you want to put in regular applications. I think that's something that people have learnt and the whole community has basically come to.

   

15. Shared memory vs program correctness

Joe Armstrong: I agree, although I think it's far worse than you suggest because the problem with shared memory is not so much deadlock and things like that. It's a problem with incorrect code. If you have 2 programs that share memory, I hand over the program to your program but it gets it wrong. It corrupts its memory, its computation is wrong. They put some bad values into memory and then it releases its semaphores and other things and the other program doesn't know the memory is corrupted so there is actually no way around that problem.

One of the basic tenants of connecting things together is my program should not muck up your program. It's a basic imperative. I've written and proved it to be correct and believe it to be correct and you've written a program and believe it to be correct. We put them in the same machine both of them, but one of them destroys the other and then we can just forget all about programming and quality control and everything. The only way to achieve that is to make sure they're isolated. Error recovery, together with the notion of shared memory just don't work together. There is absolutely no way of making them work.

The other thing it's I think we're seeing a shift in programming to creating things and then gluing things together that have already been created. The only way to glue things together is, well, you could use shared memory, but I think that's a bad way to connect them together, it is by sending messages.

Now we're in a very sensitive area because right at the fundamental level we don't really agree on the data types inside our programming languages. C has got these things called integers and Erlang has got these things called integers, but an integer in Erlang can be a million digits long, but in C it's got to be 32 bits long or 64 bits long and we have to do quite a lot of heavy coercion here to get them.

   

16. The role of XML and s-exprs for communication protocols

Joe Armstrong: If we can't even agree what an integer is, we have to be very careful when we say what an integer is We're beginning to get into deep problems. When we have closures and high order functions how the heck can we send them? We can't send a function from Erlang into Fortran because Fortran won't understand what a function is.

Although we should be able to use some sort of transparent messaging, we're forced to go down to something like XML or something like that, which is completely painful because it's very verbose. Again, it doesn't solve the fundamental problem of agreeing what data types are. What we don't see is a programming language which describes protocols themselves. We need a separate family of new languages which describe protocols.

Ralph Johnson: People have described these languages in the past, but the problem always is you have to all agree on them. Are we saying we're going to solve the problem of heterogeneity by having too many standards, we can't agree on them all and making one more standard and everyone is going to meet that standard. Like CORBA was going to be that standard 20 years ago.

Joe Armstrong: If we'd used s-expressions for things to communicate through sockets, we could have thrown away the parsing problem and instead of having 4,800 protocols, we can have one. That could have been done years ago. That's why XML is becoming very popular.

Ralph Johnson: How is XML any different from s-expressions?

Joe Armstrong: Instead of having round brackets you've got funny weird brackets. As long as they're well nested I don't see any difference. If you've got a weak memory you can put a tag at the end. If you've got a good memory, you could match the brackets. But XML has also got a grammar, it's got DTDs and things like that to describe what the type of the data structure is, which unfortunately people don't use. If your XML is a type to data structure with a given syntax that you send over the net that's a good starting point.

We haven't defined the protocols, because we don't know in which order they should come. What people normally protocols are packet structures and they say "If I'll send you that packet, you'll send me that one back". But there is nothing that says "If I send you an A, you're not be allowed to send me a B until you've sent me a C" and languages like that don't really exist and they should exist.

Ralph Johnson: They are certainly not standard, people have written languages like that. I think it's because the industry has not taken message passing really seriously, so there hasn't been that belief that we need to standardize. It's been something that researchers have done.

   

17. API protocols

Joe Armstrong: I think it boils down to APIs because if you look at say a file system, you can open a file and close a file and read and write a file. I have never ever seen a document that says if you open a file and then close it then you're not allowed to read it. It doesn't say it in the API. You've got the file handled in your program and you can read it. The documentation doesn't say you're not allowed to read a closed file because everybody knows that, so it's fine.

It's fine everybody knowing it, but a theorem prover doesn't know it and the bit of mathematics doesn't know it, you have to tell it. You could add this information in the state machine and say "When you are in the state open you are allowed to read it" or "When you close the file you change the state to 'closed' and when you are in 'closed' you are not allowed to read it."

But APIs don't tell us this kind of stuff, they tell us all sort of stuff we don't want to know and the vital stuff about which order you are allowed to do things they don't tell us. That's called protocol and we are very bad at describing protocols. Practically engineers put packet sniffers on and see what happens and that's not very good.

   

18. What approaches have there been for describing protocols? You mentioned some.

Joe Armstrong: I think Tony Hoare's communicating sequential processes (CSP) made an algebra for describing protocols. Actually, when I first made Erlang I was trying to do CSP in a way and it didn't really work out properly. Erlang is kind of a failure, because it should have been CSP in Prolog but it wasn't and it's got a life of its own. Perhaps we can move it backwards in those directions.

Ralph Johnson: I know there has been research systems - I believe the name is Statemate - and they had adding states to type systems, so you could talk about how the state of an object would change over time. There was an IBM project, it was at least 10 years ago. These ideas come up over and over again, but you get people in research labs at universities who build slightly more than a toy system and they do a couple of applications where everyone ignores them and the world goes on.

Some years later, somebody says the same thing over again. There is a huge amount of really cool stuff that's been ignored and we're talking about that and the difficulty is when you have good stuff (when I think of Erlang) - something that does something that nothing else can do and it's been around for over a decade, so it's very reliable and how difficult it is to get the word out for that.

If you're Microsoft, all you have to do is give an announcement at your big conference and everyone pays attention to you and you have 10,000 programmers using it the next day. But if you are somebody else, it's really hard to get your good ideas out. It's just nature.

   

19. The 30 year gap between research results and the mainstream

Joe Armstrong: It seems to be one of the paradoxes of computing. Did you see Dan Ingalls lecture last night? That was really good because he showed his example from being around and he developed Smalltalk and from early Smalltalk-76. He showed some little drop-down menus and I guess they invented drop-down menus at Xerox PARC and that found it's way into conventional operating systems, you find them everywhere in the menu - you click on something and you get a drop-down menu.

What he was doing in possibly '76 or round about then, the late '70s, not only could he make a drop-down menu, because he showed us, he could just rotate the menu. The entire drop-down menu just rotated a bit or you could make it spin or make it spin and fly around the screen and things like that. Here we are, in about 1978, doing stuff which you can't do today on your computer. It's totally absurd!

What happened then? The guys at Xerox made the windowing system, they made menus and things like that and then a small fraction of those ideas migrated into mainstream computing leaving behind most of the ideas. It's absurd that we haven't got what was developed 30 years ago into mainstream computing today. It's absolutely great. Look at Prolog for example. It's the same story. If you are doing Prolog queries contra SQL or something like that. It's just crazy.

Predicate logic is an incredibly powerful programming tool. We've all these islands of programmers, at these developer conferences we have great armies of people doing .NET and you've got another army of people doing JVM and these 2 crowds of people don't talk to each other and their applications don't inter-work. I think we're going to move into a world where we want to glue things together that already exist because if we got a million lines of code, we don't want it to re-implement it in another framework. We're seeing that already.

   

20. Commoditization of Services

Joe Armstrong: I think the big breakthrough there is in storage. Things like Amazon S3 just make storage really simple. You can buy a gigabyte month of storage for 15 cents. It doesn't make sense to do it yourself. Slowly we pick off these things. Let's take storage and we won't do that in our enterprise system, we'll buy it from somebody else. Then they'll pick off something else, databases with queries. We'll pick off databases and we'll put them somewhere else and we'll pick off something else and put them out. We're forced to use communication protocols to do that and we're seeing the trend already.

What we're not seeing is a commoditization of those. I was talking to Ralph earlier, we want to buy say a gigabyte month of storage with a liability of a million dollars if you lose this stuff? I think it's going to become commoditized. You can ensure a boat if got a freight liner you can ensure it against you losing your contents of your boat because it sinks, but why can't I ensure my company's data? Because then I'll store it somewhere and not pay a million dollars for that.

If you look at the real cost, do you want to build storage yourself? You can do it yourself - you go out and you buy a disk, it costs you 100 pounds for a terabyte. Buy a few disks, hire a programmer, run it around the clock and it's going to cost you a fortune.

Ralph Johnson: It's not the disks; it's the backup, the management and everything else.

Joe Armstrong: We are seeing a commoditization of storage. So they are basically I always said a few years ago there was some fundamental problems and if you decide one of them is storage - how do you store stuff. Next problem is how do you find stuff, because we've got so much stuff how the heck can we find it? The Google solution - just index absolutely everything if you've got enough storage it's fine and if you have got enough computers.

We can find stuff and we can store stuff and then the next problem is going to be that most of the stuff we find is rubbish so we need to filter the stuff. If you look at what you get over the web, I was looking at house advertisements because I wanted to buy a flat. If you click on one add, you go all these pages back and I was thinking "I wonder what's going on under the covers", so I traced everything, I found a little proxy and captured it and to get one house advertisement I sent 137 URL requests and pages.

Most of it was full of advertisements. That even set non-cacheable on pictures to force the browser to refresh to put the advertisements in. I'm not actually interested in that, I'm interested in the price of the flat, the number of rooms, the number of square meters and what it costs. I think what happened the early web just hand written HTML was quite a high information content. I think the signal to noise ratio is going down. It's not just spam? The house ad has got 5 numbers of information and that's hidden in several megabytes of stuff which I'm not interested in.

Then, what you see at these aggregators and mashups what they do through reverse engineering is they extract the information content. Shannon's Law, the amount of information is proportional to the log of the amount of data. The first problem was to get data to everybody. The second problem now is how can we solve that we can get data to everybody. We can store the data, we can find the stuff in it, but now we have to filter it, so it has good content.

This is all kind of distributed thing with the commoditization of all these things and I don't think we need that many protocols actually. I mean we need storage - the storage API. What more can you do with storage than get and put, store that stuff forever.

Ralph Johnson: Everybody says "We don't actually need this many whatever" and then people keep inventing more and more things to do with it before you know it you got to use it.

Joe Armstrong: That's because they've got too much time.

Ralph Johnson: No, it's because you say Storage, but there is going to be all the medical information systems and keeping track of your health record, etc.

Joe Armstrong: But you are still just doing get and put, aren't you? You know, get Joe Armstrong's data, put Joe Armstrong's data. And then you got a sub-key "Put his weight or length of his hair"

Ralph Johnson: But you start having services, you start asking it to do more.

Joe Armstrong: Yes, but that's decoupled from storage, unless you're going to put the service in the storage.

Ralph Johnson: If you are saying all you have is storage ...

Joe Armstrong: No, I'm not saying all you have is storage. I'm saying that if you factor out storage, you can commoditize it and just put it there and have a nice protocol and that problem is solved and you have the other problems how to interpret that issue.

Ralph Johnson: I agree that commoditizing the storage is important. All I'm saying is that you say you're going to have a fixed protocol for storage - fine! But when you say we're only going to have this fixed number of protocols, it's not going to be a small number of protocols, there is a lot more to life than storage.

Storage is fundamental, but if you think of we have HTTP, we put all this stuff over HTTP, but then people are layering over HTTP stuff like SOAP or how you interpret URLs, the very way of decoding URLs, that's a kind of protocol that people just keep adding more and more information. That's because they are doing more and more. They will have specialized needs.

   

21. Protocols - and the world's most used language

Ralph Johnson: I think that's why it's important not just to have a fixed set of protocols, but to have protocol definition languages, because people are going to keep inventing new ones.

Joe Armstrong: Yes and no. Probably the most used programming language in the world (I'll just guess it) is postscript. Every time you print something on a modern printer you are sending it a program, Postscript program, which you then execute and then out comes the page.

Postscript hasn't been changed in the slightest. You are sending for every document megabytes of Postscript and that's completely frozen. It hasn't changed for years. The fact that it's frozen provides an awful lot of benefit.

Ralph Johnson: But it's also programmers don't write programs in Postscript, programmers write programs that generate Postscript.

Joe Armstrong: Some of them.

Ralph Johnson: You have been known to do that, but that's not normal. Normal people do not write programs in Postscript.

Joe Armstrong: A friend wrote directly in Postscript and he'd send documents out to people working and they said "Could you send me the source, please?" and I said "You've got it" "Am I expected to edit all the Postscript to revise your documents? "Yes". I thought that was slightly extreme, but he was a control freak.

Jul 21, 2010

BT