Bio Crista Lopes is a Professor of Informatics at the University of California, Irvine. Her research focuses on software engineering for large-scale data and systems. She was a founding member of the team at Xerox PARC that developed AOP. She is also a core developer of OpenSimulator, a virtual world server and a founder of Encitra, a company specializing in online VR.
GOTO Aarhus is an premier software development conference designed for developers, team leads, architects, and project managers. At GOTO, the program is created for developers by developers. Our concept has always been to present the latest developments as they become relevant and interesting for the software development community.
Hi thank you, I’m Crista, I’m a professor at the University of California in Irvine, I teach courses in Computer Science and Informatics and I’ve been doing a lot of different things over my career.
Not languages as such, I think it’s about something that I think is even more fundamental than the languages themselves which we don’t do a very good job at teaching in CS curricula and the talk is about Programming Styles, so different ways of thinking about computational tasks, and what we end up doing in when we teach about these things, we do a good job at teaching certain programming languages so there was, more recently there was this massive switch into Object Oriented Programming, we have a whole generation of developers out there who have been indoctrinated in OOP. And so they have learned to decompose the problems and solve the problems in terms of objects, but that is just one way of thinking about the problem itself, there is a whole other class of people who have been indoctrinated, particularly certain universities are very strong in teaching the functional style, functional programming languages.
And so those people are sort of indoctrinated in a way of thinking about the problem, they very strongly believe this is the right way of doing things, and what I’m trying to do here is step back from those kind of religious approaches and make everyone realize that what is going on in here is just a matter of style and style throughout history, it has been the reason for lots of people having lots of very heated fights in everything, in arts, in literature, different styles of approaching things are actually quite dear to people that tends to happen in that way, but really if you just step back and really what is going on is just different ways of thinking about the problem, and yes, some ways of thinking about problems are better than others, for specific kind of things, as far as I know there is no single way that is better than the other in general, it depends of what you are trying to achieve, what your goal is when you are developing.
And so what I think is really important more than making people be versed in one or two styles is making people realize that there is all this variety of styles and when I say styles I mean something very concrete, what I mean is constraints. Style is a set of constraints with which you solve your problem. If you give a set of constraints for, you end up with one style, if you give another set of constraints you end up with another style and that is important to realize that there is this whole collection of constraints that we have accumulated over 70 years and we are not teaching that very well, so that is what I’m trying to do here, is to come up with a way of teaching these lessons that are actually more important than programming languages themselves. So that is what I’ve been doing, I have a whole collection in GitHub that right now has about 23 different styles and I’m going up to 33 and then I’ll stop and actually write the book. So if people want to contribute, that would be great.
Werner: So we'll find that on your GitHub account.
My GitHub account is “crista”, and you will see the Exercises in Programming Style, is the project
So yes, I’m getting away from giving the styles technical names because people get into this religious discussions about their favorite style so I’m giving them all fun names so you’ll see styles like, the first one is called Old Time Style and it’s trying to emulate the style of programming that people had in 1951, so what does this mean, it's not just a name. It's actually a set of constraints, very clear set of constraints and are actually two, basically you have very small primary memory so typically the size of the memory is much smaller than the size of the data that you need to process and around that time you also did not have symbols, so all you had was addressable memory, so there were no variable names, no nothing, there were no names, there was just addressable memory so you had two constraints, very small memory and no labels, no names. So I can tell you, write some program with these constraints, you can do it now.
This is one style, another one would be Code Golf Style. Code Golf is where people try to write programs in as few lines of code as possible, so what you try to optimize is the number of lines of code. Another style would be Things, the kingdom of noun style, the constraints there is that you divide your problem in terms of things, and these things do actions and they wrap data around those actions, so that is sort of what is going on in that style. Other styles that come to my mind, so another one would be, REST, I have an example there that illustrates the REST style, that is the underpinnings of the web. That is a complicated style, actually a quite complicated, there are many constraints to it.
So some styles have very short constraints, other have a long list of constraints, but what is really going on is write programs using these constraints, and if you obey these constraints then you end up programming this style, that is how it’s works, and I stop at 23 but it’s unbounded, the number of constraints that you can come up with is unlimited really, so what I’m trying to capture is things that I know that have accumulated over 70 years and that is important for people to know but many more kinds of styles are going to be developed as we go forward.
I mean you can call this “patterns”, there is no reason why not to call this patterns. What I want to kind of zoom in is the fact that what is really going on is constraints, even patterns are constraints, and the patterns are constraints that serves specific purposes for the design of your application. The ones that I’m trying to illustrate are constraints that don’t necessarily serve a functional purpose but that is the service or the skeleton of which you develop your program, so in a way such of more lower level than what people think of as design patterns.
The book is supposed to be out in March.
That is correct, that is one of the styles that I have too.
It’s again another way of thinking about the programs and one of the styles that I have in the GitHub repo is ... I’m trying to get away from these technical names because I don’t want to step over people’s interpretation of names but there is a style in there that is given by the following constraints, that you decompose your problem into abstractions of some sort functions, procedures, objects you name it, and then what happens is that there are maybe other things that you want to have of that program, that you have them latch on to those abstractions without changing the source code of your abstractions. So that is the constraint, you cannot change the source code of your abstractions but you want those things to actually act upon even on some internals of your abstractions, so that is the constraint, and so you can do things like for example profiling or debugging or tracing that with that kind of thing and you have to find a way of latching that extra functionality on top of the abstraction that you have without changing the code of your abstractions. Aspect Oriented Programming was something that I was involved with back starting 1995.
So AOP had a bunch of different roots so there was a Gregor Kiczales' group was doing the metaobject protocol, reflection and this idea of open implementations which was this idea of having primary decomposition of your problem but using whatever language you choose, but then also have these other programs on the side that would affect the boxes. I had also been doing before going to PARC I had started my PHD and what I wanted to do, I had a background in distributed systems and I wanted to find a way of specifying certain non functional requirements of distributed systems on the side, so things like for example marshalling of data to pass between remote calls, how would you specify, how deep the data should be copied for example, you don’t want to pass the entire graph, you should be able to specify that somehow declaratively on the side or things like concurrency for example, you might also want to define your concurrency policies sort of on the side of the main functionality.
The idea being, the goal that you would have a program that would be sequential but then magic would happen and you’d be able to run that program in the parallel machine by just doing the control on the side. It’s kind of hard to do but that would be sort of the idea behind that approach, so where I went to PARC, there were many other people doing variations of this idea, the people on IBM Research, Harold Ossher and Peri Tarr were doing Subject Oriented Programming. That was also sort of separating different things, so this idea was sort of a pervasive idea at that time in the mid nineties about separating things, and so we went strongly on that idea of separating things and then we came up with a name Aspect Oriented Programming about a year after I was at PARC, I think that was a good name.
Werner: So I came in to this interview thinking that Aspect Oriented Programming was a big topic in the first half of the 2000’s and it sort of disappeared from my view, but now with your ideas of styles I'm thinking, there is no paradigm shift but that all these paradigms exist in parallel, so it’s very interesting.
Absolutely I think so, ideas never die, they maybe been forgotten and then they are rediscovered eventually and it happens a lot. The ideas have been sort of proposed back in the 60’s and for a reason or another they never made it to mainstream and then people discover them again 20 years later. Aspect Oriented Programming was, there was a lot of excitement at the time and then other shinier toys get along and seen people fled a little bit but I end up finding it in unsuspected places, sometimes I'm looking for this or that library and I find libraries in different languages that do Aspect Oriented style things just kind of interesting, I think it’s fun to see that the concept, the constraints, right, have survived and people have adopted them for their own purposes, and ultimately I think that is what matters.
9. The question is of course, Aspect Oriented Languages which of course didn’t take off because it needs more adoption. Question now is how can we support these things more easily in existing languages, is it meta programming, is it something else?
You usually if you don’t want to change languages, a very easy way of doing it is if your language has reflection is very easy to do AOP. In fact in this style that I have that is based in constraints, the example program that I have that I uses Python's reflection to achieve that, so most languages these days have some form of reflection, so it’s not that hard to do Aspects in a way without having any particular special purpose constructs in the language and I think that is good because as I said, ultimately what matters is the concepts and the ideas.
Werner: So moving on to some other concepts and ideas, your research is in, you do some research in mining software repositories.
Correct, this has been more recently and I started doing that in the mid-2000’s, so I have been sort of collecting very large amounts of projects, like tens of thousands of projects and then doing data mining and statistical analyses over what is in the source code. Let me tell you where these ideas came from: I entered Academia as a professor after I left PARC. I went through my tenure process and I start realizing that a lot of what we do in computer science is first of all not science, it’s a very big misnomer. A lot of what is going on is people having these ideas, these design ideas and kind of defending them somehow and now design is very hard to defend, because you, how you are going to measure things and that is sort one of the corner stones of doing science if that is what you want to do, you have to gather data somehow and show with your data that something is better than the other thing and the data needs to speak for it, is not your arguments that speak for it, it’s the data.
And so I started realizing that the sort of the small scale evaluations that we had been doing just didn’t do it for me, if I wanted to try to be a scientist and I thought that being in academia was a good opportunity for me to actually try to be a scientist. I thought that maybe I should look more carefully into evaluations and so that is when I started realizing there is all this source code now available. So in the mid-2000’s open source was happening, well and lots and lots of code was being put out there. So I start realizing that the problem of assessing might be actually much better now, you could actually get the code and somehow test our hypotheses about what was going on in the code with that code. There are limitations of what you can do, you cannot really test new design ideas with data that already exists because that data does not use your ideas, so you cannot really validate new things, but you can validate old things, there are things that you can validate by looking retroactively at how people write things. So that is when I started collecting these very large amounts of code and dissecting the code into relations and doing some data mining on to find both interesting facts and serve as feed for interesting development tools.
So there is one research question, this one is not published yet, so I’m giving you a little preview here. One of my student's PHD dissertation was about the following question. We have software metrics, things like coupling, cohesion, defects per line of code, things like that, that you can do with static analysis and you count effects per line of code. So we had these metrics, internal quality metrics, we teach people that you should write good programs that have some either higher or lower of these metrics. And then we have libraries out these that are used or not used by people, so the question is: Is there a correlation between the popularity of a library and its internal technical quality, so in other words is it true that if you write really good code your library is going to be used a lot or not, or does it matter.
It has important consequence before I tell you what we found out and the there are important consequences, I mean if you find that there is a correlation, a positive correlation, if you say that yes so the libraries that are most popular they are really good internally, the engineers do a really good work. Then you say you know, you really should pay attention to these quality metrics. If you don’t find any correlation, then you say, there is something else going on, maybe you should be paying attention to something else but not that you should ignore those quality metrics, maybe there is more to it than putting this construction the right way. So that is sort of the implications and the implications for strategic management of projects and all sorts of things. So what we found out is that there is actually no correlation and the little correlation that we found and when I say little I mean technically, statistically. That little correlation that we found is actually negative, but again there is very little signal but the signal we found, some of the slices that we studied it looks like there is actually an inverse correlation between the popularity of a library and its internal qualities, which what this suggests is that maybe those are the projects that have so much activity in them. With activity comes introduction of bugs and things that are not on the right way.
There must be what is going on in there and maybe the other projects that are not so popular have less activity therefore they can solidify a little better. So there are interesting findings that you can do if you have a very large amount of code. So that is one thing, another thing that we also started doing was, this is more on the tool side, program synthesis, but then the modern machine learning style, so what I mean by that: program synthesis is the idea that you can generate the program by giving very high level, possibly incomplete specifications, by example or other ways. Usually the traditional technique that you use to do program synthesis are very much based on sort of logic, we have theorem solvers, we have all that good stuff.
So our idea was, so there is all this code out there, pretty much almost anything that you might want to do has already been done, the composition of things is different but your little functions are probably all there. So could we somehow generate the program simply by finding them over this very large body of code and what you give is a specification that could for example be a test case, you give a test case, so the inputs and the outputs, and now you go and search for code that passes that test case and now you get sort of candidates, implementation candidates, programs based on things that have already been written. So that is called Test Driven Code Search, is a research project that I also had been working on with some colleges in Brazil and that is using leveraging this very large source code base that we have collected.
Werner: So it’s actually searching the codebase for the actual function.
Yes, it’s already there, “all” we need to do is try to find things that do what you want, so that is the definition of synthesis, we already to have this very large data and you are just trying to match the functionality of that you want with very fuzzy specifications of the things that already exist, instead of trying to generate it from first principle.
Werner: It’s definitely interesting, I think it’s an old hope to do that, to do program synthesis. Have you also looked at the temporal development of a repository, so do they get worse, better.
Myself and my students we have not done that kind of studies, there are a lot of people doing those kinds of studies. We have tended to focus that kinds of studies that we did tend to focus on single snapshots of a lot of projects and there are things that you can study that way and the other things that you cannot study that way, that you need the temporal information, which we haven’t done those kinds of studies.
Werner: That is very interesting, I should look in that more and I think you also have an open source project.
That is right, so that is one of the things that I do sort of on the side to keep me sane, actually going to do some code that in the system that is used by lots of people, it’s called Open Simulator, and it’s server side for the Second Life client, so basically you can have your own virtual world running in your own computer. I’ve been involved in that project since end of 2008, I’m a core developer there, I’ve been doing a lots of cool stuff, don’t publish a lot of it but it’s just a lot of fun to go from concepts to actual implementation and things that work. The project has a very strong following among the some of the Second Life aficionados that don’t necessarily like to play in Second Life itself, they want to have their own server, so there is a lot of indy deployments. I like it because is very indy feel to it, is not mainstream at all, and so there is all these people who have these servers, their own servers, honestly I don’t know how many, I mean there is hundreds of thousands of downloads, there is a very large community that uses this but it’s also sort of indy movement.
They do different things, there are very strong adoption among people who do education, people who are sort of exploring alternative ways of delivering education, so there is a lot of people out there who think that the future of classrooms is in virtual reality, because you can reach people who are not present physically, but it’s still maintaining the realtimeness of education, so it’s not the MOOC concept, it’s realtime education but it’s sort of reaching out for people who aren’t physically present. So there is a very strong adoption in colleges and high-schools. There are also the people who are trying to make more money out of this, there are people who are trying to emulate the Second Life business model, so it’s for playing, role playing, a lot of role playing games, casinos so you know sort of virtual casinos, gaming and things like that. So that is sort of the major indy deployments that I know of.
Werner: So this is the server side aspect, so do you have, are there any interesting problems that pop up, I think I remember Second Life had problems with the size of areas, how many people could be in the same place.
So that is true for all multi user servers in general, so the question always is how can you serve more people with one server basically, how can you minimize the number of machines that you have on server side to serve more people. That is one of the interesting engineering challenges that I found really fascinating here is to, so these multi user systems, realtime multi user systems, all of them have the n^2 property, means that one person that joins is not just going to linearly increase the complexity, it's going to quadratically increase the complexity because now this person generates events and these events have to be sent to all the others that are already connected and all the events all the others are generating have to be sent to this one more person, so it’s n^2 square increase of data, of computational complexity, it’s very, very nasty.
So there is a lot of in engineering that has gone into, especially in gaming I found out, the web now is rediscovering all of these techniques that have been applied to multi user games, and specially the massive multi user games, so there are different ways. Open Simulator right now is actually better than Second Life, we just had a virtual conference recently in the beginning of September and we were able to host about 380 people, not on the same server, but we are now up to a point that we can serve very comfortably about 100 people doing real stuff without lagging the server very much on the same server. So that is an improvement, in Second Life if you pass 80 people you basically, everything sort of crawls, and in Open Simulator now we are sort of now starting to pay attention to the performance because we wanted to be able to do the conference.
Werner: Well Crista you’ve giving us a lot to look into, thank you very much!