1. We here at Code Mesh 2013 in London and I am sitting here with Will Byrd. Will, who are you?
My name is Will. I am a programmer. I have a long history of programming in Scheme and in doing functional programming. Over the last 10 years I have been doing relational or pure logic programming in a language called miniKanren. I got my PhD in programming languages from Indiana University in 2009 and now I am a research associate at the University of Utah and I just applied for some professorship positions so hopefully soon I will have a more permanent position and get to share the glory of relational programming with the world.
Logic programming is the idea that first order of logic is a very nice way to express rules or knowledge and this has been used in artificial intelligence for many decades. Prolog is the best-known logic programming language from the early 70’s, but the idea of using logic to represent knowledge goes back to the ancient Greeks. The idea is if you want to have a knowledge-based system, something symbolic, first order logic is a great way to represent it. Prolog is the most popular logic programming language and in some sense it had a lot of success in knowledge representation and artificial intelligence and the promise of logic programming is that you should be able to just say what you want to happen and instead of how it should happen. It is the notion of declarative programming and what declarative programming means depends on who you talk to, but I think that most people agree that logic programming is pretty high up there in terms of declarativeness and if you look at Prolog books they usually begin with this idea of declarative programming and using logic and it's all very nice and clean.
You can take a first order logic formula and encode it as Prolog clauses with some issues, really negation and some other things where you can do it. The issue is that when you are writing as a declarative style you give up a lot of control. You are giving up control over how the answer is found and what Prolog uses as a search, it uses a depth first search, which is incomplete which means you might not ever find an answer, even if it exists and that is for efficiency, for better memory usage. There are other choices that Prolog makes for efficiency - for example, the fundamental operation in logic programming is a thing called unification, which you can see as a type of pattern matching. It is a two-way pattern matching where you can have variables on both sides of the patterns instead of just one side. That is the fundamental operation.
In order for unification to be correct, to be sound, you have to perform something called the occurs check which is making sure that one of the terms you are unifying on one side does not appear inside of another term on the other side. Prolog does not do this check for efficiency because this is an expensive check, so that is another compromise that Prolog has made for efficiency. Often you can get away with this, but there are a bunch of other compromises that are made either in the Prolog language or more likely in how Prolog is used. In particular, in order to control the search, Prolog programmers like to use something called “cut” which is a way of pruning the search tree, saying “do not do any backtracking search over this part of the tree” and instead we are going to make sure that if you go down this branch, we do not go down this other branch.
That is inherently impure or inherently non-relational or non-logic or, as logic programmers call it, extra-logical. And there are other extra-logical features that Prolog programmers tend to use in practice. One called “assert”, one is called “retract” - those are ways to add and remove information from a global database that Prolog has. So there's a whole slew of these extra-logical or impure operators that people in Prolog use and they use it for good reasons, not because they are naïve. They understand the trade-offs of using these. Your program can be more efficient, you can have more expressive control but if you are not careful, your program can be unsound. In other words: there may be an answer to your query that you do not find. So an analogy would be in a database lookup. If you are doing a database query, maybe there is actually information in a table that you should find, but you do not find it.
That is the danger of trying to prune the search tree by hand, if you do not do it carefully. Even if you do it carefully, you are still giving up in general on this view of just specifying the logic or the high-level rules and then letting the system, figure out what the answers are. In particular, one of the great advantages of pure logic programming is that you can treat your programs as relations, instead of, say, functions. The traditional example is append, appending to list. If you had the list of the numbers 1,2,3 and you are appending to that the list 4, 5 then you will get back a list 1, 2, 3, 4, 5. So, if you call append in Scheme, let’s say, a functional language, you can pass in list 1, 2, 3 as the first argument and list 4,5 as a second argument you will get back the list 1, 2, 3, 4, 5. That is the answer. If you treat appends as a relation instead of a function now it takes three arguments instead of two. The third argument would be basically that answer, that output, which is list 1, 2, 3, 4, 5. The interesting thing about treating append as a relation is that now you can put a variable in a different position, one of the argument positions to represent unknown information and this is similar to a database query. So you can say: appending 1, 2, 3 to 4, 5 is z, is my logic variable and therefore that variable will become bound or associated with 1, 2, 3, 4, 5. That is equivalent to having the function, but you can do more interesting things.
You can say the list 1, 2, 3 appended to y gives you 1, 2, 3, 4, 5 and the system will figure out that y has to be list 4, 5. You can say: append the variable x to the variable y to get 1, 2, 3, 4, 5 and now you get back all of the pairs x,y of the list of numbers that when appended give you 1, 2, 3, 4, 5. So you get multiple answers. In fact, you can even generate programs that have infinitely many answers and have a stream of answers. So it gives you a lot of flexibility, the ability to get back multiple answers, the ability to find unknown data pretty much anywhere in your program and this relational approach is what I am interested in and exploring. Unfortunately, in Prolog once you have started using these extra-logical operators like cut, or assert and retract, or copy term or rejection of numbers using “is” or any of these other slews of extra-logical operators you lose that ability to treat your program as a true relation. So my interest is in trying to push that relational aspect as far as possible, recognizing that that gives up efficiency in control in some cases.
3. So relational programming? What is the nature of it? How does it differ?
In my mind relational programming and logic programming have a couple of differences. One is that in relational programming – that is a term that Dan Friedman and I like to use for the style of programming, I think it is more maybe in tune with how people think of relational databases – but the idea of logic programming is that normally you start out with knowledge representation as formulas and logic and, say, first order logic and then from that you will write a logic program in Prolog - it would be in terms of Horn clauses which is the syntax for the type of logic they can handle. Then usually, in logic programming, once you have your declarative representation of a problem, then you start adding these extra-logical or impure features to get the performance you need or to get the expressivity you need to make sure you can express certain forms of negation, for example. That is usually how I see logic programming, starting from a logical specification of the program.
The relational programming approach I see is starting with maybe a function, something like “append” as a function and then saying “What happens if we treat the output argument, or the output from the functional standpoint as just an extra argument and now we can put the variables anywhere?” The relational programming approach and the logic programming approach will often give you the same sort of thing in the end, like “append” is the same in Prolog and miniKanren really, but I think it's a sort of philosophical difference of how you are starting, like we tend to start more from taking a Scheme or functional program and then trying to extend it to be relational. Then the other thing is the commitment to making sure we can always treat our program as a relation and not using any extra-logical features. That means we are not using cut, we are not trying to prune the search tree, that sort of thing.
So, that is a trade-off and in order to make that trade-off make sense, in order to express more complicated programs, we have to have some extra tools and one of those tools, probably the biggest tool we have, is constraint logic programming. So adding extra constraints besides unification, besides the two-way pattern matching, we have other constraints that can help us express certain parts of the problem. The way both Prolog and core miniKanren work, you have unification – a two-way pattern matching, you have disjunction – the ability to express an “or” basically and you have conjunction – the ability to express an “and” logically. Those are the main features you have and that is efficient, that is Turing complete, along with the search inside of the implementations. That is Turing complete – you can express any computable function with that. However, in practice, you want more, just like you want more than just the Lambda calculus if you are doing functional programming.
So, constraints tend to be very handy. For example, doing arithmetic. If you want to do arithmetic relationally, you really need some form of constraint or you have to encode an entire arithmetic system in a relational manner, Oleg Kiselyov came up with this very clever way of doing it. But once again, you have trade-offs. In Oleg's system, it can handle our arbitrarily large integers but it is not very efficient and it is awkward the way you express the numbers. You have to express the numbers as a list of binary digits in a little Endian order versus using constraints over finite domains of integers, saying this variable x is between 22 and 73 in which case you can use just normal representation of numbers in the language and the underlying operations are more efficient but it has other restrictions. All these things are trade-offs and really the art of writing these programs relationally is trying to figure out what constraints you need or what representations of the data you need in order to come up with a reasonable program that is relatively efficient and it is expressive enough and in general negation turns out to be a real problem.
So that is what we have been spending more of our time looking at: what sort of constraints do we need to solve interesting problems? By their very nature these constraints tend to be very powerful tools for solving very specific problems like how to represent certain types of arithmetic declaratively but, if you cannot map your problem onto these constraints you have to go to the underlying system and use standard logic programming which may not be very expressive or maybe slow for your problem, or whatever. Or you have to decide: is it worth actually implementing my own constraint to solve this part of my problem and then you have to go through that work and that maybe non-trivial defining the constraint, maybe non-trivial implementing that constraint, may not even be clear that there is a declarative way to implement these things. So there are many problems I would like to solve where I am not even sure that it makes sense to solve these in a relational way. I do not know what that means. These are the overall differences, as I see it.
That is a good question. I do not know the answer to that. It is one of the reasons I am going to work in this area, trying to figure that out. I can tell you the problems that jump out to me. Rule-based problems – if you have a set of high-level declarative rules explaining what needs to be done – that is usually a good fit. If you have something like a rewriting system, for example a source-to-source optimizer for a language where you are looking at some code and if you see certain patterns in the code and you want to rewrite that as another pattern, either simpler or more efficient or more idiomatic, for example - that is also a good use for this type of system. One set of rules that comes up a lot in programming languages are type inference rules. You have these typing judgments and in logic they are written with a bar and you have some formulas above and some formulas below saying in order to prove the formula below, you have to prove the things above the bar. This comes up a lot. Not just in type inference systems, but also in interpreters and various term rewriting systems and so forth.
So much of theoretical computer science is written this way and there are also problems of pragmatic importance that people write these rule-based systems or inference based systems or typing judgment style systems. Anything like that tends to be something that is very well handled by a relational style system. The advantage is that you have extreme flexibility. For example the type inferencer – if you write down typing rules for a language, if you look at the rules the right way, you can see that, in a way, they actually do not have a direction, you can run them backwards. If you have a language like Java, Java has a type checker but it does not do type inference. A language like ML does type inference where you can say the value of x, you can let the compiler figure out that x is going to be an integer. You do not have to write that annotation yourself. The interesting thing to me is that if you write down the typing rules for these systems and you implement it in a relational way, not only do you get a type checker, but you also get a type inferencer from that, for free. Not only do you get a type inferencer but you have the ability to do what is called type inhabitation where you can say here is a type, give me a program that has this type. You can generate programs that have this type. This turns out to be very powerful, it has a very deep connection to mathematical logic and programming language theory and type theories called the Curry-Howard isomorphism where you can look at the programs that have that type as proofs and the types themselves as propositions in constructive logic and it turns out that many prover systems or many or theorem provers work based on that. Another example – maybe a little bit more concrete to many people - is the idea of an interpreter.
Probably most people have used some sort of an interpreter. It could be something like a calculator, but more likely or more interesting examples are languages that have REPLs like Python or Lisp or Clojure. You would type in (+ 3 4) in Scheme, because it uses prefix annotation and then the evaluator and the interpreter would return 7. You have an input expression - (+3 4) or 3+4 in Java – and then the output would be 7. When you write an interpreter or evaluator in this relational style, you can run it forwards just like normal, give it an input expression and it'll give you the answer, but you can also give it the answer you want. You can say the answer is 6, the value is 6, give me expressions or programs that evaluate to 6 and it will give you answers like (+3 4) or a procedure applied to some other complex argument and so forth. So you will get not just well typed programs, but programs that if you were to run in Scheme or Clojure or Java or whatever, actually give you the expected answer. One thing that I am interested in using this for and I have not figured out how to do this exactly but I feel like I am close, is using it for fuzzing which is basically testing a program, maybe an interpreter compiler feeding it random inputs and make sure it actually works correctly. So, one of the problems with fuzzing is that – the traditional way is to generate lots of programs that may not even be syntactically correct, you might throw out the ones that are not syntactically correct and then does this even make sense type-wise and throw out the ones that do not and then eventually you will have some sub-set of the programs you generate and then run those through the system or if you are using a sign like QuickCheck in Haskell or one of the systems derived from that, you basically have some axioms you specify that talk about program properties and you use that to generate data. The neat thing about writing a relational interpreter or evaluator is that, in a sense, you will have encoded the entire semantics. Not just in a sense, you really have encoded the entire semantics for the programming language as the evaluator so all the programs it generates are well-typed, they evaluate back to a specific value if you give it that value and you should be able to use that for fuzzing and other testing related things, especially if you put probabilities or weights on the generation of the expressions. You should be able to do all sorts of cool things with that. That is something I am interested in looking at.
Werner: So if our audience wants to try logic programming, I think you have a solution for it – a language.
I am not really here to just push miniKanren. I think logic programming, in general, is interesting. Look at Prolog.
Werner: Let us push miniKanren anyway.
Let us push away. My thesis is that if you want to learn about logic programming, look at Prolog. If you want to learn about relational programming, you really want a language that commits to it and has support for it. Andy Pitts in his group at Cambridge has a language called MLSOS which is a nice language for that. Our language is miniKanren and we have been actively adding constraints and other features to make the relational programming very nice. I think miniKanren is a good language to look at. If you look at http://miniKanren.org you will see a list of languages that it has been ported to because miniKanren's implementation is very short, it is designed to be readable and hackable and all that. So it has been ported to many languages at this point. Most major languages have some implementation of miniKanren or you could port one, port it to one of the languages you want. The main languages with major implementations of miniKanren are Scheme, which was the original language we started with and Clojure, which David Nolen took the code in my dissertation and created the core.logic library which is now extended and cleaned up in many nice ways. Most people using miniKanren are actually using the core.logic version in Clojure. So if you are a Java programmer, this is a chance to explore Clojure, which is a cool language that runs on the JVM.
Then the other main implementation is this thing called cKanren for constraint Kanren and that runs in Racket, which used to be DrScheme and now it is DrRacket. So those three implementations I would advise people to look at. But if you are a Python programmer, there is a Python implementation and so forth. Actually, I do not know if there is a Java implementation – there probably is. If not, I know that in Java there are things like I think Jess is a rule-based language in Java. I guess one way to think about it is, if you are kind of looking for a motivating application, the sorts of things that you might use a rule-based engines like Jess. When you are starting to think, rules or rewriting expressions into other expressions, those are the times when you would want something relational and that would be a good time to explore this area.
Werner: I think you have already mentioned your interest in program synthesis to run the interpreter backwards.
That’s right. When you are running these programs backwards, be it a type inferencer, be it an evaluator, you are performing program synthesis. You are basically giving a specification for the program, as a type, as a value, you want to produce, and you are letting miniKanren find, through its search or through its constraint solving, program expressions that satisfy that constraint. What is most interesting to me is that you don't even have to say what output value you want or what output type you expect. You can have a logic variable in that position as well, also. You can say, for my evaluator or interpreter, both the expression I want to generate and the value that expression evaluates to, they can both be logic variables and then the system would just sit there and generate pairs of expressions and values that the expression evaluates to.
But more interestingly, you can sometimes give shared structure between both the expression and the value. So the simplest example for an interpreter would be to say that I want the expression and the value to be the same. So for using variables you could say that evaluating x gives you back x. So we want a program that evaluates to itself and there is a name for this is “quine”, a self evaluating program and there is a long tradition in C, for example of generating quines. It is kind of an intellectual exercise. There is a mathematical theory of how to do it – Kleene’s second recursion theorem - and all these sorts of things. But the amazing thing to me is that if you write a Scheme interpreter in a relational style and that Scheme interpreter has list and quote as part of the language, if you do this simple query where you say: “I have the same variable for both the expression and the value”, you will get back quines right away.
You will get back these non-trivial programs that evaluate to themselves and in fact, if you look online for the quine page you will see the canonical quine in Scheme or Lisp and that is the first quine that is generated by this system and it is something that requires some ingenuity to come up with. It comes back in about 20 milliseconds on my system. So you generate about 20 quines in a second and you can also generate all sorts of variations on quines and these twin quines called twines and things like that. So that is something you get when you have shared structure between the input expression and the output value and that shared structure helps prune the search tree so you can exclude many of the branches. Another thing that I have been dealing with recently or looking into recently is synthesis of programs in combinatory logic which is a system equivalent to Lambda calculus which is the basis for functional programming.
So this is another system that is used as a basis for functional programming and the idea there is that your programs actually do not have variables in them at all. You just have these constants called combinators like S and K. All you have are these constant letters and then parenthesis which represent procedure application basically, application as combinators and that is all you have in the whole language and amazingly that is Turing complete that can express any computable function. The interesting thing there is that first of all it simplifies your code because you do not have to deal with variable binding but the neat thing is that you can express lots of structure between the input expression and the output expression, the output term. For example, you can enter a mathematical specification for a fixed point combinator which is used to specify recursion in functional languages.
You have heard of the Y combinator – it is not just a Paul Graham company, it is also a mathematical construct used to give you recursion in languages that do not have recursion built in. In order to specify that I want a fixed point combinator I can just type in the mathematical formulas or logical formulas. It is formulas like: Exist some fixed point function F such that for all x, F(x) equals x of F(x). I just type that in and I can generate fixed point combinators equivalent to the Y combinator so it gives me this long string of Ss and Ks and parens which is not very useful but then there is an algorithm for turning that into code that runs in Lisp, for example, or any other functional language. That is an example of non-trivial program synthesis that I think is pretty cool.
The problems with all these things is that first of all, there is an expressivity problem. You have to figure out how you can express these things in logic or as a relation. It is not always clear how to do that. The other problem is that the search space is generally exponential as you get bigger and bigger terms. The interesting thing to me is that, that logical specification for the fixed point combinator says on the left hand side is F(x) and the right hand side is x applied to F(x). So F(x) appears on both sides. Once again that shared structure prunes the search tree really aggressively so even though it is exponential, the search is exponential, the search space is exponential in terms of the size of the combinators, we can actually find fairly efficiently these fixed point combinators. So I am trying to figure out if there are ways we can extend this by exploiting the shared structure in the input and the output. How far that can be pushed? I do not know the answer to that but I was shocked that we could even do these examples so far.
I think those are really good examples of the promise of either functional programming or relational programming that you can just type in a mathematical or logical specification of your problem and then the program should be generated by the system. But it is difficult to do that and it is also expensive to do that so generally with this program synthesis stuff we can do small programs and that might be helpful for doing part of your program. You can look at it as a glass half full, also. Sometimes you will do this search and it will not come back so you just kill the search and then other times it comes back right away and if you win 1/3 of the time, that is still 1/3 of the code that you do not have to write. So, that is the promise and I am hoping to push that forward. By the way, there is really no reason this stuff could not have been done in Prolog decades ago. To my knowledge it has not been done, I have never seen a relational interpreter that can generate quines or things like that. You do need to use a few extensions, a few constraint extensions but I think the main thing is sort of philosophical, that Prolog programmers are used to thinking: “Yes, we can do things in logic, we can do things relationally” but whenever push comes to shove they can use these extra-logical features and cheat.
It is a similar situation to Scheme or Lisp or ML, these functional languages which have been around for a long time where in theory, you are supposed to do everything purely and you are supposed to write everything in this nice style where you are not doing any mutation, but those languages have mutation that you can change variable state and stuff like that all the time, whenever you get stuck. When Haskell came along and made a commitment to purity because of the laziness of the language required it and because of that, there was a lot of innovation in using monads to represent mutable state in a more pure way. I see a similarity here with what is happening with miniKanren relational programming versus, say, Prolog and logic programming.
It is not that you could not do any of this stuff in Prolog, you might need some extensions in Prolog to do it and you actually do, but you could do it and there are Prologs out there that have these extensions but I think it is more of a philosophical issue that by committing to purity we are forcing ourselves to find creative solutions to problems and we are going back and revisiting problems that have been looked at before by Prolog programmers but I think that they had a different take on it. By committing to purity I think we are able to look at these old problems with a new vision and that is a lot of fun to take a look at something as simple as a little Lambda calculus interpreter which is only a few lines long and it has been studied for 50 years and try to find a little twist on that. That is what I like doing. Instead of writing a 100.000 line program I like trying to write a three-line program that does something surprising because other people have written that three-line program, but if you have the right perspective, sometimes you can find something deep in there that maybe other people have not seen.
Werner: I think that should be a motivation for our audience to study miniKanren, core.logic and relational programming in general. Thank you, Will.
Thank you.