Learn some RegEx for the New Year
If you are looking for something to learn for the new year, its hard to go wrong with regular expressions. With dialects available on every major platform, it’s a skill that will continue to pay dividends throughout your career. One way to get started is Firas Dib’s Regular Expressions 101.
Besides the usual quizzes, RegEx 101 features a regular expression explainer. This breaks down standard regular expressions and shows how each component of it works. This should prove useful to not only novices, but also experienced developers who are trying to understand a regular expression they found in some legacy code.
InfoQ. What inspired you to create regex101.com?
Firas Dib: I have for many years now had a bot on IRC which would explain regular expressions users sent to it. Not too long ago, before creating regex101, I also implemented the quiz on said bot. This was all a fun project in the beginning, but as time passed on, things got out of hand and hard to manage given that this was all written with mIRCs own scripting language. This was however not the only limitation. As all communication was done over IRC (which is text-only), it was very hard to make it intuitive and attract people to it. The explanation-part of the bot was widely appreciated but also quite annoying, as it would spit out quite a bit of information to the channel. So this got me thinking, what would be a better way of providing this service without these limitation I'm currently facing? A website of course! This was when I started converting my poorly written mIRC code to a tiny bit less poorly written PHP code.
InfoQ. How long have you been working on the site?
I started working on the website around the beginning of last summer.
I would also like to mention that I am well aware that the website has many design-similarities to rubular. This was not my intention at first when I started designing it. The design did however slowly converge to something similar to rubular (because it's a very obvious but good layout) and in the end I ended up using the quick reference and some other minor things from his website. I did however contact him and explain the situation and if there ever was any problem he just has to send me an email and I'll deal with it. I also added what I have stated here to the credits of the website. I am mentioning this because I don't want people to think I have committed plagiarism or something of that nature; all code on the website is my own and I have spent many hours working on it (aside from the quick reference).
InfoQ. It has been said that regular expressions are a "write-only"
language. That is to say, once written they are usually too hard to
understand by most programmers. What is your opinion of that
This is probably very true, and also partly why I created this service. You see, people would join IRC and ask the bot to explain their regex, and then they would leave. Perhaps come back an hour later and do the same thing again. Now with the website, they can either leave the tab open or create a permalink with their regular expression. Not only that, they can use the website with their test cases and the automated explanation as a reference to show their co-workers or even use in the comments in their code! Thus you kind of eliminate or at least help solve the problem with regular expressions being 'write-only'.
InfoQ. I'm not familiar with the bot you speak of. Can you tell us more about it?
The bot was just an extra mIRC client I had a friend of mine run on his server. It was nothing fancy really. The reason I even used mIRC to begin with was because I was very familiar with the client and when I was younger I used to do a lot of scripting in it.
InfoQ. How was the explanation feature written?
The explanation feature actually uses regular expressions to break down the regular expressions (inception!). Well, not only with regular expressions but I use them where I can. This might sound weird, but because PCRE is such a powerful library with support for recursion among other things, it allows me to parse the input very accurately. In order to implement this though, I had to read through the PCRE manual several times. I will have you know, that is no fun task. And even then, my service is not fully PCRE-compatible. There are a few things I have just decided to skip since they are used by more or less nobody.
InfoQ. The site was written with PHP's version of regular expressions in
mind. For those who use other implementations (.NET, Java, etc.) are
there any particularities you think they should be aware of?
You are correct, it is written in PHP which uses the (very powerful) PCRE library. This library supports a lot of things many others don't, for example recursion, lookarounds and conditional statements. People simply need to be aware of their own languages limitation regarding regular expressions. I can however try to help users with easier issues, such as their escaping and similar. For example, java requires you to escape your backslashes: \w would have to be typed out as \\w. I have had many users today request this feature and I will definitely consider it.
Another Good RegEx Site
This one shows you the RegEx explanation as a flow chart.
Interactive regular expression site