BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Presentations Modern WAF Bypass Scripting Techniques for Autonomous Attacks

Modern WAF Bypass Scripting Techniques for Autonomous Attacks

Bookmarks
46:18

Summary

Johnny Xmas talks how the various forms of “bot detection” out there work, and the philosophies behind how to modify/spoof the necessary client environments to bypass nearly all of them using anything from Python and JavaScript to Selenium, Puppeteer and beyond.

Bio

Johnny Xmas is a predominant personality in the Information Security community, most well-known for his work on the TSA Master Key leaks between 2014 and 2018. He is currently working with the Australian firm Kasada to defend against the automated abuse of web infrastructure.

About the conference

Software is changing the world. QCon empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.

Transcript

[Note: please be advised that this transcript contains strong language]

Xmas: I am Johnny Xmas. I have done pretty much everything there is to do in the security space. I have a rule; whenever I switch companies, I want to move into a role that's something I've never done before so that I can come out of it with a way more well-rounded understanding of what the boots on the ground are doing in all these places. I started out as a systems engineer and then moved into network engineering because I was really drawn to that. I've been a hobbyist hacker for all of my life. I've gotten in plenty of trouble for it. I have whole other talks on that. I moved into systems engineering, network engineering.

I was an information security engineer at a Fortune 500 company, Global 1000. That was amazing, I really got to cut my teeth in an amazing space when you're dealing with something that's at that level of an enterprise that really gave me a great understanding of what's going on on the business side as well as the IT side. From there I was able to move into information consultancy and help other Fortune 500s etc. deal with the security problems in a way that would help IT be able to talk to the business and really cobble things together and remediate things in a way that everyone liked and everyone could understand.

From there, I moved into penetration testing specifically. That's where all the fun stuff starts. Those are the guys who break into your company and hopefully give you a really lengthy report on what they did to break into your company, what you need to do to fix it, instead of just running in and hacking everything up and giving you the finger and leaving. Who's had penetration tests done at their company? Who's had application testers come in? Good. Every year I see that hand count go up and that's very pleasing, because when I first started doing that, it was one hand in the back and it was terrifying.

Then in my last position, I was an industrial security researcher. That was super fun, the industrial security scene is terrifying. Anyone here working in the ICS space? Probably not, it's very empty. It's like going back into the past. The average staying time for a piece of hardware in industrial IT is 19 years, because they're very much into stability over anything else. Don't fix it if it ain't broke. That means you're still dealing with 20-year-old software out there in the field, you're dealing with 20-year-old hardware, you're dealing with firmware that hasn't had an update in 20 years. The security implications of that are terrifying, it's a huge problem and we're not focusing enough on it as a country, and that might just be to keep the fear and the raving masses at Bay. But yes, industrial security is a terrifying place to be.

I moved out of there because this company, Kasada, had developed a really freaking cool product for defending web applications from automated attacks. I'll touch base on that maybe towards the end of the talk if I have time. I am the director of field engineering for Kasada. I'm also a blade runner, meaning I spend a good chunk of my time just hunting bots and doing research on the bots that are out there on the Internet attacking all of your stuff. The product we have sucks up all of that general white noise that you get whenever you connect anything to the Internet. Because of that, we have a lot of really cool data to play with to see what's going on at the Internet as a whole.

This talk here is for developers. I try to bring something that would be for this audience. Normally, I'm talking to hackers at hacker conferences. I'm a hacker, I know how to code, I know how to write scripts, but I'm not a dev. Everything I do is for speed and not for stability. I don't do unit testing when I'm hammering out my scripts to get through a situation. Especially as a penetration tester, when I had a very limited time limit to break into things, I had to just write stuff and go. Speaking here makes me really nervous, because you're the ultimate coders. You are the people who do everything right, and I'm the people who do everything wrong. I'm the one who wrote that goofy script you found in a Github repo that doesn't even have comments in it and didn't properly use classes and functions, and you're, "What five-year-old wrote this?" That was me, and it was a five-minute script I needed to hack through and get a job done.

One of the reasons we as hackers write a lot of these scriptings is to automate something we're trying to do because normally if we were doing these attacks by hand, it would take us potentially hundreds of years. I'm talking about trying to brute force a login where I have huge word lists of usernames and passwords and I have to find the right combination. I'm not going to sit there and do that by hand. I'm going to sit there and hammer this out. I'm going to have Python do it, I'm going to have JavaScript do it. I don't have time for this crap, my computer has time for this crap.

Over the years of doing penetration testing and specifically web application testing, I developed this bag of tricks for getting past a lot of the defenses that exist out there that are stopping us from doing these brute forcings, from doing our scraping. I guess everyone here probably has a need for what they think this talk is about, which is very interesting to me. I didn't realize how big that need was until I started working for Kasada and really realized that there is a massive amount of what I've always called corporate espionage going on. That's the, "I work for one company and I need to figure out what another company is doing, but that company won't just tell me, so I'm going to have to go find out, sometimes based on publicly available information, such as aggregating pricing data for all my competitors, or figuring out what they're selling all of their stuff at so I can make sure that I'm competitive in my pricing."

There are a lot of defenses out there to try and stop this sort of stuff from happening, and a lot of them are really bad, and a lot of them are really old, and that's why they're really bad.

Web Application Firewalls

There are two main things out there. There are these basic WAFs. WAF means Web Application Firewall. The reason it's called a firewall comes from the fact that, for the longest time, the majority of them worked based off of IP address-based rules. The most they could do, as a defensive mechanism, was blacklist your IP. Some of them, later on down the road, and we're talking mid-2000s, started to be able to do some very basic behavioral analysis, and we're talking about timing. If something's coming in, if you get X number of requests in X number of seconds, well that's clearly not a human and we're going to shut that down. That's about the max you can get out of a basic WAF. It's super easy to bypass these, simply even just by rotating your IP address. The fact that they blacklist your IP as their only real mechanism for defense is, again, trivial to bypass. I put this up here twice because these don't do much and there's not much to say about getting around them, it's all right there. Simply rotating your IPs is really all you need to get around these basic WAFs.

These basic WAFs are what's out there in most cases as well. These are usually on-prem. These are usually appliances you have in your data center. You set them up so your network routes all of your incoming web traffic to the WAF first, it takes a look at what's going on and that determines whether or not it's going to send those requests over to your origin server or not. In nearly all cases, these aren't in-line like that. In nearly all cases, these corporations are putting these in a monitor mode where they'll alert if something wacky is going on, but they still won't block. The business never lets these block because their false positive rate is so freaking high, that it's near impossible to convince the business to allow you to actually block these.

Your security folks will tell you, "Yes, the business never lets us block in-line. We always just have to alert and then it floods our inboxes with these alerts and then we don't care because we have a trillion alerts a day and we're overwhelmed and here we are." So even tripping the alerts on these things often doesn't get you caught before you've accomplished what you're trying to accomplish because it takes so long for that security team to come through and actually take a look at what's going on and do something about it.

SQL Map

One of my favorite ways of bypassing those old style WAFs is this tool called SQL Map. You can take a look at that repo to see the exact mechanisms it uses. The other thing that these basic WAFs are doing is post data inspection, and they're trying to identify specific types of attacks that come in through that HTTP post data. They do that by looking for a specific groupings of characters or groupings of words. That's all they can do. SQL Map has a really cool obfuscation technique that it uses to still be able to send in injection commands while fooling the thing into not seeing the characters that it's sending through. It's what you can see that's going on here. It's effectively putting a lot of these null characters in place, while still doing union SQL calls. If you want to see some cool obfuscation, just check out the SQL Map repo.

Between that and just rotating your IPs, getting past these basic WAFs and the fact that nobody responds to those alerts anyway, is actually really trivial, and you're going to see that that triviality is a common theme in what we're doing here. You're going to be really surprised that most of the things I recommend are going to be simple things that you can add in one, two, three lines into any of your scripts that you're writing. There's not going to be any devastating, "Holy crap, this is a really complex attack, I can't believe somebody thought of this" kind of stuff in this talk. It's going to be surprisingly basic and you're going to be really upset at the state of things in the defense of universe right now. Forgive me if I'm also flying through this - there is a lot of data we have here.

There are these more modern WAFs coming out now that I call sophisticated WAFs. These often exist in the cloud as a kind of a reverse proxy. They operate similar to how a CDN operates, you'll have your DNS send all the traffic up to them first. They'll figure out what requests are good, what requests are bad. They'll send the good requests to your origin, and they'll do something about the bad requests, and that something varies based on whoever sophisticated WAF you had been using. These often partially rely on JavaScript execution. This is usually to fingerprint the client environment.

What we're doing here is we're actually taking a look at the connecting client and not just the post data. We're not just looking at HTTP headers, we're not just looking at post data. We're not just looking at IP address information. We're actually seeing what's going on in that client environment. Is this a real browser that's actually trying to connect to my server? Or is it somebody pretending to be a real browser? Unfortunately, in most of the cases, that fingerprinting is still not very good. What I'm going to do with the majority of this talk is tell you how to get past this more sophisticated stuff because the first stuff's really easy.

Bare Minimums

This is the bare minimum stuff that you're going to do. This next few things I'm going to tell you in this section are going to be like, "Please at least do this." This is the, "At least you showed up to work today." At least you tried, you're going through the motions. Everything you write should at the very least, be doing these few things. Rotate your IP - you should be rotating your IP all the time. When to rotate your IP is going to vary based on what you're attacking. There is an art to this, you may have to rotate it with every single Git request. If you're bypassing something that's really good at doing behavioral analysis, you might get flagged after a single Git request. And if you're trying to scrape data off a webpage, getting stopped after one Git, is super irritating. But if you can rotate your IP after each Git, that's devastating. That's going to bypass so much stuff out there, and all you're doing is that one thing and that's something that's super easily scriptable.

Obtaining IP addresses for this is really easy. There is no end to free proxy sites out there. Just Google free proxies, you can look up VPNs as well, they call themselves sometimes. The paid services of course are going to be much more reliable, whereas the free ones often get blacklisted by IP reputation services relatively quickly, but the paid services are pretty cheap. We're talking 15 bucks a month for thousands of IP’s that you can use. There are really cool services out there that will let you lease residential IPs. This is super devastating if you're dealing with the old style WAFs, which, like I said, is most WAFs out there, in that no business is going to allow their security team to block residential ASNs. Whose company is going to let you block all of Comcast? Especially if you're an eCommerce site. If you're a company where a significant portion of your business comes from access to your website, or you simply run customer portals, if you're say a health insurance website, anything like that where you absolutely need individual people with residential IPs to access your website, these will never get blocked. There's a lot of really cool services that'll rent these to you relatively cheap for what we're trying to do here.

Where they get these IPs is super sketchy, because normally where do you get a residential IP? You call up Comcast or AT&T, whoever, and you get your one IP address. If you need a second IP address, what's that cost? What's that cost in New York? Probably 10 bucks? 10 bucks an IP, if you can even get another static IP from them. What these companies do is they run these side hustles. Luminati is my personal favorite. Who knows Hola VPN? Maybe you use it to watch TV in other countries. That's its most common thing that it pushes, it's a free VPN service. Have you read the terms of service for Hola VPN? No, of course not, nobody does that. If you skim through there - it's not that long of a terms of service - you'll see that Hola VPN says if you're using their free service, you are agreeing to also allow them to use your home IP address as an exit node for other services.

This other company they run is called Luminati. If you check out luminati.com, they lease residential IP addresses. By using this free software, this Hola VPN, you're an exit node for botnets. You are literally hosting malware from your house traceable to your IP, mind you. But, as people who need some residential IPs, that's a great place to go get them.

Monkey Socks is another one. Monkey Socks leases mobile IPS, cellular network IP addresses. It gets those from an SDK that it offers that anyone who is effectively writing most any mobile app that is capable of establishing a network connection, same thing, it just ties in there and it says, ''Use our SDK. Throw our little blurb in your terms of service that you say, by using our free app that we wrote for Android, you also agree to let us use your network connections for whatever the hell we see fit.'' That's terrifying. For any free app that you're using on your phone, take a look at the terms of service. Especially if it's not ad supported, they're getting money from somewhere. This is where they're getting that money from. But you, as the attacker, can go ahead and use an entire ASN full of mobile IPs, which nobody's going to block.

Aside from rotating your IP addresses, we're going to start getting into what the more complicated, the more sophisticated WAFs out there are looking for. Again, they can really only rely on the data that you send them for the. This is the medium sophistication stuff. Make sure when you're writing the HTTP scripts, that you're sending the usual HTTP headers that any browser always sends to them. You can take a look; just go into Chrome or Firefox or whatever. Go in the inspect panel and just look at the normal request headers that get sent in there. Specifically, I call out the accept*/* that bypasses so many bot detections, it's hilarious. Most scripts, most binaries that do HTTP, your curl, your wget, it doesn't send that header, so these rules that exist on the defensive side just go, "Does it have that 'accept anything'?" If that accept header is missing, it goes, "Nope, that's a bot. Shut it down." You can bypass that rule just by adding your accept*/* and you get right in. Sometimes that's the only bot detection going on in these mid-grade WAFs.

The Do Not Track, is another one that they'll use as it's more of a weighted thing since it's not always sent in the first place, but you can use that to weight in your favor because they're going to go, "All right, we also saw the do not track header, so probably a real browser." Sometimes, depending on the type of communications you're doing and what you're up to, there'll be X headers. You guys know what the X Headers are, like X forwarded for X. That's one of the most common ones. These are optional headers where you can pretty much invent any kind of header you want. You just by spec, you start it with an X and you add whatever data. Look at the X Headers that are coming in and being sent when you're doing normal communications on that website and figure out if this is something that you should be adding to your script or not. Watch out for the X forwarded for if you're using free proxies, because some of those free proxies will add an X forwarded for that will put your source IP, the actual IP you're coming from in there and then that just completely blows your cover. Most of the paid proxies don't do that. You want to look for something called a transparent proxy or an invisible proxy. Most of those are going to not add any header data and will also forward any header data that you add. That's really critical for this.

Your user agents - definitely send a valid user agent, something from a modern browser, and you can just go look at what the current Chrome one is. Look at what the current Firefox one is. When you're copying and pasting your user agents, don't include the quotes. Everyone includes the quotes and that's the easiest way to detect if somebody's up to something sketchy, because their user agent still has those single quotes in there, and you go, "No, no browser actually sends the single quotes," but they copied it and pasted it right from Chrome because when you view it in the inspect panel, it's got those single quotes and they just pasted it and it's a dead giveaway that this is not somebody using a real browser. Watch the quotes.

Sometimes you want to use session cookies. This is something you're going to have to experiment. Really, this is all stuff you're going to have to experiment with. Try it on, try it off. Generally, the top two are something you're always going to want in there. The other three and the user agent. X Headers, Session cookies, you want to play with. Session cookies often allow you free access to everything. They often eliminate throttling, especially in an authenticated scenario. Sometimes you're going to go in by hand, authenticate to the website, then grab your session cookie and add that to your script, then the remote server is going to have no rules throttling authenticated people because no way they're fake, they're authenticated.

Sometimes they're just even non authenticated session cookies, but because, by default, things like Python requests, or curl, or wget don't even deal with those cookies, the fact that they aren't there is a great tip off, and these mid-grade systems will use that to just block you right out of the gate. Again, real simple stuff. We're not hacking anything, all we're doing at this point is just abiding by the HTTP protocol. That's going to be overarching theme of what we're doing here. Just make sure that you're mimicking a real browser as much as you possibly can by hand.

There's a really cool tool called POSTMan. The code option in POSTMan, in the upper right, is the tiniest thing on the planet. Whenever you send a request out, there are a bunch of links just above the window that has all your data in it, and one of them says code. Click that code link. That code link gives you this drop down that lets you pick the language that you're scripting in - I just picked Node.js here - and it will generate the request that you just sent in whatever language that you're scripting in. You can copy and paste this and it includes all your really cool stuff. There are your cookies right down there and there's your accept*/*. This is just right out of a request I dumped in, just go to google.com/maps. It's going to take a lot of that default stuff that the remote server is expecting, throw it right into the script for you so you don't have to spend a ton of time doing this by hand. Definitely check out POSTMman, even though you guys already are.

Rotate your user agents. This gets past so many things. A combination of changing your UA and your IP address is one of the most devastating things you do. Rotating your user agent is often a great way of getting more usage out of a single IP address. A lot of requests may come in from a single IP address for an organization, such as from a university, or workplace. You could have 10,000 people all using the same public IP because they all come out that same exit. Rotating these user agents really gives the look of it actually being an organization with a ton of different users. You can find lists of every user agent in existence anywhere in the Internet. If you look in my Github in the scripts I wrote from hacking Venmo last year, I have a flat file in the Venmo script. It's like 4,000 user agents and you can just grab that and use it. Again, super simple, you're just taking a flat file with the user agents in it and telling your script, "Go get the next one," or, "Go get a random one and just rotate through this." It looks like it's a bunch of people at some company, or some university campus, or something at a bunch of different computers because, again, the defenses for this stuff aren't that complex in nearly all cases.

You can also do this if you're trying to fuzz a whitelist. A lot of WAFs, and the more sophisticated WAFs, they'll have whitelists that people will make the poor decision of just basing off of a user agent. They'll say, "All right, if we see this specific user agent come in, that's fine. Just let it through." You can go through and fuzz every single user agent until you get one that actually lets you in. Definitely make sure when you're writing these scripts and you're trying this stuff out, you're logging what's going on, you're seeing what's working and you're seeing what's not, because there's a lot more going on in the background than just a binary, "Let this person in", "Don't let this person in." You're going to be fuzzing cookies. You're going to be fuzzing user agents, things like that. Again, cookies are the same thing.

Sometimes you have to provide a session cookie to even get where you're trying to go. Sometimes you can eliminate the session cookie in order to eliminate the throttling. You can write your script so that every time it gets a cookie, don't provide that cookie on the the git request because the remote server is watching how many times that session has made a request. They'll do that to try and get around you rotating your IP, because if you're rotating IP and still providing the same session cookie, you're literally providing the same ID to them over and over and saying, "It's the same person still." Make sure you're not doing that. Sometimes you actually do have to provide that cookie, so it's an art, it’s something you're going to want to try both ways, see what works for you.

Watch out for sneaky WAF cookies. The more sophisticated WAFs will often drop identifier cookies that you have to provide. These are the ones that will run a bit of Javascript on your end, do that fingerprinting, post that telemetry up to their server and then they'll respond with a cookie that definitely IDs you as you. Sometimes, you have to provide that every single time or it's going to block you outright, or you're going to get caught in this fingerprinting circle and not actually get anywhere. Or, sometimes you can get fingerprinted in a real browser. Get that fingerprinting cookie out of the way manually and then just copy and paste that fingerprinted authorization cookie into your script as a cookie replay attack. That's a super common one.

If you're having trouble getting your script to convince the remote server to generate the necessary cookies you need, do it in a real browser, and then just copy and paste that cookie and see what happens. A lot of products are susceptible to that. That's an old hack, it's called Cookie Replay, it works against a ridiculous number of things. Go ahead and try that. In fact, everything you're doing here, you should be doing manually in a real browser first to understand how the application you're attacking works, and then you're going to write a script to do whatever you need to do. Don't be afraid to copy and paste as much crap as possible.

Let's talk about the super serious stuff now. This is when we're dealing with the modern sophisticated WAFs. These are what we're dealing with the really expensive bot defense stuff. The stuff that is able to effectively force you to fingerprint your browser and as do it and does a really good job at making sure your browser is in fact a real browser, and not just a super, super snarky Python script that you spent three months writing to really mimic a real browser.

Edge Enumeration

Occam's razor with this one. Try to bypass that WAF entirely, try to find another way into the website that you're attacking. I wrote a script - it's on my Github and there's a link at the end of this talk to where my Github is - called Scan Canon. It's like a hundred-line bash script, and all it does is enumerate the edge of whatever ASN you give it. It finds all the servers that are running out there, it'll find a bunch of other stuff, but for this purpose it finds all the web servers that are running. Hopefully, that's going to find some of the edge servers, and then hopefully those edge servers have crappy firewall rules around them that let you connect directly to them instead of going through this cloud WAF they have set up. This sounds dumb, I see this all the time. You're literally just bypassing the WAF because normally you'll have this scenario where you punch in the web address, the domain you're trying to get to, and the DNS says, "Go to this IP address," and that IP address is the cloud WAF. Then you have to try and figure out how to bypass this cloud WAF before it'll let you get to the actual origin server.

What we're doing here is just finding the IP address of that origin server and just connecting directly to that IP. Now we're not dealing with tricking this WAF; we're literally bypassing this WAF. This is really common because people are really bad at writing firewall rules. I don't know why; firewalls are not complicated, but it's in the top three number of things I exploited as a penetration tester, was just these bad firewall rules.

ARIN is the American Registry of Internet Numbers. This is the full public list of who owns what IP addresses for American companies - InterNIC takes care of other international ones - and all of these registries are public. You can go in and you can say, "What IP space does QCon own?" You can get that full ASN, that full list. Then you can go to town and say, "Let's see what's out there. Let's who's running web servers on each of these IP addresses." This can take you a while, which is why I wrote a script to do it.

Sometimes you'll come across the website you're looking for, or the web application you're looking for. Sometimes you'll come across a dev instance that has different firewall rules, but in the end still gets you access to the data and the backend that you were looking for. That's really common. They'll have prod to go through the complicated cloud WAF, but dev doesn't have to because it's just dev, but they screwed up on the backend and let dev still access whatever you're trying to get. Or the dev site will use live data. We as devs know that this is a stupid idea, it happens all the time. It's common, don't be afraid to go look for it. You can save yourself a lot of time.

If you're forced to go through this cloud WAF, a lot of times they're using a CDN in front of it, and CDNs have path rules that will pass certain requests for certain paths through the WAF, and other requests will bypass it. Start fuzzing the paths. That can be hard, you're literally just punching random words into the URL path. See if you can find a URL path that gets you to the place you're trying to get to and doesn't force you to go through that WAF. This is some more advanced stuff. This is a lot of last resort things, but that's where we're at at this point in the game. A lot of times you'll find that accessing an application via a different URL path or different means has different rules associated with it, because somebody forgot to add them into the CDN. CDN is 6,000 rules as it is, and nobody knows how it works, and now you have found a way in that they weren't aware of.

Start smashing their DNS, find other domains that that company owns. Do you guys know what DNS zone transfers are? That allows you to dump every domain name that's registered within their DNS server. Look through those domain names. Does anything have the word "origin" in it? Here's a really common one. Look up if a company has www-origin.companydomain.com or whatever the TLD is - that's a freebie. There's someone very popular out there using that as a way of hiding the origin servers. Look for the word "origin" or something that looks like an origin server just within all the DNS names that their DNS server has and just start hitting that. It might work, you'd be surprised. I told you this wasn't going to be any devastating, insane, complicated hacks, this is simple stuff. You just have to think outside the box and find other ways in.

If you're able to get in, but you're being throttled and you can't figure out how to defeat this throttling that's happening to you, find all the edge nodes, find all the IPs that are hosting whatever it is you're attacking and attack only one. Attack only one until it stops you, then try attacking the next one. There are certain WAFs out there that have a really long window where blocked IP info gets synced up. Some of them are as long as 15 minutes, and so you can attack one and then you've got 15 minutes before that one tells all the other ones to start blocking you. Then you can just wreck the next one, and then wreck the next one, and then rotate your IP and start all over again.

Look for an API that hosts the data that you're looking for. Oftentimes this is just domain name/API. Go look, just type that in your browser and see if it exists. APIs are almost always less protected than the actual websites themselves. That's often because of the systems that need to interact with the API not being able to interact with it in the same way that a browser provides. You're not able to fingerprint the connecting system because maybe it's a mobile application, so you can't protect the API in the same manner. Look for that API, because that API may be providing the pricing data you're looking for, and may be providing whatever you're trying to aggregate, whatever that API may allow you to authenticate. You could possibly, if you're trying to brute force some login credentials, brute force against that API and it will have completely different rules associated with it. Don't assume that everything on their website has the same rules. Don't assume that every URL path has the same rules associated with it. Every single page and every single means of getting to that page can have its own rule applied. Don't assume anything, always try everything.

Sophisticated WAFS

Look for UUIDs or really complicated DNS names. Look for something that's super long and you go, "What is this?" especially in a scenario where it stands out, where you go, "These three names are really long hashes, and then everything else is very obviously named." Those are probably obfuscated origin servers. They're definitely obfuscated, and they're are obfuscated for some reason. See what they are, check those IPs. Use a tool like N Map to see what ports are hosting services and that'll help you figure out what that server is. Just connect to it on 80 and 443. It might be what you're looking for. There's a good chance that's an origin server and that might be what your target is for this particular situation.

We talked about the WAF cookies. These more complicated WAFs are going to throw down JavaScript snippets at you. Sometimes these snippets are in-line in the page template itself and every page has the same one. Sometimes, you'll send a git request and your response will be just a blank page with a JavaScript fingerprinting snippet in it and then you have to process that JavaScript, send back whatever response it's looking for, and then you'll get a valid fingerprinting cookie that lets you continue. Take a look, see if those are happening. Sometimes it's as simple as just not running that JavaScript. Block that specific snippet and it'll fail open. There's definitely a pair of products that just simply fail open if you don't run their JavaScript. That seems ridiculous to me, but that's a thing that happens. Or, run that JavaScript, run in a regular browser, take the resulting cookie, dump it in your script, Cookie Replay. Say, "Thanks for the cookie. I'm going to put this here now." Everything works fine a lot of the time.

Automate a Real Browser

Failing all of that, this is where it gets devastating. Just automate an actual real browser. Like I've been saying, what you should always be doing when you're trying to do anything that interacts with a website is you go interact with the website yourself manually. You see how everything works, you make some notes and then you write a script that's going to accurately recreate what you're trying to do to make this website work for the computer. Why don't you automate that? That's where things get really fun. That's what bypasses so much stuff. It is, however, a bit more complex and there is a bit of a learning curve to it.

ZombieJS and PhantomJS are kind of deprecated more or less, and neither of those are being maintained by the original creators anymore. I think Phantom is still at least community-maintained. I believe Zombie is totally dead. Anyone heard of Arachni? It's a web application vulnerability scanner. That one, I believe, runs PhantomJS in the background and it uses that to increase its ability to access more pages within the browser by actively running the JavaScript. Those used to be the way to go for a long time, because they were tools that would run JavaScript and pretend to be browsers by the mere fact of, "I run JavaScript, therefore I must be a browser." When we're dealing with these modern WAFs now, they're actually doing a lot more hardcore fingerprinting than just, "Can you run JavaScript or not?" Well, some of them are. There are still a ton that are just seeing if you can run JavaScript or not, and that's a terrible way of fingerprinting anything to determine if it's a real human in a real browser.

These days, the things you want to be using are Selenium - Selenium is super popular in QA testing - and Puppeteer in Headless Chrome. That's my all-time favorite, I absolutely love Puppeteer Headless Chrome. I'm not going to get into a super lengthy how-to on how to use Puppeteer here. It's got a learning curve to it, but honestly, you can watch an hour-long YouTube video that you can Google up and get the gist of how it operates. The deal is that Selenium, Puppeteer, they're running Headless Chrome. Headless Chrome is Chrome, more or less. Headless Chrome claims that it's a clone of Chrome and it does all the same things that Chrome does and it looks like Chrome. It's not exactly true, there are a lot of things going on under the hood of Headless Chrome that kind of give it away.

If you're dealing with something super complex that's really digging deep into that browser, it's going to identify this stock Puppeteer, Selenium, Headless Chrome setup right out of the gate. But it's kind of rare that you're going to come up against that. Definitely, even just try the stock config and see where you get. Automating it is super easy, it's very similar to JavaScript. That's going to get you a really long length of the way. It looks like human activity because you're using a real browser, you're using Chrome, and you can set your timing in there. You can set your throttling just like any other method of scripting.

It executes JavaScript to the fullest extent. It executes JavaScript because it uses the same JavaScript engine that Chrome uses. It properly leverages cookies, it does that whole tradeoff the way it should. It stores cookies, it does sessions properly. You can run multiple instances of it per IP because it's just a browser, like you can open multiple browsers on your computer.

Realistic WebDriver

If you're going to be doing this, go into your WebDriver settings for Selenium or for a Headless Chrome or for Puppeteer. Change at the least these aspects of it to make them look more realistic. Depending on what you're using, sometimes these will be inherently discerned by the WebDriver, and the WebDriver is this automation tool. Literally someone driving your web browser. If you're running this on your AWS instance that has 12 cores in it, your WebDriver is going to report that you're running a 12 core CPU. The average user of a website probably isn't running 12 cores on their desktop computer, so you want to fix that.

These are the things that a lot of those complex WAFs out there look for even in the automated browsers like these. If you change these things to be more reasonable for what an average user would have, like screen resolution, a lot of these default to 320 by 240 for the screen resolution. Some of them default of 1024 by 768. Nobody's running that these days. That's an insanely tiny screen resolution, change that. Go through these, and once you see them, they're obvious what they are. Set these to normal, what I call normal human values. Set these to what your mom would use. That's going to bypass a ton of the most sophisticated WAFs that are out there today.

Now that you at least have that thing to do in your bag of tricks, you're going to get past a lot, like 90% of the things that have been stopping you up to date. Again, I'm Johnny Xmas, I'm super active on Twitter. If you want to ping me and ask any questions, there's my company, Kasada. We at Kasada do sell a product that is not susceptible to literally anything I've discussed in this talk, because that'd be weird if it was. If you're at a company who's trying to get around people who are launching these more sophisticated attacks, you go ahead and give me a call, we can talk about that as well.

 

See more presentations with transcripts

Recorded at:

Sep 23, 2019

BT