InfoQ Homepage Articles Safari Content Blockers Under the Hood

Development

Safari Content Blockers Under the Hood

Sep 23, 2015 9 min read

Follow us on

Youtube232K Followers

Linkedin26K Followers

At WWDC 2015, Apple introduced iOS 9. Although the new SDK does not introduce as many new or enhanced features as iOS 8, which included more than 4,000 new APIs, it does still provide a wealth of new functionality and enhancements. Along with the new SDK, iOS 9 is also marked by new developer tools to support some of its features, and new releases of Apple’s major programming languages, Swift and Objective-C.

This series aims at introducing all that is essential for developers to know about building apps for the latest release of Apple’s mobile OS. It comprises five articles that will cover what’s new in iOS 9 SDK, new features in Swift, Objective-C, and developer tools, and Apple’s new bitcode.
This InfoQ article is part of the series “IOS 9 For Developers ”. You can subscribe to receive notifications via RSS.

With iOS 9, Apple introduced a content blocker mechanism into Safari. Rather than provide a hook by which content blockers can provide a yes/no answer to each individual URL or resource as it loads, a content blocker is expected to serve up a configuration list — essentially, a set of URLs combined with a yes/no answer. This allows the web browser to compile this into the most efficient way of working with the format and to maintain a known good/bad list of websites to avoid consulting the blocker on each lookup. The next version of OSX, El Capitan, will have the same content blocker engine.

The other aspect is that this separates out the content blocker from the history or application state of the websites being browsed. A content blocker can say ‘Don’t go here!’ but it doesn’t get to track each site that you visit - if it did, there would be a strong desire for the content blocker itself to monetise the user’s personal browsing history.

Finally, by running the content blocker as an extension in a separate process, should the extension take up too much memory or be too slow to respond, it’s possible for the browser to arbitrarily terminate, relaunch or just disable the content blocker itself. In addition, by allowing people to audit the source code it’s possible for users to have faith that the content blockers aren’t leaking personal information. (Safari is based on the open-source WebKit project.)

Configuration File

The content blocker configuration is a simple JSON file, with an array of rules at the top level. Each rule is represented as an object and has two fields; a trigger, which says when the rule is active (for example, based on the URL of the site you are visiting) and then a corresponding action, which says what Safari should do about it.

An iOS application has to be created to host the content blocker extension. This application typically also provides the place to configure the content blocker. Xcode 7 can be used to create a simple Single View application project to host an ad blocker:

Once the host application project has been created, a Content Blocker Extension target can be added using the File → New → Target menu, and choosing the appropriate extension from the menu:

This results in three files being added to the project: an Info.plist for the content extension (which will give it the name used in the Safari Content Blocker list); an ActionRequestHandler Swift class (which is called by the extension to return the configuration file) and a blockerList.json file which contains a template for getting started:

[
  {
    "action": {
      "type": "block"
    },
    "trigger": {
      "url-filter": "webkit.org/images/icon-gold.png"
    }
  }
]

Building and Running a content blocker

Once the template has been created, the application can be run using Cmd+R in Xcode. This will launch the dummy application and install the application content blocker at the same time. The individual extension blocker can be seen by going into the Settings app, then in the Safari option there is a Content Blockers seen when one or more blockers are installed.

Screenshot showing the content blocker under Safari preferences

To reload the JSON file, the content toggle must be turned off, and then turned on again. This triggers Safari to reload the list of blocked sites. Applications can also trigger this by calling the SFContentBlockerManager method reloadContentBlockerWithIdentifier and passing in the identifier for the blocker, as stored in the Info.plist of the blocker itself. To facilitate testing, add an automated reload to the AppDelegate method applicationDidFinishLaunching - this way, whenever the application executes the list will be reloaded automatically:

import SafariServices
func application(application: UIApplication,
 didFinishLaunchingWithOptions launchOptions: [NSObject: AnyObject]?) -> Bool {
  // Override point for customization after application launch.
  SFContentBlockerManager.reloadContentBlockerWithIdentifier(
   "example.AdBlocker.AdBlockerHandler") {
    (e:NSError?) in
    NSLog("Completed with \(e) for example.AdBlocker.AdBlockerHandler")
  }
  return true
}

Not only is this the way ad blocker programs will ask Safari to reload the block list, it also provides a means to determine if there is any errors in the block list itself. There isn’t any easy way of debugging the ActionRequestHandler itself because it is running in a new process which isn’t attached to the debugger. Furthermore, log output (generated with print or NSLog) will be routed through somewhere other than the Xcode console. The error returned by the callback will be nil if there are no errors, and will contain enough information to identify what the correct solution should be.

Blocking scripts and images outside a domain

The block list provides a number of other actions which can be done to limit certain types of resources. The url-filter is used as a quick test to determine if the rule may apply to a particular resource. Further tests can then be implemented to provide a way of blocking particular types of resource or whether it is loaded from the current site or not.

[
  {
    "action": {
      "type": "block"
    },
    "trigger": {
      "url-filter": ".*",
      "resource-type": [ "script", "image" ],
      "load-type": [ "third-party" ] 
    }
  }
]

Since the url-filter is a regular expression, using a * on its own is not sufficient. The dot means ‘any character’ and * means ‘zero or more’ — so the .* will match any URL. If the type being loaded is a script or an image (i.e. loaded by the <script> or <img> tags) then this rule will fire. The load-type here can be either first-party (from the site itself) or third-party (from other sites). If a site uses a content delivery network (CDN) for its loads then this may cause problems; but for the most part, sites that have first-party loaded scripts will continue to work whilst third-party sites are often used by ad networks.

The resource-type can include a number of other elements, including:

document - HTML content
style-sheet - CSS files
font - Font definitions
raw - any document or request type such as XHR requests
svg-document - loaded by the SVG tag
media - Sound, Video
popup - Pop-ups

If the value is omitted then this will apply to all requests. As a result, a Do Not Track can be effectively replaced with:

[
  {
    "action": {
      "type": "block"
    },
    "trigger": {
      "url-filter": ".*",
      "load-type": [ "third-party" ] 
    }
  }
]

Thus, all third-party sites will fail to load any connections. Sometimes this isn’t desirable; for example, a site like InfoQ hosts a number of heavy resources (such as the videos and presentations) on a content delivery network at cdn.infoq.com. To enable these to be loaded, certain domains can be explicitly excluded from this rule:

[
  {
    "action": {
      "type": "block"
    },
    "trigger": {
      "url-filter": ".*",
      "load-type": [ "third-party" ],
      “unless-domain”: [ “www.infoq.com” ]
    }
  }
]

Note that the unless-domain is a string, not a regular expression; and subdomains aren’t tested (so infoq.com would not have the desired behaviour). It’s also the name of the site you are on, as opposed to the site you are connecting to; in essence, it allows sites to opt-out of the load behaviour.

Selectively fixing sites

It’s also possible to use the content blocker to hide certain parts of a site. For example, navigating to anything on Blogspot currently shows a header saying how it’s important to enable Google to use cookies to track what you do. This can be filtered using a blocking rule as well:

[
  {
    "action": {
      "type": "css-display-none",
      "selector": "#cookieChoiceInfo"
    },
    "trigger": {
      "url-filter": ".*.blogspot.*"
    }
  }
]

This will match any one of the blogspot domains (people visiting blogspot.com tend to get redirected to a country-local suffix like blogspot.co.uk) and then use the css-display-none action to simply hide that element from the display. This performs a similar rule to the way that some ad blockers work today by hiding elements with a known ID. Provided that the ID of the element in the DOM doesn’t change, this will remain stable. Multiple selectors can be provided as a comma-separated string, like how CSS works today, using the css selectors level 4 draft.

Summary

Content blocking in Safari provides a powerful way of determining what scripts a user is prepared to load, without exposing the content blocker to the URLs that the user is visiting. It uses regular expressions to match domains but can provide exclusions to allow certain sites to keep visiting. As with other ad blockers, a web page can selectively have elements cut out of them.

It’s unlikely that advertising networks will sit still. For example, they may request that ad sites host copies of the script (instead of dynamically including them remotely, as they do today) or ask the advertiser host to set up a network proxy that will route requests through. Using hard-coded hostnames as a blacklist (such as doubleclick.net) is unlikely to last; advertisers will simply burn through temporary domain names as a way of evading blockers. Starting with a blacklist that encompasses everything is likely to be the only way to prevent this from spreading, and then selectively enabling certain domains or sites. And whilst the JSON format and rebuild-when-you-want-to-change it won’t be appealing for many, the content blockers rising to the top of the store will provide user-friendly ways of configuring these rules in an easy-to-use manner.

About the Author

Dr Alex Blewitt was introduced to object-oriented programming with Objective-C on a NeXTstation over twenty years ago and has been using the platform ever since. With the release of Swift showing the future of the OSX platform, Alex has written a book, Swift Essentials . In his spare time and if the weather is nice, he has been known to go flying from the local Cranfield airport.

InfoQ Software Architects' Newsletter

Safari Content Blockers Under the Hood

Follow us on

Related Sponsors

Configuration File

Building and Running a content blocker

Blocking scripts and images outside a domain

Selectively fixing sites

Summary

About the Author

Rate this Article

This content is in the Privacy topic

Related Topics:

Related Editorial

Popular across InfoQ

The InfoQ Newsletter