InfoQ

News

AntiSamy 1.0 Released - Protecting web applications from malicious HTML and CSS

Posted by Gavin Terrill on Dec 03, 2007 10:00 PM

Community
Architecture,
.NET,
Java
Topics
Security

Cross Site Scripting (XSS) is a major security issue facing developers who wish to allow their users to submit content containing HTML and CSS. A new project on OWASP known as the "AntiSamy" project, aims to offer a comprehensive, policy driven, API that validates and sanitizes input, as well as providing user feedback on the filtering process. The project's home page describes the API:

Technically, it is an API for ensuring user-supplied HTML/CSS is in compliance within an application's rules. Another way of saying that could be: It's an API that helps you make sure that clients don't supply malicious cargo code in the HTML they supply for their profile, comments, etc. that gets persisted on the server. The term malicious code in terms of web applications is usually regarded only as JavaScript. Cascading Stylesheets are only considered malicious when they invoke the JavaScript engine. However, there are many situations where "normal" HTML and CSS can be used in a malicious manner.

What sets this API apart, according to lead developer Arshan Dabirsiaghi, is its user friendly approach:

The methodology of AntiSamy is unique in that it is built on a positive security model in both the format of the HTML document and the content within the document. It's also unique in that it attempts to help the user tune their input to pass validation in a cooperative spirit, rather than treating users as potential attackers which is how all contemporary security mechanisms work.

In the paper "Towards Malicious Code Detection and Removal" (PDF), Dabirsiaghi describes the phases involved in the filtering process:

  1. Pre-Processing. Use of NekoHTML to perform HTML Sanitization.
  2. Processing. Tag/CSS Validation Rules are applied depth first using three processing modes - Filter, Truncate and Validate. Filter actions remove tags that are not allowed, but retains their content. Truncating removes forbidden tag attributes and child nodes. Validation involves matching rules in the policy file with tag/attribute combinations, ensuring only valid tags are permitted.
  3. Remediation. If validation fails during processing, the policy file is consulted to determine how to handle the tag and its contents. Options include removing the tag and its content, filtering out the tag and leaving the content, and removing the attribute from the tag.

The first release includes of AntiSamy includes a Java implementation, with .Net and PHP versions available soon.

Integration into a Java application is simple:

import org.owasp.validator.html.*;

Policy policy = new Policy(POLICY_FILE_LOCATION);
AntiSamy as = new AntiSamy();
CleanResults cr = as.scan(dirtyInput, policy);
MyUserDAO.storeUserProfile(cr.getCleanHTML()); // some custom function

The CleanResults class provides methods to access useful information about the filtering process:

  • getErrorMessages() - a list of String error messages
  • getCleanHTML() - the clean, safe HTML output
  • getCleanXMLDocumentFragment() - the clean, safe XMLDocumentFragment which is reflected in
  • getCleanHTML()
  • getScanTime() - returns the scan time in seconds

Downloads of AntiSamy, available under a BSD style license, are available from the Google code project page.

problems running the code by Vojtech Kolencik Posted Dec 4, 2007 10:15 AM
Re: problems running the code by Arshan Dabirsiaghi Posted Dec 4, 2007 10:39 AM
Re: problems running the code by Vojtech Kolencik Posted Dec 4, 2007 11:00 AM
Why not simple CSS and HTML validator? by Balaji D Loganathan Posted Dec 4, 2007 10:58 AM
Re: Why not simple CSS and HTML validator? by Arshan Dabirsiaghi Posted Dec 4, 2007 11:02 AM
  1. Back to top

    problems running the code

    Dec 4, 2007 10:15 AM by Vojtech Kolencik

    Well, I wanted to try this and I haven't found the predefined policy files anywhere. Also, I think the only place to find out how the policy files should be written is the source code. Also, the Policy class must be instantiated using a static factory method, as the constructor used in this article (as well as on the project's homepage) is declared private.

  2. Back to top

    Re: problems running the code

    Dec 4, 2007 10:39 AM by Arshan Dabirsiaghi

    Vojtech, Sorry you've been having issues. You're right about the constructor, I will have to change those code snippets. Also, the policy files are linked from a page on my blog. Or you can navigate directly to the test page which contains the actual policy files. The Google Project page is quite buggy, and I haven't been able to upload anything since I uploaded the rest of the project. As soon as I can upload the policy files - I will upload them to the project. Until then, please use the URLs above. Feel free to email me (arshan.dabirsiaghi [at the] gmail.com) directly if you have any issues. Cheers, Arshan

  3. Back to top

    Why not simple CSS and HTML validator?

    Dec 4, 2007 10:58 AM by Balaji D Loganathan

    Hi, For HTML/CSS malicious code ? Why not simply use the online W3C CSS and HTML validator? Or even firefox extension like CSE HTML Validator? Thank you Regards Balaji D Loganathan

  4. Back to top

    Re: problems running the code

    Dec 4, 2007 11:00 AM by Vojtech Kolencik

    Great, it works now. Thanks for the info (and for a useful library, also).

  5. Back to top

    Re: Why not simple CSS and HTML validator?

    Dec 4, 2007 11:02 AM by Arshan Dabirsiaghi

    This is for preventing XSS and phishing attacks, not for validating the format of an HTML document or a stylesheet. Check out the project description.

Educational Content

Bindings, Platforms, and Innovation

This presentation focuses on the Internet and separating myth from fact, history from the future, and the mundane from the imaginative. Bob Frankston presents a vision of what could and should be.

Orchestrating Long Running Activities with JBoss / JBPM

This article explores the use of JBoss and jBPM to implement design solutions that effectively address the issue of orchestrating long running activities.

Neo4j - The Benefits of Graph Databases

This presentation covers the use of graph databases as an optimal solution for data that is difficult to fit in static tables, rapidly evolving data or data that has a lot of optional attributes.

Realistic about Risk: Software development with Real Options

This session introduces Real Options and shows how it can help in running your project. Real Options is a decision-making process that can be used to manage risk.

Communication Flexibility Using Bindings

This article discusses the use of bindings on services and references (including the instance of non-configured bindings) as the means to implement SCA communications in a Web and SOA environment.

Writing DSLs in Groovy

After a short introduction to DSLs, Scott Davis plays with the keyboard showing how to approach the creation of a DSL by typing working snippets of Groovy code that get executed.

Scaling Agile with C/ALM (Collaborative Application Lifecycle Management)

IBM Rational and InfoQ present, Scaling Agile with C/ALM, an eBook showing organizations how to become “finely tuned software delivery machines” by enabling team integration and scaling.

Concurrent Programming with Microsoft F#

Amanda Laucher presents a real life enterprise application written in F#. She shows actual code snippets, explaining design decisions and suggesting how to use some of the F# constructs.