AntiSamy 1.0 Released - Protecting web applications from malicious HTML and CSS - InfoQ

Cross Site Scripting (XSS) is a major security issue facing developers who wish to allow their users to submit content containing HTML and CSS. A new project on OWASP known as the "AntiSamy" project, aims to offer a comprehensive, policy driven, API that validates and sanitizes input, as well as providing user feedback on the filtering process. The project's home page describes the API:

Technically, it is an API for ensuring user-supplied HTML/CSS is in compliance within an application's rules. Another way of saying that could be: It's an API that helps you make sure that clients don't supply malicious cargo code in the HTML they supply for their profile, comments, etc. that gets persisted on the server. The term malicious code in terms of web applications is usually regarded only as JavaScript. Cascading Stylesheets are only considered malicious when they invoke the JavaScript engine. However, there are many situations where "normal" HTML and CSS can be used in a malicious manner.

What sets this API apart, according to lead developer Arshan Dabirsiaghi, is its user friendly approach:

The methodology of AntiSamy is unique in that it is built on a positive security model in both the format of the HTML document and the content within the document. It's also unique in that it attempts to help the user tune their input to pass validation in a cooperative spirit, rather than treating users as potential attackers which is how all contemporary security mechanisms work.

In the paper "Towards Malicious Code Detection and Removal" (PDF), Dabirsiaghi describes the phases involved in the filtering process:

Pre-Processing. Use of NekoHTML to perform HTML Sanitization.
Processing. Tag/CSS Validation Rules are applied depth first using three processing modes - Filter, Truncate and Validate. Filter actions remove tags that are not allowed, but retains their content. Truncating removes forbidden tag attributes and child nodes. Validation involves matching rules in the policy file with tag/attribute combinations, ensuring only valid tags are permitted.
Remediation. If validation fails during processing, the policy file is consulted to determine how to handle the tag and its contents. Options include removing the tag and its content, filtering out the tag and leaving the content, and removing the attribute from the tag.

The first release includes of AntiSamy includes a Java implementation, with .Net and PHP versions available soon.

Integration into a Java application is simple:

import org.owasp.validator.html.*;

Policy policy = new Policy(POLICY_FILE_LOCATION);
AntiSamy as = new AntiSamy();
CleanResults cr = as.scan(dirtyInput, policy);
MyUserDAO.storeUserProfile(cr.getCleanHTML()); // some custom function

The CleanResults class provides methods to access useful information about the filtering process:

getErrorMessages() - a list of String error messages
getCleanHTML() - the clean, safe HTML output
getCleanXMLDocumentFragment() - the clean, safe XMLDocumentFragment which is reflected in
getCleanHTML()
getScanTime() - returns the scan time in seconds

Downloads of AntiSamy, available under a BSD style license, are available from the Google code project page.

Topics

Pitfalls of Unified Memory Models in GPUs

Evolving Trainline Architecture for Scale, Reliability and Productivity

Generally AI - Season 2 - Episode 3: Surviving the AI Winter

Mastering Observability: Unlocking Customer Insights with Gojko Adzic

Proactive Approaches to Securing Linux Systems and Engineering Applications

Helpful links

Choose your language

AntiSamy 1.0 Released - Protecting web applications from malicious HTML and CSS

Write for InfoQ

Rate this Article

This content is in the Microsoft topic

Related Topics:

Related Editorial

Related Sponsored Content

Popular across InfoQ

Microsoft Introduces Drasi: Open-Source System for Real-Time Event Processing and Automation

How Cell-Based Architecture Enhances Modern Distributed Systems

Article Series: Cell-Based Architectures: How to Build Scalable and Resilient Systems

Orchestrating a Path to Success - a Conversation with Bernd Ruecker

OpenAI Releases Swarm, an Experimental Open-Source Framework for Multi-Agent Orchestration

Generally AI - Season 2 - Episode 3: Surviving the AI Winter

Challenges and Lessons Porting Code from C to Rust

Copilot Now Available in OneDrive: AI-Powered Features for Streamlined Document Management

Ephemeral IDs: Cloudflare's Latest Tool for Fraud Detection

Evolving Trainline Architecture for Scale, Reliability and Productivity

Taking Advantage of Cell-Based Architectures to Build Resilient and Fault-Tolerant Systems

No EC2 or Kubernetes Allowed: Insights from Building Serverless-Only Architecture at PostNL

Mastering Observability: Unlocking Customer Insights with Gojko Adzic

How a Sustainable Mindset in Software Engineering Can Increase Team Performance and Prevent Burnout

The Ongoing Challenges of DevSecOps Transformation and Improving Developer Experience

University Researchers Publish Analysis of Chain-of-Thought Reasoning in LLMs

Microsoft and Tsinghua University Present DIFF Transformer for LLMs

OpenAI Releases Swarm, an Experimental Open-Source Framework for Multi-Agent Orchestration

Google Cloud Adds Scalable Vector Search to Memorystore for Valkey & Redis Cluster

Podman Desktop 1.13 Launches with Hyper-V Support and Additional Enhancements

Uber Completes Major MySQL Fleet Upgrade, Boosting Performance and Security

QCon San Francisco

QCon London

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?