[Welcome] [TitleIndex] [WordIndex

Problem

You have a chunk of arbitrary data taken from user input, from a file, or from a database. You wish to include it in the HTML output generated by a Quixote handler. This means that '<', '>', and '&' characters in the data need to be turned into the HTML entities '<', '>', and '&'.

Solution

You can do the escaping manually, putting in calls to cgi.escape() as needed:

from quixote import get_field
import cgi

...

def page():
    paragraph = get_field('para') or 'No text provided'
    return '<p>%s</p>' % cgi.escape(paragraph)

Or, you can use the htmlescape() function to get an instance of the htmltext type; this type will take care of escaping data when it's combined with strings:

from quixote import get_field
from quixote.html import htmlescape, htmltext
def page():
    paragraph = get_field('para') or  'No text provided'
    return htmltext('<p>') + htmlescape(paragraph) + htmltext('</p>')

Discussion

Most non-Quixote applications take the manual approach; wrap all usage of data that needs escaping in some function that does the required substitution. The major problem is that it's error-prone: it's easy to forget to call the function in one location. Worse, you won't notice the error until the data actually contains an HTML tag or an ampersand. This opens up your application to a "cross-site scripting" attack, where some attacker inserts some JavaScript into your web site and uses it to steal data or cause damage.

A second problem is the opposite error, quoting too many times so that users end up seeing <p>&amp;... on your pages. In a complicated application you'll have utility functions to format a specific type of object, or generate some frequently-used text. Should these utility functions return already-escaped data, or should it be the caller's responsibility? It's easy to get it wrong and run the data through html_quote() twice. This error is just embarrassing, but doesn't open any security holes like forgetting to escape does.

The htmltext type simplifies the problem; if it's an htmltext instance, it's already been quoted and is safe to output. htmlescape() is a counterpart of cgi.escape() that returns htmltext instead of a regular Python string, so data can be passed through htmlescape() any number of times but will only be escaped once.

If you decide to use the htmltext constructor, be sure to use it only with string literals, or with data you're very sure is safe, because passing a string to the constructor says that the string is safe to output.

You can either create htmltext instances manually, or you can use the HTML template type in PTL, which turns the second example above into the following PTL code:

def page [html] ():
    paragraph = get_field('para') or  'No text provided'
    '<p>'
    paragraph
    '</p>'

In an HTML template, string literals in the code are automatically turned into htmltext instances. Properly written HTML templates will never need to use the htmltext() constructor explicitly.

Consult the doc/PTL file in the Quixote source distribution for more details about PTL and the htmltext type.


CategoryCookbook


2010-09-22 22:14