Module z_html

Utility functions for html processing.

Copyright © 2009-2014 Marc Worrell Date: 2009-04-17

Authors: Marc Worrell (marc@worrell.nl).

Description

Utility functions for html processing. Also used for property filtering (by m_rsc_update).

Function Index

abs_links/2Make all links (href/src) in the html absolute to the base URL For now this takes a shortcut by checking all ' (src|href)=".."'.
br2nl/1Translate any html br entities to newlines.
ensure_escaped_amp/1Ensure that &-characters are properly escaped inside a html string.
escape/1Escape a string so that it is valid within HTML/ XML.
escape_check/1Escape a string so that it is valid within HTML/ XML.
escape_html_comment/2Escape smaller-than, greater-than (for in comments).
escape_html_text/2Escape smaller-than, greater-than, single and double quotes in texts (& is already removed or escaped).
escape_link/1Escape a text.
escape_props/1Escape all properties used for an update statement.
escape_props/2
escape_props_check/1Checks if all properties are properly escaped.
escape_props_check/2
flatten_attr/1Flatten an attribute to a binary, filter urls and css.
nl2br/1Translate any newlines to html br entities.
noscript/1Filter a url, remove any "javascript:" and "data:" (as data can be text/html).
sanitize/1Sanitize a (X)HTML string.
sanitize/2
sanitize/4Sanitize a mochiwebparse tree.
sanitize_uri/1Ensure that an uri is (quite) harmless by removing any script reference.
scrape_link_elements/1Given a HTML list, scrape all <link> elements and return their attributes.
strip/1Strip all html elements from the text.
truncate/2Truncate a previously sanitized HTML string.
truncate/3
unescape/1Unescape - reverses the effect of escape.

Function Details

abs_links/2

abs_links(Html, Base) -> any()

Make all links (href/src) in the html absolute to the base URL For now this takes a shortcut by checking all ' (src|href)=".."'

br2nl/1

br2nl(B) -> any()

Translate any html br entities to newlines.

ensure_escaped_amp/1

ensure_escaped_amp(B) -> any()

Ensure that &-characters are properly escaped inside a html string.

escape/1

escape(L::iolist()) -> binary()

Escape a string so that it is valid within HTML/ XML.

escape_check/1

escape_check(L::list() | binary() | {trans, list()}) -> binary() | undefined

Escape a string so that it is valid within HTML/ XML.

escape_html_comment/2

escape_html_comment(X1, Acc) -> any()

Escape smaller-than, greater-than (for in comments)

escape_html_text/2

escape_html_text(X1, Acc) -> any()

Escape smaller-than, greater-than, single and double quotes in texts (& is already removed or escaped).

escape_link/1

escape_link(Text) -> binary()

Escape a text. Expands any urls to links with a nofollow attribute.

escape_props/1

escape_props(Props::list()) -> list()

Escape all properties used for an update statement. Only leaves the body property intact.

escape_props/2

escape_props(Props::list(), Options::list()) -> list()

escape_props_check/1

escape_props_check(Props::list()) -> list()

Checks if all properties are properly escaped

escape_props_check/2

escape_props_check(Props::list(), Options::list()) -> list()

flatten_attr/1

flatten_attr(X1) -> any()

Flatten an attribute to a binary, filter urls and css.

nl2br/1

nl2br(B) -> any()

Translate any newlines to html br entities.

noscript/1

noscript(Url) -> any()

Filter a url, remove any "javascript:" and "data:" (as data can be text/html).

sanitize/1

sanitize(Html::binary()) -> binary()

Sanitize a (X)HTML string. Remove elements and attributes that might be harmful.

sanitize/2

sanitize(Html, Options) -> any()

sanitize/4

sanitize(ParseTree::mochiweb_html:html_node(), ExtraElts::binary() | list(), ExtraAttrs::binary() | list(), Options::any()) -> any()

Sanitize a mochiwebparse tree. Remove harmful elements and attributes.

sanitize_uri/1

sanitize_uri(Uri) -> any()

Ensure that an uri is (quite) harmless by removing any script reference

scrape_link_elements/1

scrape_link_elements(Html::string()) -> [LinkAttributes]

Given a HTML list, scrape all <link> elements and return their attributes. Attribute names are lowercased.

strip/1

strip(Html::iolist()) -> iolist()

Strip all html elements from the text. Simple parsing is applied to find the elements. Does not escape the end result.

truncate/2

truncate(Html, Length) -> any()

Truncate a previously sanitized HTML string.

truncate/3

truncate(Html, Length, Append) -> any()

unescape/1

unescape(L::iolist()) -> binary()

Unescape - reverses the effect of escape.


Generated by EDoc