Welcome to uTidylib’s documentation!

The Tidy wrapper. I am the main interface to TidyLib. This package supports processing HTML with Tidy, with all the options that the tidy command line supports.

For more information on the tidy options, see the reference. These options can be given as keyword arguments to parse and parseString, by changing dashes (-) to underscores(_).

For example:

>>> import tidy
>>> from __future__ import print_function
>>> print(tidy.parseString(
...     '<Html>Hello Tidy!',
...     output_xhtml=1, add_xml_decl=1, indent=1, tidy_mark=0,
...     doctype='transitional',  char_encoding='ascii'
... ))
<?xml version="1.0" encoding="us-ascii"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <title></title>
  </head>
  <body>
    Hello Tidy!
  </body>
</html>

For options like newline and output_encoding, which must be set to one of a fixed number of choices, you can provide either the numeric or string version of the choice; so both tidy.parseString(‘<HTML>foo</html>’, newline=2) and tidy.parseString(‘<HTML>foo</html>’, newline=’CR’) do the same thing.

There are no plans to support other features of TidyLib, such as document-tree traversal, since Python has several quality DOM implementations. (The author uses Twisted’s implementation, twisted.web.microdom).

tidy.parse(self, filename, **kwargs)
Parameters:
  • kwargs – named options to pass to TidyLib for processing the input file.
  • filename – the name of a file to process
Returns:

a Document object

Open and process filename as an HTML file, returning a processed document object.

tidy.parseString(self, text, **kwargs)
Parameters:
  • kwargs – named options to pass to TidyLib for processing the input file.
  • text – the string to parse
Returns:

a Document object

Use text as an HTML file, and process it, returning a document object.

class tidy.Document

Document object as returned by parseString() or parse().

errors

Returns list of errors as a list of ReportItem.

get_errors()

Returns list of errors as a list of ReportItem.

write(stream)
Parameters:stream – Writable file like object.

Writes document to the stream.

class tidy.ReportItem(err)

Error report item as returned by tidy.

Attribute severity:
 W, E or C indicating severity
Attribute line:Line where error was fired (can be None)
Attribute col:Column where error was fired (can be None)
Attribute message:
 Error message itsef
Attribute err:Whole error message as returned by tidy
exception tidy.TidyLibError

Generic Tidy exception.

exception tidy.InvalidOptionError

Exception for invalid option.

exception tidy.OptionArgError

Exception for invalid parameter.

Installing

To use uTidylib, you need to have HTML tidy library installed. Check <http://www.html-tidy.org/> for instructions how to obtain it.

Contributing

You are welcome to contribute on GitHub, we use it for source code management, issue tracking and patches submission, see <https://github.com/nijel/utidylib>.

Running testsuite

The testsuite can be exececuted using both py.test or setuptools, choose whatever approach you prefer:

./setup.py test
py.test tidy

Building documentation

To build the doc, just run:

make -C docs html

This requires that you have Sphinx installed.

The API documentation will be built in the docs/_build/html/ directory.

License

The MIT License

Copyright (c) 2003 Cory Dodt <corydodt@twistedmatrix.com>

Copyright (c) 2014-2016 Michal Čihař <michal@cihar.com>

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Indices and tables