Abstract

Vilistextum is a html to ascii converter specifically programmed to output ascii text suitable for reading.

Some features:

INSTALL:

make or gmake

It should compile on any platform with a decent gcc.

DOWNLOAD:

vilistextum-v2.3.1.tar.gz
vilistextum-v2.3.1.tar.bz2
vilistextum_v2.3.0.tar.gz
vilistextum_v2.22.tar.gz

USAGE:

vilistextum [OPTIONS] [inputfile|-] [outputfile|-]

inputfile,- resp. outputfile,-
Replace inputfile with '-' for reading from standard input, likewise outputfile with '-' for writing to standard output.
--version
Reports version number and release date.
-h,--help
Prints a list of the command line options.

-c, --convert-tags
Some of the tags will be converted to special characters.
Eg: "<B>Bold</B> isn't <I>italic</I> isn't <U>underlined</U> isn't <EM>emphasized<EM> but is like <STRONG>strong</STRONG>."
will be output as "*Bold* isn't /italic/ isn't _underlined_ isn't /emphasized/ but is like *strong*."
-p, --palm
This outputs text more suitable for reading on a PDA.
Palm textreader do their own wordwrapping, so the width is set to infinity and the program doesn't rightjustify or center the text.
-w, --width number
The width of the output text.
Default: 72.
-m, --nomicrosoft
The entities from windows1252 that are &#128 - &#159 and their proper names will not be converted.

-i, --defimage string
IMG tags without alt attribute are output as [string].
Default: Image.
-r, --remove-empty-alt
If there is an empty ALT attribute in a IMG tag (eg <IMG href="..." alt='">), don't output '[]'.
-s, --shrinklines
If there are more than two newlines, output only two. There is at most one completely empty line.
-l, --links
Numbers the links in the document and prints the corresponding addresses at the end of the file. Similar to 'lynx -dump'. Note: Relative URIs are not resolved and won't be printed.
-e, --errorlevel number
Increase level of verbosity for error messages.
 0: No error messages
 1: Show unrecognized entities
 2: Show unknown tags
>2: Mostly debugging information

BUGS and similar features:

The handling of OL is broken. The program treats it as UL and more than 6 nested lists confuse it.
Text is never justified.

Bugreports or comments:

You can send your comments or bugreports to this address. If you've discovered a bug, please give the link or attach a copy of the html file that caused that particular bug.
Patric Müller
Last modified: Thu May 3 23:06:07 CEST 2001