Abstract
Vilistextum is a html to ascii converter specifically programmed to get the best out of incorrect html.
Some features:
- small and fast
- creates footnotes for links
- can swallow multiple empty lines
- removes empty ALT tags
- converts characters and entities between 128 and 159 from the windows1252 charset to meaningful strings in 8859-1. Eg 0x93 is converted to '"'.
- output can be optimized for ebook reading
- GUI-frontend using kaptain
REQUIREMENT:
For the main program a decent gcc installation suffices.
If you want to use the GUI-frontend, you need to have installed kaptain.
INSTALL:
./configure
make
make install (as root)
DOWNLOAD:
vilistextum-2.4.1.tar.gz
vilistextum-2.4.1.tar.bz2
USAGE:
vilistextum [OPTIONS] [inputfile|-] [outputfile|-]
This is the command line program.
kilistextum
GUI-frontend using kaptain. Its usage should be obvious, even if you haven't read this manual.
Start with "kilistextum". The makefile tries to guess where kaptain resides. If it fails you can add something like "#!/pathto/kaptain" to the first line or start it with "kaptain kilistextum".
Command line arguments
- inputfile,- resp. outputfile,-
- Replace inputfile with '-' for reading from standard input, likewise outputfile with '-' for writing to standard output.
- --version
- Reports version number and release date.
- -h,--help
- Prints a list of the command line options.
- -c, --convert-tags
- Some of the tags will be converted to special characters.
Eg: "<B>Bold</B> isn't <I>italic</I> isn't <U>underlined</U> isn't <EM>emphasized<EM> but is like <STRONG>strong</STRONG>."
will be output as "*Bold* isn't /italic/ isn't _underlined_ isn't /emphasized/ but is like *strong*."
- -p, --palm
- This outputs text more suitable for reading on a PDA.
Palm textreader do their own wordwrapping, so the width is set to infinity and the program doesn't rightjustify or center the text.
- -w, --width number
- The width of the output text.
Default: 72.
- -m, --nomicrosoft
- The entities from windows1252 that are € - Ÿ and their proper names will not be converted.
- -i, --defimage string
- IMG tags without alt attribute are output as [string].
Default: Image.
- -r, --remove-empty-alt
- If there is an empty ALT attribute in a IMG tag (eg <IMG href="..." alt='">), don't output '[]'.
- -s, --shrink-lines [number]
- If there are more than number empty lines, output only number.
Default: 1.
- -l, --links
- Numbers the links in the document and creates footnotes of each link at the end of the file. Similar to 'lynx -dump'. Note: Relative URIs are not resolved and won't be printed.
- -e, --errorlevel number
- Increase level of verbosity for error messages.
0: No error messages
1: Show unrecognized entities
2: Show unknown tags
>2: Mostly debugging information
BUGS and similar features:
The parsing of tables is not very good.
Character sets other than latin1 are not yet fully supported.
The handling of OL is broken. The program treats it as UL and more than 6 nested lists confuse it.
Text is never justified.
How to read HTML mail with gnus or mutt using vilistextum
If you want to use vilistextum for automatically converting html mails to ascii read HTMLMAIL or this
Bugreports or comments:
You can send your comments or bugreports in english or german to this address. If you've discovered a bug, please give the link or attach a copy of the html file that caused that particular bug.
Patric Müller
Last modified: Mon Sep 3 18:42:59 CEST 2001