Welcome to Web Indexer. This program enables you to index your website, or group of web pages, to produce a HTML document which lists the HTML files, with a description of each file. You can tell the program to ignore certain files/directories by either making the description of the HTML page to be "IGNOREINDEX", for one file.... or put a file called ".ignore_index" in a directory if you wish it to ignore ALL files in that directory.
Contents of Web Indexer
README : file explaining the usage of the web indexer
web_index.pl : The Indexer program web_index.conf : A Sample default configuration for the program
Recurse.pm : Module to process files recursively through dirs
images/ : a directory holding graphics needed for the graphics version
web_index2.0.zip or web_index2.0.tar.gz : The full package (Save this link to get the file... there is no ftp available)
How does it get the description?
You tell the program various ways to look for the description in each HTML document. The three current methods require you to add the following HTML code to your HTML page... (preferebly in thearea) where [description] is a description of the page (of course).
Configuration of the program
There are two ways to run this program, using command line arguments (e.g. web_index.pl -d /usr/home/dion ... ) or via the configuration file (e.g. see web_index.conf).
COMMAND LINE ARGS: ----------------- h = Show the Usage Help w = Get the description via <WINDEX "description here"> m = Get the description via <META NAME="description" CONTENT="description here"> t = Get the description via <TITLE>Grab this part</TITLE> i = If the description = IGNOREINDEX then ignore the file I = If a ".ignore_index" file is in a directory ignore all files in the directory and move ot the next T = Text output ONLY (Not using the graphics) c = Read configuration from web_index.conf d [dir] = Start indexing from [dir] u [url] = Set the base URL to http://[url] C [file] = Read configuration from [file] o [file] = HTML filename to output too EXAMPLE: To setup an index of my web pages (starting at dion) i would % web_index.pl -iI -d /usr/home/dion/www -u /dion which would produce HTML: /dion/web_index.html USING THE CONFIGURATION FILE (Default: web_index.conf) ------------------------------------------------------ If you wish to use the defauly web_index.conf then you would call % web_index.pl -c. If you want to specify a different filename then use % web_index.pl -C /path/to/file.conf The config file... and the different variables you can set 1. -> ROOT_INDEX_DIRECTORY: /usr/home/dion/www Set the directory for the indexer to start looking through to compile it's Site Index 2. -> ROOT_URL: /dion Set the base URL for the index (basically the URL which points to the ROOT_INDEX_DIRECTORY) 3. -> IMAGES_URL: /images Set the relative URL where the images are stored (e.g. if you have your images at /images you would have the above setting) 4. -> OUTPUT_FILE: /usr/home/dion/www/windex.html Set the HTML doc which will have the Index in it 5. -> GET_DESCRIPTION_FROM: w or m or t Here you select the method for the program to get the [description] w = description via <WINDEX "[description]"> m = description via <META NAME="description" CONTENT="[description]"> t = description via <TITLE>[description]</TITLE> if you leave it blank it will try to use "m" and if it doesn't get a match it will try "t" 6. -> IGNORE_DIRECTORIES: yes If you have "yes" there then if you make a file ".ignore_index" in a directory the program will ignore all files in it and move to the next. 7. -> IGNORE_FILES: yes If you have "yes" there then if you make a description "IGNOREINDEX" (e.g. if you are using the "m" method) then that file will be ignored 8. -> TEXT_OUTPUT_ONLY: yes If you have "yes" there then if will not print out any nice images that are in the "images/" directory. Personally i like the images :) 9. -> HTML_HEADER [put html here] END_HTML_HEADER All the HTML inbetween the two tags HTML_HEADER, and END_HTML_HEADER will be printed at the top of the HTML output file (OUTPUT_FILE) 10. -> HTML_FOOTER [put html here] END_HTML_FOOTER All the HTML inbetween the two tags HTML_FOOTER, and END_HTML_FOOTER will be printed at the bottom of the HTML output file (OUTPUT_FILE) Configuration of web_search.cgi
Now to setup the Web Search part of the package. It is also simple. 1. Place the program in a place where http:// can get to it. E.g. in /cgi-bin or in your web directory. 2. Now make sure the web_index.pl has it's FORM ACTION pointing to the cgi