Introduction

    Welcome to Web Indexer. This program enables you to index your website, or group of web pages, to produce a HTML document which lists the HTML files, with a description of each file. You can tell the program to ignore certain files/directories by either making the description of the HTML page to be "IGNOREINDEX", for one file.... or put a file called ".ignore_index" in a directory if you wish it to ignore ALL files in that directory.

    Here is an example... using my pages

    Contents of Web Indexer

    README : file explaining the usage of the web indexer
    web_index.pl : The Indexer program web_index.conf : A Sample default configuration for the program
    Recurse.pm : Module to process files recursively through dirs
    images/ : a directory holding graphics needed for the graphics version
    web_index2.0.zip or web_index2.0.tar.gz : The full package (Save this link to get the file... there is no ftp available)

    How does it get the description?

    You tell the program various ways to look for the description in each HTML document. The three current methods require you to add the following HTML code to your HTML page... (preferebly in the area) where [description] is a description of the page (of course).

    1. <WINDEX "[description]">
    2. <META NAME="description" CONTENT="[description]">
    3. <TITLE>[description]</TITLE>
    4. Smart checking: Look for 2, if it isn't found look for 3.
      1. = just used for this web indexer (#2 is preferable)
      2. = HTML3.0 complient tag which not only this program uses but other search engines/web spiders.
      3. = The standard HTML tag

    Configuration of the program

    There are two ways to run this program, using command line arguments (e.g. web_index.pl -d /usr/home/dion ... ) or via the configuration file (e.g. see web_index.conf).

        COMMAND LINE ARGS:
        -----------------
        h		=	Show the Usage Help
        w		=	Get the description via <WINDEX "description here">
        m		= 	Get the description via 
    			<META NAME="description" CONTENT="description here">
        t		=	Get the description via <TITLE>Grab this part</TITLE>
        i		=	If the description = IGNOREINDEX then ignore the file  
        I		=	If a ".ignore_index" file is in a directory ignore
                            all files in the directory and move ot the next
        T           =       Text output ONLY (Not using the graphics)
        c 		=	Read configuration from web_index.conf
        d [dir]	=	Start indexing from [dir]
        u [url]	=	Set the base URL to http://[url]
        C [file] 	=	Read configuration from [file]
        o [file]    =	HTML filename to output too
    
        EXAMPLE: To setup an index of my web pages (starting at dion) i would
        % web_index.pl -iI -d /usr/home/dion/www -u /dion
    
        which would produce HTML: /dion/web_index.html
    
        USING THE CONFIGURATION FILE (Default: web_index.conf)
        ------------------------------------------------------
    
        If you wish to use the defauly web_index.conf then you would call
        % web_index.pl -c. 
        If you want to specify a different filename then use
        % web_index.pl -C /path/to/file.conf
    
        The config file... and the different variables you can set
    
        1. -> ROOT_INDEX_DIRECTORY: /usr/home/dion/www
           Set the directory for the indexer to start looking through to 
           compile it's Site Index
       
        2. -> ROOT_URL: /dion
           Set the base URL for the index (basically the URL which points to
           the ROOT_INDEX_DIRECTORY)
    
        3. -> IMAGES_URL: /images
           Set the relative URL where the images are stored (e.g. if you have
           your images at /images you would have
           the above setting)
    
        4. -> OUTPUT_FILE: /usr/home/dion/www/windex.html
           Set the HTML doc which will have the Index in it
    
        5. -> GET_DESCRIPTION_FROM: w or m or t
           Here you select the method for the program to get the [description]
           w =  description via <WINDEX "[description]">
           m =  description via <META NAME="description" CONTENT="[description]">
           t =  description via <TITLE>[description]</TITLE>
           
           if you leave it blank it will try to use "m" and if it doesn't get
           a match it will try "t"
    
        6. -> IGNORE_DIRECTORIES: yes
           If you have "yes" there then if you make a file ".ignore_index" in
           a directory the program will ignore all files in it and move to the
           next.
    
        7. -> IGNORE_FILES: yes
           If you have "yes" there then if you make a description "IGNOREINDEX"
           (e.g.  if you are
            using the "m" method) then that file will be ignored
    
        8. -> TEXT_OUTPUT_ONLY: yes
           If you have "yes" there then if will not print out any nice images
           that are in the "images/" directory. Personally i like the images :)
    
        9. -> HTML_HEADER
              [put html here]
    	  END_HTML_HEADER
    
           All the HTML inbetween the two tags HTML_HEADER, and END_HTML_HEADER
           will be printed at the top of the HTML output file (OUTPUT_FILE)
    
       10. -> HTML_FOOTER
    	  [put html here]
    	  END_HTML_FOOTER
    
           All the HTML inbetween the two tags HTML_FOOTER, and END_HTML_FOOTER
           will be printed at the bottom of the HTML output file (OUTPUT_FILE)
    
     Configuration of web_search.cgi
    
    Now to setup the Web Search part of the package. It is also simple. 1. Place the program in a place where http:// can get to it. E.g. in /cgi-bin or in your web directory. 2. Now make sure the web_index.pl has it's FORM ACTION pointing to the cgi
    3. Edit the web_search.cgi itself and change the following: $image_dir = "images"; to point to the directory that points to the where the "images/" one is 4. Change the PrintHeader, and PrintFooter function to customise the HTML that you want and change the following which holds the default search: <INPUT TYPE="hidden" NAME="IGNORE" VALUE="yes"> <INPUT TYPE="hidden" NAME="boolean" VALUE="OR"> <INPUT TYPE="hidden" NAME="case" VALUE="Insensitive"> 5. Edit the web_search.html file Change all the <A HREF> and <IMG> tags to point to your images etc. Change the again points to the web_search.cgi. Now set the following variables: <INPUT TYPE="hidden" NAME="DOC_ROOT" VALUE="/usr/home/dion/www"> <INPUT TYPE="hidden" NAME="URL_ROOT" VALUE="/dion"> <INPUT TYPE="hidden" NAME="IGNORE" VALUE="yes"> These are the same as for the web_index.pl 6. Celebrate you are done :) * ------------------------------------------------------------------------ * * If you have any questions or comments contact Dion Almaer * * ------------------------------------------------------------------------ * * Email Address | [email protected] * * WWW Page | /dion * * ------------------------------------------------------------------------ * * -=< M E M B E R S E R V I C E S I N T E R N A T I O N A L >=- * * ------------------------------------------------------------------------ *