KSEARCH VERSION 1.5a Copyright (C) 2000 - 2008, KScripts - www.kscripts.com Parts of this script are Copyright: www.perlfect.com (C)2000 N.Moraitakis & G.Zervas. All rights reserved ========================================================================================================= == GNU GENERAL PUBLIC LICENSE: == This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 ========================================================================================================= == READ THIS: == This file contains general instructions on how to install and use KSearch. For troubleshooting information read the attached FAQs.html file and/or visit ourDiscussion Forum sites at: http://www.kscripts.com/discus/ ============================================================================== ============================================================================== == GENERAL INSTALLATION INSTRUCTIONS: == Nine steps, approximately 45 minutes. You will need a text editor, and access to your server to edit and run scripts. If you want to index PDF files, you will need to install Xpdf from www.foolabs.com/xpdf/. See faqs.html for details. 1. Open indexer.cgi + indexer.pl + ksearch.cgi and set the Path to Perl on Line 1. =The default path is: #!/usr/bin/perl =Unix OS: To determine the path, type 'which perl' at the command line prompt. =Windows OS: The absolute path: #!C:/where_ever_perl_installed/perl.exe 2. Open search_form.html Line 18, find "". Open search_tips.html Line 19, find "". Change the "../index.html" and set it to your home page name (../index.php, etc) 3. Change paramaters in the configuration.pl file (located in the /configuration folder). = CHANGES THAT ARE NECESSARY = Line 13: $INDEXER_START is the starting path of files to begin indexing. Line 17: $BASE_URL corresponding to the path in Line 13. Line 20: $SEARCH_URL the absolute location of ksearch.cgi file after uploaded. Line 23: $KSEARCH_DIR the path to the search directory, after uploaded. Line 28: @VALID_REFERERS is the domain you must be in to run indexer.cgi NOTE: There is a difference between http://www.mydomain.com and http://mydomain.com If you DO NOT use www in your URL to access indexer.cgi then don't include it here. Line 29: $INDEXER_URL the absolute URL path to indexer.cgi after uploaded. Line 30: $PASSWORD is required to access the indexer.cgi (what ever you want it to be) Line 69: $LOG_SEARCH is the absolute URL path to the search_log.txt after uploaded. All other configuration.pl changes are optional. If you don't know what they are, then don't change them. If you have questions, contact us: Web: www.kscripts.com/contact_us.html Forum: www.kscripts.com/discus/index.php Email: email@kscripts.com 4. Ignore Files and Folders: Add the full path of files/folders YOU DO NOT WANT to index to the ignore files list, on seperate lines -- configuration/ignore_files.txt =NOTE=: After indexing, you may discover files/folders you don't want to include in your search engine. You may later come back and add files/folders -- however, you'll need to re-index your website using indexer.cgi 5. Stop Terms: Add terms YOU WANT TO IGNORE to the search engine stop terms list, on seperate lines -- configuration/stop_terms.txt =NOTE=: After indexing, you may discover terms you don't want to include in your search engine. You may later come back and add terms to the file -- however, you'll need to re-index your website using indexer.cgi 6. Upload the folder SEARCH and all its contents to your website. Maintain the folder/file structure without changes. =NOTE=: if you change the file/folder structure, some or all of your search engine will NOT finction properly, if it functions at all. =NOTE=: You don't have to upload the 5 files not included in the SEARCH folder. (CHANGELOG.txt, GNU.txt, HISTORY.txt, README.txt, and FAQs.html) These are for your personal reference, troubleshooting, and future use. 7. Set permissions for each file/directory, after uploading. package includes file name set permissions indexer script: indexer.pl 755 cgi indexer script: indexer.cgi 755 search script: ksearch.cgi 755 search form: search_form.html 744 configuration file: configuration/configuration.pl 744 ignore files list: configuration/ignore_files.txt 744 stop terms list: configuration/stop_terms.txt 744 HTML template: templates/search.html 744 database directory: database/ 744 database: database/database.txt 744 search log: databse/search_log.txt 744 images: ks_images/ 755 images: ks_images/ks-forum.gif 644 images: ks_images/ks-kscripts.gif 644 images: ks_images/KSlogo.gif 644 images: ks_images/uparrow.gif 644 images: ks_images/valid_xhtml.gif 644 style sheet: ks_images/style.css 644 (For Unix Hosts) (type at command line) (type 'ls -l' for file list) read/exec 'chmod 755 filename' -rwxr-xr-x read 'chmod 744 filename' -rwxr--r-- image 'chmod 644 filename' -rw-r--r-- 8. Run the INDEXER: When you're finished with permissions (above), open your browser and run the indexer script: http://www.MyWebsite.com/search/indexer.cgi The time required will depend on the size of your site and your server's CPU. =NOTE=: You need to use the same URL path as specified in configuration.pl -- Line 28, @VALID_REFERERS. 9. Test it out: Open the search_form.html (http://www.MyWebsite.com/search/search_form.html) Run a search. Questions or problems, FIRST read the enclosed FAQs.html file. ============================================================================== ============================================================================== Additional Info: A. You may change the search_form.html HTML template to match your web site design preference. You may also edit the templates/search.html for a custom Search Results page. =NOTE=: At the bottom of the search.html (templates/search.html) file are descriptions of all the possible fields you may add to the template. By default, all possible fields are added. B. This search engine is designed to handle very large websites if configured correctly. There is a tradeoff between search speed and disk space: if configured for speed, this search engine will use more disk space. To configure for speed (larger websites) change the following settings in 'configuration/configuration.pl': 1. $IGNORE_COMMON_TERMS = 90; Set this to the maximum percentage of files that indexed terms can exist in. This will remove common terms not present in 'stop_terms.txt'. 2. $SAVE_CONTENT = 1; Set this to 1 to index the processed contents of each file in the database. If set to 0, the search engine will perform an on-the-fly search of all files. 3. It is recommended to use DB_File. The search engine will be slightly faster. See below for details. To configure to save disk space (smaller web sites) use the following settings in 'configuration/configuration.pl': 1. $IGNORE_COMMON_TERMS = 90; Set this to the maximum percentage of files that indexed terms can exist in. This will remove common terms not present in 'stop_terms.txt'. 2. $SAVE_CONTENT = 0; Set this to 0 to search on-the-fly. This could be extremely slow for large websites. 3. $MAKE_LOG = 0; Set this to 0 so you do not create a logfile of the indexing routine. C. DBM Issues: This search engine relies on a Perl DBM database library to create the search index and will automatically use the best available DBM database on your system. SDBM is the standard library bundled with Perl, and has a limitation that may effect the performance of the search engine. The SDBM, ODBM, and NDBM libraries have block size limits (memory limits) that may terminate the indexing routine if your site is too large. If you have DB_File or GDBM_File, this will not be an issue. Also, if you use "$SAVE_CONTENT = 1" and have DB_File or GDBM_File, the search script will perform slightly faster because the data is stored in the DBM database instead of separate files. If SDBM, ODBM, or NDBM is used and you use "$SAVE_CONTENT = 1", the indexer will save the processed file contents to separate files to avoid memory limits. However, this does not prevent your chance to reach the block size limits if you have a very large site. For most sites, this will not be an issue. D. If you receive the error: 'dbm store returned -1, errno 28, key "trap" at - line 3.' while running the indexer, you have reached the block size limit. To fix this problem, you will need DB_File (install from CPAN) and the Berkley DB (http://www.sleepycat.com) or GDBM_File (install from CPAN). We recommend DB_File. E. See: FAQs page (enclosed in the KSearch1.5a.zip download) our Discussion Forum page at: http://www.kscripts.com/discus/index.php Perl DB_File documentation: http://www.perl.com/pub/doc/manual/html/lib/DB_File.html Sleepycat Software documentation for the Berkeley DB: http://www.sleepycat.com/ F. Credits: www.perlfect.com - N.Moraitakis & G.Zervas - Thanks to the people at www.perlfect.com for much of the structure behind this script. For a more robust search script please visit www.perlfect.com.