omindex − Index static website data via the filesystem
omindex [OPTIONS] --db DATABASE [BASEDIR] DIRECTORY
omindex − Index static website data via the filesystem
DIRECTORY is the directory to start indexing from.
BASEDIR is the directory corresponding to URL (default: DIRECTORY).
−d, −−duplicates=ARG
set duplicate handling: ARG can be ’ignore’ or ’replace’ (default: replace)
−p, −−no−delete
skip the deletion of documents corresponding to deleted files (−−preserve−nonduplicates is a deprecated alias for −−no−delete)
−e, −−empty−docs=ARG
how to handle documents we extract no text from: ARG can be index, warn (issue a diagnostic and index), or skip. (default: warn)
−D, −−db=DATABASE
path to database to use
−U, −−url=URL
base url BASEDIR corresponds to (default: /)
−M, −−mime−type=EXT:TYPE
assume any file with extension EXT has MIME Content−Type TYPE, instead of using libmagic (empty TYPE removes any existing mapping for EXT; other special TYPE values: ’ignore’ and ’skip’)
−G, −−mime−type−match=GLOB:TYPE
assume any file with leaf name matching shell wildcard pattern GLOB has MIME Content−Type TYPE (special TYPE values: ’ignore’ and ’skip’)
−F, −−filter=M[,[T][,C]]:CMD
process files with MIME Content−Type M using command CMD, which produces output (on stdout or in a temporary file) with format T (Content−Type or file extension; currently txt (default), html or svg) in character encoding C (default: UTF−8). E.g. −Fapplication/octet−stream:’strings −n8’ or −Ftext/x−foo,,utf−16:’foo2utf16 %f %t’
−−read−filters=FILE
bulk−load −−filter arguments from FILE, which should contain one such argument per line (e.g. text/x−bar:bar2txt −−utf8). Lines starting with # are treated as comments and ignored.
−l, −−depth−limit=LIMIT
set recursion limit (0 = unlimited)
−f, −−follow
follow symbolic links
−i, −−ignore−exclusions
ignore meta robots tags and similar exclusions
−S, −−spelling
index data for spelling correction
−m, −−max−size=N[SUFFIX]
maximum size of file to index (in bytes or with a suffix of ’K’/’k’, ’M’/’m’, ’G’/’g’) (default: unlimited)
−−sample=SOURCE
what to use for the stored sample of text for HTML documents − SOURCE can be ’body’ or ’description’ (default: ’body’)
−E, −−sample−size=SIZE
maximum size for the document text sample (supports the same formats as −−max−size). (default: 512)
−T, −−title−size=SIZE
maximum size for the document title (supports the same formats as −−max−size). (default: 128)
−R, −−retry−failed
retry files which omindex failed to extract text from on a previous run
−−opendir−sleep=SECS
sleep for SECS seconds before opening each directory − sleeping for 2 seconds seems to reliably work around problems with indexing files on Microsoft DFS shares.
−C, −−track−ctime
track each file’s ctime so we can detect changes to ownership or permissions.
−−date−terms
ignored for forward compatibility with Omega 1.5.x.
−−no−date−terms
don’t index D, M and Y prefixed terms to support date range filtering using terms (we now recommend using a value slot for this instead).
−v, −−verbose
show more information about what is happening
−−overwrite
create the database anew (the default is to update if the database already exists)
−s, −−stemmer=LANG
set the stemming language (default: english). Possible values: arabic armenian basque catalan danish dutch earlyenglish english finnish french german german2 hungarian indonesian irish italian kraaij_pohlmann lithuanian lovins nepali norwegian porter portuguese romanian russian spanish swedish tamil turkish (pass ’none’ to disable stemming)
−h, −−help
display this help and exit
−V, −−version
output version information and exit
Please report bugs at: https://xapian.org/bugs