omindex - Index static website data via the filesystem

NAME SYNOPSIS DESCRIPTION OPTIONS

NAME

omindex − Index static website data via the filesystem

SYNOPSIS

omindex [OPTIONS] --db DATABASE [BASEDIR] DIRECTORY

DESCRIPTION

omindex − Index static website data via the filesystem

DIRECTORY is the directory to start indexing from.

BASEDIR is the directory corresponding to URL (default: DIRECTORY).

OPTIONS

−d, −−duplicates=ARG

set duplicate handling: ARG can be ’ignore’ or ’replace’ (default: replace)

−p, −−no−delete

skip the deletion of documents corresponding to deleted files (−−preserve−nonduplicates is a deprecated alias for −−no−delete)

−e, −−empty−docs=ARG

how to handle documents we extract no text from: ARG can be index, warn (issue a diagnostic and index), or skip. (default: warn)

−D, −−db=DATABASE

path to database to use

−U, −−url=URL

base url BASEDIR corresponds to (default: /)

−M, −−mime−type=EXT:TYPE

assume any file with extension EXT has MIME Content−Type TYPE, instead of using libmagic (empty TYPE removes any existing mapping for EXT; other special TYPE values: ’ignore’ and ’skip’)

−G, −−mime−type−match=GLOB:TYPE

assume any file with leaf name matching shell wildcard pattern GLOB has MIME Content−Type TYPE (special TYPE values: ’ignore’ and ’skip’)

−F, −−filter=M[,[T][,C]]:CMD

process files with MIME Content−Type M using command CMD, which produces output (on stdout or in a temporary file) with format T (Content−Type or file extension; currently txt (default), html or svg) in character encoding C (default: UTF−8). E.g. −Fapplication/octet−stream:’strings −n8’ or −Ftext/x−foo,,utf−16:’foo2utf16 %f %t’

−−read−filters=FILE

bulk−load −−filter arguments from FILE, which should contain one such argument per line (e.g. text/x−bar:bar2txt −−utf8). Lines starting with # are treated as comments and ignored.

−l, −−depth−limit=LIMIT

set recursion limit (0 = unlimited)

−f, −−follow

follow symbolic links

−i, −−ignore−exclusions

ignore meta robots tags and similar exclusions

−S, −−spelling

index data for spelling correction

−m, −−max−size=N[SUFFIX]

maximum size of file to index (in bytes or with a suffix of ’K’/’k’, ’M’/’m’, ’G’/’g’) (default: unlimited)

−−sample=SOURCE

what to use for the stored sample of text for HTML documents − SOURCE can be ’body’ or ’description’ (default: ’body’)

−E, −−sample−size=SIZE

maximum size for the document text sample (supports the same formats as −−max−size). (default: 512)

−T, −−title−size=SIZE

maximum size for the document title (supports the same formats as −−max−size). (default: 128)

−R, −−retry−failed

retry files which omindex failed to extract text from on a previous run

−−opendir−sleep=SECS

sleep for SECS seconds before opening each directory − sleeping for 2 seconds seems to reliably work around problems with indexing files on Microsoft DFS shares.

−C, −−track−ctime

track each file’s ctime so we can detect changes to ownership or permissions.

−−date−terms

ignored for forward compatibility with Omega 1.5.x.

−−no−date−terms

don’t index D, M and Y prefixed terms to support date range filtering using terms (we now recommend using a value slot for this instead).

−v, −−verbose

show more information about what is happening

−−overwrite

create the database anew (the default is to update if the database already exists)

−s, −−stemmer=LANG

set the stemming language (default: english). Possible values: arabic armenian basque catalan danish dutch earlyenglish english finnish french german german2 hungarian indonesian irish italian kraaij_pohlmann lithuanian lovins nepali norwegian porter portuguese romanian russian spanish swedish tamil turkish (pass ’none’ to disable stemming)

−h, −−help

display this help and exit

−V, −−version

output version information and exit

Please report bugs at: https://xapian.org/bugs

Updated 2024-01-29 - jenkler.se | uex.se