hxextract - extract selected elements from a HTML or XML file

NAME  SYNOPSIS  DESCRIPTION  OPTIONS  OPERANDS  ENVIRONMENT  BUGS  SEE ALSO 

NAME

hxextract − extract selected elements from a HTML or XML file

SYNOPSIS

hxextract [ −h | −? ] [ −x ] [ −s text ] [ −e text ] [ −b base ] element-or-class [ −c configfile | file−or−URL ]

DESCRIPTION

hxextract outputs all elements with a certain name and/or class.

Input must be well-formed, since no HTML heuristics are applied.

OPTIONS

The following options are supported:

−x

Use XML format conventions.

−s text

Insert text at the start of the output.

−e text

Insert text at the end of the output.

−b base

URL base

−c configfile

Read @chapter lines from configfile (lines must be of the form "@chapter filename") and extract elements from each of those files.

−h, −?

Print command usage.

OPERANDS

The following operands are supported:
element-or-class

The name of an element to extract (e.g., "H2"), or the name of a class preceded by "." (e.g., ".example") or a combination of both (e.g., "H2.example").

file-or-URL

A file name or a URL. To read from standard input, use "-".

ENVIRONMENT

To use a proxy to retrieve remote files, set the environment variables http_proxy and ftp_proxy. E.g., http_proxy="http://localhost:8080/"

BUGS

Remote files (specified with a URL) are currently only supported for HTTP. Password-protected files or files that depend on HTTP "cookies" are not handled. (You can use tools such as curl(1) or wget(1) to retrieve such files.)

SEE ALSO

hxselect(1)


Updated 2024-01-29 - jenkler.se | uex.se