Table of Contents
pdftotext - Portable Document Format (PDF) to text converter (version
0.92)
pdftotext [options] [PDF-file [text-file]]
Pdftotext
converts Portable Document Format (PDF) files to plain text.
Pdftotext
reads the PDF file, PDF-file, and writes a text file, text-file. If text-file
is not specified, pdftotext converts file.pdf to file.txt. If text-file is
'-', the text is sent to stdout.
- -f number
- Specifies the first page
to convert.
- -l number
- Specifies the last page to convert.
- -ascii7
- Convert
the text to 7-bit ASCII; the default is to use the 8-bit ISO Latin-1 character
set.
- -latin2
- Convert the text to the Latin-2 (ISO-8859-2) character set. (This
will only be useful if the font encodings are specified correctly in the
PDF file.)
- -latin5
- Convert the text to the Latin-5 (ISO-8859-9) character
set. (This will only be useful if the font encodings are specified correctly
in the PDF file.)
- -eucjp
- Convert Japanese text to EUC-JP. This is currently
the only option for converting Japanese text -- the only effect is to switch
to 7-bit ASCII for non-Japanese text, in order to fit into the EUC-JP encoding.
(This option is only available if pdftotext was compiled with Japanese
support.)
- -raw
- Keep the text in content stream order. This is a hack which
often "undoes" column formatting, etc. This option will likely be replaced
with something more sophisticated when pdftotext is rewritten to use a
smarter text placement algorithm.
- -upw password
- Specify the user password
for the PDF file.
- -q
- Don't print any messages or errors.
- -v
- Print copyright
and version information.
- -h
- Print usage information. (-help is equivalent.)
Some PDF files contain fonts whose encodings have been mangled beyond
recognition. There is no way (short of OCR) to extract text from these
files.
The pdftotext software and documentation are copyright 1996-2000
Derek B. Noonburg (derekn@foolabs.com).
xpdf(1)
, pdftops(1)
, pdfinfo(1)
,
pdftopbm(1)
, pdfimages(1)
http://www.foolabs.com/xpdf/
Table of Contents