The Reader class serves as an entry point for parsing a PDF file.
PDF is a page based file format. There is some data associated with the document (metadata, bookmarks, etc) but all visible content is stored under a Page object.
In most use cases for extracting and examining the contents of a PDF it makes sense to traverse the information using page based iteration.
In addition to the documentation here, check out the PDF::Reader::Page class.
reader = PDF::Reader.new("somefile.pdf") puts reader.pdf_version puts reader.info puts reader.metadata puts reader.page_count
reader = PDF::Reader.new("somefile.pdf") reader.pages.each do |page| puts page.fonts puts page.images puts page.text end
reader = PDF::Reader.new("somefile.pdf") reader.pages.map(&:text)
reader = PDF::Reader.new("somefile.pdf") page = reader.page(1) puts page.fonts puts page.images puts page.text
reader = PDF::Reader.new("somefile.pdf") page = reader.page(1) page.walk(receiver)
Depending on the algorithm it may be possible to parse an encrypted file. For standard PDF encryption you‘ll need the :password option
reader = PDF::Reader.new("somefile.pdf", :password => "apples")
objects | [R] | lowlevel hash-like access to all objects in the underlying PDF |
DEPRECATED: this method was deprecated in version 1.0.0 and will
eventually be removed
Parse the file with the given name, sending events to the given receiver.
creates a new document reader for the provided PDF.
input can be an IO-ish object (StringIO, File, etc) containing a PDF or a filename
reader = PDF::Reader.new("somefile.pdf") File.open("somefile.pdf","rb") do |file| reader = PDF::Reader.new(file) end
If the source file is encrypted you can provide a password for decrypting
reader = PDF::Reader.new("somefile.pdf", :password => "apples")
DEPRECATED: this method was deprecated in version 1.0.0 and will
eventually be removed
Parse the given string, sending events to the given receiver.
returns a single PDF::Reader::Page for the specified page. Use this instead of pages method when you need to access just a single page
reader = PDF::Reader.new("somefile.pdf") page = reader.page(10) puts page.text
See the docs for PDF::Reader::Page to read more about the methods available on each page
returns an array of PDF::Reader::Page objects, one for each page in the source PDF.
reader = PDF::Reader.new("somefile.pdf") reader.pages.each do |page| puts page.fonts puts page.images puts page.text end
See the docs for PDF::Reader::Page to read more about the methods available on each page