Class PDFStreamParser

java.lang.Object
org.apache.pdfbox.pdfparser.BaseParser
org.apache.pdfbox.pdfparser.PDFStreamParser

public class PDFStreamParser extends BaseParser
This will parse a PDF byte stream and extract operands and such.
  • Field Details

    • LOG

      private static final org.apache.commons.logging.Log LOG
      Log instance.
    • streamObjects

      private final List<Object> streamObjects
    • MAX_BIN_CHAR_TEST_LENGTH

      private static final int MAX_BIN_CHAR_TEST_LENGTH
      See Also:
    • binCharTestArr

      private final byte[] binCharTestArr
  • Constructor Details

  • Method Details

    • parse

      public void parse() throws IOException
      This will parse all the tokens in the stream. This will close the stream when it is finished parsing. You can then access these with getTokens().
      Throws:
      IOException - If there is an error while parsing the stream.
    • getTokens

      public List<Object> getTokens()
      This will get the tokens that were parsed from the stream by the parse() method.
      Returns:
      All of the tokens in the stream.
    • parseNextToken

      public Object parseNextToken() throws IOException
      This will parse the next token in the stream.
      Returns:
      The next token in the stream or null if there are no more tokens in the stream.
      Throws:
      IOException - If an io error occurs while parsing the stream.
    • hasNoFollowingBinData

      private boolean hasNoFollowingBinData(SequentialSource pdfSource) throws IOException
      Looks up an amount of bytes if they contain only ASCII characters (no control sequences etc.), and that these ASCII characters begin with a sequence of 1-3 non-blank characters between blanks
      Returns:
      true if next bytes are probably printable ASCII characters starting with a PDF operator, otherwise false
      Throws:
      IOException
    • readOperator

      protected String readOperator() throws IOException
      This will read an operator from the stream.
      Returns:
      The operator that was read from the stream.
      Throws:
      IOException - If there is an error reading from the stream.
    • isSpaceOrReturn

      private boolean isSpaceOrReturn(int c)
    • hasNextSpaceOrReturn

      private boolean hasNextSpaceOrReturn() throws IOException
      Checks if the next char is a space or a return.
      Returns:
      true if the next char is a space or a return
      Throws:
      IOException - if something went wrong
    • close

      public void close() throws IOException
      Close the underlying resource.
      Throws:
      IOException - if something went wrong