Package org.apache.pdfbox.multipdf
Class PDFMergerUtility
java.lang.Object
org.apache.pdfbox.multipdf.PDFMergerUtility
This class will take a list of pdf documents and merge them, saving the
result in a new document.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic enum
The mode to use when merging AcroForm between documents:PDFMergerUtility.AcroFormMergeMode.JOIN_FORM_FIELDS_MODE
fields with the same fully qualified name will be merged into one with the widget annotations of the merged fields becoming part of the same field.
Although the API is finalized processing of different form field types is still in development. Currently only (nested) text fields do work with intermediate nodes being existent.static enum
The mode to use when merging documents:PDFMergerUtility.DocumentMergeMode.OPTIMIZE_RESOURCES_MODE
Optimizes resource handling such as closing documents early. -
Field Summary
FieldsModifier and TypeFieldDescriptionprivate PDDocumentInformation
private String
private PDMetadata
private OutputStream
private boolean
private static final org.apache.commons.logging.Log
Log instance.private int
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprivate void
acroFormJoinFieldsMode
(PDFCloneUtility cloner, PDAcroForm destAcroForm, PDAcroForm srcAcroForm) private void
acroFormLegacyMode
(PDFCloneUtility cloner, PDAcroForm destAcroForm, PDAcroForm srcAcroForm) void
Add a source file to the list of files to merge.void
addSource
(InputStream source) Add a source to the list of documents to merge.void
Add a source file to the list of files to merge.void
addSources
(List<InputStream> sourcesList) Add a list of sources to the list of documents to merge.void
appendDocument
(PDDocument destination, PDDocument source) append all pages from source to destination.private void
cleanupFieldCOSDictionary
(COSDictionary fieldCos) private void
cleanupWidgetCOSDictionary
(COSDictionary widgetCos, boolean removeDAEntry) Get the merge mode to be used for merging AcroForms between documentsPDFMergerUtility.AcroFormMergeMode
Get the destination document information that is to be set inmergeDocuments(org.apache.pdfbox.io.MemoryUsageSetting)
.Get the name of the destination file.Set the destination metadata that is to be set inmergeDocuments(org.apache.pdfbox.io.MemoryUsageSetting)
.Get the destination OutputStream.Get the merge mode to be used for merging documentsPDFMergerUtility.DocumentMergeMode
(package private) static Map<String,
PDStructureElement> (package private) static Map<Integer,
COSObjectable> private boolean
hasOnlyDocumentsOrParts
(COSArray kLevelOneArray) private boolean
isDynamicXfa
(PDAcroForm acroForm) Test for dynamic XFA content.boolean
Indicates if acroform errors are ignored or not.private void
legacyMergeDocuments
(MemoryUsageSetting memUsageSetting) Merge the list of source documents, saving the result in the destination file.private void
mergeAcroForm
(PDFCloneUtility cloner, PDDocumentCatalog destCatalog, PDDocumentCatalog srcCatalog) Merge the contents of the source form into the destination form for the destination file.void
Deprecated.void
mergeDocuments
(MemoryUsageSetting memUsageSetting) Merge the list of source documents, saving the result in the destination file.private void
mergeFields
(PDFCloneUtility cloner, PDField destField, PDField srcField) private void
mergeIDTree
(PDFCloneUtility cloner, PDStructureTreeRoot srcStructTree, PDStructureTreeRoot destStructTree) private void
mergeInto
(COSDictionary src, COSDictionary dst, Set<COSName> exclude) This will add all of the dictionaries keys/values to this dictionary, but only if they are not in an exclusion list and if they don't already exist.private void
mergeKEntries
(PDFCloneUtility cloner, PDStructureTreeRoot srcStructTree, PDStructureTreeRoot destStructTree) private void
mergeLanguage
(PDDocumentCatalog destCatalog, PDDocumentCatalog srcCatalog) private void
mergeMarkInfo
(PDDocumentCatalog destCatalog, PDDocumentCatalog srcCatalog) private void
mergeOutputIntents
(PDFCloneUtility cloner, PDDocumentCatalog srcCatalog, PDDocumentCatalog destCatalog) private void
mergeRoleMap
(PDStructureTreeRoot srcStructTree, PDStructureTreeRoot destStructTree) private void
mergeViewerPreferences
(PDDocumentCatalog destCatalog, PDDocumentCatalog srcCatalog) private void
optimizedMergeDocuments
(MemoryUsageSetting memUsageSetting) void
setAcroFormMergeMode
(PDFMergerUtility.AcroFormMergeMode theAcroFormMergeMode) Set the merge mode to be used for merging AcroForms between documentsPDFMergerUtility.AcroFormMergeMode
void
Set the destination document information that is to be set inmergeDocuments(org.apache.pdfbox.io.MemoryUsageSetting)
.void
setDestinationFileName
(String destination) Set the name of the destination file.void
Set the destination metadata that is to be set inmergeDocuments(org.apache.pdfbox.io.MemoryUsageSetting)
.void
setDestinationStream
(OutputStream destStream) Set the destination OutputStream.void
setDocumentMergeMode
(PDFMergerUtility.DocumentMergeMode theDocumentMergeMode) Set the merge mode to be used for merging documentsPDFMergerUtility.DocumentMergeMode
void
setIgnoreAcroFormErrors
(boolean ignoreAcroFormErrorsValue) Set to true to ignore acroform errors.private void
updatePageReferences
(PDFCloneUtility cloner, Map<Integer, COSObjectable> numberTreeAsMap, Map<COSDictionary, COSDictionary> objMapping) Update the Pg and Obj references to the new (merged) page.private void
updatePageReferences
(PDFCloneUtility cloner, COSArray parentTreeEntry, Map<COSDictionary, COSDictionary> objMapping) private void
updatePageReferences
(PDFCloneUtility cloner, COSDictionary parentTreeEntry, Map<COSDictionary, COSDictionary> objMapping) Update the Pg and Obj references to the new (merged) page.private void
updateParentEntry
(COSArray kArray, COSDictionary newParent, COSName newStructureType) Update the P reference to the new parent dictionary.private void
updateStructParentEntries
(PDPage page, int structParentOffset) Update the StructParents and StructParent values in a PDPage.
-
Field Details
-
LOG
private static final org.apache.commons.logging.Log LOGLog instance. -
sources
-
destinationFileName
-
destinationStream
-
ignoreAcroFormErrors
private boolean ignoreAcroFormErrors -
destinationDocumentInformation
-
destinationMetadata
-
documentMergeMode
-
acroFormMergeMode
-
nextFieldNum
private int nextFieldNum
-
-
Constructor Details
-
PDFMergerUtility
public PDFMergerUtility()Instantiate a new PDFMergerUtility.
-
-
Method Details
-
getAcroFormMergeMode
Get the merge mode to be used for merging AcroForms between documentsPDFMergerUtility.AcroFormMergeMode
-
setAcroFormMergeMode
Set the merge mode to be used for merging AcroForms between documentsPDFMergerUtility.AcroFormMergeMode
-
setDocumentMergeMode
Set the merge mode to be used for merging documentsPDFMergerUtility.DocumentMergeMode
-
getDocumentMergeMode
Get the merge mode to be used for merging documentsPDFMergerUtility.DocumentMergeMode
-
getDestinationFileName
Get the name of the destination file.- Returns:
- Returns the destination.
-
setDestinationFileName
Set the name of the destination file.- Parameters:
destination
- The destination to set.
-
getDestinationStream
Get the destination OutputStream.- Returns:
- Returns the destination OutputStream.
-
setDestinationStream
Set the destination OutputStream.- Parameters:
destStream
- The destination to set.
-
getDestinationDocumentInformation
Get the destination document information that is to be set inmergeDocuments(org.apache.pdfbox.io.MemoryUsageSetting)
. The default is null, which means that it is ignored.- Returns:
- The destination document information.
-
setDestinationDocumentInformation
Set the destination document information that is to be set inmergeDocuments(org.apache.pdfbox.io.MemoryUsageSetting)
. The default is null, which means that it is ignored.- Parameters:
info
- The destination document information.
-
getDestinationMetadata
Set the destination metadata that is to be set inmergeDocuments(org.apache.pdfbox.io.MemoryUsageSetting)
. The default is null, which means that it is ignored.- Returns:
- The destination metadata.
-
setDestinationMetadata
Set the destination metadata that is to be set inmergeDocuments(org.apache.pdfbox.io.MemoryUsageSetting)
. The default is null, which means that it is ignored.- Parameters:
meta
- The destination metadata.
-
addSource
Add a source file to the list of files to merge.- Parameters:
source
- Full path and file name of source document.- Throws:
FileNotFoundException
- If the file doesn't exist
-
addSource
Add a source file to the list of files to merge.- Parameters:
source
- File representing source document- Throws:
FileNotFoundException
- If the file doesn't exist
-
addSource
Add a source to the list of documents to merge.- Parameters:
source
- InputStream representing source document
-
addSources
Add a list of sources to the list of documents to merge.- Parameters:
sourcesList
- List of InputStream objects representing source documents
-
mergeDocuments
Deprecated.Merge the list of source documents, saving the result in the destination file.- Throws:
IOException
- If there is an error saving the document.
-
mergeDocuments
Merge the list of source documents, saving the result in the destination file. The source list is not reset after merge. If you want to merge one document at a time, then it's better to useappendDocument(org.apache.pdfbox.pdmodel.PDDocument, org.apache.pdfbox.pdmodel.PDDocument)
.- Parameters:
memUsageSetting
- defines how memory is used for buffering PDF streams; in case ofnull
unrestricted main memory is used- Throws:
IOException
- If there is an error saving the document.
-
optimizedMergeDocuments
- Throws:
IOException
-
legacyMergeDocuments
Merge the list of source documents, saving the result in the destination file.- Parameters:
memUsageSetting
- defines how memory is used for buffering PDF streams; in case ofnull
unrestricted main memory is used- Throws:
IOException
- If there is an error saving the document.
-
appendDocument
append all pages from source to destination.- Parameters:
destination
- the document to receive the pagessource
- the document originating the new pages- Throws:
IOException
- If there is an error accessing data from either document.
-
mergeViewerPreferences
-
mergeLanguage
-
mergeMarkInfo
-
mergeKEntries
private void mergeKEntries(PDFCloneUtility cloner, PDStructureTreeRoot srcStructTree, PDStructureTreeRoot destStructTree) throws IOException - Throws:
IOException
-
hasOnlyDocumentsOrParts
-
updateParentEntry
Update the P reference to the new parent dictionary.- Parameters:
kArray
- the kids arraynewParent
- the new parentnewStructureType
- the new structure type in /S or null so it doesn't get replaced
-
mergeIDTree
private void mergeIDTree(PDFCloneUtility cloner, PDStructureTreeRoot srcStructTree, PDStructureTreeRoot destStructTree) throws IOException - Throws:
IOException
-
getIDTreeAsMap
static Map<String,PDStructureElement> getIDTreeAsMap(PDNameTreeNode<PDStructureElement> idTree) throws IOException - Throws:
IOException
-
getNumberTreeAsMap
- Throws:
IOException
-
mergeRoleMap
-
mergeOutputIntents
private void mergeOutputIntents(PDFCloneUtility cloner, PDDocumentCatalog srcCatalog, PDDocumentCatalog destCatalog) throws IOException - Throws:
IOException
-
mergeAcroForm
private void mergeAcroForm(PDFCloneUtility cloner, PDDocumentCatalog destCatalog, PDDocumentCatalog srcCatalog) throws IOException Merge the contents of the source form into the destination form for the destination file.- Parameters:
cloner
- the object cloner for the destination documentdestAcroForm
- the destination formsrcAcroForm
- the source form- Throws:
IOException
- If an error occurs while adding the field.
-
acroFormJoinFieldsMode
private void acroFormJoinFieldsMode(PDFCloneUtility cloner, PDAcroForm destAcroForm, PDAcroForm srcAcroForm) throws IOException - Throws:
IOException
-
mergeFields
-
cleanupFieldCOSDictionary
-
cleanupWidgetCOSDictionary
-
acroFormLegacyMode
private void acroFormLegacyMode(PDFCloneUtility cloner, PDAcroForm destAcroForm, PDAcroForm srcAcroForm) throws IOException - Throws:
IOException
-
isIgnoreAcroFormErrors
public boolean isIgnoreAcroFormErrors()Indicates if acroform errors are ignored or not.- Returns:
- true if acroform errors are ignored
-
setIgnoreAcroFormErrors
public void setIgnoreAcroFormErrors(boolean ignoreAcroFormErrorsValue) Set to true to ignore acroform errors.- Parameters:
ignoreAcroFormErrorsValue
- true if acroform errors should be ignored
-
updatePageReferences
private void updatePageReferences(PDFCloneUtility cloner, Map<Integer, COSObjectable> numberTreeAsMap, Map<COSDictionary, throws IOExceptionCOSDictionary> objMapping) Update the Pg and Obj references to the new (merged) page.- Throws:
IOException
-
updatePageReferences
private void updatePageReferences(PDFCloneUtility cloner, COSDictionary parentTreeEntry, Map<COSDictionary, COSDictionary> objMapping) throws IOExceptionUpdate the Pg and Obj references to the new (merged) page.- Parameters:
parentTreeEntry
-objMapping
- mapping between old and new references- Throws:
IOException
-
updatePageReferences
private void updatePageReferences(PDFCloneUtility cloner, COSArray parentTreeEntry, Map<COSDictionary, COSDictionary> objMapping) throws IOException- Throws:
IOException
-
updateStructParentEntries
Update the StructParents and StructParent values in a PDPage.- Parameters:
page
- the new pagestructParentOffset
- the offset which should be applied- Throws:
IOException
-
isDynamicXfa
Test for dynamic XFA content.- Parameters:
acroForm
- the AcroForm- Returns:
- true if there is a dynamic XFA form.
-
mergeInto
This will add all of the dictionaries keys/values to this dictionary, but only if they are not in an exclusion list and if they don't already exist. If a key already exists in this dictionary then nothing is changed.- Parameters:
src
- The source dictionary to get the keys/values from.dst
- The destination dictionary to merge the keys/values into.exclude
- Names of keys that shall be skipped.
-
mergeDocuments(org.apache.pdfbox.io.MemoryUsageSetting)