使用 iTextSharp 阅读 PDF 文件附件注释

Reading PDF File Attachment Annotations with iTextSharp(使用 iTextSharp 阅读 PDF 文件附件注释)

本文介绍了使用 iTextSharp 阅读 PDF 文件附件注释的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下问题.我有一个 PDF,其中附有一个 XML 文件作为注释.不是作为嵌入文件,而是作为注释.现在我尝试使用以下链接中的代码阅读它:

I have the following issue. I have a PDF with a XML file attached as annotation inside it. Not as embedded file but as annotation. Now I try to read it with the code from the following link:

iTextSharp - 如何打开/读取/提取文件附件?

它适用于嵌入文件,但不适用于作为注释的文件附件.

It works for embedded files but not for file attachemts as annotations.

我用谷歌从 PDF 中提取注释并找到以下链接:使用 iText 阅读 PDF 注释

I Google for extracting annotations from PDF and find out the following link: Reading PDF Annotations with iText

所以注释类型是文件附件注释"

So the annotation type is "File Attachment Annotations"

有人可以展示一个工作示例吗?

Could someone show a working example?

提前感谢您的帮助

推荐答案

经常有关于 iText 和 iTextSharp 的问题,首先应该看看 itextpdf.com 上的关键字列表.在这里您可以找到 文件附件,提取附件,引用了来自 iText in Action — 第 2 版:

As so often in questions concerning iText and iTextSharp, one should first look at the keyword list on itextpdf.com. Here you find File attachment, extract attachments referencing two Java samples from iText in Action — 2nd Edition:

旧的关键字列表已不复存在;itextpdf.com 网站现在提供了其他搜索示例的方法,但我不会描述它们,以免网站再次更改并且我再次出现死链接......

The old keyword list is no more; the itextpdf.com site now offers other ways for searching examples but I won't describe them lest the site changes again and I have dead links once more...

基于 iText in Action — 第二版 是:

  • part4.chapter16.KubrickDvds
  • Java,iText 5.x
  • Java,iText 7.x
  • .Net,iText 5.x
  • part4.chapter16.Kubrick 纪录片
  • Java,iText 5.x
  • Java,iText 7.x
  • .Net,iText 5.x

这里是 来自 iText5 的示例

(我还没有找到 .Net 和 iText 7 的示例端口,但基于其他来源,这个端口应该不会太难......)

(I haven't found ports of the samples to .Net and iText 7 but based on the other sources this port should not be too difficult...)

KubrickDvds 包含以下方法 extractAttachments/ExtractAttachments 来提取文件附件注释:

KubrickDvds contains the following method extractAttachments/ExtractAttachments to extract File Attachment Annotations:

Java、iText 5.x:

Java, iText 5.x:

/**
 * Extracts attachments from an existing PDF.
 * @param src   the path to the existing PDF
 */
public void extractAttachments(String src) throws IOException {
    PdfReader reader = new PdfReader(src);
    PdfArray array;
    PdfDictionary annot;
    PdfDictionary fs;
    PdfDictionary refs;
    for (int i = 1; i <= reader.getNumberOfPages(); i++) {
        array = reader.getPageN(i).getAsArray(PdfName.ANNOTS);
        if (array == null) continue;
        for (int j = 0; j < array.size(); j++) {
            annot = array.getAsDict(j);
            if (PdfName.FILEATTACHMENT.equals(annot.getAsName(PdfName.SUBTYPE))) {
                fs = annot.getAsDict(PdfName.FS);
                refs = fs.getAsDict(PdfName.EF);
                for (PdfName name : refs.getKeys()) {
                    FileOutputStream fos
                        = new FileOutputStream(String.format(PATH, fs.getAsString(name).toString()));
                    fos.write(PdfReader.getStreamBytes((PRStream)refs.getAsStream(name)));
                    fos.flush();
                    fos.close();
                }
            }
        }
    }
    reader.close();
}

Java、iText 7.x:

Java, iText 7.x:

public void extractAttachments(String src) throws IOException {
    PdfDocument pdfDoc = new PdfDocument(new PdfReader(src));
    PdfReader reader = new PdfReader(src);
    PdfArray array;
    PdfDictionary annot;
    PdfDictionary fs;
    PdfDictionary refs;
    for (int i = 1; i <= pdfDoc.getNumberOfPages(); i++) {
        array = pdfDoc.getPage(i).getPdfObject().getAsArray(PdfName.Annots);
        if (array == null) continue;
        for (int j = 0; j < array.size(); j++) {
            annot = array.getAsDictionary(j);
            if (PdfName.FileAttachment.equals(annot.getAsName(PdfName.Subtype))) {
                fs = annot.getAsDictionary(PdfName.FS);
                refs = fs.getAsDictionary(PdfName.EF);
                for (PdfName name : refs.keySet()) {
                    FileOutputStream fos
                            = new FileOutputStream(String.format(PATH, fs.getAsString(name).toString()));
                    fos.write(refs.getAsStream(name).getBytes());
                    fos.flush();
                    fos.close();
                }
            }
        }
    }
    reader.close();
}

C#,iText 5.x:

C#, iText 5.x:

/**
 * Extracts attachments from an existing PDF.
 * @param src the path to the existing PDF
 * @param zip the ZipFile object to add the extracted images
 */
public void ExtractAttachments(byte[] src, ZipFile zip) {
  PdfReader reader = new PdfReader(src);
  for (int i = 1; i <= reader.NumberOfPages; i++) {
    PdfArray array = reader.GetPageN(i).GetAsArray(PdfName.ANNOTS);
    if (array == null) continue;
    for (int j = 0; j < array.Size; j++) {
      PdfDictionary annot = array.GetAsDict(j);
      if (PdfName.FILEATTACHMENT.Equals(
          annot.GetAsName(PdfName.SUBTYPE)))
      {
        PdfDictionary fs = annot.GetAsDict(PdfName.FS);
        PdfDictionary refs = fs.GetAsDict(PdfName.EF);
        foreach (PdfName name in refs.Keys) {
          zip.AddEntry(
            fs.GetAsString(name).ToString(), 
            PdfReader.GetStreamBytes((PRStream)refs.GetAsStream(name))
          );
        }
      }
    }
  }
}

KubrickDocumentary 包含以下方法 extractDocLevelAttachments/ExtractDocLevelAttachments 来提取文档级附件:

KubrickDocumentary contains the following method extractDocLevelAttachments/ExtractDocLevelAttachments to extract document level attachments:

Java、iText 5.x:

Java, iText 5.x:

/**
 * Extracts document level attachments
 * @param filename     a file from which document level attachments will be extracted
 * @throws IOException
 */
public void extractDocLevelAttachments(String filename) throws IOException {
    PdfReader reader = new PdfReader(filename);
    PdfDictionary root = reader.getCatalog();
    PdfDictionary documentnames = root.getAsDict(PdfName.NAMES);
    PdfDictionary embeddedfiles = documentnames.getAsDict(PdfName.EMBEDDEDFILES);
    PdfArray filespecs = embeddedfiles.getAsArray(PdfName.NAMES);
    PdfDictionary filespec;
    PdfDictionary refs;
    FileOutputStream fos;
    PRStream stream;
    for (int i = 0; i < filespecs.size(); ) {
      filespecs.getAsString(i++);
      filespec = filespecs.getAsDict(i++);
      refs = filespec.getAsDict(PdfName.EF);
      for (PdfName key : refs.getKeys()) {
        fos = new FileOutputStream(String.format(PATH, filespec.getAsString(key).toString()));
        stream = (PRStream) PdfReader.getPdfObject(refs.getAsIndirectObject(key));
        fos.write(PdfReader.getStreamBytes(stream));
        fos.flush();
        fos.close();
      }
    }
    reader.close();
}

Java、iText 7.x

Java, iText 7.x

public void extractDocLevelAttachments(String src) throws IOException {
    PdfDocument pdfDoc = new PdfDocument(new PdfReader(src));
    PdfDictionary root = pdfDoc.getCatalog().getPdfObject();
    PdfDictionary documentnames = root.getAsDictionary(PdfName.Names);
    PdfDictionary embeddedfiles = documentnames.getAsDictionary(PdfName.EmbeddedFiles);
    PdfArray filespecs = embeddedfiles.getAsArray(PdfName.Names);
    PdfDictionary filespec;
    PdfDictionary refs;
    FileOutputStream fos;
    PdfStream stream;
    for (int i = 0; i < filespecs.size(); ) {
        filespecs.getAsString(i++);
        filespec = filespecs.getAsDictionary(i++);
        refs = filespec.getAsDictionary(PdfName.EF);
        for (PdfName key : refs.keySet()) {
            fos = new FileOutputStream(String.format(PATH, filespec.getAsString(key).toString()));
            stream = refs.getAsStream(key);
            fos.write(stream.getBytes());
            fos.flush();
            fos.close();
        }
    }
    pdfDoc.close();
}

C#,iText 5.x:

C#, iText 5.x:

/**
 * Extracts document level attachments
 * @param PDF from which document level attachments will be extracted
 * @param zip the ZipFile object to add the extracted images
 */
public void ExtractDocLevelAttachments(byte[] pdf, ZipFile zip) {
  PdfReader reader = new PdfReader(pdf);
  PdfDictionary root = reader.Catalog;
  PdfDictionary documentnames = root.GetAsDict(PdfName.NAMES);
  PdfDictionary embeddedfiles = 
      documentnames.GetAsDict(PdfName.EMBEDDEDFILES);
  PdfArray filespecs = embeddedfiles.GetAsArray(PdfName.NAMES);
  for (int i = 0; i < filespecs.Size; ) {
    filespecs.GetAsString(i++);
    PdfDictionary filespec = filespecs.GetAsDict(i++);
    PdfDictionary refs = filespec.GetAsDict(PdfName.EF);
    foreach (PdfName key in refs.Keys) {
      PRStream stream = (PRStream) PdfReader.GetPdfObject(
        refs.GetAsIndirectObject(key)
      );
      zip.AddEntry(
        filespec.GetAsString(key).ToString(), 
        PdfReader.GetStreamBytes(stream)
      );
    }
  }
}

(出于某种原因,c# 示例将提取的文件放在一些 ZIP 文件中,而 Java 版本将它们放入文件系统中......哦,好吧......)

(For some reason the c# examples put the extracted files in some ZIP file while the Java versions put them into the file system... oh well...)

这篇关于使用 iTextSharp 阅读 PDF 文件附件注释的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!

本文标题为:使用 iTextSharp 阅读 PDF 文件附件注释

基础教程推荐