使用javascript(使用pdf.js)将pdf转换为png数组

Turn pdf into array of png#39;s using javascript (with pdf.js)(使用javascript(使用pdf.js)将pdf转换为png数组)

本文介绍了使用javascript(使用pdf.js)将pdf转换为png数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试开发一个前端代码,要求用户提供一个 pdf,然后在内部(在用户浏览器中)生成一个 png 数组(通过数据到 url),其中数组中的每个条目对应于pdf:

Im trying to develop a frontend code that asks the user to provide a pdf and then internally (in the users browser) produces an array of png's (via data to url) where each entry in the array corresponds to a page in the pdf:

dat[0] = 第 1 页的 png
dat[1] = 第 2 页的 png
...

dat[0] = png of page 1
dat[1] = png of page 2
...

当我测试下面的代码时,页面以某种方式呈现在彼此之上并旋转.

When I test the below code the pages are somehow rendered on top of eachother and rotated.

<script src="http://cdnjs.cloudflare.com/ajax/libs/processing.js/1.4.1/processing-api.min.js"></script><html>
<!--
  Created using jsbin.com
  Source can be edited via http://jsbin.com/pdfjs-helloworld-v2/8598/edit
-->
<body>
  <canvas id="the-canvas" style="border:1px solid black"></canvas>
  <input id='pdf' type='file'/>

  <!-- Use latest PDF.js build from Github -->

  <script src="https://ajax.googleapis.com/ajax/libs/jquery/2.2.4/jquery.min.js"></script>
  <script src="pdf.js"></script>
  <script src="pdf.worker.js"></script>
  <script type="text/javascript">
    //
    // Asynchronous download PDF as an ArrayBuffer
    //
    dat = [];
    

    var pdf = document.getElementById('pdf');
    pdf.onchange = function(ev) {
      if (file = document.getElementById('pdf').files[0]) {
        fileReader = new FileReader();
        fileReader.onload = function(ev) {
          //console.log(ev);
          PDFJS.getDocument(fileReader.result).then(function getPdfHelloWorld(pdf) {
            //
            // Fetch the first page
            //
            number_of_pages = pdf.numPages;

            for(i = 1; i < number_of_pages+1; ++i) {
              pdf.getPage(i).then(function getPageHelloWorld(page) {

              var scale = 1;
              var viewport = page.getViewport(scale);

              //
              // Prepare canvas using PDF page dimensions
              //
              var canvas = document.getElementById('the-canvas');
              var context = canvas.getContext('2d');
              canvas.height = viewport.height;
              canvas.width = viewport.width;

              //
              // Render PDF page into canvas context
              //
              var renderContext = {
                canvasContext: context,
                viewport: viewport};
              page.render(renderContext).then(function() {
                dat.push(canvas.toDataURL('image/png'));
              });
              });
            }
            //console.log(pdf.numPages);
            //console.log(pdf)

          }, function(error){
            console.log(error);
          });
        };
        fileReader.readAsArrayBuffer(file);
      }
    }

  </script>


<style id="jsbin-css">

</style>
<script>

</script>
</body>
</html>

我只对数组 dat 感兴趣.当我渲染数组中的图像时,我看到了dat[0] = 第 1 页的 png(正确)
dat[1] = 第 1 页的 png 和第 2 页的 png 相互旋转 180
...

Im only interested in the array dat. When I render the images in the array I see that dat[0] = png of page 1 (correct)
dat[1] = png of page 1 and png page 2 rotated 180 on top of each other
...

如何确保在数组的每个条目中正确呈现单个页面?

How do I ensure a correct rendering of single pages in each entry of the array?

推荐答案

尝试在不同的画布上渲染页面.您可以创建一个 canvas 并将其附加到容器中使用

Try rendering the pages on a different canvas. You can create a canvas and append it to the container using

var canvasdiv = document.getElementById('canvas');      
var canvas = document.createElement('canvas');
canvasdiv.appendChild(canvas);

var url = 'https://file-examples-com.github.io/uploads/2017/10/file-sample_150kB.pdf';

var PDFJS = window['pdfjs-dist/build/pdf'];

PDFJS.GlobalWorkerOptions.workerSrc = '//mozilla.github.io/pdf.js/build/pdf.worker.js';

var loadingTask = PDFJS.getDocument(url);

loadingTask.promise.then(function(pdf) {

  var canvasdiv = document.getElementById('canvas');
  var totalPages = pdf.numPages
  var data = [];

  for (let pageNumber = 1; pageNumber <= totalPages; pageNumber++) {
    pdf.getPage(pageNumber).then(function(page) {

      var scale = 1.5;
      var viewport = page.getViewport({ scale: scale });

      var canvas = document.createElement('canvas');
      canvasdiv.appendChild(canvas);

      // Prepare canvas using PDF page dimensions
      var context = canvas.getContext('2d');
      canvas.height = viewport.height;
      canvas.width = viewport.width;

      // Render PDF page into canvas context
      var renderContext = { canvasContext: context, viewport: viewport };

      var renderTask = page.render(renderContext);
      renderTask.promise.then(function() {
        data.push(canvas.toDataURL('image/png'))
        console.log(data.length + ' page(s) loaded in data')
      });
    });
  }

}, function(reason) {
  // PDF loading error
  console.error(reason);
});

canvas {
  border: 1px solid black;
  margin: 5px;
  width: 25%;
}

<script src="//mozilla.github.io/pdf.js/build/pdf.js"></script>

<div id="canvas"></div>

这篇关于使用javascript(使用pdf.js)将pdf转换为png数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!

本文标题为:使用javascript(使用pdf.js)将pdf转换为png数组

基础教程推荐