PDFのページを回転・削除して保存するWebAPPの作成

Screen shot

PDF - page rotation / remove page

PDFファイルの指定したページに対し回転や削除を行いPDFファイルとして再構成する。機能を限定し最小構成で実現することを試みる。

Sample Site

さくらインターネット・レンタルサーバスタンダードプランへ実装した作例。
(動作テスト用のためpdfファイルサイズを約1Mbyteに制限)

https://yanmos.jpn.org/editpdf/editpdf

Local環境ではuvicornで実行、レンタルサーバではcgi経由でa2wsgiを利用した。

Structure

flowchart LR
  subgraph fastAPI
  cgi[main.py]
  end
  subgraph uikit
  html[main.html]
  end
  infile((.pdf file))
  outfile((.pdf file))
  html-->cgi-->outfile
  infile-->html

Library, Framework

使用したライブラリ、フレームワークは下記の通り。

UIkit・・・Page appearance
fastAPI・・・python web framework
Ghostscript・・・PostScript interpreter
pypdf・・・PDF python library

Directory Tree

デプロイ時のディレクトリ構成。

editpdf/
├── static/
│   ├── css/
│   │   └── uikit.min.css
│   └── js/
│       ├── uikit-icons.min.js
│       └── uikit.min.js
├── templates/
│   └── main.html
├── work/
└── main.py

Source code

ソースコードを下記に示す。実際の運用版とは細部が異なる。

main.html

<!DOCTYPE html>
<html lang="ja">

<head>
  <meta charset="utf-8">
  <meta name="viewport" content="width=device-width, initial-scale=1">
  <link href="{{ url_for('static';, path='/css/uikit.min.css') }}" rel="stylesheet">
  <script type=text/javascript src="{{ url_for('static', path='/js/uikit.min.js') }}"></script>
  <script type=text/javascript src="{{ url_for('static', path='/js/uikit-icons.min.js') }}"></script>
  <link rel="shortcut icon" href="#">
  <title>Edit PDF Pages - .pdf</title>
  <style>
    .uk-navbar {
      background-color: darkred;
    }
    .uk-navbar * {
      color: white;
    }
  </style>
</head>

<body class="text-center">

  <header>
    <nav class="uk-container uk-navbar">
      <div class="uk-navbar-left">
        <a class="uk-navbar-item uk-logo" href="#">PDF</a>
        <ul class="uk-navbar-nav uk-visible@s"></ul>
      </div>
    </nav>
  </header>

  <main>
    <div class="uk-container uk-width-expand">

      <div class="uk-container">

        <form id="work" class="uk-form">
          <p class="uk-margin-top">Drag and drop the .pdf file to the area below or click the icon to select it</p>

          <div class="js-upload uk-placeholder uk-text-center uk-background-muted uk-padding-small">
            <div class="uk-text-right">
              <div uk-form-custom>
                <input id="inputpdf" type="file" />
                <span class="upload-icon"><span uk-icon='icon: cloud-upload; ratio: 1.2'></span></span>
              </div>
            </div>
          </div>

          <progress id="js-progressbar" class="uk-progress" value="0" max="100" hidden></progress>
          <div id="downloadlink" class="uk-margin-top"></div>
        </form>

        <form id="exec" class="uk-form" action="/editpdf/editpdf/output" method="POST" hidden="hidden">
          <div class="uk-margin-top">
            <p id="outputtext" class="uk-margin-top uk-text-small"></p>
            <table class="uk-table">
              <tbody id="thums">
              </tbody>
            </table>
            <button id="execbutton" class="uk-button uk-button-default" type="submit">EXECUTE</button>
          </div>
        </form>

      </div>
    </div>
  </main>

  <script>

    function setParameter(formid, id, name, value) {
      if (document.getElementById(id) == null) {
        var el = document.createElement('input');
        el.setAttribute('type', 'hidden');
        el.setAttribute('id', id);
        el.setAttribute('name', name);
        el.setAttribute('value', value);
        document.getElementById(formid).appendChild(el);
      } else {
        var el = document.getElementById(id);
        el.setAttribute('name', name);
        el.setAttribute('value', value);
      }
    }

    function display_thum(c) {
      document.getElementById("work").setAttribute('hidden', 'hidden');
      document.getElementById("exec").removeAttribute('hidden');
      var tbody = document.getElementById("thums");
      var files = JSON.parse(c)["files"]
      var thums = files["thums"];
      var thumsp = files["thumsp"];
      for (i = 0; i < thums.length; i++) {
        const bname = thums[i].substring(0, thums[i].lastIndexOf("."));
        var tr = document.createElement("tr");
        var tdi = document.createElement("td");
        var tdf = document.createElement("td");
        tds = "<div class=\"uk-margin uk-grid-small uk-child-width-auto uk-grid\">\n";
        imghtml = "<img class=\"uk-box-shadow-small\" src=\"" + thumsp[i] + "\">\n";
        radiohtml = `
          <label><input id="${bname}_nc" class="uk-radio" type="radio" name="${bname}_r" value="nc" checked> N.C.</label>
          <label><input id="${bname}_cw" class="uk-radio" type="radio" name="${bname}_r" value="cw"> CW 90 deg.</label>
          <label><input id="${bname}_ccw" class="uk-radio" type="radio" name="${bname}_r" value="ccw"> CCW 90 deg.</label>
          <label><input id="${bname}_180" class="uk-radio" type="radio" name="${bname}_r" value="180"> Turn 180 deg.</label>
          <label><input id="${bname}_del" class="uk-radio" type="radio" name="${bname}_r" value="del"> Remove</label>
        `;
        tde = "</div>\n";
        tdi.innerHTML = tds + imghtml + tde;
        tdf.innerHTML = tds + radiohtml + tde;
        tr.appendChild(tdi);
        tr.appendChild(tdf);
        tbody.appendChild(tr);
      }
      setParameter("exec", "infoname", "infoname", files["infoname"]);
    }

    // Disable drag & drop
    window.addEventListener('dragover', function (e) {
      e.preventDefault();
    }, false);
    window.addEventListener('drop', function (e) {
      e.preventDefault();
      e.stopPropagation();
    }, false);

    var bar = document.getElementById('js-progressbar');
    UIkit.upload('.js-upload', {
      url: '/editpdf/editpdf/upload',
      method: 'post',
      multiple: true,
      allow: '*.pdf',
      mime: 'application/pdf',
      name: 'file',
      error: function (e) {
        console.log('Error', arguments, e);
      },
      loadStart: function (e) {
        bar.removeAttribute('hidden');
        bar.max = e.total;
        bar.value = e.loaded;
      },
      progress: function (e) {
        bar.max = e.total;
        bar.value = e.loaded;
      },
      loadEnd: function (e) {
        bar.max = e.total;
        bar.value = e.loaded;
        display_thum(e.target.response)
      },
      completeAll: function (e) {
        setTimeout(function () {
          bar.setAttribute('hidden', 'hidden');
        }, 1000);
      }
    });

  </script>

</body>

</html>

main.py

import os
import socket
import platform
import pypdf
import shutil
import subprocess
import glob
import json
import tempfile
from fastapi import FastAPI, Request, UploadFile, File
from fastapi.responses import HTMLResponse, FileResponse, JSONResponse
from fastapi.staticfiles import StaticFiles
from fastapi.templating import Jinja2Templates

WORK_DIR = r'work';
INFO_FILE = r'info.json'

BASE_DIR = os.path.dirname(os.path.abspath(__file__))

PLATFORM = platform.system()
if PLATFORM == 'Windows':       # windows
    HOSTNAME = socket.gethostname()
    GS_CMD = r'C:Program Filesgsgs9.52bingswin64c.exe'
    DIR_SEP = '\\'
elif PLATFORM == 'Darwin':      # local mac
    HOSTNAME = 'localhost'
    GS_CMD = r'gs'
    DIR_SEP = '/'
elif PLATFORM == 'FreeBSD':     # freebsd
    HOSTNAME = 'yanmos.jpn.org'
    GS_CMD = r'gs'
    DIR_SEP = '/'
elif PLATFORM == 'Linux':       # Linux
    HOSTNAME = socket.gethostname()
    GS_CMD = r'gs'
    DIR_SEP = '/'
else:
    raise('Error: Unknown platform.')

app = FastAPI()
app.mount("/static", StaticFiles(directory="static"), name="static")
templates = Jinja2Templates(directory="templates")

@app.get("/", response_class=JSONResponse)
async def read_root():
    return {"Hello": "World"}

@app.get("/editpdf", response_class=HTMLResponse)
async def start_editpdf(request: Request):
    os.chdir(BASE_DIR)
    return templates.TemplateResponse("main.html", {"request": request})

@app.post("/editpdf/upload", response_class=JSONResponse)
async def upload_file_pdf(file: UploadFile = File(...)):
    os.chdir(BASE_DIR)
    uploadf = file
    files = dict()
    fnames = dict()
    pnames = dict()

    fnames, pnames = makeFilePath(uploadf)
    files['fnames'] = fnames
    if pnames['workFilePath'] != '':
        with open(pnames['workFilePath'], 'wb') as fp:
            fp.write(uploadf.file.read())

        shutil.copy2(pnames['workFilePath'], pnames['outFilePath'])
        thums = getThums(pnames, fnames['inputFilename'])
        files['thums'] = thums
        thumsp = [f"/editpdf/img/{pnames['workDirPath'].split(DIR_SEP)[-1]}/" + th for th in thums]
        files['thumsp'] = thumsp

    infopath = os.path.join(pnames['workDirPath'], INFO_FILE)
    files['infoname'] = os.path.join(fnames['workDirname'], INFO_FILE)
    with open(infopath, "w") as finfo:
        json.dump(files, finfo)

    return {"files": files}

@app.get("/editpdf/img/{dname}/{fname}", response_class=FileResponse)
async def download_img_file(dname, fname, request: Request):
    os.chdir(BASE_DIR)
    fpath = os.path.join(BASE_DIR, WORK_DIR, dname, fname)
    return FileResponse(fpath)

@app.post("/editpdf/output", response_class=FileResponse)
async def rotate_page(request: Request):
    form = await request.form()
    infopath = os.path.join(BASE_DIR, WORK_DIR, form['infoname'])
    with open(infopath, "r") as finfo:
        files = json.load(finfo)

    pnames = reMakeFilePath(files['fnames'])
    os.chdir(pnames['workDirPath'])
    with open(pnames['workFilePath'], "rb") as inpdf:
        pdfr = pypdf.PdfReader(inpdf)
        pdfw = pypdf.PdfWriter()
        for i in range(pdfr.get_num_pages()):
            page = pdfr.get_page(i)
            rname = os.path.splitext(files['thums'][i])[0] + "_r"
            if form[rname] != 'del':
                if form[rname] != 'nc':
                    if form[rname] == 'cw':
                        angle = 90
                    elif form[rname] == 'ccw':
                        angle = 270
                    elif form[rname] == '180':
                        angle = 180
                    else:
                        angle = 0

                    if angle != 0:
                        page.rotate(angle)

                pdfw.add_page(page)

        with open(pnames['outFilePath'], "wb") as outpdf:
            pdfw.write(outpdf)

    return FileResponse(pnames['outFilePath'])

def getThums(pnames:dict, inputFilename:str):
    os.chdir(pnames['workDirPath'])
    pdfreader = pypdf.PdfReader(pnames['workFilePath'])
    npage = len(pdfreader.pages)
    outopt = '-sOutputFile=page_%03d.png'
    cmd = [
        GS_CMD,
        '-q', '-dSAFER', '-dBATCH', '-dNOPAUSE', '-sDEVICE=png256', '-r16',
        outopt,
        inputFilename
    ]
    subprocess.run(cmd)
    fs = sorted(glob.glob('*.png'))
    return fs

def makeFilePath(srcfile):
    if not (srcfile.filename == ''):
        dirName = tempfile.mkdtemp(dir=os.path.join(BASE_DIR, WORK_DIR)).split('/')[-1]
        if PLATFORM == 'Windows':       # windows
            inputFilename = os.path.basename(srcfile.filename)
        elif PLATFORM == 'Darwin':      # mac
            inputFilename = srcfile.filename
        elif PLATFORM == 'FreeBSD':     # freebsd
            inputFilename = os.path.basename(srcfile.filename)
        elif PLATFORM == 'Linux':       # Linux
            inputFilename = os.path.basename(srcfile.filename)
        else:
            raise('Unknown platform error.')

        base, ext = os.path.splitext(inputFilename)
        outputFilename = base + '_ep' + ext
        workDirPath = os.path.join(BASE_DIR, WORK_DIR, dirName)
        workFilePath = os.path.join(workDirPath, inputFilename)
        outFilePath = os.path.join(workDirPath, outputFilename)
        fnames = {
            'workDirname': dirName,
            'inputFilename': inputFilename,
            'outputFilename': outputFilename
        }
        pnames = {
            'workDirPath': workDirPath,
            'workFilePath': workFilePath,
            'outFilePath': outFilePath
        }
    else:
        fnames = None
        pnames = None

    return fnames, pnames

def reMakeFilePath(fnames):
    if fnames != None:
        dirName = fnames['workDirname']
        inputFilename = fnames['inputFilename']
        outputFilename = fnames['outputFilename']
        workDirPath = os.path.join(BASE_DIR, WORK_DIR, dirName)
        workFilePath = os.path.join(workDirPath, inputFilename)
        outFilePath = os.path.join(workDirPath, outputFilename)
        pnames = {
            'workDirPath': workDirPath,
            'workFilePath': workFilePath,
            'outFilePath': outFilePath
        }
    else:
        pnames = None

    return pnames

if __name__ == "__main__":
    pass

2024/05/06 Windows環境での機能不全を修正
2024/05/04 初版

コメントを残す

メールアドレスが公開されることはありません。 が付いている欄は必須項目です