掌上百科 - PDAWIKI

 找回密码
 免费注册

QQ登录

只需一步,快速开始

查看: 785|回复: 1

[讨论] Crop PDF pages along the spine direction

[复制链接]
  • TA的每日心情
    奋斗
    2018-5-18 05:02
  • 签到天数: 27 天

    [LV.4]偶尔看看III

    发表于 2018-12-15 22:58:03 | 显示全部楼层 |阅读模式
    本帖最后由 GL_n 于 2018-12-16 02:24 编辑
    + W1 F3 y/ ~1 n5 ]( B0 x
    Sometimes we need to split or crop double-arranged PDF’s pages along the spine direction (i.e., the middle line) into two new pages, for example, two A4 book pages scanned into a A3 paper, or two A5  book pages arranged in a A4 paper, and so on. Well, there exist some cutting tools to fill the need of cutting double-up pages. However, this cutting/cropping procedure is not automatic, and the repeated cropping-operation is somewhat tedious.
    1 B3 X0 L8 o! q# v3 s. K
    How to get the repeated cropping-manipulations done automatically? Writing a python program is most likely a good choice for such assignment. The following coding is one of such right examples for working well towards cropping double-arranged PDF’s pages, no matter what the page format is.

    % O0 s* H8 Y1 u% r
    1. #coding=utf-8
        {' g$ Q& s; K. @2 v5 o( g
    2. from PyPDF2 import PdfFileWriter, PdfFileReader$ k8 b3 G! p: z3 l5 S7 o/ \
    3. from copy import copy! ]# A! B$ q5 b& S" O5 z1 }
    4. from os import listdir : a0 g* x7 \, O- k
    5. import math
      + \! i2 P* y9 s! m( e
    6. + h- A6 I! D0 X5 S8 ^$ |$ q
    7. def op(pdfInputFileName):
      9 S" `3 x+ v+ M/ k3 E/ o2 \
    8.     ! z1 m8 K# ?. U% e
    9.     pdfFileObj = open(pdfInputFileName, 'rb')  & X( P  ]3 ?) n0 |" {* q4 [
    10.     pdfReader = PdfFileReader(pdfFileObj)
      - Z, L; B: X* x5 v" P0 m
    11.     pdfWriter = PdfFileWriter()
      7 {  t( q* y* G8 i; u( l; c4 c) [; E
    12. " v0 y2 S$ \. L+ |
    13.     for page in [pdfReader.getPage(i) for i in range(pdfReader.getNumPages())]:+ v) u- K# Z6 Y7 Y) T
    14.         p = page $ w& B3 Z8 o( h- R% R
    15.         q = copy(p) 4 i8 R$ l0 _& l4 k5 e* S
    16.         q.mediaBox = copy(p.mediaBox)
      " Q+ X" j- N" l
    17. 0 s8 B/ N$ A( L: `4 L( t
    18.         x_1, x_2 = p.mediaBox.lowerLeft6 ~3 N4 v7 B0 H. C* [1 E
    19.         x_3, x_4 = p.mediaBox.upperRight
      & A* Q6 A) p3 P  w, l

    20. ) d% V& @& _; r  \
    21.         x_1, x_2 = math.floor(x_1), math.floor(x_2)
      $ u" }8 @4 ?9 A2 Y! L: G
    22.         x_3, x_4 = math.floor(x_3), math.floor(x_4)
      0 F) R4 u- z! q; x, }6 w4 v0 u
    23.         x_5, x_6 = math.floor(x_3/2), math.floor(x_4/2)
        S4 D# Q1 b6 C$ S& \3 r8 A

    24. 6 S: j4 h  ~9 M$ H
    25.         if x_3 < x_4: # If your scanned page is normally presented in Adobe Acrobat this "if" statement can be deleted.
      ) H9 P. f; c3 _; ~
    26.             p = p.rotateClockwise(90)3 |  C; s6 y) @
    27.             q = q.rotateClockwise(90)0 ~% V" e, z- R
    28.             
      0 j/ K. w* z" G* F4 K) R- o

    29. 5 D7 n0 P: L! w* o+ v7 {9 i5 @
    30.          if x_3 > x_4: # For editable page 4 U, r0 y8 ^* ^7 D6 I
    31.                             # vertical cropping along Y-axis(x_5 direction, i.e., cutting X-axis)* Z3 C  \! \' I3 ^

    32. ) p, ]8 l9 u2 a
    33.              p.mediaBox.lowerLeft = (x_1, x_2) # Left part of two-page-rectangle
      4 G  H" c9 S5 O& C7 y0 S& I( _
    34.              p.mediaBox.upperRight = (x_5* 105/100, x_4)
      3 X6 E- I, j; ]* o* r6 K- @
    35. : I: _1 R5 b! q+ s& V
    36.              q.mediaBox.lowerLeft = (x_5* 95/100, x_2)$ F- t1 L4 d: U7 j8 D8 m+ l: i
    37.              q.mediaBox.upperRight = (x_3, x_4) # Right part of two-page-rectangle1 t8 k* u  t- r7 n  B! R( W
    38.             & D3 G7 V& b2 g+ y6 D9 a
    39.          else: # For image page" q" Y+ k7 Z5 j6 y  T
    40.                # vertical cropping along X-axis(x_6 direction, i.e., cutting Y-axis)" m' g/ C+ l5 H/ ?
    41.             
      2 v  m1 b4 Q# ^' l7 L4 R$ p
    42.              p.mediaBox.lowerLeft = (x_1, x_2)* U: S; _3 j# i& T# `2 S7 m/ P( R
    43.              p.mediaBox.upperRight = (x_3, x_6* 105/100) # Left part of two-page-rectangle5 B8 X) @2 F2 M# j% d) q; `

    44. ! P! ~7 Y5 Z2 E- n
    45.              q.mediaBox.lowerLeft = (x_1, x_6* 95/100)
      # `. F2 J7 X' O9 A5 ~
    46.              q.mediaBox.upperRight = (x_3, x_4) # Right part of two-page-rectangle! g# Z, E7 P, Y7 ]0 \3 n+ u
    47. - c& O( I$ o- m7 J
    48.         pdfWriter.addPage(p)
      & j, R6 x7 d6 I  q7 f1 A, M: q* y- K
    49.         pdfWriter.addPage(q)& R5 N+ A1 e/ x+ R0 i
    50. 0 T( F1 }$ ^  c- s! G6 V( U3 m
    51.     pdfOutputFileName = pdfInputFileName[:-4]+'-cut_myself_revised.pdf'
      ; l3 K6 h. W/ I# |" H8 e) O
    52.     pdfOutputFile = open(pdfOutputFileName, 'wb')  
      * I8 P# I# s6 T* w3 I' t
    53.     pdfWriter.write(pdfOutputFile)
      $ Z" ~, c4 e2 Q1 e4 ^8 e$ C
    54.     pdfFileObj.close()
      % ^: ^$ ?9 b$ L1 M
    55.     pdfOutputFile.close()" a3 D* F/ M! n! w- C3 |$ w' k8 A' y3 N' z
    56.    
        _$ X) a9 k, V& U! R
    57. # Accomplish the whole execution of a series of PDF-cropping (both editable pages and image pages) automatically in current directory.  E) w$ U1 p7 E! W" v
    58. for pdfInputFileName in listdir('.'):
      4 @( H- y- d, _: g# C
    59.     if pdfInputFileName[-4:]=='.pdf' or pdfInputFileName[-4:]=='.PDF':7 n) L4 S+ @/ T- \) g+ P
    60.          op(pdfInputFileName)
      ( X% u! i! |- w: ~$ N4 X& c2 C! b
    61. ) |9 }/ Z9 |( F) |' @/ f5 o& }

    62. 1 {, b  M' k9 |! @4 [
    复制代码
    : a! J+ h/ G& O! Y+ u9 `$ @5 z7 i

    / [( I1 R% ?6 w+ v9 i

    该用户从未签到

    发表于 2020-2-3 15:05:30 | 显示全部楼层
    Thanks for your great work.
    您需要登录后才可以回帖 登录 | 免费注册

    本版积分规则

    小黑屋|手机版|Archiver|PDAWIKI |网站地图

    GMT+8, 2024-4-26 13:31 , Processed in 0.038216 second(s), 8 queries , MemCache On.

    Powered by Discuz! X3.4

    Copyright © 2001-2023, Tencent Cloud.

    快速回复 返回顶部 返回列表