本帖最后由 GL_n 于 2018-12-16 02:24 编辑
+ W1 F3 y/ ~1 n5 ]( B0 xSometimes we need to split or crop double-arranged PDF’s pages along the spine direction (i.e., the middle line) into two new pages, for example, two A4 book pages scanned into a A3 paper, or two A5 book pages arranged in a A4 paper, and so on. Well, there exist some cutting tools to fill the need of cutting double-up pages. However, this cutting/cropping procedure is not automatic, and the repeated cropping-operation is somewhat tedious. 1 B3 X0 L8 o! q# v3 s. K
How to get the repeated cropping-manipulations done automatically? Writing a python program is most likely a good choice for such assignment. The following coding is one of such right examples for working well towards cropping double-arranged PDF’s pages, no matter what the page format is.
% O0 s* H8 Y1 u% r- #coding=utf-8
{' g$ Q& s; K. @2 v5 o( g - from PyPDF2 import PdfFileWriter, PdfFileReader$ k8 b3 G! p: z3 l5 S7 o/ \
- from copy import copy! ]# A! B$ q5 b& S" O5 z1 }
- from os import listdir : a0 g* x7 \, O- k
- import math
+ \! i2 P* y9 s! m( e - + h- A6 I! D0 X5 S8 ^$ |$ q
- def op(pdfInputFileName):
9 S" `3 x+ v+ M/ k3 E/ o2 \ - ! z1 m8 K# ?. U% e
- pdfFileObj = open(pdfInputFileName, 'rb') & X( P ]3 ?) n0 |" {* q4 [
- pdfReader = PdfFileReader(pdfFileObj)
- Z, L; B: X* x5 v" P0 m - pdfWriter = PdfFileWriter()
7 { t( q* y* G8 i; u( l; c4 c) [; E - " v0 y2 S$ \. L+ |
- for page in [pdfReader.getPage(i) for i in range(pdfReader.getNumPages())]:+ v) u- K# Z6 Y7 Y) T
- p = page $ w& B3 Z8 o( h- R% R
- q = copy(p) 4 i8 R$ l0 _& l4 k5 e* S
- q.mediaBox = copy(p.mediaBox)
" Q+ X" j- N" l - 0 s8 B/ N$ A( L: `4 L( t
- x_1, x_2 = p.mediaBox.lowerLeft6 ~3 N4 v7 B0 H. C* [1 E
- x_3, x_4 = p.mediaBox.upperRight
& A* Q6 A) p3 P w, l
) d% V& @& _; r \- x_1, x_2 = math.floor(x_1), math.floor(x_2)
$ u" }8 @4 ?9 A2 Y! L: G - x_3, x_4 = math.floor(x_3), math.floor(x_4)
0 F) R4 u- z! q; x, }6 w4 v0 u - x_5, x_6 = math.floor(x_3/2), math.floor(x_4/2)
S4 D# Q1 b6 C$ S& \3 r8 A
6 S: j4 h ~9 M$ H- if x_3 < x_4: # If your scanned page is normally presented in Adobe Acrobat this "if" statement can be deleted.
) H9 P. f; c3 _; ~ - p = p.rotateClockwise(90)3 | C; s6 y) @
- q = q.rotateClockwise(90)0 ~% V" e, z- R
-
0 j/ K. w* z" G* F4 K) R- o
5 D7 n0 P: L! w* o+ v7 {9 i5 @- if x_3 > x_4: # For editable page 4 U, r0 y8 ^* ^7 D6 I
- # vertical cropping along Y-axis(x_5 direction, i.e., cutting X-axis)* Z3 C \! \' I3 ^
) p, ]8 l9 u2 a- p.mediaBox.lowerLeft = (x_1, x_2) # Left part of two-page-rectangle
4 G H" c9 S5 O& C7 y0 S& I( _ - p.mediaBox.upperRight = (x_5* 105/100, x_4)
3 X6 E- I, j; ]* o* r6 K- @ - : I: _1 R5 b! q+ s& V
- q.mediaBox.lowerLeft = (x_5* 95/100, x_2)$ F- t1 L4 d: U7 j8 D8 m+ l: i
- q.mediaBox.upperRight = (x_3, x_4) # Right part of two-page-rectangle1 t8 k* u t- r7 n B! R( W
- & D3 G7 V& b2 g+ y6 D9 a
- else: # For image page" q" Y+ k7 Z5 j6 y T
- # vertical cropping along X-axis(x_6 direction, i.e., cutting Y-axis)" m' g/ C+ l5 H/ ?
-
2 v m1 b4 Q# ^' l7 L4 R$ p - p.mediaBox.lowerLeft = (x_1, x_2)* U: S; _3 j# i& T# `2 S7 m/ P( R
- p.mediaBox.upperRight = (x_3, x_6* 105/100) # Left part of two-page-rectangle5 B8 X) @2 F2 M# j% d) q; `
! P! ~7 Y5 Z2 E- n- q.mediaBox.lowerLeft = (x_1, x_6* 95/100)
# `. F2 J7 X' O9 A5 ~ - q.mediaBox.upperRight = (x_3, x_4) # Right part of two-page-rectangle! g# Z, E7 P, Y7 ]0 \3 n+ u
- - c& O( I$ o- m7 J
- pdfWriter.addPage(p)
& j, R6 x7 d6 I q7 f1 A, M: q* y- K - pdfWriter.addPage(q)& R5 N+ A1 e/ x+ R0 i
- 0 T( F1 }$ ^ c- s! G6 V( U3 m
- pdfOutputFileName = pdfInputFileName[:-4]+'-cut_myself_revised.pdf'
; l3 K6 h. W/ I# |" H8 e) O - pdfOutputFile = open(pdfOutputFileName, 'wb')
* I8 P# I# s6 T* w3 I' t - pdfWriter.write(pdfOutputFile)
$ Z" ~, c4 e2 Q1 e4 ^8 e$ C - pdfFileObj.close()
% ^: ^$ ?9 b$ L1 M - pdfOutputFile.close()" a3 D* F/ M! n! w- C3 |$ w' k8 A' y3 N' z
-
_$ X) a9 k, V& U! R - # Accomplish the whole execution of a series of PDF-cropping (both editable pages and image pages) automatically in current directory. E) w$ U1 p7 E! W" v
- for pdfInputFileName in listdir('.'):
4 @( H- y- d, _: g# C - if pdfInputFileName[-4:]=='.pdf' or pdfInputFileName[-4:]=='.PDF':7 n) L4 S+ @/ T- \) g+ P
- op(pdfInputFileName)
( X% u! i! |- w: ~$ N4 X& c2 C! b - ) |9 }/ Z9 |( F) |' @/ f5 o& }
1 {, b M' k9 |! @4 [
复制代码 : a! J+ h/ G& O! Y+ u9 `$ @5 z7 i
/ [( I1 R% ?6 w+ v9 i |