TA的每日心情 | 开心 2019-8-21 08:44 |
---|
签到天数: 163 天 [LV.7]常住居民III
|
发表于 2018-11-16 21:27:49
|
显示全部楼层
- '''2 E$ [1 n5 \. @/ ?' N
- Based on xmllarge.py+ s& c& E& ^$ C! B( ]7 [. a# [
- '''/ ]! v, W6 Q U. _
- # from pyquery import PyQuery as pq
9 C% M" E; o: d; Z3 ^- y; | - from pathlib import Path
) R1 F4 l- H5 x0 y1 H q: g# K - 1 c) k0 V/ J1 ^2 t2 H) o0 w
0 f; f: ^$ i" @: i) u* a, S- def xml_iter(file, tag):
) E, |, _* [$ u {, O) r - '''
: p( u: M' r1 C; h5 ^ - Process huge xml files
- C; ~# J' @2 N1 R: a& W5 z - <tag> </tag> need to be in separate lines4 u3 M3 p, ?# R2 T
- # TODO: in the middle of lines
3 a, J; F/ q' n4 Z. y - . v( Y0 x: g- E$ Z" k- f, D+ k
- :file: file path2 s; y( S7 Z! c, a$ V! G
- :tag: element to retrieve- ~8 ]0 B5 ^( y4 v. |! \6 F/ M
- '''
. i, P2 N8 ~) U% v4 X: Y4 w! d( o - tagb1 = '<' + tag + '>'* l6 J6 B: b4 d. I% x9 T+ A1 ]4 U
- tagb1 = tagb1.encode() |3 x; R0 X4 K+ R1 V) P
- 5 y$ A8 r4 `1 X. V, K+ Z; p
- 0 B2 @) x* a3 M0 |# g$ w9 t- H4 _( Z7 ?
- tagb2 = '<' + tag + ' ') V w5 d5 F7 L' m
- tagb2 = tagb2.encode()
+ c2 x" J8 ~% H5 d& i- f% d - " ], g0 d5 D: X0 o4 r3 |, I
- tagb3 = '</' + tag + '>'$ o6 p6 R7 ]9 D2 l
- tagb3 = tagb3.encode()
$ Z4 n, ~: D4 b' {2 {! F2 z1 s' Y - % p: ^* V- H+ ]/ k3 j4 W
- with open(file, 'rb') as inputfile:
W& Y% \0 d; o3 [% Q& k$ R - append = False
% w# G6 x3 q( W - for line in inputfile:
; w+ N- [: e5 d# J/ f2 j7 f. C - #~ if b'<tu>' in line or b'<tu ' in line:& h% Y( i: e5 I
- if tagb1 in line:5 c1 p/ I3 W. X w
- inputbuffer = line[line.index(tagb1):]
]0 B- Y" W+ g5 ]* _* b - append = True
7 G8 |: W6 j; h/ c9 u1 t - elif tagb2 in line:9 Q& @1 v8 k& d3 t3 Q
- inputbuffer = line[line.index(tagb2):]' r" S/ \& W) y5 N3 _
- append = True8 p6 @: }# k- X
- #~ elif b'</tu>' in line:
0 D) A7 S7 ?, y7 v5 q( Q: @ - elif tagb3 in line:/ h$ F; L; u( Q6 m+ V: C
- inputbuffer += line[:line.index(tagb3) + len(tagb3)] f( B, V: @3 s$ [7 s- d! j; I
- append = False
* E- y: n/ q2 R1 o% [6 G - yield inputbuffer
' m+ r$ [7 a9 m2 w. n$ S( g9 ` - #~ docitem = process_buffer(inputbuffer, id_num)) Z( F1 S+ o2 }) i8 Q: C: f+ v/ V6 l
- #~ print(id_num)
" W( n5 P, Q+ T/ w# ^ - #~ id_num += 1
- B" k2 I- J8 k! ? - inputbuffer = b''
8 ^. M5 r* @! C& P: K) c, ] - elif append:! N7 z8 L! t4 V9 w+ J4 E, m/ Q/ w
- inputbuffer += line
复制代码
) |1 t+ M! T* K0 x0 ^* l5 {" u
+ @, r, |2 m" _7 N& v& H这么多人找这东西?我过一阵打包发个小工具。
; t0 L$ H4 e N/ Z- g
- h# P5 F7 L8 }+ A上面的python3函数用法! k+ l6 A0 A4 ^1 w
resu = ''
/ e1 g) D& O0 l' H- Q% xfor elm in xml_iter(filename, 'tu'):
& b4 ^/ {# F; a" m3 m& y resu += elm
3 h7 Q- X7 k; z$ N3 x* i, {* m; f4 S4 ^5 A
内存足迹极小……不管文件多大。 |
|