TA的每日心情 | 无聊 2019-1-1 20:10 |
---|
签到天数: 31 天 [LV.5]常住居民I
|
发表于 2018-12-26 10:02:26
|
显示全部楼层
这问题用 XPath 可以不用这么烧脑,用正则表达式则是把简单事情复杂化了,附 python 程序实现,依赖 lxml 库。" }) I8 {+ t ^
1 h8 }8 B2 Z. Y2 ?8 k: M- #!/usr/bin/env python2
$ s, _7 q" i0 X8 |8 m8 a - # -*- coding: utf-8 -*-" F# Z9 N2 S! C& M
- """
# H3 G" T& _9 e) J - File: replace_tilde_with_title.py7 k2 L% k( q% F0 h, Y8 k! m6 ]' z! A
- Author: zzhirong
1 k' A; U9 N; f4 Q" r+ P - Email: [email protected]
* r- K f1 E+ m7 a Q - Description: 替换 span 下的 ~ 为 d:entry 的 d:title 属性
, Q6 W4 L0 M% ?- x+ i2 H. I" |7 { - """0 T I! u0 V$ ?
* Z8 o% Z* x7 y- from lxml import etree
! D1 l3 ]% q5 A. r6 r# \& Q - + |/ C% N; `1 Q! d! l4 [7 w' }) B* e% N
- s = """<?xml version="1.0" encoding="UTF-8"?>
$ M# W" ?3 [9 q* g5 [ - <d:dictionary xmlns="http://www.w3.org/1999/xml" xmlns:d="http://www.apple.com/DTDs/DictionaryService-1.0.rng">: k/ s( Q& ^7 x( \
- <d:entry id="_38ja" d:title="xxx">
" D m6 C3 S+ I) A5 r - <d:index d:value="steal" d:title="steal"/><span class="hw">steal</span><br/>
% e; R. }& x9 q, H - <span class="ex">~ a visit <span class="tag1">(an interview)</span> </span><span class="ex_c">测试<span class="tag1">(测试)</span></span>
H0 ^: Y# X) W0 f* Y - <span class="ex">~ a kiss </span><span class="ex_c">测试</span>- c1 p m N' E
- <span class="ex">~ rides on the train </span><span class="ex_c">测试</span>! R/ M1 p1 I9 I- I$ p6 G
- </d:entry>
, c* C# } z/ ~, _! k& x+ A& y - </d:dictionary>
! A) K; f! ]9 C* k - """
7 O7 `8 r* b5 j% q" Z! h
$ v2 o" p: z+ b$ c5 a- xml = etree.XML(s) ^5 {2 `' o' w* }/ V" e H
- D_NS = xml.nsmap["d"]7 u. C @# A% {, ?$ y
- XML_NS = xml.nsmap[None]
5 W. w2 a' Y8 c7 Y3 ?
( W7 W4 I7 e1 p7 F' m- for entry in xml.xpath("//d:entry", namespaces={"d": xml.nsmap["d"]}):
- N: U7 x2 A9 h - title = entry.get("{%s}title" % D_NS, "")- e: A6 L4 E/ I
- for span in entry.iterfind("./{%s}span" % XML_NS):3 G& T* a! ~8 m& R1 E/ A7 R5 ]
- span.text = span.text.replace("~", title)8 y/ e, h5 O, F: {
- print(etree.tostring(xml))& X2 j% L$ j1 T: t
复制代码 - W) g/ e2 O. _+ P0 N/ n. y
|
本帖子中包含更多资源
您需要 登录 才可以下载或查看,没有账号?免费注册
x
|