TA的每日心情 | 无聊 2019-1-1 20:10 |
---|
签到天数: 31 天 [LV.5]常住居民I
|
发表于 2018-12-26 10:02:26
|
显示全部楼层
这问题用 XPath 可以不用这么烧脑,用正则表达式则是把简单事情复杂化了,附 python 程序实现,依赖 lxml 库。* d7 b( x; z' E+ x+ m
- + v+ ~* g1 ?9 K5 V1 h5 @; m% P
- #!/usr/bin/env python2
) s& Z1 X% w9 q2 v - # -*- coding: utf-8 -*-
+ K7 t1 Y% e% t. w* D# l, h - """
" z5 a( H& j2 Q/ _+ v3 N% I - File: replace_tilde_with_title.py
+ B% r: G3 T3 ^: V. K+ d3 n* R - Author: zzhirong
. K% z" P% L: K) b* Q - Email: [email protected]
- T2 P; N4 v: L& B - Description: 替换 span 下的 ~ 为 d:entry 的 d:title 属性. Q+ m$ r+ I- F0 m
- """
* y4 m: _7 c. k; t3 x9 V# g! l' }
% L" s2 U2 d& a: l# M0 P9 Z- from lxml import etree6 x3 C& E, x, X0 E2 z% t
9 }2 k$ G# o/ x! I5 u- s = """<?xml version="1.0" encoding="UTF-8"?>
( H0 B# ~# s6 s: B - <d:dictionary xmlns="http://www.w3.org/1999/xml" xmlns:d="http://www.apple.com/DTDs/DictionaryService-1.0.rng">
# T r- S4 Y" P. w' o7 T/ F - <d:entry id="_38ja" d:title="xxx"># H1 i$ ~8 ]5 f. p/ `2 A0 g' o6 ^
- <d:index d:value="steal" d:title="steal"/><span class="hw">steal</span><br/>) ~$ j" `) r' A' {- {2 Y6 Y
- <span class="ex">~ a visit <span class="tag1">(an interview)</span> </span><span class="ex_c">测试<span class="tag1">(测试)</span></span>
5 M/ y& d. _: I# X) y6 d - <span class="ex">~ a kiss </span><span class="ex_c">测试</span>
, v) B& _: j8 V/ d% F - <span class="ex">~ rides on the train </span><span class="ex_c">测试</span>
: q$ w9 V; n0 \! f: v. j3 ^ - </d:entry>
5 Q9 o; W6 c6 c$ @ - </d:dictionary>4 _# M& G. f& d# T% P, R! Y' b& X
- """
: R$ I" l6 {5 n3 \& s( Y( O
6 Z* L ~! g9 J2 n- xml = etree.XML(s)! b* [5 n7 h5 X3 c
- D_NS = xml.nsmap["d"]
) M8 G3 i$ ~, T+ z9 t5 N" R9 H - XML_NS = xml.nsmap[None]( s) @/ e/ D8 x. d
- 4 T" G' E1 D# @+ ?9 ]
- for entry in xml.xpath("//d:entry", namespaces={"d": xml.nsmap["d"]}):
6 t x) F; \& z9 t3 q - title = entry.get("{%s}title" % D_NS, "")& c$ u8 P- v! P+ d# I$ e5 K8 A
- for span in entry.iterfind("./{%s}span" % XML_NS):
- T# o# ?5 y! z" \, P3 ?1 q: Q2 g - span.text = span.text.replace("~", title)
) ?& Q2 j. F o$ G; L - print(etree.tostring(xml))! y8 E4 Z& X. ?) f
复制代码 : T5 Z: Y- E# T$ D2 ~% K& z
|
本帖子中包含更多资源
您需要 登录 才可以下载或查看,没有账号?免费注册
x
|