TA的每日心情 | 无聊 2019-1-1 20:10 |
---|
签到天数: 31 天 [LV.5]常住居民I
|
发表于 2018-12-26 10:02:26
|
显示全部楼层
这问题用 XPath 可以不用这么烧脑,用正则表达式则是把简单事情复杂化了,附 python 程序实现,依赖 lxml 库。, P. I9 ]. V' l! J& ^& h
# m' t( {+ u3 f; R1 O" R- #!/usr/bin/env python23 m' e- K/ q3 h2 j# n
- # -*- coding: utf-8 -*-
7 e- z: i$ B( i @8 L - """
6 U- D D. R! Z. |7 Z) T A - File: replace_tilde_with_title.py
' M; [! m9 q2 n6 [$ \ - Author: zzhirong8 c2 {+ t7 `7 ^0 g% y! k
- Email: [email protected]' u! b& B: B3 P# A# C5 y
- Description: 替换 span 下的 ~ 为 d:entry 的 d:title 属性: U5 O5 F( K3 @# }
- """
& S6 K+ \5 m3 `, A! F( j0 a/ [( Q( Q
2 v) G. B" X9 ^4 p- from lxml import etree' k1 t2 b0 n& X/ d2 @
0 w& V: ~; F4 q; c- s = """<?xml version="1.0" encoding="UTF-8"?>$ i. `5 y8 U0 x/ }0 `, ^' ^
- <d:dictionary xmlns="http://www.w3.org/1999/xml" xmlns:d="http://www.apple.com/DTDs/DictionaryService-1.0.rng">
" ` b7 [ q9 I. S' p s' `, L - <d:entry id="_38ja" d:title="xxx">4 J% L' \# B& j& H- `
- <d:index d:value="steal" d:title="steal"/><span class="hw">steal</span><br/>
% S( j; _# p' l' E9 P - <span class="ex">~ a visit <span class="tag1">(an interview)</span> </span><span class="ex_c">测试<span class="tag1">(测试)</span></span>
, M) {% Z1 \/ y0 Q/ f- X - <span class="ex">~ a kiss </span><span class="ex_c">测试</span>3 S. ~0 R+ T% @8 Q4 w3 K
- <span class="ex">~ rides on the train </span><span class="ex_c">测试</span>
6 p; m1 U6 K3 S0 p* v9 z% a+ D - </d:entry>' y# T- R8 {* d0 T/ W; F. P! L" L' C l g
- </d:dictionary>1 t; I1 `+ w3 ]! Y! Z" H M
- """& t; Z ]7 s) Y8 e7 O' i1 e! N4 O6 d
2 _" r; x6 c( V- W& o0 X- xml = etree.XML(s)
% y! ^8 W( Y7 |; W8 ^ - D_NS = xml.nsmap["d"]
0 L3 Q& c: m$ ?1 D4 M - XML_NS = xml.nsmap[None]
/ t& h# U6 l2 u5 ?& J
5 z2 H0 V3 k" q# `4 S' Z) Z* ?: ` a- for entry in xml.xpath("//d:entry", namespaces={"d": xml.nsmap["d"]}):8 \% w3 a7 M2 O r: T
- title = entry.get("{%s}title" % D_NS, "")4 e$ e9 q$ r( r" ^% e9 m
- for span in entry.iterfind("./{%s}span" % XML_NS):
M# q+ h2 l. y8 b* x - span.text = span.text.replace("~", title)$ ?9 e+ Z9 B r, {6 j
- print(etree.tostring(xml)): }% ?5 ]1 j7 T0 h1 d5 h% a( o. w
复制代码
~! A1 | D- M7 S |
本帖子中包含更多资源
您需要 登录 才可以下载或查看,没有账号?免费注册
x
|