|
发表于 2016-10-13 18:42:44
|
显示全部楼层
lxchen2001 发表于 2016-10-13 17:44, q7 } Z% L) m2 g
我明白你的问题了。你想把文章一句句拆开。
2 q0 H7 f5 |7 g: M$ h+ d; A
% V0 g- I& L5 l/ A4 W8 f1 _网页HTML上文字是放在一起的,经过处理后才成为两个栏位 ...
. W5 w$ M- w) q' M( _& _, \ Q这样应该可以了( ?! r1 R5 l* |, J. X+ V! P( c' M
- import requests
9 j% M9 R' i1 P& Z9 A% ^7 Y% ~4 j Y - from bs4 import BeautifulSoup; o$ U3 F* I5 v* g) n
- r=requests.get('http://www.cuyoo.com/article-30928-1.html')
( R- N3 E) n& E' J - soup=BeautifulSoup(r.text,'lxml')
& F, n* a1 K7 V7 p% p& [6 X3 s - en=soup.find(id='en'): X( W9 l4 Y; h' e8 U0 Y
- enstring=en.strings% w! E* E& l2 F3 ?; J) i# A. x# E
- cn=soup.find(id='cn'): i6 P& D: B, ]5 c
- cnstring=cn.strings% {5 G: i- k$ B7 [, h8 e% e
- file=open('/30928.txt','w',encoding='utf-8'), Q9 J3 F( {, \2 x
- while True:
2 K A$ i! V! i( `$ e1 K0 s - try:
2 c6 Z. Q9 [3 c+ r - ensentence=next(enstring)) U L2 m; v" y" `' f7 y2 W
- #print(ensentence)+ p& N, ^& b& u
- file.write(ensentence)
. N6 l' ]5 w4 m7 H& z - file.write('\n')
/ h8 x% E) Y* ~4 D - cnsentence=next(cnstring)/ o6 u; P/ L- p1 B
- #print(cnsentence)
3 |; _% T% s+ R - file.write(cnsentence)
) d$ z; o! v5 J9 K" l& C j - file.write('\n')8 j9 T4 U+ U" [
- except StopIteration as e:" m8 s. N9 M& o
- print('Finished')
$ w& u% J, |: J4 G1 E - break6 o: b: D& `# z& ?8 n/ Y4 [
- file.close()
复制代码 |
|