TA的每日心情 | 擦汗 2021-11-17 09:18 |
---|
签到天数: 79 天 [LV.6]常住居民II
|
发表于 2021-10-6 09:52:48
|
显示全部楼层
) m+ @2 n: a* g# N' l(3) 在哪里加入- current_path = os.path.dirname(__file__);current_path+"/OALD4_azure.txt
复制代码 6 g' p7 d0 J3 m: D/ y/ X- p
1 d0 R$ R& z1 \1 T* u* ]7 p, g3 V我的版本是 3.9.7 ,目前没有遇到 No such file or directory 报错。 然后是genMDX_ox4.py 文件有部分中文乱码& J& o# t& }9 B" c2 H# P+ K) n( p
& k5 C# y, d- N8 Q- # -*- coding: utf-8 -*-
8 C9 q" I' G | - # encoding=utf8
( G2 w5 n( i, p8 B" I
( e n" r0 B/ v, I0 c/ x! S- from __future__ import unicode_literals,print_function, absolute_import, division7 s2 c8 n1 _% m9 T; ?
1 d) e5 w6 ^! F: D8 [2 h( p
( i% l l; ~7 P( e+ {8 n- import re
* d! Y. f; F0 p$ |/ m1 d# c3 S1 X - import copy
' S+ Z! c: W. ~6 j+ @ - import chardet; \ \8 A% X9 D
- ; H2 G' M, }& T4 J
- import os
7 A3 O/ J" a+ [- Z' `; H7 t) a - import io
+ |+ B1 O* V7 k7 O `% J$ i( z - import sys: h) x; z9 g* J$ |
- # reload(sys)2 D, N I9 u& M( F0 ^* k! b- g: b! o
- # sys.setdefaultencoding('utf-8')- `6 l1 h/ h4 M9 g& N
: g) ?0 _% J9 l9 g# K% W# O) ?8 m- import collections! ]8 u( t' h4 E7 I+ x! m8 O( N
- from collections import defaultdict
# b$ c9 H m" x' U
+ T; k ^& Q h; L3 G- : X+ h+ M8 G& Z. [3 w
- from writemdict import MDictWriter, encrypt_key
. J, S1 m2 y9 L% S* f - from ripemd128 import ripemd1284 Q2 _$ `- p# j7 F1 u& G/ d
; }! D. N7 l8 T' K2 t- * q/ g3 t' T8 M) f2 }
- head = 00 m2 s1 p9 _ l3 s) j+ B. C V
- new_mean =[]
/ W4 e, H& ?+ U8 x1 B - f=io.open('OALD4_azure.txt', 'r',encoding='utf-8')
3 p1 `. O; m. i) S( E - #f=io.open('oxford2_original.txt', 'r',encoding='utf-8')# [, Z1 Q G# t6 M
- d = defaultdict(list) #����һ�����ֵ䣬Ҳ��ʹ��{}������6 l8 g- }. ]) O# }2 h, P$ X9 i
- for line in f: #ÿ�δ�f�ж���һ��
d. e0 L8 J3 m/ | - line=line.rstrip('\n')#ȥ����β�Ļ��з�
. h9 C- }7 x7 F9 l5 \ M# F - if line == '</>':
2 C0 v" Q! X& J+ V* i5 M - if head == 2:
3 X& E, ~" S1 O/ W6 s - new_mean[0:] = ["".join(new_mean[0:])]" B2 o# g3 X6 r
- d[word].append(new_mean[0])
# n, I+ N' W: a/ r3 }( q - head = 1;/ v" Z0 H( b' u) z
- new_mean =[]: g1 y8 X2 R3 T" Q4 M( d8 K P2 Y
- elif head == 1:0 E2 w0 T$ [8 m( V
- word = line% Z/ y4 x) X* ]% F0 {2 z) Z- k% T
- head = 2: P% m M" J$ F$ R8 v- U6 I: ^) [
- elif head == 2:4 l. a. z B/ w4 E, f1 S' z
- new_mean.append(line)- c, U \+ y9 w0 n
- head = 25 K3 q5 R* @% A0 g) X X4 P
- f.close()
- R3 a: g u6 e3 n6 B - & Y- v+ W* A q0 L5 a6 g7 B- b
7 t- U% }* y4 J- _- ff=io.open('about_OX4.txt', 'r',encoding='utf-8')#�ʵ�about��Ϣ��txt�ļ��뱣��Ϊutf-8
/ `' I4 ^* t/ @4 G0 h - about=[]8 Z- M/ O. o/ z3 M) `
- for line in ff: #ÿ�δ�f�ж���һ��& z+ m8 c1 S5 t. }' A
- about.append(line)
; B8 C J, t, g) O# g9 O2 o+ P - about[0:] = ["".join(about[0:])]- N& P4 l; x5 [& p" t
- ! k4 m4 _- @' ]- t; \
% I' y$ A; C0 T. |- #outfile = open("example_output/��ţ��Beta_V2.2.1.mdx", "wb")2 @1 x+ I I. S; m6 S. Q
- #writer = MDictWriter(d, "��ţ��Beta_V2.2.1", about[0])
! l( Z# ^' g8 R& i - outfile = open("output_ox4/OALD4_Ex.mdx", "wb")* l7 y5 ]: O1 s$ B3 Q" S4 f
- writer = MDictWriter(d, "ţ��߽�˫��(���İ�)", about[0])8 U$ y; x* W: ^# _7 I! G- e. m
- writer.write(outfile)
A- H/ [' J" Z5 c$ ?0 z - outfile.close()- ?& H: F* D2 j+ @" A6 q! w& z
- y& e$ g4 _4 V& q
复制代码 + ~9 R* c- E% P$ V# E
+ D5 Q6 B' i; F: Q3 _; h; c( i' {3 u. h; Z4 X
是否可以看看你的文档呢 乱码的部分中文写的是什么? |
|