|
楼主 |
发表于 2021-7-10 13:04:29
|
显示全部楼层
前面说的很清楚,KSDRIP不开源,而且其生成的DA3自动从utf-16LE转成了GB2312,丢失了索引和特殊字符,同时也无法解析语音库。7 J+ h" |6 H& I1 Z. o& }) P% {" c
我这个是底层解析,只是说明技术可行性,只是为了好玩,不喜欢可以忽略。, u! {" L. x ^ z A7 o
有了这个源代码,完全可以在任何平台支持金山词霸DIC和ADIC。
( S) w' R$ I- g3 a目前已经解决了国内大部分词典的词库格式解析,包括有道、海笛、欧路、灵格斯、金山词霸、MDICT等等,只剩海笛的语音图片离线库没有解析完成,资料太少,加密比较复杂,等有空好好再研究一下。2 N7 C5 a k7 e* W% I+ R
5 G& w* j L( w& T5 A( a
生成DIC跟解析是两个工程,目前看,120字节的文件头有几个不知道什么意思,我个人没有这个需求,所以抽不出时间。
. f3 H, y8 O+ r9 ^给个文件头自己看看吧:' ^- A: A& Y, [% _
- Option Explicit9 V# i$ d% F' s# H
& u1 {- g4 x0 k/ A, W$ i- '金山词霸DIC词库解析
0 p6 o! I5 o P8 `0 g/ J A/ k - 'Kingsoft PowerWord Dic file format:
1 L: y5 R# I9 u4 H. M" [ - 'Offset 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 h3 b5 u7 a8 B1 Q
- '00000000 4B 53 44 49 70 57 05 00 95 8B 00 00 52 A3 00 00 KSDIpW 晪 R?1 j4 b% I- h$ z, Y! c% L
- '00000016 68 58 22 49 08 00 00 00 01 00 00 00 78 A8 25 00 hX"I x?5 @3 }0 s% c [! T
- '00000032 00 40 00 00 01 00 02 00 04 08 00 00 09 04 00 00 @
* F7 C; f9 t3 j' ~# C - '00000048 04 08 00 00 1D 1E 00 00 20 00 00 00 11 00 00 00& D" N0 {) L5 P' @: s1 k7 A: y
- '00000064 F4 01 00 00 00 00 00 00 78 00 00 00 78 00 00 00 ? x x' F' U a- S4 F$ \2 l
- '00000080 F8 07 00 00 70 08 00 00 F8 3F 00 00 68 48 00 00 ? p ? hH, G4 q- z& l- y0 A: q
- '00000096 E8 F0 00 00 50 39 01 00 D8 CB 01 00 28 05 03 00 桊 P9 厮 (
. s) S2 f9 X! o4 ` - '00000112 50 A3 22 00 00 00 00 00 3C 00 64 00 69 00 63 00 P? < d i c
0 e) X3 v) X; f3 G# T - '每个zlib块解压后都是163843 D7 S% G% d9 w1 d
- Type TCibaDIC
: }9 ~) h$ I3 v - lSign As Long '0x4944534b ie KSDI6 p7 K% ~; `2 y% _
- lFileSize As Long 'file size
' X7 i4 w8 {; O - lFileSize1 As Long' x3 x8 Y# p; V6 p( u
- lFileSize2 As Long
! m! l6 e, }: A4 w5 z - lFileCRC32 As Long 'crc32?
! ^) g: D8 |3 N - lNum1 As Long '8
' A X( p. @* F5 W7 A) w - lNum2 As Long '1
* e2 d2 P: ?, z - lFileSizeOrig As Long 'Original file size of decription( j0 j2 i. l. e4 D, Y" W6 t6 ^
- lBlockSize As Long '0x00004000
L1 M; e2 k, B- r5 g - lNum4 As Long '0x00020001
" j% j: E7 o5 K$ F" j; V' P - lSource_lcid As Long '0x00000804
$ ? K% g4 {4 b4 r1 y' j: z" y - lTarget_lcid As Long '0x00000409
* U# I6 V6 n9 m" r: F - lNum5 As Long '0x00000804
3 }/ Q$ ?9 @7 t# I7 E - lNumWords As Long '0x1e1d! n0 Z. d2 g! @- n0 `+ O, H. d N
- lNum6 As Long '0x20: }8 [ K! A+ O0 t% i% a
- lNum7 As Long '0x118 ]4 t' J+ j, l6 F! s6 u
- lNum8 As Long '0x01f4: q7 W' J3 z( S. Y/ v
- lNum9 As Long '0x008 _1 y* y1 [" j5 \: r. n
- lOffStart As Long '0x78
8 Q; a: @/ k, Q5 o0 P* U. Z - lOffXML As Long '0x78
" |: ^4 a8 I1 F" O; _, G+ E# v - lLenXML As Long '0x07f8
6 J7 H0 K- P3 c# e& } - lOffIdxTable As Long '0x78
* g' T6 J0 F ?1 a - lLenIdxTable As Long '0x78
i' m- _6 l9 u- U - lOffIdxTable1 As Long '0x78
Y! _! E; n2 C- S6 z+ Y - lLenIdxTable1 As Long '0x78
5 }. y& t! ~ K. }9 ?. V7 A - lOffIndexTable As Long '0x78
) c1 c1 B3 o" a1 z* ~ - lLenIndexTable As Long '0x781 {) r* ?$ L- N7 J
- lOffWordsTable As Long '0x78) ~9 o9 a3 C3 s
- lLenWordsTable As Long '0x78
7 W% a, P4 \! f# u3 P9 Q( i2 ? - End Type
8 O5 c8 q) A2 _* K) M
复制代码 |
|