|

楼主 |
发表于 2021-7-10 13:04:29
|
显示全部楼层
前面说的很清楚,KSDRIP不开源,而且其生成的DA3自动从utf-16LE转成了GB2312,丢失了索引和特殊字符,同时也无法解析语音库。3 t; S& x) r* Y4 I+ j& j
我这个是底层解析,只是说明技术可行性,只是为了好玩,不喜欢可以忽略。
8 f5 M( M0 V/ F9 h0 R1 ^! L- e& w有了这个源代码,完全可以在任何平台支持金山词霸DIC和ADIC。
5 x8 C" v. o# n9 Y' w目前已经解决了国内大部分词典的词库格式解析,包括有道、海笛、欧路、灵格斯、金山词霸、MDICT等等,只剩海笛的语音图片离线库没有解析完成,资料太少,加密比较复杂,等有空好好再研究一下。
1 ~& G' C, k4 g2 p1 p# }3 [/ z% q4 I" y/ i8 A
生成DIC跟解析是两个工程,目前看,120字节的文件头有几个不知道什么意思,我个人没有这个需求,所以抽不出时间。% u' h; l" J! o, q. m/ A& K
给个文件头自己看看吧:4 E5 S( h. n6 B: x( c2 T" N
- Option Explicit
6 Y( v2 U/ e: V9 P! ]+ O% `9 T
) X& s( S; T- H: j0 L$ d6 U- '金山词霸DIC词库解析) o" |- U0 f% G8 g+ ~
- 'Kingsoft PowerWord Dic file format:
9 z- c0 H9 T+ G o1 d. ]6 l, f - 'Offset 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
9 {9 }" l+ T8 d7 e$ ? - '00000000 4B 53 44 49 70 57 05 00 95 8B 00 00 52 A3 00 00 KSDIpW 晪 R?& E* W" @, D; G$ F) j% e
- '00000016 68 58 22 49 08 00 00 00 01 00 00 00 78 A8 25 00 hX"I x?
% b8 y( m) N2 Y( `% ]& e& I" H - '00000032 00 40 00 00 01 00 02 00 04 08 00 00 09 04 00 00 @
6 r+ y e& I! P - '00000048 04 08 00 00 1D 1E 00 00 20 00 00 00 11 00 00 004 c4 f2 B% U: \& Y) V: y$ X% i
- '00000064 F4 01 00 00 00 00 00 00 78 00 00 00 78 00 00 00 ? x x
1 H. H9 Z3 T1 I# y' v8 r( T, K9 V - '00000080 F8 07 00 00 70 08 00 00 F8 3F 00 00 68 48 00 00 ? p ? hH
' b- z9 H4 L* l2 _, |; n8 v$ v5 M - '00000096 E8 F0 00 00 50 39 01 00 D8 CB 01 00 28 05 03 00 桊 P9 厮 (2 W; ?" w7 N( ?; Z1 L6 [- Z
- '00000112 50 A3 22 00 00 00 00 00 3C 00 64 00 69 00 63 00 P? < d i c4 c9 b7 b. C- ? l+ a" ?5 j* I
- '每个zlib块解压后都是16384
; V% i: e: j% Q4 S - Type TCibaDIC
; r1 ]4 |0 @1 ] - lSign As Long '0x4944534b ie KSDI& f" O6 |+ z8 ^5 O0 u
- lFileSize As Long 'file size
4 Z1 z: z. i. z z - lFileSize1 As Long
: l0 v1 }) @5 `; |; _ - lFileSize2 As Long
4 H0 t" q x' Y k& v8 I" w7 M. P - lFileCRC32 As Long 'crc32?
% v2 N$ i' F5 i+ O0 d; p3 M$ M - lNum1 As Long '8
5 L: Z0 u2 W7 f9 }- i - lNum2 As Long '1
, y% q/ O$ Q& j; l0 P5 q3 o* T$ @ - lFileSizeOrig As Long 'Original file size of decription
& ~; O; e. n; W) `2 G9 O- x - lBlockSize As Long '0x00004000
' T/ j+ P! _1 P8 k# }. @ - lNum4 As Long '0x00020001
j" n: G' _; Z) m. U: Q - lSource_lcid As Long '0x000008046 }! A. E& `% W) E T3 ]3 N
- lTarget_lcid As Long '0x00000409
3 b4 U d2 r; f' x$ _8 r9 a - lNum5 As Long '0x00000804% S8 p% n4 X* \" F& D+ J
- lNumWords As Long '0x1e1d
; a+ E5 ?3 Z- S/ C) g - lNum6 As Long '0x20) A" b) O" D1 U. a @
- lNum7 As Long '0x115 H$ \2 w" U; b- D& j
- lNum8 As Long '0x01f4
h. A- @, M/ r& v - lNum9 As Long '0x003 |5 e7 K, ^$ k/ _$ r0 c2 R
- lOffStart As Long '0x78; m/ X0 ?& Z' j, ^( H1 a7 \: l& _
- lOffXML As Long '0x78
- @. }4 L5 u* z) m1 ~7 r. e* w - lLenXML As Long '0x07f8" ?7 Z* k5 u! ]: h0 L' Q
- lOffIdxTable As Long '0x78& y. Z) [8 }! }# Y
- lLenIdxTable As Long '0x78
" W9 w) I0 ?5 r4 U' D - lOffIdxTable1 As Long '0x78
6 W7 ]1 ?4 Q) | - lLenIdxTable1 As Long '0x78
9 I8 F0 I0 m1 [' F! M- T - lOffIndexTable As Long '0x78; x t' P3 U, V9 J2 |% A
- lLenIndexTable As Long '0x78
1 C2 l' p0 x1 V/ A B& N - lOffWordsTable As Long '0x78
# A: k! o, ^0 E" \$ v5 }# K - lLenWordsTable As Long '0x78
5 j) [- m5 D9 g$ J7 i - End Type
& f7 ]0 t& [" g8 k5 J
复制代码 |
|