|

楼主 |
发表于 2021-7-10 13:04:29
|
显示全部楼层
前面说的很清楚,KSDRIP不开源,而且其生成的DA3自动从utf-16LE转成了GB2312,丢失了索引和特殊字符,同时也无法解析语音库。
0 s$ c; ]8 x( |' a& k) y& C我这个是底层解析,只是说明技术可行性,只是为了好玩,不喜欢可以忽略。
) W% S: z1 u( u% [; f有了这个源代码,完全可以在任何平台支持金山词霸DIC和ADIC。
/ ?5 e8 L: a7 k: U目前已经解决了国内大部分词典的词库格式解析,包括有道、海笛、欧路、灵格斯、金山词霸、MDICT等等,只剩海笛的语音图片离线库没有解析完成,资料太少,加密比较复杂,等有空好好再研究一下。1 W% S3 d2 O4 I
* w/ B6 E. @% H# Q8 Z8 m1 J生成DIC跟解析是两个工程,目前看,120字节的文件头有几个不知道什么意思,我个人没有这个需求,所以抽不出时间。
6 R# W' T: J/ t" d& s! u给个文件头自己看看吧:
. |, Z+ U( i7 H( }. [, g' ?- Option Explicit
5 _$ l% p( j. V8 N/ H9 {( a) u
# B/ s* ?( d$ x* r5 V2 i8 ^$ ~- '金山词霸DIC词库解析/ r" K4 r# \9 W6 E. d& s
- 'Kingsoft PowerWord Dic file format:
) h3 W' P, l* }4 @% h - 'Offset 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15! q! q, r% v& S$ t. \, e
- '00000000 4B 53 44 49 70 57 05 00 95 8B 00 00 52 A3 00 00 KSDIpW 晪 R?( j6 `2 s1 @3 S% w
- '00000016 68 58 22 49 08 00 00 00 01 00 00 00 78 A8 25 00 hX"I x? U& A }/ [2 G9 e4 e
- '00000032 00 40 00 00 01 00 02 00 04 08 00 00 09 04 00 00 @
+ _: d5 W% s' O7 a" l" J - '00000048 04 08 00 00 1D 1E 00 00 20 00 00 00 11 00 00 003 B9 S1 @; ?2 U# w( M
- '00000064 F4 01 00 00 00 00 00 00 78 00 00 00 78 00 00 00 ? x x
9 v4 b( i* s/ y. T, ]% N - '00000080 F8 07 00 00 70 08 00 00 F8 3F 00 00 68 48 00 00 ? p ? hH! {% I" S# h- C! n
- '00000096 E8 F0 00 00 50 39 01 00 D8 CB 01 00 28 05 03 00 桊 P9 厮 (
) P- v |" f- s0 m0 C; z j- z - '00000112 50 A3 22 00 00 00 00 00 3C 00 64 00 69 00 63 00 P? < d i c; v" z) K' c& S* k! C8 A1 \
- '每个zlib块解压后都是16384
7 L5 ?( @9 u, b7 R6 k - Type TCibaDIC0 N& [4 j3 E" N! Y4 N$ X9 J
- lSign As Long '0x4944534b ie KSDI/ m% x6 U8 H. c8 H6 D: w
- lFileSize As Long 'file size
, o6 f% |! R2 Y - lFileSize1 As Long
9 n! S& ?" r; q; b& @0 t+ ?6 ~! v - lFileSize2 As Long
- z% e8 s7 A( M A - lFileCRC32 As Long 'crc32?
7 _5 `+ U1 f1 M6 [ - lNum1 As Long '8- n# m0 s9 U# }+ P* e
- lNum2 As Long '1" {$ ]( j1 W2 j
- lFileSizeOrig As Long 'Original file size of decription2 [* {5 I; H9 R4 v
- lBlockSize As Long '0x00004000: E- V5 T9 ~5 \( x
- lNum4 As Long '0x000200018 q, U. g' V/ F8 P. g$ g
- lSource_lcid As Long '0x00000804
+ g: ~& z# m: x: a - lTarget_lcid As Long '0x00000409
7 l8 ~( n% @, z - lNum5 As Long '0x000008043 T$ @: a* `! k1 @
- lNumWords As Long '0x1e1d
7 x k; A" ^# d: U" t# a1 a/ W% u% x - lNum6 As Long '0x20
6 p+ W- |& H5 d, D - lNum7 As Long '0x11
: u4 P X+ [; Q" J - lNum8 As Long '0x01f4
0 M/ q8 F8 y6 j0 k" l - lNum9 As Long '0x00. Q% Z2 E8 P: m, g
- lOffStart As Long '0x784 h0 @) ?9 |3 J+ @- p+ \
- lOffXML As Long '0x78
. p# C" D& G7 {" t$ q - lLenXML As Long '0x07f8
- x! D! @8 ]) X& P0 B5 b - lOffIdxTable As Long '0x78
" O }/ T( H2 H - lLenIdxTable As Long '0x78& p7 @" W' q0 Y' o
- lOffIdxTable1 As Long '0x78% h( {: D% Y% A/ C# |& {( z
- lLenIdxTable1 As Long '0x78
2 x8 N9 C. L, U+ b* O1 ?# B$ [ - lOffIndexTable As Long '0x78
" Z+ c! [; A: ^" B - lLenIndexTable As Long '0x78' |% D9 U l6 u, `$ k9 L8 a; ~5 C
- lOffWordsTable As Long '0x78; }, `) j% O: T6 }3 z
- lLenWordsTable As Long '0x78
3 D" l& W. c* X( r# M6 a - End Type$ Q' n2 A- m9 T) o0 j3 n; e0 X
复制代码 |
|