掌上百科 - PDAWIKI

 找回密码
 免费注册

QQ登录

只需一步,快速开始

查看: 535|回复: 9

[英英] 【圖片辭典】(粗略匹配) The American college dictionary.1953

[复制链接]
  • TA的每日心情
    开心
    2018-8-8 03:13
  • 签到天数: 1 天

    [LV.1]初来乍到

    发表于 2019-12-2 01:08:03 | 显示全部楼层 |阅读模式
    本帖最后由 喬治兄 于 2019-12-2 02:11 编辑
    5 H/ \9 M8 ]9 ]$ h
    ) B5 Q, T+ y1 k有關此辭典的前世今生請參考此帖粗略簡介....
    9 ~0 y8 C& g3 _# v4 R[词典科普] The American College Dictionary; R7 {7 g- [% ?8 q, p6 y
    https://www.pdawiki.com/forum/thread-36850-1-1.html( x2 `. C9 y# H9 ^+ D- p

    $ x3 t" q% h0 ]) ]0 G, W7 K) g1 J' h6 `首先得感謝 Klwo 兄資助此 PDF, 若無他熱心的提供文本, 小弟也無法目賭這已有點年代的辭典- x# X* |- k) i& V& @  n  A7 b0 F
    在匹配過程中選擇何字表來匹配有點掙扎...在幾番考量下選擇了用 World Book Dictionary 的字表來匹配
    - \0 I+ O3 B9 r2 ]4 B5 D1 ]6 F. ]但此辭典若以 Random House Webster's Unabridged Dictionary 的字表來匹配則基本匹配率會提高到 96%~" T+ p# `  T7 z3 |  Q1 ?, K
    之所以沒選用以 RHWUD 來匹配是因 RHWUD 收詞量過多逾 40 萬吧...3 K8 [! L( C( c, d+ [# |
    粗估 The American College Dictionary 收詞量約 10 萬左右  |/ N4 }1 A& M3 O& `
    怕 RHWUD 來匹配 整頁會是一堆沒有的單字占據整個畫面反而影響查詢( V: P% n6 H. Q- G# [: E; w
    雖然完成但感覺不太滿意......或許以後有找到更佳的篩選字表方法再來重配吧
    % k9 Q! {2 Q; u3 ]" k& O2 N# D: N5 v3 F4 O+ T/ w7 I. z
    本想以詞頻作為條件來篩選字表& k& A4 C) m, H5 q; z
    終因其詞頻的分佈非 Normal Distribution 模型且不太適用而作罷.....: y. G6 m1 g" u: w* v$ e. q+ `; B0 u
    或許您有更多的奇思覺得可能可以解小弟之困惑....不仿留個言....謝謝2 e# t9 a# C# P* @! B
    [讨论] 匹配索引量化及隨機抽取量化字表範圍
    , i$ i- _, o  u7 ]1 y) n) x! mhttps://www.pdawiki.com/forum/thread-37305-1-1.html5 z6 Z5 V0 x- i+ f5 ?

    + s6 u- R6 ^8 t2 v【圖片辭典】(粗略匹配) The American college dictionary.1953! z  \! A5 Y+ C- N
    https://share.weiyun.com/5xGcUjW! Y6 B9 |4 f* N% N

    * {; t8 Y8 R2 r

    本帖子中包含更多资源

    您需要 登录 才可以下载或查看,没有帐号?免费注册

    x

    评分

    2

    查看全部评分

    该用户从未签到

    发表于 2019-12-2 08:54:30 | 显示全部楼层
    "没有蜜蜂的勤劳,花海再大也不会有蜜"谢谢乔治兄为知识保存、积累、再制、传授!

    该用户从未签到

    发表于 2019-12-2 16:19:58 | 显示全部楼层
    Thanks alot for efforting

    该用户从未签到

    发表于 2019-12-2 16:42:45 | 显示全部楼层
    Hello my friend..Thanks again for nice working..Can you explain that how convert pdf for mdict in english?! My chiness is really weak!
  • TA的每日心情
    开心
    昨天 11:00
  • 签到天数: 159 天

    [LV.7]常住居民III

    发表于 2019-12-2 17:25:39 来自手机 | 显示全部楼层
    谢谢分享????
  • TA的每日心情
    奋斗
    4 天前
  • 签到天数: 169 天

    [LV.7]常住居民III

    发表于 2019-12-2 20:53:57 | 显示全部楼层
    乔治兄发布的东西都很经典。
  • TA的每日心情
    开心
    2018-8-8 03:13
  • 签到天数: 1 天

    [LV.1]初来乍到

     楼主| 发表于 2019-12-3 19:14:18 | 显示全部楼层
    本帖最后由 喬治兄 于 2019-12-3 19:24 编辑
    , A' G0 N+ s, r/ v7 I5 C
    tarzan1200 发表于 2019-12-2 16:42
    : Q, w5 `) q; vHello my friend..Thanks again for nice working..Can you explain that how convert pdf for mdict in en ...
    # t: t9 G! A0 U3 D1 c/ _( k
    2 m# X+ x, D6 {2 u" l% y
    tarzan1200 :+ |5 Q, P, v6 P0 ~% H3 f
    1. export pdf to png format picture file.
    5 h& c- V, F8 Y9 M$ ~7 L, h* }, |% `- h2. use aabby to crop each page top-left corner word ( each page first word)- L- K6 v& z' @% s' |7 T+ D6 ], L- ^
    3. ocr every word and than create a page index sheet  g3 N; `( a7 R$ U( i
    4. find a suitable wordlist to disburse these words via the page index sheet by excel vlookup function.
    ! y* V+ n( m6 [& y& F5. before your disburse these words you have to give these words some treatment for matching.* U' V9 U8 R" u. r: b5 o
    6. spare a column for  these treatmented words for matching ( page index sheet and wordlist sheet) ref. https://www.pdawiki.com/forum/thread-35890-1-1.html
    $ l, _/ Z. y, i* {3 Q. b- X7. when you perfectly done vlookup the pages than copy the words and page no. to the vba tool to generate the text for mdxbuilder.https://www.pdawiki.com/forum/thread-33574-1-1.html
  • TA的每日心情
    无聊
    昨天 00:00
  • 签到天数: 381 天

    [LV.9]以坛为家II

    发表于 2019-12-3 21:26:09 | 显示全部楼层
    本以为又要错过乔治兄的作品,沒想到 klwo2 有另外上传 .mdd 档。! }& O1 L" X$ A5 S( Y
    如果 .mdx 档不大,可以用一两个附件上传的话,想再麻烦你一下。
    & j* C3 f: p& ]) [' ^: x* _4 y

    该用户从未签到

    发表于 6 天前 | 显示全部楼层
    喬治兄 发表于 2019-12-3 19:14& ]$ j- t9 s5 o, t/ w
    tarzan1200 :+ k6 B# X$ m2 \9 D7 u3 a
    1. export pdf to png format picture file.
    / f  U5 q2 |1 G. e+ Q4 b2. use aabby to crop each page top-left cor ...
    / A7 L) J; b4 {3 i
    Hello again my friend and THANK YOU for perfect descriptions
  • TA的每日心情
    开心
    2018-8-8 03:13
  • 签到天数: 1 天

    [LV.1]初来乍到

     楼主| 发表于 6 天前 | 显示全部楼层
    本帖最后由 喬治兄 于 2019-12-9 17:13 编辑
    / g* r! e3 w4 V8 d) [
    tarzan1200 发表于 2019-12-9 15:50, H7 z: A! D/ m
    Hello again my friend and THANK YOU for perfect descriptions

    & T( S8 j: G& l+ s
    ; ^: ^* b4 m9 P. |) t- s, tWelcome, my friend.  The main point is making the spare column (matching column) to adjust each word for sorting and match the the Dictionary sequence. The Dictionary word's sequence is not follow words in excel sequence. So, you got to replace space with none character,   and replace any not English character with none, such - , . ' /... you got to delete it. and sorting all columns by the matching column and check the page's no. just by substract the prior one, if the results is not 0 or 1, that means could be error, and than check it, find the reason. if you got the logic and familiar with excel sorting technique and proper replace these none English characters, the making of indexes for the dictionary will be successfully.% m5 w! y' z" J" f6 j
    This method is suitable for disbursing the words to it's proper pages, which interval is delimited by each page first word. So you don't need the exactly wordslist and also can pick the right page and locate at the ballpark word.
    : [! M! h% F0 [1 T4 P' i
    您需要登录后才可以回帖 登录 | 免费注册

    本版积分规则

    顶部qrcode底部
    关注公众号送论坛充值码
    关注微信公众平台
    关注微信公众号 pdawiki,获取邀请码,看文抢积分,抽奖得浮云! Follow our Wechat official account "pdawiki", get invitation codes, and play the lottery to earn points (积分)!

    小黑屋|手机版|Archiver|PDAWIKI |网站地图

    GMT+8, 2019-12-15 19:08 , Processed in 0.136700 second(s), 9 queries , MemCache On.

    Powered by Discuz! X3.4

    © 2001-2017 Comsenz Inc.

    快速回复 返回顶部 返回列表