词组拆解(拆分词组、提取词组) 脚本
本帖最后由 henices 于 2019-11-19 09:08 编辑看到 @lgmcw 牛9 贴中有这么一段
https://www.pdawiki.com/forum/thread-35803-1-1.html?x=48936
https://s2.ax1x.com/2019/09/29/u8pUDx.png
我以前写过一个脚本,估计年代久远,很多人都不记得了。https://www.pdawiki.com/forum/thread-18376-1-1.html
以图中 get to like/know/understand somebody/something和 get (sombody) somewhere/anywhere/nowhere 为例
➜/tmp cat test.txt
get to like/know/understand somebody/something
get (sombody) somewhere/anywhere/nowhere
➜/tmp python2 test.py test.txt
➜/tmp cat lnk.mdict
get to like somebody
@@@LINK=get to like/know/understand somebody/something
</>
get to like something
@@@LINK=get to like/know/understand somebody/something
</>
get to know somebody
@@@LINK=get to like/know/understand somebody/something
</>
get to know something
@@@LINK=get to like/know/understand somebody/something
</>
get to understand somebody
@@@LINK=get to like/know/understand somebody/something
</>
get to understand something
@@@LINK=get to like/know/understand somebody/something
</>
get (sombody) somewhere
@@@LINK=get (sombody) somewhere/anywhere/nowhere
</>
get (sombody) anywhere
@@@LINK=get (sombody) somewhere/anywhere/nowhere
</>
get (sombody) nowhere
@@@LINK=get (sombody) somewhere/anywhere/nowhere
</>
get (sombody) somewhere 需要继续拆分为, get somebody somewhere 和 get somewhere, 脚本内没有处理。
Gyngreenlie 发表于 2019-11-18 21:12
建议把somebody改成sb,something改为sth,方便检索。
其实我们平时查词的时候,sb和sth基本上没有输入,像get (sombody) somewhere这个短语,我们平时查词的时候,输入的绝大多数是get somewhere(省略了sb) Gyngreenlie 发表于 2019-11-18 21:12
建议把somebody改成sb,something改为sth,方便检索。
ideally, somebody sb 和 什么都不加这三种情况都要添加 你要不在标题里面多加几个关键词,比如拆分词组、提取词组什么的,方便以后大家搜到
帮你加亮了帖子 建议把somebody改成sb,something改为sth,方便检索。{:4_97:} 谢谢楼主 曾经为此需要手工逐个处理的问题头疼很久,原来早就有现成的轮子可供参考了
页:
[1]