henices 发表于 2019-11-18 16:38:13

词组拆解(拆分词组、提取词组) 脚本

本帖最后由 henices 于 2019-11-19 09:08 编辑

看到 @lgmcw 牛9 贴中有这么一段
https://www.pdawiki.com/forum/thread-35803-1-1.html?x=48936

https://s2.ax1x.com/2019/09/29/u8pUDx.png


我以前写过一个脚本,估计年代久远,很多人都不记得了。https://www.pdawiki.com/forum/thread-18376-1-1.html

以图中 get to like/know/understand somebody/something和 get (sombody) somewhere/anywhere/nowhere 为例



➜/tmp cat test.txt

get to like/know/understand somebody/something
get (sombody) somewhere/anywhere/nowhere

➜/tmp python2 test.py test.txt

➜/tmp cat lnk.mdict

get to like somebody
@@@LINK=get to like/know/understand somebody/something
</>
get to like something
@@@LINK=get to like/know/understand somebody/something
</>
get to know somebody
@@@LINK=get to like/know/understand somebody/something
</>
get to know something
@@@LINK=get to like/know/understand somebody/something
</>
get to understand somebody
@@@LINK=get to like/know/understand somebody/something
</>
get to understand something
@@@LINK=get to like/know/understand somebody/something
</>
get (sombody) somewhere
@@@LINK=get (sombody) somewhere/anywhere/nowhere
</>
get (sombody) anywhere
@@@LINK=get (sombody) somewhere/anywhere/nowhere
</>
get (sombody) nowhere
@@@LINK=get (sombody) somewhere/anywhere/nowhere
</>



get (sombody) somewhere 需要继续拆分为, get somebody somewhere 和 get somewhere, 脚本内没有处理。










kyletruman 发表于 2021-4-3 14:08:14

Gyngreenlie 发表于 2019-11-18 21:12
建议把somebody改成sb,something改为sth,方便检索。

其实我们平时查词的时候,sb和sth基本上没有输入,像get (sombody) somewhere这个短语,我们平时查词的时候,输入的绝大多数是get somewhere(省略了sb)

lgmcw 发表于 2019-11-19 17:52:11

Gyngreenlie 发表于 2019-11-18 21:12
建议把somebody改成sb,something改为sth,方便检索。
ideally, somebody sb 和 什么都不加这三种情况都要添加

klwo2 发表于 2019-11-18 17:32:45

你要不在标题里面多加几个关键词,比如拆分词组、提取词组什么的,方便以后大家搜到

帮你加亮了帖子

Gyngreenlie 发表于 2019-11-18 21:12:06

建议把somebody改成sb,something改为sth,方便检索。{:4_97:}

careykwok1 发表于 2020-1-3 16:19:46

谢谢楼主

wjl 发表于 2021-8-9 14:14:59

曾经为此需要手工逐个处理的问题头疼很久,原来早就有现成的轮子可供参考了
页: [1]
查看完整版本: 词组拆解(拆分词组、提取词组) 脚本