|
发表于 2017-6-4 20:46:11
|
显示全部楼层
本帖最后由 skywind3000 于 2017-6-4 20:51 编辑 " a3 }9 O ]3 D% y0 M" R" d8 y
4 N; a2 f# U5 n5 u' J
光靠正则搞不定,你需要 Lemma List,就是一个每个单词有哪些变形的对应表格,比如:5 N5 P7 N" J* {7 U
! o" d9 x# y( O2 I$ ^) I- be/4109826 -> is,was,are,were,'s,been,being,'re,'m,am,m) O& }/ F8 H4 @9 s- ^! X" Z
- have/1315648 -> had,has,'ve,having,'s,'d,of,d,ve1 t# {0 u. d9 s
- it/1213224 -> its,they
. I- n, w4 H% U2 D - he/1196022 -> his,him,they
) L$ b2 [$ g8 F: p3 W. v - i/1133697 -> my,me,we,is# l$ J. x3 k' Q* U% {7 d1 c
- they/841960 -> their,them,'em
+ r/ k! q/ k+ u. ^ - you/804279 -> your,ya,ye
- N$ W, {, H! h* S* |4 a - not/767330 -> n't
+ v) Q9 G' l: s - she/653505 -> her% I, q, ~+ t0 b1 _+ b' k2 e
- do/535646 -> did,does,done,doing,du,d'$ m6 j1 K0 U2 [( k9 l/ T3 @
- we/503360 -> our,us$ E- M4 Q# F* `7 n' {) k, Q- j
- will/334612 -> 'll,wo,ll. w- W1 {* |4 [' J
- say/317317 -> said,says,saying
; N$ h3 `" E- X' l3 Z' U- a8 h - would/278414 -> 'd
: u7 c8 B% Z" o% D, ]* u - can/263138 -> ca,cans,can,could, e+ J0 v9 V( ^9 U- T
- go/227247 -> going,went,gone,goes,goin'
* ]( g# J# x7 l0 Y" T - get/212569 -> got,getting,gets,gotten6 _; N. }& ^+ a1 D B/ K
- make/209818 -> made,making,makes% Y* a5 ], @0 i' J5 O" r$ m
- up/206976 -> ups,upping,upped% x2 Y) r; `+ t: u4 b! [$ b" N) Z
- see/184969 -> seen,saw,seeing,sees
8 n$ v( L4 I# k' S - other/181277 -> others
0 J) m; B; o# Z4 y# v$ X$ B - time/181080 -> times,timed,timing7 u' w( K& ~: A2 n" T' U/ l( J3 i
- know/177717 -> knew,known,knows,knowing
- Y, E. _) N, h: p" G - take/172773 -> took,taken,taking,takes* U$ P# q- z- u/ H5 }4 [
- year/161649 -> years
复制代码 ' y2 o/ C+ e* l9 N3 ^9 v, R y' ^
! O" U, B3 L; D7 U7 G
然后写点小脚本就搞定了,点击下载:1 }4 k+ A% I) D: f6 ~
lemma.en.txt" K3 L% j% m% {9 D" R+ G0 R
6 y( y/ D3 L9 a# v: |; E1 E
|
|