|
|
本帖最后由 5dhtml 于 2018-11-13 11:01 编辑 : c0 d! N& {( O: o
- K/ L1 b8 ]& X3 m最近在分析整理了几本英英词典的的数据,产生了一个疑问:即使是初级词典,词条的选择也并不是只选择比较初级的词汇,比如某词典收词量仅2万,但包含大量词频在2万以后的词条(根据ANC/BNC/COCA综合数据),那么除了OED这种巨无霸,普通词典编纂时候是如何选择收录哪些词呢?同样一直有疑问的是,像CET4-6这类考试,词汇大纲的范围又是根据什么依据选择的?
- G- T( }$ v6 k& E9 l% B# U% K, F/ Q, o5 o
3 H' n9 K8 T6 q' d f4 ^" U顺便请教一下,谁知道有没有现成的英语词汇的屈折变化列表(如动词四态)和名词复数、衍生等词形变化的列表数据呢?" G' n# {7 c8 }! A) n
比如work works worked working。。。。* ?: ~+ _- A% K' P
; D- k" G6 S" V! | g
找到了,根据BNC词频排列的84497个英语词干屈折变化列表,可直接另存为txt$ X( i9 J9 U1 u1 h; R7 Z2 t3 ]
/ }) W8 F7 W( C3 I, Y
https://raw.githubusercontent.com/skywind3000/ECDICT/master/lemma.en.txt
' L; q9 B$ z) [4 H9 s* G$ {( x- z0 I0 f1 c, U# i c$ V+ H
7 H8 q- w& m6 A, f1 F7 S' P
; En Lemma Database (version 1.0.2)
+ @1 J3 ^- A& F/ G7 f/ G; Compiled by Lin Wei (https://github.com/skywind3000), Mar 28, 2017
7 i+ Z1 }% b) q; F8 `4 U; by referencing the 100M+ words in the British National Corpus (BNC),
1 i5 O: ? p4 j0 J C/ Z% H; NodeBox Linguistics and Yasumasa Someya's lemma list.
8 ~ b* y# R3 E$ c; This lemma list is provided "as is" and is free to use for any research
; u' ~# C- U% K- i; and/or educational purposes. 1 j( d* ], O7 `) [/ O( Q
; The list currently contains 186,523 words (tokens) in 84,487 lemma groups.
/ n; r1 ^/ Y9 P/ y; If you have any questions or comments about this lemma list, feel free 1 |* |- I; b6 K% C1 y0 f
; to contact me ([email protected]), at any time..
7 i8 }( z3 m/ {) z/ D) v;
0 k7 Q, n2 I0 V; D- x$ z. cbe/4109826 -> is,was,are,were,'s,been,being,'re,'m,am,m
8 |0 [# y! D9 r# c. Nhave/1315648 -> had,has,'ve,having,'s,'d,of,d,ve4 J8 k9 r4 \; v$ e* X" E0 _( C# M
it/1213224 -> its,they0 S9 x( N& h* `2 {
he/1196022 -> his,him,they
( ~' a+ I7 p. m8 ?; li/1133697 -> my,me,we,is( B& L! e' E" s, h! p
they/841960 -> their,them,'em
# W2 @. Q' D5 c" L9 H0 V" k) h0 iyou/804279 -> your,ya,ye- W- P7 C: `% ]
not/767330 -> n't" f. o& j U4 I) x8 O0 o( I4 ?
she/653505 -> her6 Q( v j) n8 d+ c' S& T; I6 M6 Z
do/535646 -> did,does,done,doing,du,d'
- S4 T- P6 Z: y) S7 [we/503360 -> our,us: C% F# Y' \( x1 P" @
will/334612 -> 'll,wo,ll' n" B1 X" w; m1 W' p* L
say/317317 -> said,says,saying. w* i& E( a/ q9 x% a- [, _
would/278414 -> 'd$ i5 P; D! Q3 O* \, q. {6 f( j$ D9 P
can/263138 -> ca,cans,can,could
K. b% a) E3 c! h' E4 l3 b+ C# \go/227247 -> going,went,gone,goes,goin'
0 j& n: E& F+ bget/212569 -> got,getting,gets,gotten( }8 e* }, [1 A& W+ Z: A0 s
make/209818 -> made,making,makes# q6 X( t& X T7 `" R4 S; y
up/206976 -> ups,upping,upped# l; |. ]7 B! J$ h7 {
see/184969 -> seen,saw,seeing,sees
$ R1 R- N" c$ \2 d' Jother/181277 -> others
' V4 h+ v' J' Gtime/181080 -> times,timed,timing
' C, M! o! I$ L! R. z& d: H) gknow/177717 -> knew,known,knows,knowing
5 e2 ^5 f4 z0 b4 S4 L6 Q8 ctake/172773 -> took,taken,taking,takes
7 `+ P3 C) O+ n1 S ~% V& cyear/161649 -> years; a$ J# t. \* [
well/156075 -> better,wells,welling,welled! U( E" i/ }7 e1 C
like/154975 -> liked,likes,liking4 I3 p# J: x: [ K( p
then/154443 -> thens
$ U9 F% O$ {) q7 S! e, Cthink/145268 -> thought,thinking,thinks0 x& a* P% r7 E! L. \* z
come/144107 -> came,coming,comes9 X1 g% Q% c- Q9 ?
now/138986 -> nows
0 Q4 K6 s" ?% i ~. K0 m! nuse/137498 -> used,using,uses
# L6 A. k1 N* {over/130163 -> overs
% P5 A2 M, R: i4 G7 M) v, \) Fgood/128437 -> best,better,goods1 V5 N: s- ^, |" O8 @$ ~! V
work/126290 -> working,worked,works,wrought
/ s' Y; {* q. ~; {& n" H: ?3 {7 Vgive/125727 -> given,gave,giving,gives! f& T6 F! M6 m/ h2 k
new/124872 -> newer,newest
! |9 U% I% u/ x5 V+ _5 Kpeople/123156 -> peoples,peopling,peopled
1 P) F/ }3 P% W& b. @look/119946 -> looked,looking,looks
0 H$ y+ _- \3 X" T4 ~one/116568 -> ones
3 ]7 q+ ], e, B+ D9 j) H0 ~ A) lway/110362 -> ways' T; N% x' O+ b6 {3 d7 [ ^; e. s
: G7 k- D( \0 k; a& n0 \8 J |
|