掌上百科 - PDAWIKI

 找回密码
 免费注册

QQ登录

只需一步,快速开始

查看: 2765|回复: 3

[词典校勘] COCA詞頻,排序卻不單只依詞頻排序

[复制链接]

该用户从未签到

发表于 2016-2-27 15:37:56 | 显示全部楼层 |阅读模式
緣起於fuxy526加入詞性百分比的版本( F( i& l% ]# w- L9 P3 I* H: y
4 _$ \9 O9 B/ I! @6 L
才注意到有些詞性的詞頻較多,排序卻靠後! F6 s1 Z+ y$ I7 @% _% [6 y8 x
" q4 `$ q. R* O: R
還有的例子,詞頻少了十幾倍,排序卻在前
. e" S$ I1 R6 J% m1 E
+ n9 ]9 b7 L  _) ?
( ]/ Q0 u& m5 K/ z' d1 x7 g; C
% R" l4 C& Z: a0 J: p: A, }COCA網頁上找不到排序的依據是什麼

本帖被以下淘专辑推荐:

该用户从未签到

发表于 2016-2-27 20:44:01 | 显示全部楼层

: O; J' U' K* j/ H) k4 ?http://www.wordandphrase.info/h_dispersion.asp0 O: \# a1 v% x1 w% y9 I
Why doesn't the frequency ranking follow the absolute frequency of a word?3 R9 L+ N1 x7 H( l% @( y! `
DISPERSION AND RANKING (1,60,000)6 W0 ]3 n: S2 d& t

; m% M9 E: B: O& h/ VAs you browse through the frequency listing, you may notice that words with a lower frequency than other nearby words have a higher ranking (1-60,000). This is because the ranking is a function of two numbers: [frequency x dispersion]. Dispersion is a score (0.00-1.00) that measures how "evenly" the word is spread across the entire corpus (with 1.00 being the most even). The idea is that if a word is concentrated in just one or maybe two genres (or worse, even just a few sub-genres or texts in that genre), then the word is more specialized, and shouldn't be ranked as high in the overall list 1-60,000.
8 j4 O3 T; C0 m6 @
, j1 w0 f7 c/ q- o7 C5 x) bMost people won't need to see the dispersion score. If you do, you might consider downloading the data that contains this information.  (See a sample (every seventh word, 1-60,000) with dispersion in the right column).
7 b! p0 Z$ ^- ~- B; X- U- h2 ]4 g0 x, P+ o" U
Also, please be aware that there are still some isolated "issues" with the frequency list, especially with words that occur mainly as a proper noun or in proper nouns (e.g. cook, ray, frost, savage). In most cases, these are already marked in the frequency list with parentheses, to let you know that there might be problems. But even with these issues, we believe that the frequency list here is more accurate than any other large frequency listing of English.

- U  x$ Z; m* h/ h* X: W
+ W. M* i. z, Q7 \/ Ehttps://en.wikipedia.org/wiki/Statistical_dispersion+ t- Y/ a9 n5 @+ [! i9 p
$ k+ Z0 {- n# g0 h& Y. O
6 M3 |6 t" f2 \2 Y0 W6 W6 q: a2 ^; b/ b

点评

厲害 網站翻遍了 竟然沒發現  发表于 2016-2-27 23:32

该用户从未签到

发表于 2016-3-8 12:30:10 | 显示全部楼层
楼主的这个图是从什么软件打开哒?我用欧路打开fuxy526制作的文件是乱码。。
您需要登录后才可以回帖 登录 | 免费注册

本版积分规则

小黑屋|手机版|Archiver|PDAWIKI |网站地图

GMT+8, 2024-4-29 22:25 , Processed in 0.037286 second(s), 11 queries , MemCache On.

Powered by Discuz! X3.4

Copyright © 2001-2023, Tencent Cloud.

快速回复 返回顶部 返回列表