正體中文詞嵌入向量 Word2Vec
300 維,200 維,100 維,50 維模型壓縮檔(以 gensim python 套件訓練)
引用詞嵌入模型檔之範例:
import gensim
from gensim.models.word2vec import Word2Vec
model = gensim.models.KeyedVectors.load_word2vec_format('詞嵌入模型檔的路徑',
unicode_errors='ignore',
binary=True)
Download pre-trained word vectors
@ Word based 語詞等級之預訓練詞嵌入模型
TMUNLP_1.6B_WB_50DIM_2020V1.BIN.GZ
COMMON CRAWL
1.6B TOKENS, 1.7M VOCAB, 50D VECTORS, 320.7 MB
TMUNLP_1.6B_WB_100DIM_2020V1.BIN.GZ
COMMON CRAWL
1.6B TOKENS, 1.7M VOCAB, 100D VECTORS, 626.4 MB
TMUNLP_1.6B_WB_200DIM_2020V1.BIN.GZ
COMMON CRAWL
1.6B TOKENS, 1.7M VOCAB, 200D VECTORS, 1.21 GB
TMUNLP_1.6B_WB_300DIM_2020V1.BIN.GZ
COMMON CRAWL
1.6B TOKENS, 1.7M VOCAB, 300D VECTORS, 1.8 GB
@ Character based 字元等級之預訓練詞嵌入模型
TMUNLP_1.6B_CB_50DIM_2020V1.BIN.GZ
COMMON CRAWL
4.45B TOKENS, 14K VOCAB, 50D VECTORS, 2.6 MB
TMUNLP_1.6B_CB_100DIM_2020V1.BIN.GZ
COMMON CRAWL
4.45B TOKENS, 14K VOCAB, 100D VECTORS, 5.1 MB
TMUNLP_1.6B_CB_200DIM_2020V1.BIN.GZ
COMMON CRAWL
4.45B TOKENS, 14K VOCAB, 200D VECTORS, 10 MB
TMUNLP_1.6B_CB_300DIM_2020V1.BIN.GZ
COMMON CRAWL
4.45B TOKENS, 14K VOCAB, 300D VECTORS, 15 MB