正體中文詞嵌入向量 Word2Vec

300 維,200 維,100 維,50 維模型壓縮檔(以 gensim python 套件訓練)
引用詞嵌入模型檔之範例:

import gensim from gensim.models.word2vec import Word2Vec model = gensim.models.KeyedVectors.load_word2vec_format('詞嵌入模型檔的路徑', unicode_errors='ignore', binary=True)


Download pre-trained word vectors
@ Word based 語詞等級之預訓練詞嵌入模型

TMUNLP_1.6B_WB_50DIM_2020V1.BIN.GZ

COMMON CRAWL
1.6B TOKENS, 1.7M VOCAB, 50D VECTORS, 320.7 MB

TMUNLP_1.6B_WB_100DIM_2020V1.BIN.GZ

COMMON CRAWL
1.6B TOKENS, 1.7M VOCAB, 100D VECTORS, 626.4 MB

TMUNLP_1.6B_WB_200DIM_2020V1.BIN.GZ

COMMON CRAWL
1.6B TOKENS, 1.7M VOCAB, 200D VECTORS, 1.21 GB

TMUNLP_1.6B_WB_300DIM_2020V1.BIN.GZ

COMMON CRAWL
1.6B TOKENS, 1.7M VOCAB, 300D VECTORS, 1.8 GB


@ Character based 字元等級之預訓練詞嵌入模型

TMUNLP_1.6B_CB_50DIM_2020V1.BIN.GZ

COMMON CRAWL
4.45B TOKENS, 14K VOCAB, 50D VECTORS, 2.6 MB

TMUNLP_1.6B_CB_100DIM_2020V1.BIN.GZ

COMMON CRAWL
4.45B TOKENS, 14K VOCAB, 100D VECTORS, 5.1 MB

TMUNLP_1.6B_CB_200DIM_2020V1.BIN.GZ

COMMON CRAWL
4.45B TOKENS, 14K VOCAB, 200D VECTORS, 10 MB

TMUNLP_1.6B_CB_300DIM_2020V1.BIN.GZ

COMMON CRAWL
4.45B TOKENS, 14K VOCAB, 300D VECTORS, 15 MB