HanLp的预训练模型
HanLP2.1支持包括简繁中英日俄法德在内的104种语言上的10种联合任务:
- 分词(粗分、细分2个标准,强制、合并、校正3种词典模式)
- 词性标注(PKU、863、CTB、UD四套词性规范)
- 命名实体识别(PKU、MSRA、OntoNotes三套规范)
- 依存句法分析(SD、UD规范)
- 成分句法分析
- 语义依存分析(SemEval16、DM、PAS、PSD四套规范)
- 语义角色标注
- 词干提取
- 词法语法特征提取
- 抽象意义表示(AMR)
hanlp的预训练模型
通过
import hanlp
hanlp.pretrained.ALL
列出hanlp中所有的预训练模型 。
{'SIGHAN2005_PKU_CONVSEG': 'https://file.hankcs.com/hanlp/tok/sighan2005-pku-convseg_20200110_153722.zip', 'SIGHAN2005_MSR_CONVSEG': 'https://file.hankcs.com/hanlp/tok/convseg-msr-nocrf-noembed_20200110_153524.zip', 'CTB6_CONVSEG': 'https://file.hankcs.com/hanlp/tok/ctb6_convseg_nowe_nocrf_20200110_004046.zip', 'PKU_NAME_MERGED_SIX_MONTHS_CONVSEG': 'https://file.hankcs.com/hanlp/tok/pku98_6m_conv_ngram_20200110_134736.zip', 'LARGE_ALBERT_BASE': 'https://file.hankcs.com/hanlp/tok/large_cws_albert_base_20200828_011451.zip', 'SIGHAN2005_PKU_BERT_BASE_ZH': 'https://file.hankcs.com/hanlp/tok/sighan2005_pku_bert_base_zh_20201231_141130.zip', 'CTB5_BIAFFINE_DEP_ZH': 'https://file.hankcs.com/hanlp/dep/biaffine_ctb5_20191229_025833.zip', 'CTB7_BIAFFINE_DEP_ZH': 'https://file.hankcs.com/hanlp/dep/biaffine_ctb7_20200109_022431.zip', 'PTB_BIAFFINE_DEP_EN': 'https://file.hankcs.com/hanlp/dep/ptb_dep_biaffine_20200101_174624.zip', 'SEMEVAL16_NEWS_BIAFFINE_ZH': 'https://file.hankcs.com/hanlp/sdp/semeval16-news-biaffine_20191231_235407.zip', 'SEMEVAL16_TEXT_BIAFFINE_ZH': 'https://file.hankcs.com/hanlp/sdp/semeval16-text-biaffine_20200101_002257.zip', 'SEMEVAL15_PAS_BIAFFINE_EN': 'https://file.hankcs.com/hanlp/sdp/semeval15_biaffine_pas_20200103_152405.zip', 'SEMEVAL15_PSD_BIAFFINE_EN': 'https://file.hankcs.com/hanlp/sdp/semeval15_biaffine_psd_20200106_123009.zip', 'SEMEVAL15_DM_BIAFFINE_EN': 'https://file.hankcs.com/hanlp/sdp/semeval15_biaffine_dm_20200106_122808.zip', 'GLOVE_6B_50D': 'http://downloads.cs.stanford.edu/nlp/data/glove.6B.zip#glove.6B.50d.txt', 'GLOVE_6B_100D': 'http://downloads.cs.stanford.edu/nlp/data/glove.6B.zip#glove.6B.100d.txt', 'GLOVE_6B_200D': 'http://downloads.cs.stanford.edu/nlp/data/glove.6B.zip#glove.6B.200d.txt', 'GLOVE_6B_300D': 'http://downloads.cs.stanford.edu/nlp/data/glove.6B.zip#glove.6B.300d.txt', 'GLOVE_840B_300D': 'http://nlp.stanford.edu/data/glove.840B.300d.zip', 'CTB5_POS_RNN': 'https://file.hankcs.com/hanlp/pos/ctb5_pos_rnn_20200113_235925.zip', 'CTB5_POS_RNN_FASTTEXT_ZH': 'https://file.hankcs.com/hanlp/pos/ctb5_pos_rnn_fasttext_20191230_202639.zip', 'CTB9_POS_ALBERT_BASE': 'https://file.hankcs.com/hanlp/pos/ctb9_albert_base_zh_epoch_20_20201011_090522.zip', 'PTB_POS_RNN_FASTTEXT_EN': 'https://file.hankcs.com/hanlp/pos/ptb_pos_rnn_fasttext_20200103_145337.zip', 'FLAIR_LM_FW_WMT11_EN_TF': 'https://file.hankcs.com/hanlp/lm/flair_lm_wmt11_en_20200211_091932.zip#flair_lm_fw_wmt11_en', 'FLAIR_LM_BW_WMT11_EN_TF': 'https://file.hankcs.com/hanlp/lm/flair_lm_wmt11_en_20200211_091932.zip#flair_lm_bw_wmt11_en', 'FLAIR_LM_WMT11_EN': 'https://file.hankcs.com/hanlp/lm/flair_lm_wmt11_en_20200601_205350.zip', 'CONVSEG_W2V_NEWS_TENSITE': 'https://file.hankcs.com/hanlp/embeddings/convseg_embeddings.zip', 'CONVSEG_W2V_NEWS_TENSITE_WORD_PKU': 'https://file.hankcs.com/hanlp/embeddings/convseg_embeddings.zip#news_tensite.pku.words.w2v50', 'CONVSEG_W2V_NEWS_TENSITE_WORD_MSR': 'https://file.hankcs.com/hanlp/embeddings/convseg_embeddings.zip#news_tensite.msr.words.w2v50', 'CONVSEG_W2V_NEWS_TENSITE_CHAR': 'https://file.hankcs.com/hanlp/embeddings/convseg_embeddings.zip#news_tensite.w2v200', 'SEMEVAL16_EMBEDDINGS_CN': 'https://file.hankcs.com/hanlp/embeddings/semeval16_embeddings.zip', 'SEMEVAL16_EMBEDDINGS_300_NEWS_CN': 'https://file.hankcs.com/hanlp/embeddings/semeval16_embeddings.zip#news.fasttext.300.txt', 'SEMEVAL16_EMBEDDINGS_300_TEXT_CN': 'https://file.hankcs.com/hanlp/embeddings/semeval16_embeddings.zip#text.fasttext.300.txt', 'CTB5_FASTTEXT_300_CN': 'https://file.hankcs.com/hanlp/embeddings/ctb.fasttext.300.txt.zip', 'TENCENT_AI_LAB_EMBEDDING': 'https://ai.tencent.com/ailab/nlp/data/Tencent_AILab_ChineseEmbedding.tar.gz#Tencent_AILab_ChineseEmbedding.txt', 'RADICAL_CHAR_EMBEDDING_100': 'https://file.hankcs.com/hanlp/embeddings/radical_char_vec_20191229_013849.zip#character.vec.txt', 'SUBWORD_ENCODING_CWS_ZH_WIKI_BPE_50': 'https://file.hankcs.com/hanlp/embeddings/subword_encoding_cws_20200524_190636.zip#zh.wiki.bpe.vs200000.d50.w2v.txt', 'SUBWORD_ENCODING_CWS_GIGAWORD_UNI': 'https://file.hankcs.com/hanlp/embeddings/subword_encoding_cws_20200524_190636.zip#gigaword_chn.all.a2b.uni.ite50.vec', 'SUBWORD_ENCODING_CWS_GIGAWORD_BI': 'https://file.hankcs.com/hanlp/embeddings/subword_encoding_cws_20200524_190636.zip#gigaword_chn.all.a2b.bi.ite50.vec', 'SUBWORD_ENCODING_CWS_CTB_GAZETTEER_50': 'https://file.hankcs.com/hanlp/embeddings/subword_encoding_cws_20200524_190636.zip#ctb.50d.vec', 'MSRA_NER_BERT_BASE_ZH': 'https://file.hankcs.com/hanlp/ner/ner_bert_base_msra_20200104_185735.zip', 'MSRA_NER_ALBERT_BASE_ZH': 'https://file.hankcs.com/hanlp/ner/ner_albert_base_zh_msra_20200111_202919.zip', 'CONLL03_NER_BERT_BASE_UNCASED_EN': 'https://file.hankcs.com/hanlp/ner/ner_conll03_bert_base_uncased_en_20200104_194352.zip', 'CHNSENTICORP_BERT_BASE_ZH': 'https://file.hankcs.com/hanlp/classification/chnsenticorp_bert_base_20200104_164655.zip', 'SST2_BERT_BASE_EN': 'https://file.hankcs.com/hanlp/classification/sst2_bert_base_uncased_en_20200210_090240.zip', 'SST2_ALBERT_BASE_EN': 'https://file.hankcs.com/hanlp/classification/sst2_albert_base_20200122_205915.zip', 'EMPATHETIC_DIALOGUES_SITUATION_ALBERT_BASE_EN': 'https://file.hankcs.com/hanlp/classification/empathetic_dialogues_situation_albert_base_20200122_212250.zip', 'EMPATHETIC_DIALOGUES_SITUATION_ALBERT_LARGE_EN': 'https://file.hankcs.com/hanlp/classification/empathetic_dialogues_situation_albert_large_20200123_142724.zip', 'FASTTEXT_DEBUG_EMBEDDING_EN': 'https://elit-models.s3-us-west-2.amazonaws.com/fasttext.debug.bin.zip', 'FASTTEXT_CC_300_EN': 'https://dl.fbaipublicfiles.com/fasttext/vectors-crawl/cc.en.300.bin.gz', 'FASTTEXT_WIKI_NYT_AMAZON_FRIENDS_200_EN': 'https://elit-models.s3-us-west-2.amazonaws.com/fasttext-200-wikipedia-nytimes-amazon-friends-20191107.bin', 'FASTTEXT_WIKI_300_ZH': 'https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.zh.zip#wiki.zh.bin', 'FASTTEXT_WIKI_300_ZH_CLASSICAL': 'https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.zh_classical.zip#wiki.zh_classical.bin', 'OPEN_TOK_POS_NER_SRL_DEP_SDP_CON_ELECTRA_SMALL_ZH': 'https://file.hankcs.com/hanlp/mtl/open_tok_pos_ner_srl_dep_sdp_con_electra_small_20201223_035557.zip', 'OPEN_TOK_POS_NER_SRL_DEP_SDP_CON_ELECTRA_BASE_ZH': 'https://file.hankcs.com/hanlp/mtl/open_tok_pos_ner_srl_dep_sdp_con_electra_base_20201223_201906.zip', 'CLOSE_TOK_POS_NER_SRL_DEP_SDP_CON_ELECTRA_SMALL_ZH': 'https://file.hankcs.com/hanlp/mtl/close_tok_pos_ner_srl_dep_sdp_con_electra_small_20210111_124159.zip', 'CLOSE_TOK_POS_NER_SRL_DEP_SDP_CON_ELECTRA_BASE_ZH': 'https://file.hankcs.com/hanlp/mtl/close_tok_pos_ner_srl_dep_sdp_con_electra_base_20210111_124519.zip', 'UD_ONTONOTES_TOK_POS_LEM_FEA_NER_SRL_DEP_SDP_CON_MT5_SMALL': 'https://file.hankcs.com/hanlp/mtl/ud_ontonotes_tok_pos_lem_fea_ner_srl_dep_sdp_con_mt5_small_20210228_123458.zip', 'UD_ONTONOTES_TOK_POS_LEM_FEA_NER_SRL_DEP_SDP_CON_XLMR_BASE': 'https://file.hankcs.com/hanlp/mtl/ud_ontonotes_tok_pos_lem_fea_ner_srl_dep_sdp_con_xlm_base_20210114_005825.zip', 'UD_CTB_EOS_MUL': 'https://file.hankcs.com/hanlp/eos/eos_ud_ctb_mul_20201222_133543.zip'}
试用
import hanlp
HanLP = hanlp.load(hanlp.pretrained.mtl.CLOSE_TOK_POS_NER_SRL_DEP_SDP_CON_ELECTRA_SMALL_ZH)
HanLP(["2021年HanLPv2.1为生产环境带来次世代最先进的多语种NLP技术。阿婆主来到北京立方庭参观自然语义科技公司。"])
会有警告:
/usr/local/lib/python3.7/site-packages/torch/autograd/__init__.py:204: UserWarning: CUDA initialization: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx (Triggered internally at /pytorch/c10/cuda/CUDAFunctions.cpp:100.)