Update README.md

This commit is contained in:
Zeyao Du
2019-12-07 19:09:55 +08:00
committed by GitHub
parent 81d6b43b0c
commit ce72444db2

View File

@@ -5,6 +5,10 @@
- Chinese version of GPT2 training code, using BERT tokenizer or BPE tokenizer. It is based on the extremely awesome repository from HuggingFace team [Transformers](https://github.com/huggingface/transformers). Can write poems, news, novels, or train general language models. Support char level, word level and BPE level. Support large training corpus.
- 中文的GPT2训练代码使用BERT的Tokenizer或Sentencepiece的BPE model感谢[kangzhonghua](https://github.com/kangzhonghua)的贡献实现BPE模式需要略微修改train.py的代码。可以写诗新闻小说或是训练通用语言模型。支持字为单位或是分词模式或是BPE模式需要略微修改train.py的代码。支持大语料训练。
## NEWS 12.7.2019
- 新项目[Decoders-Chinese-TF2.0](https://github.com/Morizeyao/Decoders-Chinese-TF2.0)同样支持GPT2的中文训练在使用上更加简单不易产生各种问题。目前还在测试阶段欢迎大家提出意见。
## NEWS 11.9
- [GPT2-ML](https://github.com/imcaspar/gpt2-ml)与本项目无任何直接关联已发布包含1.5B中文GPT2模型。大家如有兴趣或需要可将其转换为本项目支持的Pytorch格式进行进一步训练或生成测试。