English | 中文

Pre-training has become an essential part for NLP tasks. UER-py (Universal Encoder Representations) is a toolkit for pre-training on general-domain corpus and fine-tuning on downstream task. UER-py maintains model modularity and supports research extensibility. It facilitates the use of existing pre-training models, and provides interfaces for users to further extend upon. With UER-py, we build a model zoo which contains pre-trained models of different properties. See the UER-py project Wiki for full documentation.
🚀 We have open-sourced the TencentPretrain, a refactored new version of UER-py. TencentPretrain supports multi-modal models and enables training of large models. If you are interested in text models of medium size (with parameter sizes of less than one billion), we recommend continuing to use the UER-py project.
Table of Contents