Wikipedia biography dataset

Wikipedia biography dataset download

One life to live spoilers new

Biography generation dataset

Wikibio dataset huggingface

One life to live spoilers new...

Citation Credit

Neural Text Generation from Structured Data with Application to the Biography Domain
Rémi Lebret, David Grangier and Michael Auli, EMNLP 2016
http://arxiv.org/abs/1603.07771

This publication provides further information about the data, and we kindly ask you to cite this paper when using the data.

The data was extracted from the English wikipedia dump (enwiki-20150901) relying on the articles referred by WikiProject Biography.

Dataset Description

For each article, we extracted the first paragraph (text) and the infobox (structured data).

Each infobox is encoded as a list of (field name, field value) pairs.

Wikipedia biography dataset download

We used Stanford CoreNLP to preprocess the data, i.e. we broke the text into sentences and tokenized both the text and the field values. The dataset was randomly split in three subsets train (80%), valid (10%), test (10%). We strongly recommend using test only for the final evaluation.

The data is organised in three subdirectories for train, valid and test.