Wikipedia biography dataset



Wikipedia biography dataset

  • Wikipedia biography dataset download
  • One life to live spoilers new
  • Biography generation dataset
  • Wikibio dataset huggingface
  • One life to live spoilers new...

    Citation Credit

    Neural Text Generation from Structured Data with Application to the Biography Domain
    Rémi Lebret, David Grangier and Michael Auli, EMNLP 2016
    http://arxiv.org/abs/1603.07771

    This publication provides further information about the data, and we kindly ask you to cite this paper when using the data.

    The data was extracted from the English wikipedia dump (enwiki-20150901) relying on the articles referred by WikiProject Biography.

    Dataset Description

    For each article, we extracted the first paragraph (text) and the infobox (structured data).

    Each infobox is encoded as a list of (field name, field value) pairs.

    Wikipedia biography dataset download

    We used Stanford CoreNLP to preprocess the data, i.e. we broke the text into sentences and tokenized both the text and the field values. The dataset was randomly split in three subsets train (80%), valid (10%), test (10%). We strongly recommend using test only for the final evaluation.

    The data is organised in three subdirectories for train, valid and test.