File Format

Each data file we host has a unique identifier in the format [XX]-YYYYY-ZZZZZZZZ.EXT. These numbers are broken down as:

We have developed a small set of lightweight tools in Python3 for working with PrefLib and generating synthetic data. Please download the current version of the tools below and check the README for full details. PrefLib tools are covered under the BSD License and is available at the PrefLib-Tools GitHub Page.

We are currently using 3 formats that are described below.

Election Data

The format for all ranked preferences (orders over candidates or sets of candidates) is as follow with each element being on a new line. The file extensions SOC, SOI, TOC, TOI, TOG, MJG, WMG and PWG use this format.

Votes are sorted by count in the individual data files. Each field is described below:

Here is an example of the 25 first lines of a data file of complete orders with ties (TOC) (taken from the debian election dataset).

1 4
2 1, Branden Robinson
3 2, Raphael Hertzog
4 3, Bdale Garbee
5 4, None Of The Above
6 475, 475, 41
7 60, 3, 1, 2, 4
8 50, 1, 3, 2, 4
9 40, 3, 1, 2, 4
10 34, 3, 2, 1, 4
11 31, 3, 2, 4, 1
12 29, 2, 3, 1, 4
13 29, 1, 3, 2, 4
14 24, 2, 1, 3, 4
15 22, 1, 2, 3, 4
16 20, 3, 2, 1, 4
17 15, 1, 3, 4, 2
18 14, 2, 3, 1, 4
19 11, 3, 1, 4, 2
20 9, 2, 3, 4, 1
21 9, 3, {1, 2, 4}
22 8, 1, 2, 3, 4
23 7, 1, {2, 3, 4}
24 5, 3, 4, {1, 2}
25 5, 3, 2, {1, 4}

Weighted Matching Data

The format for all weighted matching data is as follow with each element being on a new line. Only the file extensions WMD uses this format.

The edges are sorted by source so that all edges starting from the same source are grouped together. Each field is described below:

Here is an example of the 25 first lines of a weighted matching data file (WMD) (taken from the kidney matching dataset).

1 16, 26
2 1, Pair 1
3 2, Pair 2
4 3, Pair 3
5 4, Pair 4
6 5, Pair 5
7 6, Pair 6
8 7, Pair 7
9 8, Pair 8
10 9, Pair 9
11 10, Pair 10
12 11, Pair 11
13 12, Pair 12
14 13, Pair 13
15 14, Pair 14
16 15, Pair 15
17 16, Pair 16
18 0, 4, 1
19 2, 7, 1
20 2, 3, 1
21 3, 4, 1
22 5, 7, 1
23 5, 3, 1
24 6, 7, 1
25 6, 3, 1

Extra Data File

When miscellaneous data are needed, we use the file extension DAT which always is a simple CSV file with headers.

Files with a dat extension are generally paired with another file, providing more information than is expressible in the basic data formats.