Data Structure
PrefLib is about sharing data representing preferences. We have attempted to unify the formatting of the data as much as possible. Each format is as close to a comma separated format (CSV) as possible to improve portability. All the details about the file format can be found in the data format page.
Categories
Our data is separated into the following categories:
- Election Data (ED): This category contains data that either was an election, or can be interpreted as election data. Generally speaking the sets are preference relations (votes) over a set of objects (candidates). We have data from actual elections, movie rankings, and competitor rankings from various sporting competitions.
- Matching Data (MD): This category contains data where agents express preference over items (and vise-verse) in order to pair agents to items. We have synthetic data from organ and kidney matching in the USA as well as bidding data from large conferences. We hope to obtain data in a variety of domains including two-sided matching markets such as residents bidding on hospitals and one-sided markets such as students bidding on dorm rooms.
- Rating and Combinatorial Preference Data (CD): This category contains data from a broad set of domains that can be viewed as combinatorial and/or multidimensional. This includes multi-attribute ratings, CP-nets, GAI-nets and lexicographical preferences for multi-attribute objects.
Data Files, Patches and Sets
The database is organized in three layers when its comes to the classification of the data we have. These layers are datasets, data patch and data file. Here is their meaning:
- Each of the data category described above is a collection of datasets.
- A dataset consists in a collection of data patches.
- A data patch is a set of data files, all representing the same original data interpreted in different ways.
- Each of the data file is of one of the following types:
- SOC - Strict Orders - Complete List: complete, transitive and asymmetric preference relations.
- SOI - Strict Orders - Incomplete List: transitive and asymmetric preference relations which might not be complete.
- TOC - Orders with Ties - Complete List: complete and transitive preference relations where preferences are over sets of object to indicate possible ties.
- TOI - Orders with Ties - Incomplete List: transitive preference relations possibly incomplete where preferences are over sets of object to indicate possible ties.
- TOG - Tournament Graph: voting tournament graphs indicating who wins against who for every match of the tournament.
- MJG - Majority Graph: majority graphs indicating the winner in every pairwise majority contest.
- WMG - Weighted Majority Graph: weighted majority graphs indicating the margin of the winner in every pairwise majority contest.
- PWG - Pairwise Graph: pairwise graphs indicating for every ordered pair of candidates the number of agent who ranked the candidates in this order.
- WMD - Weighted Matching Data: directed weighted graphs with a source and a sink.
- DAT - Extra Data File: miscellaneous data.
Metadata
We tried to provide usefull metadata for the data files we host. They represent different properties that may be satisfied by the preferences: number of voters, number of alternatives, existence of a Condorcet winner and so on. For more details, check the metadata page.
You can also use these metadata in the search engine we provided.