{PrefLib}: A Library for Preferences

Data Sets

Our data is separated into four categories:

Supported By:

Election Data

ED-00001: Irish Election Data

The Dublin North, West, and Meath data sets contain a complete record of votes for two separate elections held in Dublin, Ireland in 2002. The votes were posted online but have since been removed.

ED-00002: Debian Project Data

The Debian Project Leader Elections are held yearly with most of the ballots available online.

ED-00003: Mariner Path Selection

The Mariner Trajectory Selection Data Set is the votes cast by the various science teams responsible for selecting the trajectory for the 1977 interplanetary satellite. There were a total of 10 science teams voting over 32 possible paths. All these votes are complete but indifference was allowed between some of the objects.

ED-00004: Netflix Prize Data

The Netflix Prize was a competition devised by Netflix to improve the accuracy of its recommendation system. To facilitate this Netflix released real ratings about movies from the users of the system. Any set of movies can be transformed into an election via a process outlined by Mattei, Forshee, and Goldsmith (reference below).

ED-00005: Burlington Election Data

The 2009 Burlington, Vermont Mayoral Election Data is posted online at www.rangevoting.org. It contains a number of interesting features when evaluated with the IRV method. Namely, the majority candidate in the first round does not emerge as the winner of the election.

ED-00006: Skate Data

This dataset contains figure skating rankings from various competitions during the 1998 season including the World Juniors, World Championships, and the Olympics. These data sets generally have 10-25 candidates (skaters) and 8-10 judges (voters).

ED-00007: Electorial Reform Society (ERS) Data

This dataset contains the results of 86 separate elections of various elections held by non-profit orginizations, trade unions, and professional orginizations. They were originally dontated by Nicolaus Tideman who secured NSF funding to have the ballots tabulated. The ballots are from elections held under various voting rules requiring incomplete strict orders. The tabulated results were initially collected by the Electoral Reform Society in the UK in order to support the adoption of STV and other range voting methods.

ED-00008: Glasgow City Council

This data set contains the results of the 2007 Glasgow City Council elections, seperated by Ward. There are 21 wards, each with different candidates and voters. These files report the results of all the Ward level elections which were origionally held under STV. In this data set there is a maximum of 13 candidates and a minimum of 8 candidates. The maximum number of voters is 12,744 and the minimum is 5,199.

ED-00009: AGH Course Selection

This dataset contains the results of surveying students at AGU University of Science and Technology about their course preferences. Each student provided a rank ordering over all the courses with no missing elements. There are 9 courses to choose from in 2003 and 7 in 2004.

ED-00010: F1 and Skiing

This dataset contains the results of F1 Racing competitions between 1961 and 2008 as well as Cross Country Skiing and Ski Jumping results from the 2006-2009 World Championships. This data is provided by Robert Bredereck at TU:Berlin. Robert provides tools to compute Kemeny rankings on this data at his website at TU Berlin.

ED-00011: Web Search

This dataset contains the results of comparing websearches across Bing, Google, Yahoo, and Ask. This data is provided by Robert Bredereck at TU:Berlin. Robert provides tools to compute Kemeny rankings on this data at his website at TU Berlin.

ED-00012: T Shirt

This dataset contains complete rank orderings of T-Shirt designs voted on by members of the Optimization Research Group at NICTA. There are 11 designs (candidates) and 30 votes about these deisgns. Voters were required to submit complete strict orders.

ED-00013: American National Election Studies Data

This dataset contains the results of the American National Election Studies thermometer polls taken between 1970 and 2008 (not including 2006). The data presented here was derrived by datasets assembled by Nicolaus Tideman and Florenz Plassmann.

ED-00014: Sushi Data

This dataset contains the results of a series of surveys conducted by Toshihiro Kamishima asking 5000 individuals for their preferences about various kinds of sushi. There are three different datasets that were elicited in different ways:

  • Element Series 00000001 contains 10 complete strict rank orders of 10 different kinds of sushi.
  • Element Series 00000002 contains individual's strict rank ordering of 100 different kinds of sushi (candidates).
  • Element Series 00000003 contains individual's scoring of sushi items on a scale of 0-4, with repeats allowed.
This dataset contains 14 files in total including soc, soi, toi, and toc files.

Note that the dataset was incorrectly converted, it has been fixed as of Jan 2016, please re-download.

ED-00015: Clean Web Search

This dataset contains the results of comparing websearches across Bing, Google, Yahoo, and Ask. This data is provided by Robert Bredereck at TU:Berlin. Robert provides tools to compute Kemeny rankings on this data at his website at TU Berlin.

ED-00016: Aspen Election Data

The 2009 Aspen Data contains the results from the mayoral and city council elections held in Aspen, CO in 2009. The data contains two different elections with about 2,500 votes each over 5 and 11 candidates.

ED-00017: Berkley Election Data

The 2010 Berkley Data contains the results from a city council election (District 7) in Berkley, CA. The set contains about 4,000 votes over 4 candidates.

ED-00018: Minneapolis Election Data

The 2009 Minneapolis Data contains the results from the election for the Parks and Rec Commissioner and Tax Assessor in Minneapolis, MN. The set contains about 30,000 votes over 7-400 candidates. The full data sets contain ballots along with write in candidates (Mikey Mouse and Yoda are well represented). The No Write In files contain the same votes removing any write-ins and modifying the votes accordingly.

ED-00019: Oakland Election Data

The 2010 Oakland Data contains the results from the city council and mayoral elections held in Oakland, CA in 2010. The set contains 7 distinct elections with between 4 and 11 canddiates and 900 and 145,000 voters.

ED-00020: Pierce Election Data

The 2008 Pierce Data contains the results from several elections, including county executive, held in Pierce, WA in 2008. The set contains 4 distinct elections with between 4 and 7 canddiates and 40,000 and 300,000 voters.

ED-00021: San Francisco Election Data

The San Francsico data contains the results from several elections, including board of supervisors, district attorny, and mayoral elections, held in San Francisco, CA between 2008 and 2012. The set contains 14 distinct elections with between 4 and 25 canddiates and 18,000 and 195,000 voters.

ED-00022: San Leandro Election Data

The San Leandro data contains the results from several elections, including mayor and city council elections, held in San Leandro, CA between 2010 and 2012. The set contains 3 distinct elections with between 4 and 7 canddiates and about 25,000 voters each.

ED-00023: Takoma Park Election Data

The Takoma Park Data contains the results from the 2007 Takoma Park, WA special election for city council. The set contains one elections with between 4 canddiates and about 400 voters.

ED-00024: Mechanical Turk Dots

The Mechanical Turk Dots datasets come from Andrew Mao and were collected using Mechanical Turk. These data sets each contain elections with 794-800 voters over 4 candidates.

ED-00025: Mechanical Turk Puzzle

The Mechanical Turk Dots datasets come from Andrew Mao and were collected using Mechanical Turk. These data sets each contain elections with 793-797 voters over 4 candidates.

ED-00026: 2002 French Presidental Election

The 2002 French Presidental Election Dataset was collected by Jean-Francois Laslier and Karine Van der Straeten. It consists of 2,597 approval ballots collected in parallel to the actual election in 6 different districts in France.

ED-00027: Proto French Election

This analog dataset to the 2002 French Presidential Election Dataset was collected by Jean-Francois Laslier, Karine Van der Straeten and Michel Balinski. It consists of 398 approval ballots collected over potential candidates for the 2002 French Presidential election cast by students at Institut d’Etudes Politiques de Paris.

ED-00028: APA Election Data

This dataset contains the results of the elections of the American Psychological Association between 1998 - 2009. The voters are allowed to rank any number of the 5 candidates without ties. Each of these elections have 5 candidates and between 13,318 and 20,239 voters.

ED-00029: Netflix Prize Data - No Condorcet Winners

The Netflix Prize was a competition devised by Netflix to improve the accuracy of its recommendation system. To facilitate this Netflix released real ratings about movies from the users of the system. Any set of movies can be transformed into an election via a process outlined by Mattei, Forshee, and Goldsmith (reference below). This is a new slice of the Netflix Prize Dataset containing only those 71,943 instances that do not contain a Condorcet Winner.

ED-00030: UK Labor Party Leadership Vote

The 2010 UK Labor Party Leadership Vote is posted at www.rangevoting.org. This set contains the votes cast by all 266 MPs over the 5 leadership candidates. The votes are incomplete strict orders which we have posted along with extensions placing all unranked candidates tied at the end and pairwise graphs.

ED-00031: Vermont District Races

This dataset contains votes for 15 different races for various public offices held in Vermont in 2014. This data was collected and donated by Jeremy A Hansen. There are 3 to 6 candidates and 532 to 1960 voters in these data files. Not all races were competitive so not every race is reported for every district.

ED-00032: Education Surveys in Informatics (Cujae)

This dataset contains the results of surveying students and professors in the Faculty of Informatics, Instituto Superior Politécnico José Antonio Echeverría (Cujae, Havana, Cuba) about their preferences on courses and the most important aspects affecting their performance as students and professionals. Answers include ties and missing elements. These surveys, conducted in 2015, include criteria about different numbers of aspects (6 to 32 candidates) and 13 courses.

ED-00033: San Sebastian Poster Competition

Approval Ballots from the San Sebastian Poster Competition held during The Summer School on Computational Social Choice organized by COST Action IC1205 at the Miramar Palace in San Sebastian in July 2016. This set has two elections of approval ballots with 17 alternatives and about 60 voters each. The data on this page was donated by Ulle Endriss.

ED-00034: Cities Survey

This dataset contains noisy input from two surveys, one about cost of living and one about population, of 392 individuals over 36 alternatives for cost of living and 48 alternatives for population. Each individual provided a ranking of six given cities in terms of cost of living and a ranking of six countries in terms of population.

Matching Data

MD-00001: Kidney Data

This dataset contains 310 instances of synthetic kidney donor pools. The data was generated using a state of the art donor pool generation method (described in Saidman et al., Increasing the opportunity of live kidney donation by matching for two-and three-way exchanges. Transplantation 81(5), 2006) and was donated by John Dickerson. John has recently posted his generation as well as his exchange solving code online; it is available here.

MD-00002: Computer Science Conference Bidding Data

This dataset contains the bidding data from 3 Computer Science Conferences. This contains the bids of all reviewers (aside a small number of opt-outs) over a subset of papers at the conference.

MD-00003: Project Bidding Data

This dataset contains bids of students over a set of projects for student/project allocations at the School of Computing Science, University of Glasgow. Each project is supervised by an individual each with a maximum capacity of supervision. There are 8 years worth of data in this set and with between 31 and 51 students and 56 and 155 projects. This data was kindly donated by David Manlove who collected this data.

MD-00004: AAMAS Bidding Data

This dataset contains the bids of reviewers over papers from the 2015 and 2016 Autonomous Agents and Multiagent Systems Conference. Inclusion in these data sets were explicitly opt-in; 2015 contains 9,817 bids of 201 reviewers over 613 papers; this represents about 40% of the actual 22,360 bids of 281 reviewers over 670 papers. The 2016 data contains 161 out of 393 reviewers with bids over 442 out of 550 papers.

Rating and Combinatorial Preference Data

CD-00001: Trip Advisor Data

This dataset contains 675,069 reviews of 1,851 hotels across the world scraped from Trip Advisor. The data was scraped and donated by Hongning Wang.

CD-00002: Proto French Election

This analog dataset to the 2002 French Presidential Election Dataset was collected by Jean-Francois Laslier, Karine Van der Straeten and Michel Balinski. It consists of 398 approval ballots and subjective ratings on a 20 point scale collected over potential candidates for the 2002 French Presidential election cast by students at Institut d’Etudes Politiques de Paris.

CD-00003: Social Recommendation

This dataset contains the Facebook Social Graph and full ratings of 16 restaurants and 23 pubs by 93 users.

Optimization Data

No Sets Yet, Please Donate!

Links
Tools
Data