Election data sets

Here is a list of all the data sets in the election category, showing 20 per page.

2002 French Presidental Election

The 2002 French Presidental Election Dataset was collected by Jean-Francois Laslier and Karine Van der Straeten. It consists of 2,597 approval ballots collected in parallel to the actual election in 6 different districts in France.

The approval votes were collected at a set of polling stations in France during the first round of voting in the 2002 French National Election. Voters in these districts were informed prior to the election that they would have the ability to cast an approval ballot along with their normal ballot for the election. Overall, over 75% of those who turned up to vote participated in the experiment. Each of the files represent one district voting on the same election. There are between 367 and 476 voters (2,597 in all) and 16 candidates. Additional details the method used to collect the data and results of analysis can be found in the required citation for the use of this dataset.

Want to have more details, see this page.

Click here to download the dataset

AGH Course Selection

This dataset contains the results of surveying students at AGU University of Science and Technology about their course preferences. Each student provided a rank ordering over all the courses with no missing elements. There are 9 courses to choose from in 2003 and 7 in 2004.

The data on this page has been donated by Piotr Faliszewski.

Want to have more details, see this page.

Click here to download the dataset

APA Election Data

This dataset contains the results of the elections of the American Psychological Association between 1998 - 2009. The voters are allowed to rank any number of the 5 candidates without ties. Each of these elections have 5 candidates and between 13,318 and 20,239 voters.

These data were donated by Michal Regenwetter and Anna Popova from the University of Illinois at Urbana-Champaign. The work that analyzed this data was supported by National Science Foundation grants SES # 08-20009, ICES # 1216016 (PI: M. Regenwetter), the University Library at the University of Illinois at Urbana-Champaign (PI: A. Popova), and the Basic Research Program at HSE (S. Popov). We thank the American Psychological Association for permitting access to its election ballot data. More information and tools for analyzing this data can be found at the website of the Decision Making Laboratory at the University of Illinois.

Want to have more details, see this page.

Click here to download the dataset

Aspen Election Data

The 2009 Aspen Data contains the results from the mayoral and city council elections held in Aspen, CO in 2009. The data contains two different elections with about 2,500 votes each over 5 and 11 candidates.

Note that these elections were conducted under a ranked voting system which allowed blank entries. In processing this data for PrefLib we have ignored blanks and only report the order over the candidates.

The data on this page was donated by Jeffrey O'Neill who runs the site OpenSTV.org.

Want to have more details, see this page.

Click here to download the dataset

Berkley Election Data

The 2010 Berkley Data contains the results from a city council election (District 7) in Berkley, CA. The set contains about 4,000 votes over 4 candidates.

Note that these elections were conducted under a ranked voting system which allowed blank entries. In processing this data for PrefLib we have ignored blanks and only report the order over the candidates.

The data on this page was donated by Jeffrey O'Neill who runs the site OpenSTV.org.

Want to have more details, see this page.

Click here to download the dataset

Burlington Election Data

The 2009 Burlington, Vermont Mayoral Election Data is posted online at www.rangevoting.org. It contains a number of interesting features when evaluated with the IRV method. Namely, the majority candidate in the first round does not emerge as the winner of the election.

The 2006 Burlington, Vermont Mayoral data presented here was donated by Jeffrey O'Neill who runs the site OpenSTV.org.

Want to have more details, see this page.

Click here to download the dataset

Cities Survey

This dataset contains noisy input from two surveys, one about cost of living and one about population, of 392 individuals over 36 alternatives for cost of living and 48 alternatives for population. Each individual provided a ranking of six given cities in terms of cost of living and a ranking of six countries in terms of population.

The data were collected among participants of the 3rd PatrasIQ research and technology exhibition, in Patras, Greece in April 2016. We received input from 392 volunteers; each of them was given a random bundle of six cities (from a pool of 36) and a random bundle of six countries (from a pool of 48), and was asked to give a strict ranking of the given cities and countries in terms of his/her estimation about their cost of living indices and population (in decreasing order), respectively.

In the cost of living treatment each city appears in at least 57 and at most 70 bundles/votes. The alternative ids define a ground truth, i.e., a strict ranking of all 36 cities according to cost of living index data retrieved from numbeo.com in April 2016. In the population treatment Each country appears in at least 47 and at most 52 bundles/votes. The alternative ids define a ground truth, i.e., a strict ranking of all 48 countries according to population data retrieved from wikipedia.org in April 2016.

The data on this page has been donated by Iannis Caragiannis

Want to have more details, see this page.

Click here to download the dataset

Clean Web Search

This dataset contains the results of comparing websearches across Bing, Google, Yahoo, and Ask. This data is provided by Robert Bredereck at TU:Berlin. Robert provides tools to compute Kemeny rankings on this data at his website at TU Berlin.

These data files differ from the other set of web data in that these files are forced to be complete. This means that the results are restricted to only those candidates (sites) that appear in all three datasets. The data files marked big contain around 200 (max 242) candidates each while the data files marked small contain between 10 and 50 candidates. The search querys are shown in the names of the individual data files below. For the WebImpact files the number of search results for a particular term were used to creage a complete ranking over the search terms. These files measure the webimpact of various world cities and countries. We have extended this data into tournament graphs and weighted majoirty graphs.

Want to have more details, see this page.

Click here to download the dataset

Debian Project Data

The Debian Project Leader Elections are held yearly with most of the ballots available online.

We have captured several years of data below including the vote for the Debian logo. Some years there have been only a few candidate and we have omitted these years. The included data sets have between 4 and 9 candidates depending on instance and about 400 individual votes per instance.

Want to have more details, see this page.

Click here to download the dataset

Education Surveys in Informatics (Cujae)

This dataset contains the results of surveying students and professors in the Faculty of Informatics, Instituto Superior Politécnico José Antonio Echeverría (Cujae, Havana, Cuba) about their preferences on courses and the most important aspects affecting their performance as students and professionals. Answers include ties and missing elements. These surveys, conducted in 2015, include criteria about different numbers of aspects (6 to 32 candidates) and 13 courses.

This dataset was donated by Alejandro Rosete Suárez and Milton García Borroto and may be augmented with new surveys in the future.

Want to have more details, see this page.

Click here to download the dataset

Electorial Reform Society (ERS) Data

This dataset contains the results of 86 separate elections of various elections held by non-profit orginizations, trade unions, and professional orginizations. They were originally dontated by Nicolaus Tideman who secured NSF funding to have the ballots tabulated. The ballots are from elections held under various voting rules requiring incomplete strict orders. The tabulated results were initially collected by the Electoral Reform Society in the UK in order to support the adoption of STV and other range voting methods.

The files contain vote records with a maximum of 29 candidates and as few as 3; the number of voters ranges from 9 to 3419. The toc files have all unranked candidates tied, at the end of the order. Additionally, some of these are complete sets of ballots from the given elections and some are random samples from the set of all ballots.

Want to have more details, see this page.

Click here to download the dataset

F1 and Skiing

This dataset contains the results of F1 Racing competitions between 1961 and 2008 as well as Cross Country Skiing and Ski Jumping results from the 2006-2009 World Championships. This data is provided by Robert Bredereck at TU:Berlin. Robert provides tools to compute Kemeny rankings on this data at his website at TU Berlin.

The data in this set contains bewteen 10 and 351 candidates (racers, cars) and 4 go 19 voters (competitions). The results from each competition in the season provides a rank ordering over the candidates (competitiors). We have extended this data into tournament graphs, weighted majoirty graphs, and created a toc dataset where all candidates are tied, at the end of rankings.

Want to have more details, see this page.

Click here to download the dataset

Glasgow City Council

This data set contains the results of the 2007 Glasgow City Council elections, seperated by Ward. There are 21 wards, each with different candidates and voters. These files report the results of all the Ward level elections which were origionally held under STV. In this data set there is a maximum of 13 candidates and a minimum of 8 candidates. The maximum number of voters is 12,744 and the minimum is 5,199.

The data presented here was donated by Jeffrey O'Neill who runs the site OpenSTV.org.

Want to have more details, see this page.

Click here to download the dataset

Irish Election Data

The Dublin North, West, and Meath data sets contain a complete record of votes for two separate elections held in Dublin, Ireland in 2002. The votes were posted online but have since been removed.

The data sets are not complete, they contain many partial votes over the candidate set. The North data set contains 43,942 votes over 12 candidates, the West data set contains 29,988 over 9 candidates, and the Meath set contains 64,081 votes over 14 candidates.

The Meath data presented here was donated by Jeffrey O'Neill who runs the site OpenSTV.org.

Want to have more details, see this page.

Click here to download the dataset

Mariner Path Selection

The Mariner Trajectory Selection Data Set is the votes cast by the various science teams responsible for selecting the trajectory for the 1977 interplanetary satellite. There were a total of 10 science teams voting over 32 possible paths. All these votes are complete but indifference was allowed between some of the objects.

Want to have more details, see this page.

Click here to download the dataset

Mechanical Turk Dots

The Mechanical Turk Dots datasets come from Andrew Mao and were collected using Mechanical Turk. These data sets each contain elections with 794-800 voters over 4 candidates.

Each of the candidates correspond to random dots presented to a user on Mechanical Turk, who is asked to rank the items from those containing the least dots (first) to those containing the most dots (last). Thus, for all of these data sets there is a ground truth ranking which corresponds to the candidate names in sorted order. In the Dots task, each task contains elements with 200, 200+i, 200+2i, and 200+3i dots, where i = {3, 5, 7, 9}. This allows for more noise to be introduced to various iterations of the task. For each i, 40 sets of puzzles were placed on Mechanical Turk and were ranked by 20 users. As per the data owners request these 160 individual trails have been aggregated into a single file for each i. The individual trial runs are available upon request.

Want to have more details, see this page.

Click here to download the dataset

Mechanical Turk Puzzle

The Mechanical Turk Dots datasets come from Andrew Mao and were collected using Mechanical Turk. These data sets each contain elections with 793-797 voters over 4 candidates.

Each of the candidates correspond to an instance of the sliding puzzle game presented to a user on Mechanical Turk, who is asked to rank the items from those in a position closest to solution (first) to those requiring the most moves to complete (last). Thus, for all of these data sets there is a ground truth ranking which corresponds to the candidate names in sorted order. In the Puzzle task, each task contains elements requiring d, d+3, d+6, and d+9 moves to complete, where d = {5, 7, 9, 11}. This allows for more noise to be introduced to various iterations of the task. For each i, 40 sets of puzzles were placed on Mechanical Turk and were ranked by 20 users. As per the data owners request these 160 individual trails have been aggregated into a single file for each i. The individual trial runs are available upon request.

Want to have more details, see this page.

Click here to download the dataset

Minneapolis Election Data

The 2009 Minneapolis Data contains the results from the election for the Parks and Rec Commissioner and Tax Assessor in Minneapolis, MN. The set contains about 30,000 votes over 7-400 candidates. The full data sets contain ballots along with write in candidates (Mikey Mouse and Yoda are well represented). The No Write In files contain the same votes removing any write-ins and modifying the votes accordingly.

Note that these elections were conducted under a ranked voting system which allowed blank entries. In processing this data for PrefLib we have ignored blanks and only report the order over the candidates.

The data on this page was donated by Jeffrey O'Neill who runs the site OpenSTV.org.

Want to have more details, see this page.

Click here to download the dataset

Netflix Prize Data

The Netflix Prize was a competition devised by Netflix to improve the accuracy of its recommendation system. To facilitate this Netflix released real ratings about movies from the users of the system. Any set of movies can be transformed into an election via a process outlined by Mattei, Forshee, and Goldsmith (reference below).

The data sets posted below correspond 100 random 3 and 4 candidate elections drawn from Data Set 1 in the paper , "An Empirical Study of Voting Rules and Manipulation with Large Datasets." The elements numbered 1 - 100 are all 3 candidate elections and the elements 101 - 201 are all 4 candidate elections.

Want to have more details, see this page.

Click here to download the dataset

Oakland Election Data

The 2010 Oakland Data contains the results from the city council and mayoral elections held in Oakland, CA in 2010. The set contains 7 distinct elections with between 4 and 11 canddiates and 900 and 145,000 voters.

Note that these elections were conducted under a ranked voting system which allowed blank entries. In processing this data for PrefLib we have ignored blanks and only report the order over the candidates.

The data on this page was donated by Jeffrey O'Neill who runs the site OpenSTV.org.

Want to have more details, see this page.

Click here to download the dataset