Trip Advisor Data (00040)

This dataset contains 675,069 reviews of 1,851 hotels across the world scraped from Trip Advisor. The data was scraped and donated by Hongning Wang.

One file contains the numerical aspect ratings provided by the users, along with other information about the hotel. The second file contains the text of the users review. These reviews have been slightly modified, all excess spaces and tabs have been removed and all commas have bene changed to semi-colons.

Both files are zipped due to their size. Both files are encoded in the dat format and the first line of each file explains the fields within the file. Some of the usernames are encoded in Unicode so please be careful when parsing the files!

Selected studies: Hongning Wang, Yue Lu and Chengxiang Zhai. Latent Aspect Rating Analysis on Review Text Data: A Rating Regression Approach. The 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2010. | Hongning Wang, Yue Lu and ChengXiang Zhai. Latent Aspect Rating Analysis without Aspect Keyword Supervision. The 17th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2011. | Marco Costantini, Carla Groenland, and Ulle Endriss. Judgment Aggregation under Issue Dependencies. 30th AAAI Conference on Artificial Intelligence (AAAI), 2016.

Download the dataset [zip, 77.3 MB]

Details

  • Number of files: 2
  • Total size: 221.5 MB
  • Data types: dat.
  • Publication date: Aug. 17, 2013
  • Last modification: Sept. 22, 2022
Ratings — 00040-00000001.dat
Review Texts — 00040-00000002.dat