Hotel Review and Response for Customer Satisfaction Analysis Corpus

This research proposes a theoretical framework for developing deep learning-based natural language processing (NLP) models to identify and learn managerial response strategies and predict potential consumers' satisfaction. The goal of this method is first to predict what strategies the hotel manager uses and then further infer customer satisfaction based on the predicted strategies.
To the best of our knowledge, there is no public dataset for the tasks. For this reason, we randomly sample two thousand instances from the collected data to establish the corpus for model development and performance evaluation (see Figure 1). To develop the model, we requested three human coders with over five years of online hotel booking experience to involve the data compiling process. We randomly selected 1,000 positive and 1,000 negative hotel reviews with responses from nine hotels. Three human coders are proficient in English, and each coder receives a $200 monetary reward to complete this study in a month. Three coders were trained with sample reviews and responses before conducting the labeling task. The definition of each strategy was given to each participant. Given the hotel reviews, responses, and days to respond, participants were asked to mark response strategies used in each hotel response and rate their customer satisfaction toward each response. A 5-point Likert scale with 1= strongly unsatisfied and 5= strongly satisfied is used to measure users' satisfaction. We used Cohen's kappa coefficient to measure inter-rater reliability. The kappa value is .75, which indicates a substantial agreement among the three raters. The data distribution is shown in Table 1.

Table 1. The statistics of the corpus.
