Paper link: Deep Recurrent Neural Networks for OYO Hotels Recommendation


  • A hybrid model with two parts:
    1. Embedding generation: generate implicit embeddings of properties.
    2. Deep prediction and ranking model.
  • The model performed well over the existing collab-filtering model.


  • OYO’s current recommendation system
    • Graph-based Collaborative filtering model
    • Optimised on browsing data as user feedback
    • Objective: CTR
  • DL provides an opportunity to improve the system.

Lit Review


  • Embedding generation: generates embeddings of the hotels (intermediate output of the next step).
  • Prediction and ranking model: gets top-n recommendations-based on the following inputs:
    • The sequence of browsed hotels
    • Embeddings of the browsed hotels
    • Rating tokens of the browsed hotels
    • Realisation tokens of the browsed hotels
OYO RecSys Schema
  • What was the candidate list of hotels? High-rated hotels?

Embedding Gen

  • Explicit feedback requires effort from the customers; hence, ratings are sparse.
  • Browsing data as user’s implicit feedback; thus, no sparsity.
  • In this work, implicit features were derived using an RNN.
    • Embeddings were the intermediate output of the model training process.

Prediction and Ranking Model

  • Objective: realised bookings (conversion along with the realization of bookings)
  • Implemented the following four methods: RNN, GRU, LSTM, and BiLSTM.
  • Training data:
    • 1 million users
    • Sequences of their clicked hotels within a session
  • Pre-processing: padded and limited to 15 hotels.
  • Model objective: the probability of the user for realised booking at high-rated hotels.
  • Proposed architecture (disclaimer: I couldn’t grok it from the paper)
    1. Embedding layer: 100 dim

       torch.nn.Embedding(num_embeddings=all_hotels, embedding_dim=100)
    2. Embedding concat layer (
      • Not sure why they concatenated the embeddings.. The embedding tensor should have been input to the RNN layer. Otherwise, no recurrence will happen.
    3. 2 BiLSTM layers

    4. Flatten layer (torch.nn.Flatten())
    5. 4 ReLU dense layers

       l1 = torch.nn.ReLU(torch.nn.Linear(512*2, 512))
       l2 = torch.nn.ReLU(torch.nn.Linear(512, 256))
       l3 = torch.nn.ReLU(torch.nn.Linear(256, 128))
       l4 = torch.nn.ReLU(torch.nn.Linear(128, 1))
    6. Softmax layer
    7. Output layer

Embedding Evaluation

  • Embedding dimension: 100
  • Get the top 10 similar hotels for all the hotels in the training dataset using cosine similarity.
  • Four accuracy metrics:
    1. Location
    2. Distance
    3. Price
    4. Ratings
  • Metric formulation

    \[\text{Sim Index @ x} = \frac{\sum_{i=1}^{H} \text{sim@x}(\text{top-10 hotels}, i)}{H} \\\]
    • \(x\) can be any of the following: Location, Distance, Price, Ratings
    • \(H\) is a set of all query hotels;
    • \(\text{sim@x}(\text{top-10 hotels}, i)\) is the similarity score for metric \(x\).
    • Ranges between 0 and 10.
  • \(\text{sim@Location}\): fraction of top-10 hotels lying in the same city as the query hotel \(i\).
  • \(\text{sim@Distance}\): fraction of top-10 hotels that are within a 20km radius of the query hotel \(i\).
  • \(\text{sim@Price}\): fraction of top-10 hotels that are within +/-15% of the price of the query hotel \(i\).
  • \(\text{sim@Ratings}\): fraction of top-10 hotels that are +/-1 rating from the query hotel \(i\).
  • Following are the evaluation results with the winner highlighted:
  • Qualitative eval also yielded positive results.

Ranking Model Evaluation

  • Offline evaluation metric: Hit Ratio, MRR
  • \(\text{Precision@k}\) or Hit Ratio: fraction of users for which the booked hotel was among the top-k recommendations.

    \[\text{Precision@k} = \frac{U_{hit}^k}{U_{all}}\]
  • 15 total model variants:
    • 3 variants with basic RNN
    • 4 variants each with LSTM, GRU, and BiLSTM
  • Selected one variant from each model type-based on validation results.
  • Created a dataset aligned with the real-time environment. (Session logs?)
  • Out-of-time validation on this dataset.
  • The BiLSTM variant was the best-performing model.
  • Online evaluation metrics:
    • Realized bookings at high-rated hotels.
    • C*R (multiplication of booking conversion and realization of bookings) at high-rated hotels.
  • Observed lifts of 3% to 6% in realized hotel bookings across different geographies.

Review Conclusion

  • The paper proposed building a DL model with two parts: embedding gen and ranking model.
  • The embeddings are the intermediate output of the ranking model. Not sure why it is called a separate model in the paper.
  • The model is an important part of this paper, yet
    • It does not discuss the training data construction in detail.
    • Few left-out details about the architecture made it difficult to comprehend it.
    • There was no discussion about inferencing and the candidate set of restaurants to rank.
  • The embedding evaluation framework was comprehensive and quantified the effectiveness of the embeddings.
  • Model evaluation methodology followed the standard process of train-time validation and out-of-time validation steps.
  • One thing lacking was comparison with tree-based models like gradient boosted trees which have shown good performance in recommendation tasks in both industry and research.