Rethinking the ratings and reviews

How we enhanced the existing ratings system and made it more reliable for our users.

existing-ratings

Figure1: Left - submitting a review. Middle and right - reading reviews.

Just Eat (FTSE 100)  is a leading global marketplace for online food delivery, serving millions of customers across 13 countries. 

Goal

There are 22.5 million reviews on the website and our goal was to ensure that we provide a reliable and trustworthy ratings system. 

My role

I was the UX designer and researcher for this work. I led the qualitative research studies, ideation and prototyping. My team consisted of a product manager, technical manager and 5 software engineers.

The existing review system

The existing system allows a customer to submit a review for a restaurant after they have placed and received an order from that restaurant. An overall rating score is calculated by the average rating given for the 3 criteria; Food Quality, Delivery Time and Restaurant Service. A star rating scale of 6 stars is currently used.

What the data tell us

  • 4% - 4.5% of customers submit a feedback about their order
  • 50% of those write a review, while the other 50% only rate
  • 10% of users look at the reviews page

We want to encourage more people to write reviews but more importantly we want to steer people into writing richer content to their reviews that can better inform other customers (as well as the Restaurant Partners).

Observing how people use reviews

To understand the ratings and reviews further, I conducted usability testing studies with 5 people that were recruited through a third-party company. The participants were a mix of people (3 female and 2 male), aged between 18-45, who have all used Just Eat. Each participant was given the same activities to do and were told to think out loud, explaining what they are doing and thinking. Participants had to first place an order from any restaurant, that they'd genuinely be interested in (so we could observe how they make decisions). Later on, participants were given activities focused around the ratings and reviews, to understand what they mean to people and how they use them. This study was done in our ux research lab. Each session was recorded so that I could watch back later to analyse the findings.

ux-research-lab1
usertesting3

Findings from the research

Reading reviews

Ratings and reviews were used to help make decisions for restaurants that were unfamiliar to the person. If a person was familiar with a restaurant (e.g. eaten from a restaurant before or the restaurant was a known chain) then they would not feel the need to read the reviews.

Top 3 insights
  • Ratings without a review (the comments) were ignored. People skipped past ratings with no comments and did not take them into account. People wanted words to back up a reviewer’s rating.
  • Comments were scanned to find out what was good and bad.
  • Recent reviews outweigh the older ones. Reviews older than 6 months were less meaningful. Ones older than 12 months were not taken into account. Most people felt that a restaurant can change a lot over time (e.g. new ownership).

Leaving reviews

Memorable experiences have caused people to leaving reviews. People had left reviews when the outcome exceeded their expectation or failed to meet it (e.g. surprisingly good or extremely bad). If the experience met expectations, people felt that there was less need to leave a review.

Top 3 insights
  • Restaurant service is confusing. No one knew what Restaurant Service meant. Everyone felt that it was not applicable when having food delivered. 3/5 people ignored it and didn’t provide a rating for this. One person went down the middle and gave 3 stars, and the other person gave it an average score based on the food and delivery ratings.

  • At first most people didn't notice the ratings were marked out of 6 stars. When a few people realised it was out of 6, they didn't understand why. They showed a preference for a 5 star rating, which was more commonly understood.
  • Delivery time was not understood well in terms of a 6 star rating. People felt that delivery was either early, on time or late.

The problem with the star rating

It's not always clear to the reader what a rating represents. For example, what does 4/6 mean for ordering a takeaway?

It's not clear to the reader how this rating is being generated. To illustrate the problem further, the image below shows how rating inputs can differ widely but the rating outcomes can look the same.

problem-with-star-rating

Figure 2: Three different inputs resulting in similar outputs.

How might we..

The findings from the research studies were framed into how might we (HMW) statements, so that we could think about how we could turn insights into opportunities for improvement.

1. Improve the way we capture ratings?

The image above highlighted the problems of inputting a rating. We also saw from our research that people felt that star ratings were subjective e.g. what 4/6 means to one person can differ to someone else.

I felt that one way to improve this was to add descriptor labels for each star, so that every customer would rate based on the same scale. When analysing the reviews, we were able to highlight the top adjectives used by customers to describe those 1 to 6 star experiences. This led to the following descriptor labels:

  • 1 star = abysmal
  • 2 stars = poor
  • 3 stars = average
  • 4 stars = good
  • 5 stars = great
  • 6 stars = perfect (later changed to 'outstanding' based on user testing)

2. Make ratings more meaningful to readers?

Usability testing showed that people skipped past ratings that didn't have comments. People wanted to words to back up the star rating to understand what was good or bad.

I thought that for ratings (without comments), we could show the star descriptions for each of the criteria (Food quality, Delivery service, Restaurant service) to the reader, to let them know what specifically was good/bad (as oppose to only showing them the combined star rating). In addition to this, ratings/reviews should be displayed from newest first, giving more prominence to ones within the past 6 months.

3. Encourage more people to share their experience?

We've seen that people are typically more motivated to write a review if the experience has exceeded their expectations or failed to meet them. We felt that it was important to encourage more people to leave reviews. One of the ways I thought of tackling this challenge was to automate the start of a review based on the star rating given by the customer.

The question that I wanted to answer was, would a person be more inclined to write a review if we started one for them?

What if we..

There were a few questions raised from the research studies, which needed input from the software engineers to understand the feasibility and impact. These questions were framed as 'what if we' and are discussed below.

1. Went from 6 stars to 5 stars?

People showed a preference for a 5 star rating, which was more commonly understood. This was something that was investigated by the engineers but unfortunately had too many challenges to be able to change easily. It meant changing the current rating scores for thousands of restaurants, which wasn't a light task to do. It posed a lot of difficulties and unknowns to the existing data, so this was shelved for something to look into at a later time and focused on what we could improve on in the near-term.

2. Remove the restaurant service?

No one from the user testing knew what Restaurant Service meant. Everyone felt that it was not applicable when having food delivered. This was analysed by the engineers but removing it would have a huge impact on the existing ratings data. Again, this task was too big to do for our upcoming sprint.

3. Change how people rated the delivery?

People felt that delivery was either early, on time or late. As we couldn't remove the star ratings for this, due to the impact of legacy ratings, we were able to add new functionality to capture if an order was early, on time or late.

Ideating

Taking what was learned from the research and understanding the user needs, I began sketching out ideas for all possible solutions.

sketches

Figure 3: Sketching out all possibilities and critiquing them.

Once I explored a range of ideas, I iterated on the ones that provided the best potential in Sketch.

designing-sketchapp

Figure 4: Designing and iterating in Sketch.

Prototyping design concepts

I created design concepts in sketch and built them into functional prototypes using Framer. I deployed the prototypes on to a device, to test them with people to see what worked best. Results from this fed back into the design process, to iterate towards a better solution. 

framer1

Figure 5: Prototyping in Framer.

Adding descriptor labels to stars

Motivating people to write a review by automating the beginning

Future concept

Outcomes

  • Assigning descriptor labels to stars was a 50/50 experiment on web in Ireland for both new and existing customers.
  • We saw a greater spread of ratings - a key initial step to ensuring that restaurant rating is a reliable differentiator. 
  • On the capture side, in 2016 Just Eat globally generated 7.35 million ratings. With a shift in the experiment of 5 in every 100 ratings at the capture stage, we could deduce that we introduced a better level of clarity for around 368,000 user sessions per year.
  • On the rating usage side, research confirmed that star ratings do factor into user choice.  We removed the impact of 7% of 6-star ratings which were not genuinely 6 stars, allowing them to better identify the genuinely better experiences that make them reorder.
  • Descriptor labels are now a standardised feature across all platforms (iOS, Android, Web - UK and International).
  • User testing for automated reviews showed that people understood what was happening and felt comfortable with what was generated on behalf of them. People felt that it was a fair reflection based on the star rating that they gave - "It's actually better. I think it helps when that comes up." -- Research participant.
  • My code module for creating a star rating (with descriptors) in CoffeeScript can be accessed from GitHub. Note that this is purely for prototyping purposes (e.g. in Framer) and wasn't used on the live site.

Selected Works

Just Eat | Rethinking the rating and reviewsHow we enhanced the existing ratings system and made it more reliable for our customers

Just Eat | Fixing the minimum spendHow we improved the usability of ordering food and increased conversion by 0.25% (270,000 extra orders per year)

Severn Trent | Digital transformationHow we transformed the digital experience, helping customers to self-serve better

ivDripRateiPhone app for managing drug infusions, which has generated more than 25,000 downloads and is used by healthcare practitioners globally

About

Hi, my name is Mark. I am a UX designer and researcher. Read more..

See my résumé

About

Hi, my name is Mark. I am a UX designer and researcher. Read more..

See my résumé

Navigate

Find me elsewhere

© 2019 Mark Davies