How we enhanced the existing ratings system and made it more reliable for our users.
Figure1: Left - submitting a review. Middle and right - reading reviews.
Just Eat (FTSE 100) is a leading global marketplace for online food delivery, serving millions of customers across 13 countries.
There are 22.5 million reviews on the website and our goal was to ensure that we provide a reliable and trustworthy ratings system.
I was the UX designer and researcher for this work. I led the qualitative research studies, ideation and prototyping. My team consisted of a product manager, technical manager and 5 software engineers.
The existing system allows a customer to submit a review for a restaurant after they have placed and received an order from that restaurant. An overall rating score is calculated by the average rating given for the 3 criteria; Food Quality, Delivery Time and Restaurant Service. A star rating scale of 6 stars is currently used.
We want to encourage more people to write reviews but more importantly we want to steer people into writing richer content to their reviews that can better inform other customers (as well as the Restaurant Partners).
To understand the ratings and reviews further, I conducted usability testing studies with 5 people that were recruited through a third-party company. The participants were a mix of people (3 female and 2 male), aged between 18-45, who have all used Just Eat. Each participant was given the same activities to do and were told to think out loud, explaining what they are doing and thinking. Participants had to first place an order from any restaurant, that they'd genuinely be interested in (so we could observe how they make decisions). Later on, participants were given activities focused around the ratings and reviews, to understand what they mean to people and how they use them. This study was done in our ux research lab. Each session was recorded so that I could watch back later to analyse the findings.
Ratings and reviews were used to help make decisions for restaurants that were unfamiliar to the person. If a person was familiar with a restaurant (e.g. eaten from a restaurant before or the restaurant was a known chain) then they would not feel the need to read the reviews.
Memorable experiences have caused people to leaving reviews. People had left reviews when the outcome exceeded their expectation or failed to meet it (e.g. surprisingly good or extremely bad). If the experience met expectations, people felt that there was less need to leave a review.
Restaurant service is confusing. No one knew what Restaurant Service meant. Everyone felt that it was not applicable when having food delivered. 3/5 people ignored it and didn’t provide a rating for this. One person went down the middle and gave 3 stars, and the other person gave it an average score based on the food and delivery ratings.
It's not always clear to the reader what a rating represents. For example, what does 4/6 mean for ordering a takeaway?
It's not clear to the reader how this rating is being generated. To illustrate the problem further, the image below shows how rating inputs can differ widely but the rating outcomes can look the same.
Figure 2: Three different inputs resulting in similar outputs.
The findings from the research studies were framed into how might we (HMW) statements, so that we could think about how we could turn insights into opportunities for improvement.
The image above highlighted the problems of inputting a rating. We also saw from our research that people felt that star ratings were subjective e.g. what 4/6 means to one person can differ to someone else.
I felt that one way to improve this was to add descriptor labels for each star, so that every customer would rate based on the same scale. When analysing the reviews, we were able to highlight the top adjectives used by customers to describe those 1 to 6 star experiences. This led to the following descriptor labels:
Usability testing showed that people skipped past ratings that didn't have comments. People wanted to words to back up the star rating to understand what was good or bad.
I thought that for ratings (without comments), we could show the star descriptions for each of the criteria (Food quality, Delivery service, Restaurant service) to the reader, to let them know what specifically was good/bad (as oppose to only showing them the combined star rating). In addition to this, ratings/reviews should be displayed from newest first, giving more prominence to ones within the past 6 months.
We've seen that people are typically more motivated to write a review if the experience has exceeded their expectations or failed to meet them. We felt that it was important to encourage more people to leave reviews. One of the ways I thought of tackling this challenge was to automate the start of a review based on the star rating given by the customer.
The question that I wanted to answer was, would a person be more inclined to write a review if we started one for them?
There were a few questions raised from the research studies, which needed input from the software engineers to understand the feasibility and impact. These questions were framed as 'what if we' and are discussed below.
People showed a preference for a 5 star rating, which was more commonly understood. This was something that was investigated by the engineers but unfortunately had too many challenges to be able to change easily. It meant changing the current rating scores for thousands of restaurants, which wasn't a light task to do. It posed a lot of difficulties and unknowns to the existing data, so this was shelved for something to look into at a later time and focused on what we could improve on in the near-term.
No one from the user testing knew what Restaurant Service meant. Everyone felt that it was not applicable when having food delivered. This was analysed by the engineers but removing it would have a huge impact on the existing ratings data. Again, this task was too big to do for our upcoming sprint.
People felt that delivery was either early, on time or late. As we couldn't remove the star ratings for this, due to the impact of legacy ratings, we were able to add new functionality to capture if an order was early, on time or late.
Taking what was learned from the research and understanding the user needs, I began sketching out ideas for all possible solutions.
Figure 3: Sketching out all possibilities and critiquing them.
Once I explored a range of ideas, I iterated on the ones that provided the best potential in Sketch.
Figure 4: Designing and iterating in Sketch.
I created design concepts in sketch and built them into functional prototypes using Framer. I deployed the prototypes on to a device, to test them with people to see what worked best. Results from this fed back into the design process, to iterate towards a better solution.
Figure 5: Prototyping in Framer.
Just Eat | Rethinking the rating and reviewsHow we enhanced the existing ratings system and made it more reliable for our customers
Just Eat | Fixing the minimum spendHow we improved the usability of ordering food and increased conversion by 0.25% (270,000 extra orders per year)
Severn Trent | Digital transformationHow we transformed the digital experience, helping customers to self-serve better
ivDripRateiPhone app for managing drug infusions, which has generated more than 25,000 downloads and is used by healthcare practitioners globally
© 2019 Mark Davies