Negative reviews sat for 3–5 days. The COO was firefighting on her phone.
With 40+ locations across two countries, the chain was getting ~120 reviews per week on Google plus another ~200 complaints flowing through Wolt and Uber Eats. There was no central response team — each location manager was supposed to handle their own. In practice nobody did, and average response time on negatives was 3–5 days. Some never got answered at all.
The COO was personally replying to the loudest 1-star reviews from her phone in the evenings. Average rating had drifted from 4.5 to 4.2 over 18 months. Top food-quality items were buried under sentiment about poor service recovery.
We classified 600 historical reviews to find the patterns that actually matter.
Across 600 reviews we tagged complaint type (food, service, delivery, billing), severity, location, and whether a recovery offer would be appropriate. The bank of complaint types was small — 11 categories covered 94% of cases. The right reply differed dramatically by category: "cold food" needs an apology + voucher, "slow service" needs an apology + a manager mention, "missing item from delivery" needs an immediate refund, not a voucher.
We also looked at what the COO was actually doing — and her replies were genuinely good. We trained the system on her tone and her recovery decisions.
"I needed to be in a thousand places at once. Now the first reply happens before I even see the review."
Webhooks in, drafted reply out, voucher attached when warranted.
The pipeline is simple. Reviews and delivery complaints arrive via webhooks. A classifier scores severity and category. Three things happen in parallel: a personalized reply gets drafted (in the customer's language, with the COO's voice), a voucher is generated against the chain's POS system if the case warrants it, and a Slack alert goes to the location manager + COO if severity crosses a threshold.
Replies aren't auto-published — the location manager confirms with one tap on mobile. Most do so within minutes; if no one taps within 30 minutes, the system publishes a safer-toned default and flags the location for follow-up.
3 weeks across 12 pilot locations. 4.2 → 4.6 average rating.
We ran the pilot at 12 locations. 90% of reviews got a reply within 1 hour (vs 12% baseline). Severity-flagged cases got a manager involved within 8 minutes on average. The voucher recovery rate — customers who came back and used the voucher within 60 days — was 12%, well above the chain's prior re-engagement campaigns.
Average rating across the 12 pilot locations climbed from 4.2 to 4.6 in 3 weeks. The COO got her evenings back.
Numbers that survived the pilot.
- Average rating (12 pilot sites)4.2 → 4.6+0.4
- Reviews replied in <1h12% → 90%+78pt
- Manager response time on severe—8 min
- Voucher recovery rate—12%
- COO hours / week on reviews~14h → <1h−93%