How to Measure UX Quality in Numbers: SUS, HEART, and PURE Without Boring
Main chat
A chat for vibe coders: news, guides, live cases, marketplace, and finding executors.
“Design is good” is not an argument. “The design is good because SUS = 78, the task completion rate = 87%, and none of the eight users made a critical error.”.
The three frameworks – SUS, HEART and PURE – solve one problem: how to translate UX quality into numbers you can work with. Each of them is suitable for different situations. In this article - how they are arranged and when to use.
Why measure UX at all
There is a school of thought: “Good design is visible without metrics, bad too.” There’s some truth to this – an experienced designer often feels a problem before it’s confirmed by the data.
But without measurement, you can't:
** Prioritize. ** There are three problems. Which one should I fix first? Severity and frequency data help to solve this problem.
Prove progress. “Being better” is not an argument for business. "The SUS grew from 54 to 71 after the redesign" is an argument.
**Compare options. ** Two designs, choose one. Without measurement, it is a subjective choice.
Secret the baseline. Before you improve, you need to know what is improving. It is not clear if there has been an improvement.
SUS (System Usability Scale)
What is it
SUS is a 10-question questionnaire developed by John Brook in 1986. One of the most popular usability measurement tools in the world – and one of the simplest.
Each question is a statement that the user scores on a scale from 1 (“Totally disagree”) to 5 (“Totally agree”).
SUS questions
- I would like to use this system often
- I found the system too complicated
- I thought the system was easy to use
- I think I'll need the help of a technician to use this system
- I found that the various functions in this system are well integrated
- I thought there was too much inconsistency in this system
- I would assume that most people will quickly learn to use this system
- I found the system very cumbersome to use
- I felt very confident using the system
- I had to learn a lot before I could work with this system
How to count
Odd questions (1, 3, 5, 7, 9) are positive: subtract 1 from the user's score. Even questions (2, 4, 6, 8, 10) are negative: subtract the score from 5.
Multiply the sum of all 10 results by 2.5. The result is a number from 0 to 100.
** Example:**
- Question 1 (positive): User put 4 → 4 − 1 = 3
- Question 2 (negative): User put 2 → 5 − 2 = 3
- ...and so for all 10 questions
- Sum = 28, result = 28 × 2.5 = 70
Interpretation
| Балл | Оценка | Percentile |
|---|---|---|
| 85–100 | Отлично (A) | Топ 10% |
| 72–84 | Хорошо (B) | Топ 25% |
| 52–71 | Удовлетворительно (C/D) | Средний уровень |
| < 52 | Плохо (F) | Ниже среднего |
The average SUS in the industry is about 68. If lower, usability is worse than the market average. If you’re above 80, that’s a great result.
When to use
- Usability test with 5-8 users
- Comparison of two product versions
- Benchmark before redesign
- Rapid evaluation of the prototype
** Minimum sample: * 5-8 users give stable results. Up to 5, the data is too unstable.
SUS restrictions
- He doesn’t say what’s wrong, just that “something is wrong.”
- Subjective: Different users interpret “complex” differently
- Does not take into account the context of use - SUS for the banking application and for the game are incomparable directly
- The same SUS can mean different problems
HEART (Google)
What is it
HEART is a framework developed at Google to measure UX at scale. Transcript: Happiness, Engagement, Adoption, Retention, Task Success.
Unlike SUS, HEART is not a single questionnaire – it is a system for choosing the right metrics for a specific product. It is used in conjunction with the "Goals - Signals - Metrics" matrix.
Five dimensions of HEART
Happiness (Happiness) The subjective attitude of the user towards the product. How happy, how trusted, how ready to recommend.
Metrics: NPS, CSAT, App Store rating, satisfaction survey results.
When it matters: when contentment is directly related to retention (B2C) or when the product is operating in a competitive environment and the difference is in the “feeling.”.
**Engagement (Engagement) ** How deeply and regularly users use the product.
Metrics: DAU/MAU ratio, number of sessions per week, time per application, number of key actions per period.
When it matters: social products, media platforms, productivity tools. If engagement falls, the product loses its place in the user’s life.
Adoption (Adoption) How quickly new users start using a product or feature.
Metrics: number of new users, adoption rate of new feature, time to first key action, onboarding completion.
When it matters: when launching a new product or a new significant feature. If adoption is low, the problem is discovering, onboarding, or understanding value.
Retention (Containment) Are users coming back.
Metrics: D1/D7/D30 retention, churn rate, renewal rate for subscriptions.
When it matters: always, but especially for products with a model on reuse. Retention is the basis of everything.
**Task Success (Task Success) ** How successfully do users perform specific tasks.
Metrics: task completion rate, time on task, error rate, success rate without help.
When it matters: for utility products, tools with specific tasks. If the user can’t do what they came for, there’s no point in talking about the other dimensions.
Goals Matrix - Signals - Metrics
HEART works in conjunction with this matrix. For each of the five dimensions, three questions must be answered:
**Goals: What do we want to achieve for the user in this dimension?
Signals: How will user behavior change if we achieve our goal?
Metrics: How can we measure this signal?
Example for "Search" feature in the documentary service:
| Измерение | Goal | Signal | Metric |
|---|---|---|---|
| Happiness | Пользователи довольны результатами | Не обращаются в поддержку по поиску | Процент сессий без обращения в поддержку |
| Engagement | Используют поиск регулярно | Открывают поиск несколько раз в сессию | Среднее количество поисков за сессию |
| Adoption | Новые пользователи сразу используют поиск | Открывают поиск в первую неделю | % новых пользователей, использовавших поиск в D7 |
| Retention | Пользователи возвращаются ради поиска | Сессии с использованием поиска имеют лучший retention | Retention пользователей, использующих поиск vs. не использующих |
| Task Success | Находят нужное с первой попытки | Не уточняют запрос более двух раз | % запросов с ≤ 2 итерациями |
When to use HEART
- When launching a new feature or product – to determine success metrics
- When reviewing what to measure at all
- To align with the team: different roles often look only at their own dimension
- To evaluate a large product with multiple flow
*Heart is not a substitute for SUS or PURE; these are different tools for different tasks. SUS and PURE measure what is already there. HEART helps you choose what to measure and why.
PURE (Pragmatic Usability Rating by Experts)
What is it
PURE is an expert usability assessment developed by Petter Ivarsson. It allows you to quickly evaluate the interface without user tests.
PURE evaluates tasks according to three criteria:
- Ease - How easy is it to complete the task
- Navigation - how easy it is to find the right features
- Wording - how clear are the texts, labels, instructions
Each criterion is evaluated on a 3-point scale: 1 (good), 2 (difficulty), 3 (very difficult/impossible).
How to perform PURE assessment
Step 1. Make a list of tasks to evaluate. These should be specific tasks of the real user (“Create a new project”, “Change password”, “Find transaction history”).
Step 2. One expert (or several independently) passes each task and evaluates it against three criteria.
Step 3. If there are several experts, average the results.
Step 4. Tasks with a score of 3 on any criterion are critical problems. Grade 2 is average. 1 is fine.
Example of PURE assessment
| Задача | Ease | Navigation | Wording | Итог |
|---|---|---|---|---|
| Создать аккаунт | 1 | 1 | 2 | Средне |
| Изменить email | 2 | 3 | 2 | Критично |
| Экспортировать данные | 3 | 3 | 3 | Критично |
| Посмотреть историю | 1 | 2 | 1 | Средне |
From this table you can immediately see: “Change email” and “Export data” require immediate attention.
When to use PURE
- Quick product audit with no budget for user tests
- Prioritizing problems in the existing interface
- Evaluation of the prototype before the test with users
- Comparison of two versions by experts
PURE is faster than SUS - you can evaluate the entire product in 2-4 hours without recruiting users. But PURE is more subjective: the expert is not the user, and may not notice what the real person notices.
PURE restrictions
- Expert assessment of user behavior. An expert may find the task simple, but users may not.
- PURE does not provide quantitative data on user behavior
- Depends on the expertise of the expert
How to choose: SUS, HEART or PURE?
| Ситуация | Инструмент |
|---|---|
| Нужна быстрая оценка юзабилити прототипа | SUS после теста с 5–8 пользователями |
| Нужно приоритизировать проблемы в существующем продукте | PURE (быстрый экспертный аудит) |
| Нужно определить метрики для нового продукта или фичи | HEART (Goals–Signals–Metrics матрица) |
| Нужно сравнить два варианта дизайна | SUS для каждого варианта |
| Нужно понять общее «здоровье» UX на уровне продукта | HEART + регулярные SUS-опросы |
| Нет бюджета на пользовательские тесты | PURE |
You don’t have to choose one tool forever. They complement each other:
- PURE to find what's broken
- SUS → confirm with users
- HEART: How to Measure Progress
Practice: How to Incorporate UX Measurement into Team Work
Start with the basline
Before you improve, measure your current state. Conducting a SUS survey on five users and getting a basline is 2-3 hours of work. Each change can be evaluated objectively.
Include metrics in the definition of done
The problem is “ready” not when the design is approved, but when the measurement shows that it has been solved. This raises the bar and makes progress visible.
Make measurements regular
A one-time SUS doesn't work. Quarterly SUS is a trend. The trend shows whether you are moving in the right direction or not.
Tell the team
UX data should be visible to the entire team, not in the designer’s folder. A regular “UX health check” – 5 minutes on a schedule with key metrics – creates a big picture and involves the team.
Checklist: what instrument and when
- Start a new project to determine success metrics
- Do a usability test → add a SUS questionnaire at the end
- No time/budget for the test, you need a quick assessment
- Completed redesign → compare SUS before and after
- Use HEART Adoption and Task Success
- You need to justify the budget on UX before management → show SUS in comparison with the industry benchmark
How to perform the SUS test correctly: details that are important
The SUS theory is clear, but there are nuances in the test that affect the quality of the result.
Recruiting participants
It is important for SUS to test with actual or potential users of the product. Designers in the neighborhood are not a valid sample. Their professional deformity causes them to notice some problems and ignore others.
At least 5 participants for a stable outcome. Optimum 8-12. More than 12 is diminishing returns.
Procedures for implementation
- Give the user tasks (not instructions for use - tasks)
- Observe the execution without prompting
- After completing the tasks - present the SUS questionnaire
- Do not explain how to use the product correctly before completing the questionnaire
Error: Give a SUS questionnaire after explaining to the user how the interface works. The score will be overestimated - you have removed all the problems with an explanation.
Neutrality of wording
Questions in the SUS are formulated deliberately. Do not paraphrase them “more clearly” – changing the wording changes the psychometrics of the questionnaire.
If you translate into Russian, use official or verified translations, not your own.
How to Compare SUS Results
SUS is a reliable comparison tool:
- Before and after changes
- Version A vs. Version B
- Your product vs. competitor (if you test with the same participants)
How the absolute value of SUS is less reliable depends on the context (complexity of tasks, audience, device).
HEART in practice: how to hold a session in 2 hours
HEART is a conceptual framework. But it can be turned into a working tool in one working meeting.
Workshop format
**Applicants: Designer, Product, Analyst. Optionally, a developer.
** Duration:** 2 hours.
**What you need: * Large board (physical or in Miro/FigJam), stickers.
** Structure:
Step 1 (20 min) - Definition of scope. What exactly are we evaluating? The whole product? Specific feature? Specific flow? It's important to fix this at the beginning - otherwise the team will be talking about different things.
Step 2 (30 min) - Goals. For each of the 5 dimensions of HEART: what is our goal for the user? Sticker for each dimension, one sentence. Not KPI, but what we want the user to feel/do.
Step 3 (30 min) - Signals. How will user behavior change if we achieve our goal? For each target - 1-2 signals. Specific observed actions.
Step 4 (30 min) - Metrics. How can we measure every signal? What's in the analytics? What should I add?
Step 5 (10 min) - Prioritization. Which measurements are most critical right now? What will be the focus of the next quarter?
Bottom line: completed Goals-Signals-Metrics matrix agreed by the team.
How to choose a measurement tool for the task: detailed criteria
A simple table “when to use” does not convey all the nuances. Here's a more detailed guide.
SUS choose when
- You need a *quantitative result, not a list of problems
- It is possible to conduct a usability test (live or remote moderation)
- **Comparison: before/after version A vs. B
- Audience **heterogeneous ** - SUS works for a wide range of users
PURE choose when
- No time or budget to test with users (you need a quick result today or tomorrow)
- The goal is to prioritize problems, not measure the overall level of usability
- Need **expert ** evaluation of multiple products or screens quickly
- The team wants to develop a common understanding of the problems (two or three do PURE independently, then compare)
HEART choose when
- **You start a new product or a significant feature and you need to define success metrics
- You need to coordinate the command on what and how to measure
- Current metrics are not related to user goals
- You need to look at the whole product, not at a specific flow
Combinations
- Run feature: HEART → define metrics → SUS after launch → evaluate the result
- Product audit: PURE → find problems → SUS → measure with users → HEART → define metrics for monitoring
- A/B test: SUS for each variant → objective comparison
What to Do: From Data to Action n
Tools give data. But data alone doesn't change anything. The process of converting data into action is important.
After SUS
SUS scored 58 (satisfactory, below average). What's next?
Don't stop at the score. SUS doesn't say what's wrong. Take a look at some of the questions – which ones got low marks? Question 2 ("System is too complex") High grade → complexity. Question 6 ("Many inconsistencies"): High appreciation → inconsistency.
Combined with observations. What did users do during the test? Where were you stuck, where were you wrong? SUS score plus qualitative observations = full picture.
**Prepare hypotheses for each problem identified.
** Prioritize by frequency and severity.
**Document and schedule a review in 2-3 months.
After PURE
PURE gave a list of tasks with a score of 3 (critical) and 2 (medium). What's next?
Critical Tasks (Rank 3) → Immediately in High Priority Backlog
For each critical problem – analysis: in what criteria (Ease, Navigation, Wording) the problem? This determines the type of solution.
Average (grade 2) → next sprint or next quarter
PURE results are better confirmed with 2-3 real users – peer review is no substitute for user testing
After HEART
The Goals-Signals-Metrics matrix is ready. What's next?
Make sure all the metrics are actually collected. If not, ask the analyst to add tracking.
Set the *baseline for each metric.
Add key metrics to the dashboard.
Set *goals per quarter for each dimension.
Next Next post: Are the metrics moving in the right direction?
AI and UX Metrics: How to Use Claude to Measure Interface Quality
AI doesn’t replace users in tests — but it helps prepare for them, process results quickly, and formulate conclusions.
Prompt: Prepare a SUS test
Before the test, you need to choose tasks. AI helps to formulate them correctly:
I plan a usability test with a SUS questionnaire for [product description].
Target audience: [who are the users]
Key Flows I Want to Check Out: [List]
Create 5 tasks for the test in the format of “user script” (not instructions, but situations).
Task requirements:
The task should not tell you how to perform it.
- Describe the real situation, not the product function
- Must have a clear ending (the user knows when it is finished)
Prompt: interpreting the results of SUS
Here are the results of the SUS test for [product]:
User 1: [10 evaluations]
User 2: [10 evaluations]
...
Calculate the final SUS score for each user and the average for the group.
Interpret the result: what this score means, how it relates to industrial benchmarks.
Also consider some of the questions: what are the worst grades? What does that say in terms of UX problems?
Claude will calculate the SUS points according to the formula and interprets - you do not need to keep the table in Excel.
Prompt: conduct PURE-assessment using AI
This is not an obvious but powerful scenario. Claude can act as a “second expert” in the PURE assessment:
I do a PURE evaluation of the following flow:
Task: [Description of user task]
Current interface: [description of screens, steps, elements.] Or upload screenshots.
Rate the problem by three PURE criteria (1 = good, 2 = difficult, 3 = very difficult):
Ease: How easy is it to complete the task?
Navigation: How easy is it to find the right features?
Wording: How clear are the texts and captions?
For each assessment, not 1 – explain exactly what is causing the problem.
It doesn’t replace a score with real users, but gives a quick second look and helps calibrate your own score.
Prompt: Build a HEART matrix
Product: [Description]
Specific context: [feature/floo/whole product]
Help fill in the Goals–Signals–Metrics matrix of the HEART framework.
For each of the 5 dimensions (Happiness, Engagement, Adoption, Retention, Task Success):
- Offer a Goal – What we want for the user
- Offer a Signal – How behavior will change when you reach your goal
Metric: How to Measure a Signal
In the end, what are the 2-3 dimensions most critical for this product right now and why.
Prompt: process qualitative data from the test
After the test, there are often a bunch of notes – quotes, observations, “the user said that...” AI helps systematize:
Here are my notes from the usability test (5 users, test [product name]):
[Put notes in any format, even chaotic]
Systematize the data:
1. Group problems by frequency (how many users out of 5 encountered)
2. Classify by severity: critical (blocks task execution) / major (slows down) / minor (irritates but does not block)
3. Offer a prioritized list of problems to correct
4. Separately highlight patterns in what users like