How to Measure UX Quality in Numbers: SUS, HEART, and PURE Without Boring

◷ 18 min read 6/14/2026

Main chat

A chat for vibe coders: news, guides, live cases, marketplace, and finding executors.

How to Measure UX Quality in Numbers: SUS, HEART, and PURE Without Boring - обложка

“Design is good” is not an argument. “The design is good because SUS = 78, the task completion rate = 87%, and none of the eight users made a critical error.”.

The three frameworks – SUS, HEART and PURE – solve one problem: how to translate UX quality into numbers you can work with. Each of them is suitable for different situations. In this article - how they are arranged and when to use.

Why measure UX at all

There is a school of thought: “Good design is visible without metrics, bad too.” There’s some truth to this – an experienced designer often feels a problem before it’s confirmed by the data.

But without measurement, you can't:

** Prioritize. ** There are three problems. Which one should I fix first? Severity and frequency data help to solve this problem.

Prove progress. “Being better” is not an argument for business. "The SUS grew from 54 to 71 after the redesign" is an argument.

**Compare options. ** Two designs, choose one. Without measurement, it is a subjective choice.

Secret the baseline. Before you improve, you need to know what is improving. It is not clear if there has been an improvement.

SUS (System Usability Scale)

What is it

SUS is a 10-question questionnaire developed by John Brook in 1986. One of the most popular usability measurement tools in the world – and one of the simplest.

Each question is a statement that the user scores on a scale from 1 (“Totally disagree”) to 5 (“Totally agree”).

SUS questions

I would like to use this system often
I found the system too complicated
I thought the system was easy to use
I think I'll need the help of a technician to use this system
I found that the various functions in this system are well integrated
I thought there was too much inconsistency in this system
I would assume that most people will quickly learn to use this system
I found the system very cumbersome to use
I felt very confident using the system
I had to learn a lot before I could work with this system

How to count

Odd questions (1, 3, 5, 7, 9) are positive: subtract 1 from the user's score. Even questions (2, 4, 6, 8, 10) are negative: subtract the score from 5.

Multiply the sum of all 10 results by 2.5. The result is a number from 0 to 100.

** Example:**

Question 1 (positive): User put 4 → 4 − 1 = 3
Question 2 (negative): User put 2 → 5 − 2 = 3
...and so for all 10 questions
Sum = 28, result = 28 × 2.5 = 70

Interpretation

Балл	Оценка	Percentile
85–100	Отлично (A)	Топ 10%
72–84	Хорошо (B)	Топ 25%
52–71	Удовлетворительно (C/D)	Средний уровень
< 52	Плохо (F)	Ниже среднего

The average SUS in the industry is about 68. If lower, usability is worse than the market average. If you’re above 80, that’s a great result.

When to use

Usability test with 5-8 users
Comparison of two product versions
Benchmark before redesign
Rapid evaluation of the prototype

** Minimum sample: * 5-8 users give stable results. Up to 5, the data is too unstable.

SUS restrictions

He doesn’t say what’s wrong, just that “something is wrong.”
Subjective: Different users interpret “complex” differently
Does not take into account the context of use - SUS for the banking application and for the game are incomparable directly
The same SUS can mean different problems

HEART (Google)

What is it

HEART is a framework developed at Google to measure UX at scale. Transcript: Happiness, Engagement, Adoption, Retention, Task Success.

Unlike SUS, HEART is not a single questionnaire – it is a system for choosing the right metrics for a specific product. It is used in conjunction with the "Goals - Signals - Metrics" matrix.

Five dimensions of HEART

Happiness (Happiness) The subjective attitude of the user towards the product. How happy, how trusted, how ready to recommend.

Metrics: NPS, CSAT, App Store rating, satisfaction survey results.

When it matters: when contentment is directly related to retention (B2C) or when the product is operating in a competitive environment and the difference is in the “feeling.”.

**Engagement (Engagement) ** How deeply and regularly users use the product.

Metrics: DAU/MAU ratio, number of sessions per week, time per application, number of key actions per period.

When it matters: social products, media platforms, productivity tools. If engagement falls, the product loses its place in the user’s life.

Adoption (Adoption) How quickly new users start using a product or feature.

Metrics: number of new users, adoption rate of new feature, time to first key action, onboarding completion.

When it matters: when launching a new product or a new significant feature. If adoption is low, the problem is discovering, onboarding, or understanding value.

Retention (Containment) Are users coming back.

Metrics: D1/D7/D30 retention, churn rate, renewal rate for subscriptions.

When it matters: always, but especially for products with a model on reuse. Retention is the basis of everything.

**Task Success (Task Success) ** How successfully do users perform specific tasks.

Metrics: task completion rate, time on task, error rate, success rate without help.

When it matters: for utility products, tools with specific tasks. If the user can’t do what they came for, there’s no point in talking about the other dimensions.

Goals Matrix - Signals - Metrics

HEART works in conjunction with this matrix. For each of the five dimensions, three questions must be answered:

**Goals: What do we want to achieve for the user in this dimension?

Signals: How will user behavior change if we achieve our goal?

Metrics: How can we measure this signal?

Example for "Search" feature in the documentary service:

Измерение	Goal	Signal	Metric
Happiness	Пользователи довольны результатами	Не обращаются в поддержку по поиску	Процент сессий без обращения в поддержку
Engagement	Используют поиск регулярно	Открывают поиск несколько раз в сессию	Среднее количество поисков за сессию
Adoption	Новые пользователи сразу используют поиск	Открывают поиск в первую неделю	% новых пользователей, использовавших поиск в D7
Retention	Пользователи возвращаются ради поиска	Сессии с использованием поиска имеют лучший retention	Retention пользователей, использующих поиск vs. не использующих
Task Success	Находят нужное с первой попытки	Не уточняют запрос более двух раз	% запросов с ≤ 2 итерациями

When to use HEART

When launching a new feature or product – to determine success metrics
When reviewing what to measure at all
To align with the team: different roles often look only at their own dimension
To evaluate a large product with multiple flow

*Heart is not a substitute for SUS or PURE; these are different tools for different tasks. SUS and PURE measure what is already there. HEART helps you choose what to measure and why.

PURE (Pragmatic Usability Rating by Experts)

What is it

PURE is an expert usability assessment developed by Petter Ivarsson. It allows you to quickly evaluate the interface without user tests.

PURE evaluates tasks according to three criteria:

Ease - How easy is it to complete the task
Navigation - how easy it is to find the right features
Wording - how clear are the texts, labels, instructions

Each criterion is evaluated on a 3-point scale: 1 (good), 2 (difficulty), 3 (very difficult/impossible).

How to perform PURE assessment

Step 1. Make a list of tasks to evaluate. These should be specific tasks of the real user (“Create a new project”, “Change password”, “Find transaction history”).

Step 2. One expert (or several independently) passes each task and evaluates it against three criteria.

Step 3. If there are several experts, average the results.

Step 4. Tasks with a score of 3 on any criterion are critical problems. Grade 2 is average. 1 is fine.

Example of PURE assessment

Задача	Ease	Navigation	Wording	Итог
Создать аккаунт	1	1	2	Средне
Изменить email	2	3	2	Критично
Экспортировать данные	3	3	3	Критично
Посмотреть историю	1	2	1	Средне

From this table you can immediately see: “Change email” and “Export data” require immediate attention.

When to use PURE

Quick product audit with no budget for user tests
Prioritizing problems in the existing interface
Evaluation of the prototype before the test with users
Comparison of two versions by experts

PURE is faster than SUS - you can evaluate the entire product in 2-4 hours without recruiting users. But PURE is more subjective: the expert is not the user, and may not notice what the real person notices.

PURE restrictions

Expert assessment of user behavior. An expert may find the task simple, but users may not.
PURE does not provide quantitative data on user behavior
Depends on the expertise of the expert

How to choose: SUS, HEART or PURE?

Ситуация	Инструмент
Нужна быстрая оценка юзабилити прототипа	SUS после теста с 5–8 пользователями
Нужно приоритизировать проблемы в существующем продукте	PURE (быстрый экспертный аудит)
Нужно определить метрики для нового продукта или фичи	HEART (Goals–Signals–Metrics матрица)
Нужно сравнить два варианта дизайна	SUS для каждого варианта
Нужно понять общее «здоровье» UX на уровне продукта	HEART + регулярные SUS-опросы
Нет бюджета на пользовательские тесты	PURE

You don’t have to choose one tool forever. They complement each other:

PURE to find what's broken
SUS → confirm with users
HEART: How to Measure Progress

Practice: How to Incorporate UX Measurement into Team Work

Start with the basline

Before you improve, measure your current state. Conducting a SUS survey on five users and getting a basline is 2-3 hours of work. Each change can be evaluated objectively.

Include metrics in the definition of done

The problem is “ready” not when the design is approved, but when the measurement shows that it has been solved. This raises the bar and makes progress visible.

Make measurements regular

A one-time SUS doesn't work. Quarterly SUS is a trend. The trend shows whether you are moving in the right direction or not.

Tell the team

UX data should be visible to the entire team, not in the designer’s folder. A regular “UX health check” – 5 minutes on a schedule with key metrics – creates a big picture and involves the team.

Checklist: what instrument and when

Start a new project to determine success metrics
Do a usability test → add a SUS questionnaire at the end
No time/budget for the test, you need a quick assessment
Completed redesign → compare SUS before and after
Use HEART Adoption and Task Success
You need to justify the budget on UX before management → show SUS in comparison with the industry benchmark

How to perform the SUS test correctly: details that are important

The SUS theory is clear, but there are nuances in the test that affect the quality of the result.

Recruiting participants

It is important for SUS to test with actual or potential users of the product. Designers in the neighborhood are not a valid sample. Their professional deformity causes them to notice some problems and ignore others.

At least 5 participants for a stable outcome. Optimum 8-12. More than 12 is diminishing returns.

Procedures for implementation

Give the user tasks (not instructions for use - tasks)
Observe the execution without prompting
After completing the tasks - present the SUS questionnaire
Do not explain how to use the product correctly before completing the questionnaire

Error: Give a SUS questionnaire after explaining to the user how the interface works. The score will be overestimated - you have removed all the problems with an explanation.

Neutrality of wording

Questions in the SUS are formulated deliberately. Do not paraphrase them “more clearly” – changing the wording changes the psychometrics of the questionnaire.

If you translate into Russian, use official or verified translations, not your own.

How to Compare SUS Results

SUS is a reliable comparison tool:

Before and after changes
Version A vs. Version B
Your product vs. competitor (if you test with the same participants)

How the absolute value of SUS is less reliable depends on the context (complexity of tasks, audience, device).

HEART in practice: how to hold a session in 2 hours

HEART is a conceptual framework. But it can be turned into a working tool in one working meeting.

Workshop format

**Applicants: Designer, Product, Analyst. Optionally, a developer.

** Duration:** 2 hours.

**What you need: * Large board (physical or in Miro/FigJam), stickers.

** Structure:

Step 1 (20 min) - Definition of scope. What exactly are we evaluating? The whole product? Specific feature? Specific flow? It's important to fix this at the beginning - otherwise the team will be talking about different things.

Step 2 (30 min) - Goals. For each of the 5 dimensions of HEART: what is our goal for the user? Sticker for each dimension, one sentence. Not KPI, but what we want the user to feel/do.

Step 3 (30 min) - Signals. How will user behavior change if we achieve our goal? For each target - 1-2 signals. Specific observed actions.

Step 4 (30 min) - Metrics. How can we measure every signal? What's in the analytics? What should I add?

Step 5 (10 min) - Prioritization. Which measurements are most critical right now? What will be the focus of the next quarter?

Bottom line: completed Goals-Signals-Metrics matrix agreed by the team.

How to choose a measurement tool for the task: detailed criteria

A simple table “when to use” does not convey all the nuances. Here's a more detailed guide.

SUS choose when

You need a *quantitative result, not a list of problems
It is possible to conduct a usability test (live or remote moderation)
**Comparison: before/after version A vs. B
Audience **heterogeneous ** - SUS works for a wide range of users

PURE choose when

No time or budget to test with users (you need a quick result today or tomorrow)
The goal is to prioritize problems, not measure the overall level of usability
Need **expert ** evaluation of multiple products or screens quickly
The team wants to develop a common understanding of the problems (two or three do PURE independently, then compare)

HEART choose when

**You start a new product or a significant feature and you need to define success metrics
You need to coordinate the command on what and how to measure
Current metrics are not related to user goals
You need to look at the whole product, not at a specific flow

Combinations

Run feature: HEART → define metrics → SUS after launch → evaluate the result
Product audit: PURE → find problems → SUS → measure with users → HEART → define metrics for monitoring
A/B test: SUS for each variant → objective comparison

What to Do: From Data to Action n

Tools give data. But data alone doesn't change anything. The process of converting data into action is important.

After SUS

SUS scored 58 (satisfactory, below average). What's next?

Don't stop at the score. SUS doesn't say what's wrong. Take a look at some of the questions – which ones got low marks? Question 2 ("System is too complex") High grade → complexity. Question 6 ("Many inconsistencies"): High appreciation → inconsistency.
Combined with observations. What did users do during the test? Where were you stuck, where were you wrong? SUS score plus qualitative observations = full picture.
**Prepare hypotheses for each problem identified.
** Prioritize by frequency and severity.
**Document and schedule a review in 2-3 months.

After PURE

PURE gave a list of tasks with a score of 3 (critical) and 2 (medium). What's next?

Critical Tasks (Rank 3) → Immediately in High Priority Backlog
For each critical problem – analysis: in what criteria (Ease, Navigation, Wording) the problem? This determines the type of solution.
Average (grade 2) → next sprint or next quarter
PURE results are better confirmed with 2-3 real users – peer review is no substitute for user testing

After HEART

The Goals-Signals-Metrics matrix is ready. What's next?

Make sure all the metrics are actually collected. If not, ask the analyst to add tracking.
Set the *baseline for each metric.
Add key metrics to the dashboard.
Set *goals per quarter for each dimension.
Next Next post: Are the metrics moving in the right direction?

AI and UX Metrics: How to Use Claude to Measure Interface Quality

AI doesn’t replace users in tests — but it helps prepare for them, process results quickly, and formulate conclusions.

Prompt: Prepare a SUS test

Before the test, you need to choose tasks. AI helps to formulate them correctly:

plaintext

I plan a usability test with a SUS questionnaire for [product description].

Target audience: [who are the users]
Key Flows I Want to Check Out: [List]

Create 5 tasks for the test in the format of “user script” (not instructions, but situations).

Task requirements:
The task should not tell you how to perform it.
- Describe the real situation, not the product function
- Must have a clear ending (the user knows when it is finished)

Prompt: interpreting the results of SUS

plaintext

Here are the results of the SUS test for [product]:

User 1: [10 evaluations]
User 2: [10 evaluations]
...

Calculate the final SUS score for each user and the average for the group.
Interpret the result: what this score means, how it relates to industrial benchmarks.

Also consider some of the questions: what are the worst grades? What does that say in terms of UX problems?

Claude will calculate the SUS points according to the formula and interprets - you do not need to keep the table in Excel.

Prompt: conduct PURE-assessment using AI

This is not an obvious but powerful scenario. Claude can act as a “second expert” in the PURE assessment:

plaintext

I do a PURE evaluation of the following flow:

Task: [Description of user task]
Current interface: [description of screens, steps, elements.] Or upload screenshots.

Rate the problem by three PURE criteria (1 = good, 2 = difficult, 3 = very difficult):
Ease: How easy is it to complete the task?
Navigation: How easy is it to find the right features?
Wording: How clear are the texts and captions?

For each assessment, not 1 – explain exactly what is causing the problem.

It doesn’t replace a score with real users, but gives a quick second look and helps calibrate your own score.

Prompt: Build a HEART matrix

plaintext

Product: [Description]
Specific context: [feature/floo/whole product]

Help fill in the Goals–Signals–Metrics matrix of the HEART framework.

For each of the 5 dimensions (Happiness, Engagement, Adoption, Retention, Task Success):
- Offer a Goal – What we want for the user
- Offer a Signal – How behavior will change when you reach your goal
Metric: How to Measure a Signal

In the end, what are the 2-3 dimensions most critical for this product right now and why.

Prompt: process qualitative data from the test

After the test, there are often a bunch of notes – quotes, observations, “the user said that...” AI helps systematize:

plaintext

Here are my notes from the usability test (5 users, test [product name]):

[Put notes in any format, even chaotic]

Systematize the data:
1. Group problems by frequency (how many users out of 5 encountered)
2. Classify by severity: critical (blocks task execution) / major (slows down) / minor (irritates but does not block)
3. Offer a prioritized list of problems to correct
4. Separately highlight patterns in what users like

How to Measure UX Quality in Numbers: SUS, HEART, and PURE Without Boring

## Why measure UX at all

## SUS (System Usability Scale)

### What is it

### SUS questions

### How to count

### Interpretation

### When to use

### SUS restrictions

## HEART (Google)

### What is it

### Five dimensions of HEART

### Goals Matrix - Signals - Metrics

### When to use HEART

## PURE (Pragmatic Usability Rating by Experts)

### What is it

### How to perform PURE assessment

### Example of PURE assessment

### When to use PURE

### PURE restrictions

## How to choose: SUS, HEART or PURE?

## Practice: How to Incorporate UX Measurement into Team Work

### Start with the basline

### Include metrics in the definition of done

### Make measurements regular

### Tell the team

## Checklist: what instrument and when

## How to perform the SUS test correctly: details that are important

### Recruiting participants

### Procedures for implementation

### Neutrality of wording

### How to Compare SUS Results

## HEART in practice: how to hold a session in 2 hours

### Workshop format

## How to choose a measurement tool for the task: detailed criteria

### SUS choose when

### PURE choose when

### HEART choose when

### Combinations

## What to Do: From Data to Action n

### After SUS

### After PURE

### After HEART

## AI and UX Metrics: How to Use Claude to Measure Interface Quality

### Prompt: Prepare a SUS test

### Prompt: interpreting the results of SUS

### Prompt: conduct PURE-assessment using AI

### Prompt: Build a HEART matrix

### Prompt: process qualitative data from the test

Retention Rate and design: which UI solutions drop retention and which expand

CAC and UX: Where Design Loses Users in the Engagement Funnel

Why measure UX at all

SUS (System Usability Scale)

What is it

SUS questions

How to count

Interpretation

When to use

SUS restrictions

HEART (Google)

What is it

Five dimensions of HEART

Goals Matrix - Signals - Metrics

When to use HEART

PURE (Pragmatic Usability Rating by Experts)

What is it

How to perform PURE assessment

Example of PURE assessment

When to use PURE

PURE restrictions

How to choose: SUS, HEART or PURE?

Practice: How to Incorporate UX Measurement into Team Work

Start with the basline

Include metrics in the definition of done

Make measurements regular

Tell the team

Checklist: what instrument and when

How to perform the SUS test correctly: details that are important

Recruiting participants

Procedures for implementation

Neutrality of wording

How to Compare SUS Results

HEART in practice: how to hold a session in 2 hours

Workshop format

How to choose a measurement tool for the task: detailed criteria

SUS choose when

PURE choose when

HEART choose when

Combinations

What to Do: From Data to Action n

After SUS

After PURE

After HEART

AI and UX Metrics: How to Use Claude to Measure Interface Quality

Prompt: Prepare a SUS test

Prompt: interpreting the results of SUS

Prompt: conduct PURE-assessment using AI

Prompt: Build a HEART matrix

Prompt: process qualitative data from the test