~/wiki / prototipy-i-handoff / testirovanie-prototipa-s-realnymi-polzovatelyami

I showed the prototype to real people -- they broke everything in ten minutes. That's what I learned

Main chat

A chat for vibe coders: news, guides, live cases, marketplace, and finding executors.

$ cd section/ $ join vibe dev
I showed the prototype to real people -- they broke everything in ten minutes. That's what I learned - обложка

I was sure the prototype was ready. Three weeks of iterations, licked screens, neat microanimations, thoughtful onboarding. I brought five people to the corridor test, and in ten minutes they blew it to the dust. Not because they were evil. Because they weren't me.

The worst part is that I could find most of the problems myself. If you stopped looking at the layout through the eyes of the author and at least once tried to pass it as a person who sees the product for the first time, without context, without patience and without the desire to “understand”.

This article is not about the methodology of usability tests. They were described without me. It’s an analysis of what really happens when a designer shows his work to live users for the first time. What illusions fall apart, what patterns are repeating in all teams, and what to do about it in the next iteration - not in six months, but this week.

Why "showing people alive" hurts so much

In a designer’s head, a prototype is a complete story. You remember why the button is right there, what scenario led to this screen, what happens next. The user has a void and a task in his head. He hasn't seen the layouts before, he hasn't sat at the workshops, he hasn't read the brief.

This asymmetry is the main reason why tests break prototypes. Not bad design, not stupid users. It's just that two different contexts meet for the first time, and whoever pays for the product wins.

What exactly breaks down most often

  • Titles of sections that seemed obvious - no one understands what's inside
  • The main action on the screen – the user does not see it and clicks on the logo
  • Onboarding - flicks through in two seconds, everything is forgotten
  • Icons without signatures – guess worse than it seems to the team
  • Error states and blank screens turn out to be the main experience, not the exception n

When you see it live, the first desire is to justify yourself. “They didn’t finish,” “they were in a hurry,” “the real user would have been more attentive.” Nope. The real user will be even more inattentive because they won’t have you behind them.

Preparation: What to Do Before Calling People

Most failed tests fail not during the session, but before. The team invites people to “look at the prototype” without agreeing on what they are testing.

Formulate a hypothesis, not “test”

Testing onboarding is not a task. The challenge is:

  • Check if the new user understands what our product is doing in the first 30 seconds
  • Check whether a person reaches the first project without prompts
  • We check that the block with tariffs does not scare at the registration stage

Under each hypothesis is a specific scenario and a specific sign of failure. If you can’t describe in advance what “failed” looks like, you don’t test, you collect opinions.

Checklist of prototype readiness to meet people

  • There is one basic scenario that really clicks from start to finish
  • Realistic data, not "Lorem ipsum" or "Family Name"
  • Error states and empty screens are shouted out, not “finished later”
  • Stubs and inactive buttons are marked so as not to confuse the user
  • The prototype opens on the device on which it will be viewed
  • You have a separate document for notes, not Figma over the layout

Anti-patterns in the preparation phase

  • Show “everything that you managed to draw” – the user will drown, you will get noise
  • Testing with colleagues from the next team – they already know the product better than they show
  • Preparing a script that leads the user by the hand (“now click here”) is a demonstration, not a test
  • Call only loyal customers - they will be polite and useless

The first ten minutes of the session decide everything

On a real test, valuable information comes not from answers to questions, but from what a person does silently in the first minutes. Where he looks, where the cursor leads, where he freezes, where he frowns. If you speak at this time, you erase the signal with your own voice.

Minimum protocol of the session

  • Explain that you’re testing a product, not a person — literally in these words
  • Ask them to think out loud, but don’t push if they stop talking
  • Give the task the wording of the user, not your own: “you need to see a doctor”, and not “open the section “Record”
  • Don't tell me, even if it hurts to watch. A 15-second pause seems like an eternity, but that's where the insight lives
  • Record not conclusions, but observations: “3 seconds looking for the Next button”, not “bad navigation”

Questions to Ask After the Task

  • What did you expect to see on this screen?
  • At what point did it become unclear?
  • If this was your real-life scenario, would you continue or close?
  • What’s missing here to get you to trust the product?

The short summary of the first block: the test breaks the prototype not because the prototype is bad, but because it first encounters the context in which it will live. The sooner you arrange this meeting, the cheaper the conclusions will cost.

Discovery: How to Turn Sessions into Solutions

After five or seven sessions, you have a bunch of notes, screenshots and scraps of phrases. At this stage, most teams make one of two mistakes: either fix the first thing they get, or arrange a large meeting to “discuss the results” and drown in subjective assessments. Neither of them works.

Disassemble observations by layers

Before you rule something, spread everything you have collected into three layers. The same failure can live on any of them, and it is treated differently.

  • ** Interface: the person did not see the button, did not understand the label, missed the touch target. Fixing the layout.
  • Scenario: A man has gone where we didn’t expect him to go; found a shortcut we didn’t think about; stuck where we have “and so understandable.” It's a flow logic.
  • **Product model: A person doesn’t understand what it is, who needs it, and why to pay. No buttons are fixed – this is a conversation with the product and marketing.

If you mix the layers, you polish the icons in a product that has broken positioning.

Count the frequency, not the volume

One user who spent ten minutes emotionally swearing at color is one voice. Five users silently missing the same button is a pattern. A review of finds is easiest to navigate this way.

Сколько людей споткнулось Что с этим делать
1 из 5–7 Записать, не трогать
2–3 из 5–7 Проверить гипотезой, починить в ближайшем спринте
4+ из 5–7 Чинить сейчас, остальное подождёт

It's rude, but it's disciplined. Without such a scale, the team always fixes what the last participant shouted louder.

Typical diagnostic errors

Substituting the cause for the symptom

The user said, "I need the back button right here." The weak team draws a button. She asks herself why she wants to go back. Often it turns out that the previous screen didn't give him confidence, so he went to check. The button here is a patch, and you need to treat that screen.

Fix one by one

The most expensive mistake is to change the key flow to the opinion of one bright respondent. Especially if he was an expert, especially if he was loud, especially if he matched his taste with someone on the team. The rule is simple: if a pattern is seen in one, it is a hypothesis, not a solution.

Treat where it is convenient, not where it hurts

Often the real problem is in architecture or product promise, but touching it is scary and long. The team agrees: "Let's rewrite the hint for now." After a month, the same rake comes, only in a new wrapper. If everyone is relieved to breathe out after the proposed decision, this is a reason to be wary, not happy.

How to transfer conclusions to the layout

Start with screens where people were silent

The most dangerous places of the test are not where the user swears, but where he stopped talking and began to move the cursor in circles. Silence means that a person builds a model of what is happening instead of acting. In the layout, these places do not require decoration, but an answer to the question “what is here now and what can I do next.”.

Checklist of post-test edits

  • The main action on the screen is visually dominant, and this is tested not only on the desktop
  • Section titles are rewritten with user words from transcripts, not command words
  • Empty states answer the question “what should I do right now,” not “there is nothing here yet.”
  • Mistakes Explain What Happened and How to Get Out Without Error 4002
  • Icons that were guessed less than half the time received signatures
  • Each edit in the file has a reference to a specific observation, not "after the test decided"

The last point is more important than it seems. In two weeks, no one will remember why the button moved. If there is no “two out of five looking for it at the bottom” next review, it will be moved back.

Questions for a design review before the next test

  • What hypothesis are we testing with this version of the layout?
  • What are we ready to hear and what answer will upset us?
  • Where in the layout are the places that we “already understand” – and did not check?
  • What changes have we made in one vote and why?
  • What we don't consciously fix until the next iteration, and why is it safe?

In short, the test provides raw materials, not solutions. Value emerges when you separate the interface from the script, frequency from volume, and cause from symptom. Without that step, even the most honest test turns into a collection of fun stories over lunch.

When to connect AI and where it breaks

The temptation is understandable: a five-person test gave you a hundred and fifty minutes of recording, transcription, FigJam notes and a bunch of screenshots. I want to upload everything to the model and get the top 5 problems. Sometimes it works, but it fails more often than it seems.

What AI does well in analyzing the test

  • Converts transcripts to thematic clusters when you yourself have already lost the picture
  • It highlights the places where the user changes the wording of the task (often this is a change in the mental model)
  • Rewrites interface texts with words from transcripts if you give him raw quotes
  • Preparing a draft report for the team on your find structure

What AI is doing wrong

  • Considers any long quote, especially an emotional one
  • Confidently coming up with a pattern where there was one observation
  • Smooth the contradictions between users, although the contradictions are interesting
  • Loses the context of the script: sees the phrase “did not understand”, but does not see that the person has been poking at the inactive element for three minutes

Working rule: AI helps in the sorting and design phase, but not in the “what to fix” phase. Delegate clustering, prioritize not.

Safe process with the model

  • Download transcripts without names, emails or any data you're not ready to see in someone else's log
  • Ask the model to refer to specific cues rather than “generalize the impression.”
  • If the model says “users believe that...” is a signal to reread the source, such a formulation in the test was usually not
  • The final list of problems you rewrite with your hands: what did not pass through your head, will not pass through the product discussion

How to explain decisions to the team

The most common breakdown after the test is not in the layout, but in the conversation. The designer comes with ten finds, the developer hears “everything is reworked”, the product hears “the timing is shifting”, and then begins to bargain instead of work.

Discussion structure that does not turn into a dispute

  • Scenario: What problem was the person trying to solve
  • What you saw: a brief observation, without interpretation
  • How many times has it been repeated: a number, albeit small
  • The Cause Hypothesis: Why It Happened
  • What we propose: one change, not three
  • What we don’t consciously touch and why it’s safe to put it off

The last item is usually forgotten, and it relieves half the anxiety of the product. When the team sees that you’ve split “fixed now” and “waiting for the next cycle,” trust in the rest grows.

Anti-patterns in disassembly

  • Show the whole video. No one is revisiting the recording - select thirty-second fragments for specific observation.
  • Start with a solution. "I propose to remake the hat" without a script reads like a taster, even if there are five failures in a row.
  • Discuss taste in real time. If the review comes up “I don’t like the color”, take it to a separate branch, do not mix with the findings of the dough.
  • Vote for edits. The test is not democracy. Decisions are made by the person responsible for the flow result, based on frequency and scenario.

How to check that you did the test well

Checklist of parsing quality

  • For each edit, the layout has a reference to a specific observation, not a general reference to the test
  • Different insights on the interface and product, and they go in different queues
  • There’s at least one finding that contradicts your original hypothesis – if not, you’ve probably only heard a comfortable one
  • The “Don’t Fix Now” list is not empty and justified
  • The team after the review knows what gets into the nearest sprint and what does not
  • In two weeks, you can restore the logic of the decision without revising the records

Questions before the next round

  • Where in this test did I hear what I wanted to hear?
  • What parts of the layout did I not let users break, and why?
  • What edits have I made with one vote, and am I ready to defend it with numbers in a month?
  • What did I give to AI and what did I keep for myself, and does it fit the risk zone?

Segment summary: An advanced test is neither more respondents nor a smarter tool. It's the discipline of parsing: separating observation from output, output from decision, decision from its packaging for the team. AI and analytics accelerate the first steps, but the latter is still the responsibility of a living person with a name in Jira.

Checklist for each round of tests

You go through this list before you sit down. Not for beauty, not to make excuses to yourself and the team.

Before the test

  • Scripts are written as user tasks, not as “show screen X.”
  • In the prototype, only those nodes that you are really ready to hear break
  • Pre-recorded hypothesis that the test can refute – in writing, before the first responder
  • Self-dealing: What counts as a "failure" of the script, not "well, he just didn't figure it out"
  • Recording and decoding are organized so that in a month you will find the desired fragment in a minute

During the test

  • You stay silent longer than you want – a pause often pulls out a real reaction
  • Do not prompt with words from the interface ("Have you seen the Continue button?")
  • You don’t explain “how it’s designed” until the person finishes the script
  • Write down not only the words, but also where the hands / cursor hung

After the test

  • Each find is tied to a specific replica or moment of recording
  • Separated interface problems and product problems are different queues
  • There's a "not fixing now" list with justification n
  • Solutions are formulated as one change per find, not a package

Anti-patterns that break the test retroactively

The test may go well and the value may leak at the interpretation stage. Below is what happens most often.

“We tested it, so it was reasonable.”

The test becomes indulgence. Any controversial decision is defended by the phrase “so showed the user test”, although there were two people in the test and one of them was in a hurry. If you refer to a test, name the number of repetitions and the script. If there is one repetition, it is a signal for the next round, not an argument.

Repackaging flavours into finds

The quietest breakdown. The designer did not like the block - and in analysis there is an observation, under which two suitable replicas are selected. The defense is simple: you show your initial hypothesis separately and separately what refutes it. If the denials are zero, chances are you've only heard the comfortable ones.

One Respondent: A Big Decision

One man's flamboyant phrase takes the team into a big redesign. Sometimes it is true insight, but more often it is an emotion. One vote is a maximum small change or hypothesis for the next round. Structural changes require repeatability.

Fix everything

After the test there is a temptation to take a list of fifteen points and drive through all. Through the sprint, the layout becomes a mosaic of compromises, and the flow does not. Better two edits you can explain than fifteen you can't defend.

"AI's already done it."

Clusters from the model look neat, and I want to take them as a result. But accuracy is not accuracy. The model will smooth out the contradictions and come up with a pattern where there was one observation. Sorting is hers, prioritization is yours.

Questions for the findings review

Ask these questions not to the respondent, but to yourself and the team when you analyze the results. They cut half the empty discussions.

  • In what scenario did this happen and how many times did it happen?
  • Is it an interface problem or a product problem, and where is it going?
  • What are we knowingly not fixing on this find and why is it safe?
  • What is one change we are making, and how will we know in two weeks that it has worked?
  • Is there a find on the list that breaks our original hypothesis? If not, the test performed itself.
  • What changes are in one vote and are we ready to protect them in a month?

Short practical outcome

Real people break a prototype not because it’s bad, but because they don’t have your context. This is the value of the test - not confirmation, but a loss of context that can be seen from the outside.

A good round is made up of three simple disciplines: let’s break what you’re really willing to change; separate observation from output and output from solution; package findings so that the team sees not a list of problems, but a clear plan – what we fix, what we delay, how we check.

Everything else -- tools, models, parsing patterns -- only works when these three things are in place. If there are none, no smart service can save you from a test that beautifully confirmed what you wanted to hear.

$ cd ../ ← back to Prototypes and handoffs