Five Automatic Accessibility Checking Tools – and Where Each of Them Gets Wrong

◷ 15 min read 6/22/2026

Main chat

A chat for vibe coders: news, guides, live cases, marketplace, and finding executors.

Five Automatic Accessibility Checking Tools – and Where Each of Them Gets Wrong - обложка

Open any website, run Lighthouse, and see Accessibility: 98. Close the tab with a sense of accomplishment. Now try to go through the same site from the keyboard without touching the mouse. Or turn on VoiceOver and listen to the main form read. There’s a hell out there that Lighthouse didn’t even notice.

Automatic availability checkers are useful tools. But they check what the machine can check: contrasts, the presence of alt, the correctness of ARIA attributes, the validity of markup. Everything that has to do with meaning, context, and the actual use case is left out. According to various estimates, automation catches about a third of WCAG violations. The other two-thirds is what accessibility is all about.

The following is an analysis of five popular tools: what they do well, where they lie, and in which situations their green icon can not be relied on at all.

Why 100/100 is a trap, not a goal

The main problem is not the tools. The problem is how teams use them.

Typical scenario: the product sets the task of “bring accessibility-score to 95+”. The designer and the frontend rule what the checker highlights. Score's growing. Everyone is. And a real user with a screen reader is still unable to place an order, because in the modal, the focus jumps randomly, and the “Confirm” button is signed as “button-primary-2”.

The green icon creates the illusion of a solved problem. This is worse than not checking because it turns off further thinking.

Automation can't really test:

Does the alt make sense (alt="image" formally valid)
Is it logical to read and focus
Visual order is understood by the blind user
Does the custom component work from the keyboard
Does the interface break down with zoom 200% or more
Are dynamic changes (live regions) correct
Whether the text contrast on the gradient or photo is sufficient

If the team is fixed "we're all right, Lighthouse green", it is worth once to hold a live pass from the keyboard to retro. The effect is sobering.

axe DevTools: the best of the machines - but still a machine

deque’s axe-core is the engine that runs half of the industry (including Lighthouse’s built-in verification). The axe DevTools extension for Chrome is the de facto standard for quick manual verification.

What it does well: catches almost all formal violations of WCAG A and AA, gives clear references to the rules, shows a specific DOM node, and offers a fix. Minimum false positives – if axe says “violation,” it’s most likely a violation.

Where axe is wrong

Aria. axe will verify that role="dialog" has aria-labelledby. But it doesn’t check that the link leads to a meaningful title, not an empty <span>. Technically, valid. In fact, the user of the screen reader hears "dialogue", and silence.

Castom components. Self-painted dropdown without a single ARIA attribute axe simply ignores - for him it is a set of div's, and there is nothing to complain about. Paradoxically, the worse the component is made, the less axe has reasons to swear.

Order and focus. Axe does not pass the Tab interface. If the focus after opening the modular remains on the button below it - axe will not see it.

How to use without self-deception

Run axe on every PR – but only as a lower bar, not as a final test
For each component in the design system write a script manual keyboard check
Once in a sprint - the passage of critical flow with a screen reader (VoiceOver on macOS, NVDA on Windows)
When reviewing PR with new ARIA attributes – check not “there is an attribute”, but “what it voices”

** Questions for code review that axe will not ask:**

Where does the focus go after this pope closes?
What will the user hear when they press this button?
Do you understand the meaning of the icon without a signature and without alt?
If you turn off CSS, will the order of reading remain?

WAVE: A visual audit that lulls alertness

WebAIM’s WAVE is an extension that paints icons of errors, warnings and structural elements right on top of the page. Convenient for a quick overview: you can see where the headlines are, where the landmarks are, where the trouble spots are.

The main advantage is visibility. You can understand the structure of the page with your eyes in a minute without climbing into DevTools. Good for a demo client or for training new team members: “Look, this is what a page without the right semantics looks like.”.

Where WAVE fails

The tool relies heavily on “alerts” icons, yellow warnings that are not errors but require human attention. In practice, they are ignored: “Yellow is not red, let’s move on.” Namely, in yellow sit the most interesting finds: suspicious alt, empty links, duplicate headlines.

Another weakness is spa. WAVE checks what is rendered at launch. Dynamically loaded content, modulars, post-interaction states – all this is not included in the report. On a static landing page works perfectly, on a complex application shows half the picture.

And the individual pain is the contrast. WAVE checks text color and background color as two values. The text in the photo, on the gradient, on the translucent substrate is the blind spot.

**Short summary of the first two tools: * axe and WAVE solve different problems. axe for CI and formal rule checks, WAVE for quick visual inspection of the structure. None replaces the manual passage from the keyboard and screen reader. And none of them answers the main question – is the interface clear to a person who sees it differently than we do.

Lighthouse: convenient, fast, superficial

Lighthouse is built into Chrome DevTools and any normal CI. I started - got a beautiful score of 100 points with a green plaque. And that's the problem: the team is looking at the number, not the report.

Under the hood, Lighthouse has a stripped-down version of axe-core plus a few of its own checks. Trimmed is the key word. Some of Lighthouse’s axe rules are off, some are simplified. So Lighthouse: 100, axe DevTools: 12 Violations is a common occurrence, not an anomaly.

Where Lighthouse is wrong

Score-driven development. The green circle becomes the target. The team rules exactly what prevents to reach 100, and does not touch what Lighthouse in principle does not know how to check. It's worse than not checking at all because it creates false confidence.

Only the top level of the page. Lighthouse runs an audit on the DOM that it sees when downloading. Hidden behind the tab sections, the content behind display: none, the states after the click - fly by.

Contrast by letter of rule. Takes computed style and calculates the ratio. Text over the image, shadow, blur background, state :hover and :focus are blind spots, just like WAVE.

How to build without harm

Lighthouse – in CI as a lower threshold: “at least 90 in accessibility” instead of “must be 100”
Not showing score in dashboards to management as KPI accessibility is killing the point
For each PR with UI changes – a separate run axe, not just Lighthouse
Once a quarter – check the list of verifiable rules of Lighthouse with the current axe-core to understand what is not checked

Accessibility Insights: Closest to Manual Verification

A tool from Microsoft, built on the same axe-core, but with one important difference - FastPass mode and guides on assisted manual checks. That is, in addition to automation, it tells you what to poke your hands, and where to look.

Tab Stops – visualizes the order of focus right on the page. Arrows, numbers, you can see where the focus is jumping in the wrong direction. It’s no longer “the machine said OK,” it’s a tool that helps the designer and developer see the problem through their eyes.

Where Accessibility Insights is Wrong

** Entry curve. ** In order to benefit from the tool, you need to learn it. Teams that “just click audit” get exactly what they get from axe. Tab Stops, Assessment, Needs Review – half of users don’t know about these modes.

Assessment turns into a formality. A full WCAG test takes hours, and in real sprints it is driven not for every feature, but once in a release. By the time you find a problem, the feature is already in the market.

Tab Stops doesn't answer "why." Shows the focus jumps strangely. It doesn’t show you how to fix it – it’s for the developer and component architecture.

Script for the design team

When you model a new component – immediately draw a diagram of focus: where does the Tab go, where does Shift + Tab, what does Esc
On the review of the prototype in the browser - Tab Stops is turned on, pass the flow with our eyes, screen the problem areas
The problems found are not in the general backlog, but in the component card in the design system, so that the fix lives at the level of the system, not individual screens

Stark in Figma: Checking before it's too late to change anything

Stark is a plugin for Figma (both Sketch and browser) that catches accessibility issues at the layout stage. Contrast, color blindness simulation, touch-target sizing, alt-description generation. The main advantage is that it works where the designer actually lives, and not a single step at the end.

Where Stark is wrong

A layout is not an interface. Stark checks the contrast of the text layer on the background layer. In a live product, this text will be scrolled above the picture, in a dark topic, in a disabled state - Stark does not see these contexts because they are not in the layout.

**Simulation is not experience. Run through a deuteranopia simulator is helpful, but it's still a sighted person's gaze through the filter. A real color-impaired user lives with him all his life and compensates differently than we assume.

AI descriptions alt. A handy feature that generates a signature from an image. And exactly the same trap as with any AI: the designer inserts the generated text without looking, and in the sale of the icon “basket” appears alt “image of a black icon”.

Designer’s checklist before transferring the layout to development

Contrast checked for all states: default, hover, focus, disabled, error
The focus state is drawn explicitly, not "leave the browser"
Touch-target at least 44×44 for all interactive elements on mobile
Icons without a signature have a text description in the speck of the component
Running a layout through a colorblind simulator - meaning is preserved without color?
If there is a dark theme in the layout - the contrast is checked and in it

Questions for design review

What will a user who does not distinguish between red and green see on this validation form?
If you remove the color, where is the user focused?
This gray placeholder - is it exactly passing the contrast, or are we fooling ourselves?
The error state differs from the usual only by the color of the frame?

How to put it into the workflow

Five tools are not five checkboxes in a DoD. These are different points of control at different stages.

In Figma** - Stark before the layout went into development
In PR - axe in CI as mandatory gate, Lighthouse as soft threshold
On Manual Feature Check – Accessibility Insights with Tab Stops, WAVE for a quick review of the structure
Once in the release - a walkthrough by a live person with a keyboard and a critical fly screen reader

The biggest mistake teams make is choosing one tool and thinking they’ve closed the issue. Automation catches formal violations, nothing more. Everything that has to do with meaning – where the focus goes, what the screen reader hears, whether the icon is clear – remains handmade. And this work needs to be built into the process as systematically as we build a linter.

What to do with AI component generation

A separate story is when a part of the UI is assembled not by hands, but through an AI assistant: Figma Make, v0, Cursor from MCP to Figma, generation of components according to the description. Accessibility in such pipelines breaks down quietly and equally.

Typical failures of AI generation

The button is drawn as <div> with a click handler – visually indistinguishable, there is no screen reader
Modular without role="dialog", without a trick, closes only with a cross mouse
Form without <label>, placeholder instead of signature - because in the layout "so short"
Icon-button without aria-label – model generated SVG and forgot about the text name
Contrast "like in design" - but the design was a light theme, and the dark automatically generated

What to build into the process if AI is involved in code

The component generation prompt contains accessibility requirements explicitly: semantic tags, focus states, ARIA roles where they are needed
Any AI-generated component passes axe and manually runs through the Tab before entering the design system
Banning “AI writes alt and aria-label” without a review is the same trap as Stark, only in code
In the component library, availability is fixed as part of the API: if IconButton does not have label in props, it is not going to

The scenarios that are caught worst

Automation and plugins capture static DOM in the same state. Anything that is dynamic or contextual requires a conscious test.

Check for hidden failures

Toasts and snack bars - are announced by the screen reader, do not disappear faster than they have time to hear
Modals: Does the focus return to the trigger button after closing
Endless scrolling – is there an alternative for the keyboard, is not lost position on return
Custom dropdowns and combo boxes - arrows, Enter, Esc, Home/End work on waiting
Animations – Is there any respect for prefers-reduced-motion, does something blink faster than three times a second
Dark theme is not “light inversion”, contrast and focus styles are tested separately
Load states – what the screen reader hears while the spinner spins

Anti-Patterns That Pass All Auto Tests

Button with aria-label="кнопка" - formally valid, meaning garbage
alt="" has an illustration that makes sense - axe will be silent, the user will lose context
Contrast 4.5:1 on the text that lies on top of the video - on a separate frame passes, in motion there is no
The focus is visible, but it's a default blue frame on the brand's blue background - technically there is, practically no

How to explain this to your team and business

The main problem is not the tools, but that accessibility is perceived as “extra work at the end.” Then there are a few moves that work in real teams.

Translated into product language

Not "WCAG 2.2 AA," but "35 percent of our users use a keyboard at least once for navigation - we're losing them now at step X."
Not "contrast doesn't pass," but "this text doesn't read in the sun from the phone - we checked it out on the street."
Not "need semantics," but "voice assistant and page search won't find this button."

Embedded in a DoD, not a separate availability sprint

A component in a design system is considered ready when it has a focus state, keyboard shortcuts are described, and contrast is tested in all topics
The PR template contains a line about accessibility - not checkbox "I checked", but specifically: "passed Tab, checked screen reader, axe green"
Accessibility bugs are started with the same priority as functional ones, otherwise they will always “later”

Questions for review fitchey

If the user does not have a mouse, can he complete the scenario?
If you turn off the sound and color, what information is lost?
What will the screen reader hear on this screen in the first five seconds?
Where is the bottleneck here - and why did we decide not to fix it now?

Outcome of the segment

The tools do not make the product available. They show exactly where it's broken. The rest is the team’s decision: what context we check, who we explain to, where we catch. Automation is the bottom line. Real accessibility begins with the question “how to use it without what we expected by default.”.

Then there’s what makes “tools showed red/green” a real team habit: short checklists, pitfalls in the process, and questions that make sense to ask before the feature goes on sale.

Checklist before merge

Not a “full availability audit”, but a minimum that is realistic to pass in 10-15 minutes on any feature.

Before PR

Flowed only one keyboard from beginning to end – Tab, Shift+

Five Automatic Accessibility Checking Tools – and Where Each of Them Gets Wrong

## Why 100/100 is a trap, not a goal

## axe DevTools: the best of the machines - but still a machine

### Where axe is wrong

### How to use without self-deception

## WAVE: A visual audit that lulls alertness

### Where WAVE fails

## Lighthouse: convenient, fast, superficial

### Where Lighthouse is wrong

### How to build without harm

## Accessibility Insights: Closest to Manual Verification

### Where Accessibility Insights is Wrong

### Script for the design team

## Stark in Figma: Checking before it's too late to change anything

### Where Stark is wrong

### Designer’s checklist before transferring the layout to development

### Questions for design review

## How to put it into the workflow

## What to do with AI component generation

### Typical failures of AI generation

### What to build into the process if AI is involved in code

## The scenarios that are caught worst

### Check for hidden failures

### Anti-Patterns That Pass All Auto Tests

## How to explain this to your team and business

### Translated into product language

### Embedded in a DoD, not a separate availability sprint

### Questions for review fitchey

## Outcome of the segment

## Checklist before merge

### Before PR

WCAG: a boring acronym that will protect you from lawsuits

—

Why 100/100 is a trap, not a goal

axe DevTools: the best of the machines - but still a machine

Where axe is wrong

How to use without self-deception

WAVE: A visual audit that lulls alertness

Where WAVE fails

Lighthouse: convenient, fast, superficial

Where Lighthouse is wrong

How to build without harm

Accessibility Insights: Closest to Manual Verification

Where Accessibility Insights is Wrong

Script for the design team

Stark in Figma: Checking before it's too late to change anything

Where Stark is wrong

Designer’s checklist before transferring the layout to development

Questions for design review

How to put it into the workflow

What to do with AI component generation

Typical failures of AI generation

What to build into the process if AI is involved in code

The scenarios that are caught worst

Check for hidden failures

Anti-Patterns That Pass All Auto Tests

How to explain this to your team and business

Translated into product language

Embedded in a DoD, not a separate availability sprint

Questions for review fitchey

Outcome of the segment

Checklist before merge

Before PR