E2E testing best practice use of data attributes

We're using Cypress for E2E testing and are about to embark on the task of migrating away from tag and classname selectors to data attributes in order to make the selectors less fragile.

My question is around use of data- attributes. Cypress recommends using data-cy or data-test or data-testid.

Some of the most difficult selectors involve picking a row and column from a table. Example:

<!-- example of hard-to-test markup -->
<table class='users-table'>
<thead>
  <th>Name</th>
  <th>Email</th>
  <th>Phone</th>
<thead>
<tbody>
  <tr>
    <td>Bob Fish</td>
    <td>[email protected]</td>
    <td>123-123-1234</td>
  </tr>
    <td>Shaggy Rogers</td>
    <td>[email protected]</td>
    <td>509-123-1235</td>
  </tr>
<tbody>
</table>

Now if we use data-test as recommended, I would do something like this:

<table data-test='users-table'>
  ...
<tbody>
  <tr data-test='user-id-1'>
    <td data-test='name-col'>...
    <td data-test='email-col'>...
    <td data-test='phone-col'>...

Now I can find some td with a certain value like

  cy.contains('[data-test="users-table"] [data-test="name-col"]', user.name).should('be.visible')

or better:

  cy.get(`[data-test="users-table"] [data-test="user-id-${user.id}"]`).within(() => {
    cy.get('[data-test="name-col"]').should('have.text', user.name)
    cy.get('[data-test="email-col"]').should('have.text', user.email)
    cy.get('[data-test="phone-col"]').should('have.text', user.phone)
  })

But in the spirit of "semantic markup", I feel like I want to do something like this:

<table data-entity='users'>
  ...
<tbody>
  <tr data-entity-id='1'>
    <td data-col='name'>...
    <td data-col='email'>...
    <td data-col='phone'>...
  </tr>
  <tr data-entity-id='2'> ...

This would allow me to not munge together data-test="[noun]-[value]" attribute values like user-id-1, at the expense of having to come up with my own consistent set of data- attributes (data-entity, data-col, etc.)

So what is the correct, objective and not at all opinion based way to use data-attributes? Because we know software development never has tradeoffs and there is only one correct answer.

As an aside, I've also started reading up on Cypress Testing Library which seems like it could somewhat help by retrieving some elements in a semantically meaningful way (like role or label) but there would still be tons of markup that would not be covered, unless maybe I started throwing role= on everything which seems like a dirty hack and probably against ARIA or some other w3c standard.

Solution

Note this is from my experience managing huge test suites in enterprise and going down the wrong path on this exact issue a few times. There are many trade-offs here and no singular "right way".

Data attrs

Firstly, it is correct to decouple selectors from internal DOM attributes that are implementation details like the class attribute etc.

Data attributes can form part of the solution the fix this. However, you should be aware of the drawbacks before committing to them unilaterally. They are nice in that you can decouple your test code from the application-layer implementation details (class) etc.

I have seen and used before conventions such as using a data-cy-component which represents what type of "component" this DOM element represents combined with "component specific" attributes related to that component. For example:

<tbody>
  <tr data-cy-component="row" data-cy-row-id="1">
    <td data-cy-component="cell" data-cy-cell-column="name">...
    <td data-cy-component="cell" data-cy-cell-column="email">...
    <td data-cy-component="cell" data-cy-cell-column="phone">...

This is:

Cleaner than compound values from a purist point of view.
More extendable, you can add more test attributes if you need to select something in a new way, without breaking all the selectors.
Removes ambiguity (what if the column name has a - in it?).

However, on the negative side:

Fundamentally you are putting an increased burden on the application devs getting the right combos. What if they forget to add one of the attributes? It's more complex, and there are more ways to go wrong since the heuristics of what relates to what are up to you to define and for everyone to stick to that. It's not as easy as it sounds.
It invites people to not only select on these attributes but also to read from them and assert on them. E.g. use the value. This is generally bad. The user is not reading that value. After all, you are trying to prove what the user can really see.
You are still fundamentally adding a new "layer" of implementation details, albeit a separately maintained and more stable one. The user doesn't know or care about data attributes. Misalignment occurs. For example, you may successfully grab the right column, but was the column header rendered on screen with the right name?

Accessible selectors

cypress-testing-library is an example of a lib that encourages using selectors that are based on DOM data which exists for accessibility reasons (which is considered "public", or is visible. Its recommendations on which to prefer are quite telling on the conceptual thinking.

This means using visible text, or aria roles and their attributes to solve the problem. This comes with huge advantages:

You are generally testing on what is considered a public interface, and often using actual visible text. Instead of accidentally testing the data attributes, which may or may not reflect reality. This avoids a false sense of security.
If you struggle to select something with the available methods, that usually means your page isn't accessible. So you make it so, which is a good thing for everyone.
You don't get into a mess managing your data attribute conventions, since the aria attributes are well-defined and part of public standards.
If the selectors break often it means you have an accessibility problem to fix which is good to know!
If you are unable to disambiguate something via the accessible selectors, let's say if two users have the same name, then how does the user distinguish them anyway? It can lead to thinking properly about how a user thinks. Probably email is unique, so that should be the thing to select from. It's also, crucially, visible.
Many libraries and design systems implement the aria attributes already. For example Chakra. And things like Zag can help you build your own stuff in an accessible way.

Note, it doesn't ban using test attributes, but it also limits you to using a solitary data-testid attribute, precluding the compound attribute solution above (at least without adding your own escape hatches). But you usually find you don't need that anyway when combined with the existing accessible selectors.

In your example, you'd first markup using proper aria markup like so.

<table aria-label="Users table">
<thead>
   <tr>
     <th id="name-column">Name</th>
     <th id="email-column">Email</th>
     <th id="phone-column">Phone</th>
   </tr>
</thead>
<tbody>
  <tr>
    <td aria-describedby="name-column">...
    <td aria-describedby="email-column">...
    <td aria-describedby="phone-column">...
  </tr>
  <tr> ...

Then to complete various tasks:

cy.findByRole('table', {name: "Users table"}).within(() => {

   // Getting a row by its unique id that is visible (email)
   cy.findByRole('cell', {description: 'Email', name: '[email protected]'})
      .should('exist)
      .parent()
      .within(() => {
         cy.findByRole('cell', {description: 'Name'}) // Getting the name cell for that same row
           .should('have.text', 'Shaggy Rogers') 
       })
})

It's also worth noting that tables are a particularly more complex case to deal with. Finding and clicking a button is trivial with cypress-testing-library. With tables (but it's true anyway of the data attrs approach), you'll probably want to register some common commands.

Now to address your concern:

but there would still be tons of markup that would not be covered, unless maybe I started throwing role= on everything which seems like a dirty hack and probably against ARIA or some other w3c standard.

Don't throw role on something unless it is the thing you are saying it is. If it is, then you should do it. Note you also do not need to set a role on things that already have a correct role by virtue of the fact HTML elements have different default roles already.

You can go incredibly far with the aria attributes. If you feel like you need more, it's often because you are trying to do something that is bad practice -- i.e. select on something that is not visible. But if you absolutely must, you can still use basic data attributes with cypress-testing-library for those rare occasions you need to escape.

However, before reaching for that, I ususally reach for adding a meaningful aria-label or aria-describedby, and use that. After all the aria-label of something should properly summarise what the thing is.

This is new markup, but it's standards-compliant markup, that will stand the test of time and is needed anyway to be accessible.

And remember, the selectors are only one level of a robust abstraction. It is normal and even expected to build wrappers with common commands that achieve certain traversals or assertions.

In my experience, this is the way to go. You increase test confidence, and you are forced to think about the right things.