Search code examples
node.jsaccessibilitypa11ynode-html-parser

How to get the target line number and columns range in `pa11y`?


According to TypeScript typings the pa11y returns:

interface ResultIssue {
    code: string;
    context: string;
    message: string;
    selector: string;
    type: string;
    typeCode: number;
}

Example value (looks line the TypeScript typings are a little bit incomplete, but the runner and runnerExtras does not seem to be important for now):

{
  code: 'WCAG2AAA.Principle4.Guideline4_1.4_1_2.H91.A.EmptyNoId',
  type: 'error',
  typeCode: 1,
  message: 'Anchor element found with no link content and no name and/or ID attribute.',
  context: '<a><span></span></a>',
  selector: 'html > body > main > a:nth-child(3)',
  runner: 'htmlcs',
  runnerExtras: {}
}

I want to create the output similar to below one (the below one if for HTML validation, now the similar one for the accessibility checking):

enter image description here

However, the pa11y does not give the API for the line number and the columns range retrieving. Well, what we can do instead? I have no problems with the bringing of the source HTML code, all that left is detect the issued fragment at the HTML code.

The first clue is the context. Maybe we can use the regular expression or just String.prototype.indexOf() but what if the occurrences are multiple? I am not fine if it will always the first one.

Next clue is the selector like html > body > main > a:nth-child(3). Looks like it is unique. Using node-html-parser, we can access to the element by selector, but how we can get the line number and the columns range?


Solution

  • You can do this by parsing the source HTML, then use the querySelector to access the element.

    This is assuming that the source HTML contains proper line breaks with newline characters (in this case \n). Also you do need to ensure that you preserve whitespaces and formatting on the source when you parss using the pre option.

    You can then use a combination of indexOf and substring to find the line number of the element in the source HTML.

    For example (apologies typed here so could be errors!)

    const { parse } = require('node-html-parser');
    
    // where htmlContent is the source HTML
    const root = parse(htmlContent, {
       pre: true
    });
    
    // where resultIssue.selector is a string such as 'html > body > main > a:nth-child(3)'
    const element = root.querySelector(resultIssue.selector);
    
    if (element) {
      // get the element as HTML string
      const sourceHtml = element.toString();
    
      // find the element in the source html
      const startIndex = htmlContent.indexOf(sourceHtml);
    
      // count the lines (NB: could be `\r\n`)
      const lineNumber = htmlContent.substring(0, startIndex).split('\n').length;
      console.log('Line number:', lineNumber);
    } else {
      console.log('Element not found.');
    }
    

    You can access the approximate column position in a similar manner by finding the index position of the element string within the HTML line string.