Search code examples
emeditor

How to find the total number of columns (characters) in a particular line?


I have found that the GetLines method of the Document object retrieves the number of lines in the document. But I have not found a way to obtain the number of columns (i.e. user-perceived characters or grapheme clusters) in a particular line.

The GetColumns method of the Document object is not suitable because it retrieves the number of columns in a CSV mode. If the document is not a CSV mode, this method returns 0.

A column is the logical coordinate on the horizontal axis:

a column is the number of characters from the previous newline character or from the start of the document if it's the first line of the document.

Given an integer i, I want to find the logical coordinate (the X axis position) of the rightmost column in the i-th line of the document. How to do this? For example, a text file contains the following text:

123
test 1
abcdefghij
test 2

Then the number of columns in the third line is 11: the 1-based index of j is 10, plus one step of the cursor to the right.

There is the Javascript way: Intl.Segmenter. But since it is not a native (built-in) method, I don’t know whether it is 100% equivalent to the program’s internal representation.

EDIT

I need to clarify a possible confusion.

The number of code points in a line is not the same as the number of columns in a line. The latter corresponds to the number of grapheme clusters in a line, which is equal to the logical coordinate of the last grapheme cluster in the line. For example, consider the line that contains the following four code points:

  1. U+0041: Latin Capital Letter A;
  2. U+0301: Combining Acute Accent;
  3. U+0065: Latin Small Letter E;
  4. U+0301: Combining Acute Accent.

This is displayed as

Áé

The problem is that the length property in Javascript (which is the language that I use for macros) is useless for my purpose because it counts the number of code points, not the grapheme clusters (in some cases, performing Unicode normalization is not sufficient):

"Áé".length === 4
//true

But the logical coordinate of the last grapheme cluster in this line is only 2. That is, is the second column. It is possible to place the cursor to the right of this column, and the program will show that the current column is 3. But the number of code points in this line is 4:

Code points are commonly used in character encoding, where a code point is a numerical value that maps to a specific character. In character encoding code points usually represent a single grapheme—usually a letter, digit, punctuation mark, or whitespace—but sometimes represent symbols, control characters, or formatting.

My question is about finding the logical coordinate of the last grapheme cluster in a line, which is one less than the number of columns in a line (because each line has at least one “empty” column to the right of the last grapheme cluster):

It is important to recognize that what the user thinks of as a “character”—a basic unit of a writing system for a language—may not be just a single Unicode code point. Instead, that basic unit may be made up of multiple Unicode code points. To avoid ambiguity with the computer use of the term character, this is called a user-perceived character. For example, “G” + grave-accent is a user-perceived character: users think of it as a single character, yet is actually represented by two Unicode code points. These user-perceived characters are approximated by what is called a grapheme cluster, which can be determined programmatically.

The program shows this number in the bottom of the window, but it seems that there is no native method to access it as the property of a line.


Solution

  • Here's another way to get the length of the line. Unlike document.getline(xxx).length, this doesn't copy memory, but it's a few lines long.

    var i = 3;
    document.selection.SetActivePoint(eePosLogical, 1, i);
    document.selection.EndOfLine(false, eeLineLogical);
    alert(document.selection.GetActivePointX(eePosLogical));
    

    It moves your cursor to line i, goes to the end of the line, then outputs current caret X position.