I'm using Pango to typeset Devanagari. Consider the string उम्कन्छौ consisting of DEVANAGARI LETTER U, DEVANAGARI LETTER MA, DEVANAGARI SIGN VIRAMA, DEVANAGARI LETTER KA, DEVANAGARI LETTER NA, DEVANAGARI SIGN VIRAMA, DEVANAGARI LETTER CHA, DEVANAGARI VOWEL SIGN AU. When typesetting this string, I want to know the starting point of छ (CHA) to put a visual mark.
For ordinary strings I would take the length of the preceding part, उम्कन् but this doesn't work here since as you can see न् (half न) combines with छ so the result is slightly off.
Is there a way to obtain the correct starting point of a letter when combinations are involved?
I've tried querying the Pango layout using index_to_pos(), but this seems to work on the byte level (not characters).
This small Perl program shows the problem. The vertical line is off to the right.
use strict;
use warnings;
use utf8;
use Cairo;
use Pango;
my $surface = Cairo::PdfSurface->create ("out.pdf", 595, 842);
my $cr = Cairo::Context->create ($surface);
my $layout = Pango::Cairo::create_layout($cr);
my $font = Pango::FontDescription->from_string('Lohit Devanagari');
$layout->set_font_description($font);
# Two parts of the phrase. Phrase1 ends in न् (half न).
my $phrase1 = 'उम्कन्';
my $phrase2 = 'छौ';
# Set the first part of the phrase, and get its width.
$layout->set_markup($phrase1);
my $w = ($layout->get_size)[0]/1024;
# Set the complete phrase.
$layout->set_markup($phrase1.$phrase2);
my ($x, $y ) = ( 100, 100 );
# Show phrase.
$cr->move_to( $x, $y );
$cr->set_source_rgba( 0, 0, 0, 1 );
Pango::Cairo::show_layout($cr, $layout);
# Show marker at width.
$cr->set_line_width(0.25);
$cr->move_to( $x + $w, $y-10 );
$cr->line_to( $x + $w, $y+50 );
$cr->stroke;
$cr->show_page;
You cannot measure a partial rendering. Instead measure the whole rendering and iterate over the string grapheme-wise to find the position. Also see: https://gankra.github.io/blah/text-hates-you/#style-can-change-mid-ligature
use strict;
use warnings;
use utf8;
use Cairo;
use Pango;
use List::Util qw(uniq);
use Encode qw(encode);
my $surface = Cairo::PdfSurface->create('out.pdf', 595, 842);
my $cr = Cairo::Context->create ($surface);
my $layout = Pango::Cairo::create_layout($cr);
my $font = Pango::FontDescription->from_string('Lohit Devanagari');
$layout->set_font_description($font);
my $phrase = 'उम्कन्छौ';
my @octets = split '', encode 'UTF-8', $phrase; # index_to_pos operates on octets
$layout->set_markup($phrase);
my ($x, $y) = (100, 100);
$cr->move_to($x, $y);
$cr->set_source_rgba(0, 0, 0, 1);
Pango::Cairo::show_layout($cr, $layout);
$cr->set_line_width(0.25);
my @offsets = uniq map { $layout->index_to_pos($_)->{x}/1024 } 0..$#octets;
# (0, 9.859375, 16.09375, 27.796875, 33.953125, 49.1875)
for my $offset (@offsets) {
$cr->move_to($x+$offset, $y-5);
$cr->line_to($x+$offset, $y+25);
$cr->stroke;
}
my @graphemes = $phrase =~ /\X/g; # qw(उ म् क न् छौ)
while (my ($idx, $g) = each @graphemes) {
if ($g =~ /^छ/) {
$cr->move_to($x+$offsets[$idx], $y-10);
$cr->line_to($x+$offsets[$idx], $y+50);
$cr->stroke;
last;
}
}
$cr->show_page;