[EN]How to render the Thai string correctly?

From the article on how to use u8g2 that can render Thai string through the drawUTF8() function of the u8g2 library, the rendering is not correct as shown in Figure 1, therefore, the code of libraries needs additionally adjusted to render correctly as in Figure 2.

Figure 1 drawUTF8() display before adjustment
Figure 2 drawUTF8() after adjustment

Problems and solutions

From Figure 1, it can be seen that the display of the mark is not correct. Since some tonal marks and vowels have to be positioned as the preceding consonants, we’ll work around the issue by checking the characters before displaying them to get the correct display position as shown in Figure 2.

Related functions

The display function called in the example program is u8g2.drawUTF8() where this command appears in the u8g2lib.h file and is written as follows:

u8g2_uint_t drawUTF8(u8g2_uint_t x, u8g2_uint_t y, const char *s) 
{ 
    return u8g2_DrawUTF8(&u8g2, x, y, s); 
}

That means it’s running u8g2_DrawUTF8(), which the above function is in the file named u8g2_font.c located in the csrc folder and coded as follows:

u8g2_uint_t u8g2_DrawUTF8(u8g2_t *u8g2, u8g2_uint_t x, u8g2_uint_t y, const char *str)
{
  u8g2->u8x8.next_cb = u8x8_utf8_next;
  return u8g2_draw_string(u8g2, x, y, str);
}

From the code of u8g2_DrawUTF8() there are 2 commands and found that it is called u8g2_draw_string() which is the actual target function for this edit.

u8g2_draw_string()

under the code of u8g2_draw_string() has the following code:


static u8g2_uint_t u8g2_draw_string(u8g2_t *u8g2, u8g2_uint_t x, u8g2_uint_t y, const char *str) U8G2_NOINLINE;
static u8g2_uint_t u8g2_draw_string(u8g2_t *u8g2, u8g2_uint_t x, u8g2_uint_t y, const char *str)
{
  uint16_t e;
  u8g2_uint_t delta, sum;
  u8x8_utf8_init(u8g2_GetU8x8(u8g2));
  sum = 0;
  for(;;)
  {
    e = u8g2->u8x8.next_cb(u8g2_GetU8x8(u8g2), (uint8_t)*str);
    if ( e == 0x0ffff )
      break;
    str++;
    if ( e != 0x0fffe )
    {
      delta = u8g2_DrawGlyph(u8g2, x, y, e);
    
#ifdef U8G2_WITH_FONT_ROTATION
      switch(u8g2->font_decode.dir)
      {
	case 0:
	  x += delta;
	  break;
	case 1:
	  y += delta;
	  break;
	case 2:
	  x -= delta;
	  break;
	case 3:
	  y -= delta;
	  break;
      }
      
      /*
      // requires 10 bytes more on avr
      x = u8g2_add_vector_x(x, delta, 0, u8g2->font_decode.dir);
      y = u8g2_add_vector_y(y, delta, 0, u8g2->font_decode.dir);
      */

#else
      x += delta;
#endif

      sum += delta;    
    }
  }
  return sum;
}

What we need to do is check whether the characters from the *str data are the vowels we need to adjust the x value to be passed to the function u8g2_DrawGlyph() to specify the position of the x-axis for display. In addition, we found that the variable delta is the width of the drawn characters, so for safety reasons you should declare delta to 0 when declaring variables:

u8g2_uint_t delta=0, sum;

After that, it will be found that the character data is converted to Unicode code with the following command.

e = u8g2->u8x8.next_cb(u8g2_GetU8x8(u8g2), (uint8_t)*str);

and checks if e is 0x0ffff , if yes it exits drawing characters, if e is not 0x0fffe then draw characters on the following command line

delta = u8g2_DrawGlyph(u8g2, x, y, e);

that means we need to check the character to be drawn before executing the above command, so it checks if the value of *(str—) is a tonal or vowel to be checked. If so, it will reduce the value of x to equal to the delta value, in the code below, stores a list of vowels and check tones in the variable t.

...
  uint8_t s;
  uint8_t t[] = {
    (uint8_t)'่',
    (uint8_t)'้',
    (uint8_t)'๊',
    (uint8_t)'๋',
    (uint8_t)'ุ',
    (uint8_t)'ู',
    (uint8_t)'ิ',
    (uint8_t)'ี',
    (uint8_t)'ึ',
    (uint8_t)'ื',
    (uint8_t)'ั'
  };
...
for( ;; ) {
    s = (uint8_t)*str;
    e = u8g2->u8x8.next_cb(u8g2_GetU8x8(u8g2), s);
...
    for (int i=0; i<11; i++) {
      if (s == t[i]) {
        x -= delta;
        break;
      }
    }
    delta = u8g2_DrawGlyph(u8g2, x, y, e);
...
}

Modified function

from the working principle that needs to be improved, the function  u8g2_draw_string() has been modified to be like this:

static u8g2_uint_t u8g2_draw_string(u8g2_t *u8g2, u8g2_uint_t x, u8g2_uint_t y, const char *str) U8G2_NOINLINE;
static u8g2_uint_t u8g2_draw_string(u8g2_t *u8g2, u8g2_uint_t x, u8g2_uint_t y, const char *str)
{
  uint16_t e;
  u8g2_uint_t delta=0, sum;
  u8x8_utf8_init(u8g2_GetU8x8(u8g2));
  sum = 0;
  uint8_t s;
  uint8_t t[] = {
    (uint8_t)'่',
    (uint8_t)'้',
    (uint8_t)'๊',
    (uint8_t)'๋',
    (uint8_t)'ุ',
    (uint8_t)'ู',
    (uint8_t)'ิ',
    (uint8_t)'ี',
    (uint8_t)'ึ',
    (uint8_t)'ื',
    (uint8_t)'ั'
  };
  for(;;)
  {
    s = (uint8_t)*str;
    e = u8g2->u8x8.next_cb(u8g2_GetU8x8(u8g2), s);

    if ( e == 0x0ffff )
      break;
    str++;
    if ( e != 0x0fffe )
    {
    for (int i=0; i<11; i++) {
      if (s == t[i]) {
        x -= delta;
        break;
      }
    }
    delta = u8g2_DrawGlyph(u8g2, x, y, e);
    
#ifdef U8G2_WITH_FONT_ROTATION
      switch(u8g2->font_decode.dir)
      {
	case 0:
	  x += delta;
	  break;
	case 1:
	  y += delta;
	  break;
	case 2:
	  x -= delta;
	  break;
	case 3:
	  y -= delta;
	  break;
      }
      
      /*
      // requires 10 bytes more on avr
      x = u8g2_add_vector_x(x, delta, 0, u8g2->font_decode.dir);
      y = u8g2_add_vector_y(y, delta, 0, u8g2->font_decode.dir);
      */

#else
      x += delta;
#endif

      sum += delta;    
    }
  }
  return sum;
}

Example Code

The sample program code for this article is as follows.

#define U8G2_WITH_UNICODE

#include <Arduino.h>
#include <U8g2lib.h>
U8G2_SSD1306_128X64_NONAME_F_HW_I2C u8g2(U8G2_R0, /* reset=*/ U8X8_PIN_NONE);
//U8G2_SH1106_128X64_NONAME_F_HW_I2C u8g2(U8G2_R0, /* reset=*/ U8X8_PIN_NONE);

void setup() {
  u8g2.begin();

  u8g2.clearBuffer();          
  u8g2.setFont(u8g2_font_etl14thai_t ); 
  u8g2.drawUTF8(10,10,"สวัสดี");
  u8g2.drawUTF8(10,30,"ที่นี่ที่ไหน");
  u8g2.drawUTF8(10,50,"มีใครอยู่ไหม?");
  u8g2.sendBuffer();          
}

void loop() {

}

Conclusion

From this article, you will find that we were able to correct the issue of displaying Thai in the u8g2 library from the drawUTF8() function. The library is still working properly and is compatible with the ESP32, ESP8266 and STM32F103 microcontrollers (we only have this much to use). The other chips must be used by the reader for further test and fix the work. Finally, we will find that if we understand the patterns of language and how the code works, we can improve the code more easily, so have fun programming.

(C) 2020-2022, By Jarut Busarathid and Danai Jedsadathitikul
Updated 2022-01-21