Article Preview
Buy Now
FEATURE
Byte Order Marker
How to Implement a Byte Order Marker (BOM) with Xojo
Issue: 16.1 (January/February 2018)
Author: Eugene Dakin
Author Bio: Eugene works as a Senior Oilfield Technical Specialist. He has university degrees in the disciplines of Engineering, Chemistry, Biology, Business, and a Ph.D. in Chemical Engineering. He is the author of dozens of books on Xojo available on the xdevlibrary.com website.
Article Description: No description available.
Article Length (in bytes): 11,481
Starting Page Number: 12
Article Number: 16102
Resource File(s):
project16102.zip Updated: 2018-01-01 22:32:50
Related Link(s): None
Excerpt of article text...
Data in UTF form can be confusing, and adding endianness can be overwhelming. I have great news, as the Byte Order Marker can help remove this confusion when opening a file or receiving a file.
A byte order mark (BOM) are the hexadecimal numbers
FE FF
which are placed at the beginning of a file, or data stream, which are used to automatically determine the type of encoding of the data. It is common to write programs in many languages, and the way that non-english ASCII characters are shown is by using different encodings. Byte Order Mark should be invisible to the user, and programs should automatically read this data and decode the text appropriately.There is an issue with just writing the text UTF16LE, which means Unicode Transformation Format in 16-bit blocks in Little Endian format. UTF is the way that characters are converted to numbers and back to characters again by the computer.
In the early days of computers, most of the text was written in English, which required about 128 characters to include capitals, small letters, and some accent characters. When other languages were starting to be on the internet, there quickly needed to be more characters than just those for English. The characters were expanded to UTF-16. When even more unique characters were needed (and an example is with the many characters in the Mandarin [Chinese] language), then UTF-32 was created.
Another issue was that not all computers stored information the same. Intel processors wrote data in Little Endian (LE) format, while old Mac computers wrote data in Big Endian (BE) format, and these formats were also added onto the end of the UTF type.
With all of these different format types, there needed to be a way to detect the format of a text document or HTML that was sent over the internet. This was when the Byte Order Mark (BOM) was created. When the hexadecimal value
&hFEFF
is added with the encoding, then the value will change depending on the UTF and Endian type. The following table shows the values of&hFEFF
when the first 32 bits are read by the computer.
...End of Excerpt. Please purchase the magazine to read the full article.