If we try to interpret bytes in memory as a string we have to know the underlying encoding to be able to decode the string.
It does not make sense to have a string without knowing what encoding it uses.
Unicode, in friendly terms: ASCII, UTF-8, code points, character encodings, and more
Take the string Hello. The steps are as follows:
Graphemes --> Code Points --> Encoding --> Binary
Hello interpreted in Unicode code points looks like this:
U+0048 U+0065 U+006C U+006C U+006F
Now a character encoding standard can be applied to represent each Unicode character in memory. The most common ones are UTF-8 and UTF-16.
Remark: A character set like Unicode is not the same as a character encoding standard as UTF-8.
Takes up one to four bytes based on the Unicode value. In this example encoding Hello every character can be encoded using one byte.
Hex
\x48\x65\x6c\x6c\x6f
Binary
01001000 01100101 01101100 01101100 01101111
The concatenated binary respresentation is the storage ready version.
Hex
\u0048\u0065\u006c\u006c\u006f
Binary
01001000 00000000 01100101 00000000 01101100 00000000 01101100 00000000 01101111