String Encoding Debates

Cluster focuses on debates about string representations in programming languages, including UTF-8 vs. UTF-16 encodings, bytes vs. text, random access efficiency, and implementations in Rust, Python, Java, and others.

📉 Falling 0.5x Programming Languages
3,963
Comments
20
Years Active
5
Top Authors
#9704
Topic ID

Activity Over Time

2007
2
2008
10
2009
46
2010
49
2011
57
2012
104
2013
217
2014
171
2015
187
2016
245
2017
260
2018
252
2019
255
2020
280
2021
454
2022
416
2023
309
2024
315
2025
313
2026
21

Keywords

ByteStrings JIT virginia.edu OsString LATIN1 StringLiteral MB www.cs FFI XML string strings utf bytes byte encoding rust utf8 arrays unicode

Sample Comments

kragen Nov 29, 2025 View on HN

I forget that people think strings are different from sequences of bytes.

asdasf Sep 18, 2013 View on HN

ByteStrings are not the alternative to Strings! Text is.

bjoli Dec 23, 2019 View on HN

Encoding strings internally as UTF-8 is a bad idea, since you can't do efficient constant time access (utf-8 isn't fixed width).

kbd Dec 7, 2011 View on HN

What are the downsides of Ruby's alternate approach of having strings be bytes that carry an encoding object around?

Dr_Emann Sep 8, 2022 View on HN

Honestly feels like a very, very different use case from utf8-ish strings, at a quick read?

danbruc May 19, 2015 View on HN

UTF-8 is not well suited for a general purpose string implementation because it is a variable length encoding and therefore addressing a character becomes a linear time operation. UTF-16 would probably be a better choice in most cases.

jiggunjer Jul 23, 2020 View on HN

Don't most other languages use utf16 for strings?

steveklabnik May 15, 2016 View on HN

All rust String and &strs are UTF-8 encoded, there are also other string types.

fnord123 Aug 13, 2017 View on HN

Not to disagree, but you can barely perform random access on a utf8 string. You need to explode it out to utf16 or utf32 which isn't what most languages have built in. Rust and go largely work with utf8 while c and c++ love them byte arrays (not sure I've even seen std::wstring in the wild)

lawn Jun 2, 2023 View on HN

Nah, that's just dumb. Rust's way of all strings being utf-8 and providing the different lengths depending on your needs is far superior.If you want something else than utf-8 you can use another data type, like a vector of bytes.