String Encoding Debates

Cluster focuses on debates about string representations in programming languages, including UTF-8 vs. UTF-16 encodings, bytes vs. text, random access efficiency, and implementations in Rust, Python, Java, and others.

📉 Falling 0.5x Programming Languages

3,963

Comments

Years Active

Top Authors

#9704

Topic ID

Activity Over Time

2007

2008

2009

2010

2011

2012

104

2013

217

2014

171

2015

187

2016

245

2017

260

2018

252

2019

255

2020

280

2021

454

2022

416

2023

309

2024

315

2025

313

2026

Top Contributors

masklinn (79) steveklabnik (63) tialaramex (63) burntsushi (41) int_19h (24)

Keywords

ByteStrings JIT virginia.edu OsString LATIN1 StringLiteral MB www.cs FFI XML string strings utf bytes byte encoding rust utf8 arrays unicode

Sample Comments

kragen • Nov 29, 2025 • View on HN

I forget that people think strings are different from sequences of bytes.

asdasf • Sep 18, 2013 • View on HN

ByteStrings are not the alternative to Strings! Text is.

bjoli • Dec 23, 2019 • View on HN

Encoding strings internally as UTF-8 is a bad idea, since you can't do efficient constant time access (utf-8 isn't fixed width).

kbd • Dec 7, 2011 • View on HN

What are the downsides of Ruby's alternate approach of having strings be bytes that carry an encoding object around?

Dr_Emann • Sep 8, 2022 • View on HN

Honestly feels like a very, very different use case from utf8-ish strings, at a quick read?

danbruc • May 19, 2015 • View on HN

UTF-8 is not well suited for a general purpose string implementation because it is a variable length encoding and therefore addressing a character becomes a linear time operation. UTF-16 would probably be a better choice in most cases.

jiggunjer • Jul 23, 2020 • View on HN

Don't most other languages use utf16 for strings?

steveklabnik • May 15, 2016 • View on HN

All rust String and &strs are UTF-8 encoded, there are also other string types.

fnord123 • Aug 13, 2017 • View on HN

Not to disagree, but you can barely perform random access on a utf8 string. You need to explode it out to utf16 or utf32 which isn't what most languages have built in. Rust and go largely work with utf8 while c and c++ love them byte arrays (not sure I've even seen std::wstring in the wild)

lawn • Jun 2, 2023 • View on HN

Nah, that's just dumb. Rust's way of all strings being utf-8 and providing the different lengths depending on your needs is far superior.If you want something else than utf-8 you can use another data type, like a vector of bytes.