Unicode Encoding Issues

Comments discuss problems with character encoding, particularly Unicode and UTF-8 handling in web interfaces, browsers, code editors, and text processing, debating whether they are bugs, features, or edge cases.

📉 Falling 0.5x Web Development
4,051
Comments
20
Years Active
5
Top Authors
#5456
Topic ID

Activity Over Time

2007
20
2008
16
2009
55
2010
131
2011
182
2012
173
2013
202
2014
202
2015
198
2016
229
2017
247
2018
240
2019
238
2020
238
2021
369
2022
351
2023
278
2024
383
2025
290
2026
9

Keywords

e.g WITH EURO CIRCUMFLEX UTF8 URI LATIN ASCII E5 AFAIU encoding unicode character utf text characters bytes byte code utf8

Sample Comments

infectoid Feb 1, 2022 View on HN

Looks like the code got pasted into an editor that wasn't too interested in keeping that character encoding.

gspr Mar 29, 2021 View on HN

What is something you feel that the text encoding system shouldn't be able to encode?

elbear Apr 19, 2020 View on HN

I'm assuming they're taking about character encoding, as in utf-8.

VMG May 2, 2012 View on HN

Title is needlessly inflammatory. Their web interface seems to have a bug with the encoding detection.

screenbeard Sep 6, 2019 View on HN

Or just an inability to deal with character encoding.

tragic May 14, 2014 View on HN

Getting loads of text encoding oddities everywhere - is that just me?

abecedarius Dec 6, 2022 View on HN

Likely the byte-pair encoding at fault. It doesn't see the letters.

erAck Jan 9, 2018 View on HN

Nice tool. Just the title is misleading, it's not Unicode that is broken, it's the encoders/decoders..

jbaudanza Dec 11, 2013 View on HN

Yeah, encoding issues :/ working on it

lmkg Nov 11, 2021 View on HN

I disagree with calling it "corrupted." We're not tricking the browser into trying to render garbage bytes that are actually the middle of a jpeg or something. It's actually valid Unicode. It's an edge-case which is not seen in regular usage, but it's technically following all of the rules.