Unicode Encoding Issues
Comments discuss problems with character encoding, particularly Unicode and UTF-8 handling in web interfaces, browsers, code editors, and text processing, debating whether they are bugs, features, or edge cases.
Activity Over Time
Top Contributors
Keywords
Sample Comments
Looks like the code got pasted into an editor that wasn't too interested in keeping that character encoding.
What is something you feel that the text encoding system shouldn't be able to encode?
I'm assuming they're taking about character encoding, as in utf-8.
Title is needlessly inflammatory. Their web interface seems to have a bug with the encoding detection.
Or just an inability to deal with character encoding.
Getting loads of text encoding oddities everywhere - is that just me?
Likely the byte-pair encoding at fault. It doesn't see the letters.
Nice tool. Just the title is misleading, it's not Unicode that is broken, it's the encoders/decoders..
Yeah, encoding issues :/ working on it
I disagree with calling it "corrupted." We're not tricking the browser into trying to render garbage bytes that are actually the middle of a jpeg or something. It's actually valid Unicode. It's an edge-case which is not seen in regular usage, but it's technically following all of the rules.