Follow

The crazy thing about CSV files, commas are likely to be used in stuff you'd want to put in a CSV file.

"But Fluff," you say, "people already deal with that. Use colons or bar characters or tabs."

Well yes. People do that. But did you know ASCII (which now forms the first tiny chunk of unicode) has built in separator characters? ASCII 28 through 31; file, group, record, and unit separators. CSV was a mistake.

Thus ends another episode of Fluffpinions.

@LilFluff back in the day SQL studio used to use character x05 as the field separator for the columns on the ridded query results.

We wrote a custom UDP communications protocol that used character x05 for a line terminator. Trying to look at the saved pakets was difficult as hell.

@LilFluff it's funny that they escape them with speech marks, which in turn means you need to escape speech marks with more speech marks.

For example:
This, is, a, """crazy"",test"

@LilFluff Pretty sure the reason CSV (and tab-delimited to a lesser extent) was successful boils down to 'the comma/tab key exists on my keyboard'. :P

@LilFluff At my first job I was responsible for exporting account data and making reports out of it and I couldn't figure out why it kept sometimes erasing hundreds of dollars.

It was because the software I was using was putting in unsanitized commas and excel was removing leading zeroes.

CSV was a mistake.

@LilFluff So I haven't confirmed this, but I have a suspicion that CSV happened as a result of Microsoft BASIC's record handling behavior.

WHY DID THEY NOT USE THE RIGHT CHARACTERS FOR THIS

*looks at the boost and star numbers, whoa*

Some good points have been raised:
* as @kellerfuchs notices, these separators should work even with utf-8, the first 0-128 in unicode mirrors ascii (so 30 or 0x1E in both is the Record Separator).

* @bobstechsite points out that CSV generally allows escaping the comma, colon, etc. Through the use of other potential text characters, which means escaping the escape sequence to allow typing those. \, to type a comma, \\, if you actually want a backslash comma. Etc.

@vertigo reminds that ascii had other useful control characters for making out things like headers or the start and end of text that could have been usefully used but weren't.

* @SimonTesla points out a shortcoming. Tab and comma have dedicated keys on nearly all keyboards. FS, GS, RS, US don't. It would have been nice if they'd received key combos (ctrl or shift + something) early on to become defacto standards.

* or as @yakkoj suggests give them keys (anyone still regularly use SysRq?)

* @bhtooefr ponders whether this was all inherited from MS Basic. Hmm. Were csv files used on pre-hobby computer systems?

@SimonTesla @bhtooefr @LilFluff FS, GS, RS, and US are Ctrl-\, Ctrl-], Ctrl-^, and Ctrl-_. Quite (un)memorable, those...

Another advantage to a dedicated RS key for us who work on Cisco kit: that's their escape sequence! No more Ctrl-^.

@yakkoj @LilFluff @SimonTesla And on an ASR-33 (the keyboard that everyone cloned for their 1970s ASCII computers), that becomes Ctrl+Shift+L, Ctrl+Shift+M, Ctrl+Shift+N, and Ctrl+Shift+O, even more unmemorable. (No lowercase on the ASR-33.)

Another problem along those lines is that these are non-printable characters. CSV is readily human-readable (although TSV would be even better in this respect), ASCII-delimited values are... not, unless you translate the delimiters into tabs and newlines as appropriate.

@LilFluff It looks more likely to be inherited from things like IBM Fortran in the early '70s, as well as the wide variety of home/office computer standards of the '75-'85 period.

Until the PC, everything IBM used EBCDIC instead of ASCII for legacy reasons, & EBCDIC doesn't have fs/gs/rs/us characters; & until they made 16-bit computers, the other big players Commodore, Apple, Atari, & Tandy used incompatible enough ASCII-like encodings.

The printed comma was a LCD.

*blink* in two hours this has almost half the boosts and stars my toot about 538's data set on English names and mining them for gender neutral names got. (github.com/fivethirtyeight/dat)

@LilFluff I would like to subscribe to your newsletter.

@LilFluff That's really interesting, although I'll defend TSV in that it's a lot easier for the separator characters to be ones you can also easily type on your keyboard ;) (And pragmatically, nearly anything that can open CSV also opens TSV, so it's a good practical alternative in the imperfect timeline we find ourselves in.)

@LilFluff I spent the afternoon being really upset about CSV files (I’m making flashcards for Anki) and this really resonates with me

Sign in to participate in the conversation
Toot Planet

Welcome to the Planet! We're a small but unrestrictive community and customized Mastodon server.

We welcome anyone who wants to come join and whatever language you speak! Especially if you're a creative type, queer, a nerdy enthusiast of Something, you'll feel right at home, but we're proud to be a friendly and welcoming community.

We also have certain features that don't exist on most mastodon servers, such as being able to post to only other members of the Planet.