Migration bug(s): some disappearances of punctuation – en dashes, ellipses …


grahamperrin

macrumors 601
Original poster
Jun 8, 2007
4,946
627
At http://forums.macrumors.com/posts/20138440 the strange white space in a non-emphasised first line suggests that the loss losses of punctuation may be much more widespread.

(Not limited to migrated subject lines of replies.)

Apologies for not finding bugs of this type within the given test period.

Postscript

Elsewhere, all en dashes are missing from http://forums.macrumors.com/posts/21378086 so I assume that migration-related loss of that character was site-wide.

@arn I don't expect a speedy fix, but this loss of punctuation – I use the character, properly, very frequently – will be a significant annoyance. If it's not too soon to ask: can you imagine a post-migration fix for bugs such as this?

Critically: I have not begun paying attention to other characters such as em dashes …
 
Last edited:

arn

macrumors god
Staff member
Apr 9, 2001
14,097
1,440
It seems unlikely we can fix this automatically. Do the dashes work correctly going forward?

arn
 

grahamperrin

macrumors 601
Original poster
Jun 8, 2007
4,946
627
The strange appearance of space should have been a hint to me that maybe, it was not complete loss.

Truly, it's not complete loss. Where an en dash is no longer visible, following migration, in at least one instance there's something non-ASCII between two spaces.

It seems unlikely we can fix this automatically. …
Before I slept I suspected that this migration bug affected only the first line, first sentence or first paragraph of posts.

Now, after a few hours' sleep, I have a clearer head.

An ideal example may be http://forums.macrumors.com/threads/stickiness-of-yosemite-is-beautiful-and-yosemite-looks-terrible.1804151/#post-20138162. Compare:
  • the quote in that post #2
  • the first line of the the second paragraph of the opening post of that topic.

Plan of action

Do not rush thoughts of automatic replacement. This non-ASCII character, whatever it is for the en dash case, may be present where other types of character have disappeared. (I suspect that … some ellipses … have also become invisible characters; and so on.)

Identify the non-ASCII character that is invisible in the example above. I can probably do that today. Early indications are that it's simply non-ASCII (neither a control character, nor a null).

Notes to self: last year, what method did I use to identify the vertical tab character that caused problems with a Microsoft Access database back end to software on Windows? TextWrangler would probably have been within that method, and I probably asked a related question (without mentioning TextWrangler) in Stack Exchange. I'll certainly find paperwork filed beneath my desk at work.

At a convenient time, ask a database administrator (not necessarily @arn) to find/seek all instances of that character.

Expect some matches to be within private messages. So, I do not expect the results of that search to be shared.

Get a ballpark figure for the number of public instances of the disappearance. (If, for example, two en dashes have disappeared from a single line, I should count that as two instances.)


In the meantime

Without that ballpark figure, I have a hunch that – if given a list of affected topics – it'll take no longer than a few hours for me to manually edit my affected posts to work around the disappearances.
 

grahamperrin

macrumors 601
Original poster
Jun 8, 2007
4,946
627
Migration to XenForo: some en dashes converted to control characters, U+0096 (start of guarded area)

non-ASCII character, whatever it is for the en dash case
Beginning with one of the topics where characters now invisible were probably en dashes before migration: I copied from the forum, pasted to Character Viewer in Mavericks. The character is recognised as 'START OF GUARDED AREA':

START OF GUARDED AREA.png


That's something like a dash in MingLiu:

2015-06-27 09-15-40 screenshot.png 2015-06-27 09-15-43 screenshot.png 2015-06-27 09-15-49 screenshot.png

https://en.wikipedia.org/wiki/Ming_(typefaces) for reference, but that similarity to a Chinese character may be coincidental/misleading.

----

More relevant, in the order that I found them:

http://www.georgehernandez.com/h/xComputers/CharacterSets/zArc_ANSI.htm (2005-10-28) with the following phrase in the same row as 'en dash':

OPT+-
SPA = start of guarded area
http://stackoverflow.com/a/20059572/38108 (2013-11-18) with:

… U+0096 is "start of guarded area" …
Some UTF-8 characters do not show up on browser (2009-09-09) – @arn, the accepted answer there may be most relevant to the problem here with XenForo.
 
Last edited:

grahamperrin

macrumors 601
Original poster
Jun 8, 2007
4,946
627
Migration to XenForo: some ellipses … converted to control characters, U+0085 (NEXT LINE (NEL))

At http://forums.macrumors.com/posts/21287006 for example, seven characters that are now invisible were previously ellipses. The offending control character is recognised as 'NEXT LINE (NEL)':

NEXT LINE (NEL).png


In Stack Exchange I found a few items that might help to pinpoint the root cause(s) of inappropriate conversions of this character. Three of those items:


Safari can find the offending character on a page –

2015-06-28 15-57-11 screenshot.png

– but the search that's integral to MacRumors can not. A Google attempt to find the offending character –

https://www.google.co.uk/search?q="…" site:forums.macrumors.com

– finds pages that do not include the character.

@arn, please:
  • can you arrange safe replacement of offending control characters (such as the two exemplified above) with the proper visible characters?
No rush. Thanks.