Migration bug(s): some disappearances of punctuation – en dashes, ellipses …

Discussion in 'Site and Forum Feedback' started by grahamperrin, May 31, 2015.

  1. grahamperrin, May 31, 2015
    Last edited: May 31, 2015

    grahamperrin macrumors 601

    grahamperrin

    Joined:
    Jun 8, 2007
    #1
  2. grahamperrin, May 31, 2015
    Last edited: Jun 28, 2015

    grahamperrin thread starter macrumors 601

    grahamperrin

    Joined:
    Jun 8, 2007
    #2
    At http://forums.macrumors.com/posts/20138440 the strange white space in a non-emphasised first line suggests that the loss losses of punctuation may be much more widespread.

    (Not limited to migrated subject lines of replies.)

    Apologies for not finding bugs of this type within the given test period.

    Postscript

    Elsewhere, all en dashes are missing from http://forums.macrumors.com/posts/21378086 so I assume that migration-related loss of that character was site-wide.

    @arn I don't expect a speedy fix, but this loss of punctuation – I use the character, properly, very frequently – will be a significant annoyance. If it's not too soon to ask: can you imagine a post-migration fix for bugs such as this?

    Critically: I have not begun paying attention to other characters such as em dashes …
     
  3. arn macrumors god

    arn

    Staff Member

    Joined:
    Apr 9, 2001
    #3
    It seems unlikely we can fix this automatically. Do the dashes work correctly going forward?

    arn
     
  4. grahamperrin thread starter macrumors 601

    grahamperrin

    Joined:
    Jun 8, 2007
    #4
    The strange appearance of space should have been a hint to me that maybe, it was not complete loss.

    Truly, it's not complete loss. Where an en dash is no longer visible, following migration, in at least one instance there's something non-ASCII between two spaces.

    Before I slept I suspected that this migration bug affected only the first line, first sentence or first paragraph of posts.

    Now, after a few hours' sleep, I have a clearer head.

    An ideal example may be http://forums.macrumors.com/threads...osemite-looks-terrible.1804151/#post-20138162. Compare:
    • the quote in that post #2
    • the first line of the the second paragraph of the opening post of that topic.

    Plan of action

    Do not rush thoughts of automatic replacement. This non-ASCII character, whatever it is for the en dash case, may be present where other types of character have disappeared. (I suspect that … some ellipses … have also become invisible characters; and so on.)

    Identify the non-ASCII character that is invisible in the example above. I can probably do that today. Early indications are that it's simply non-ASCII (neither a control character, nor a null).

    Notes to self: last year, what method did I use to identify the vertical tab character that caused problems with a Microsoft Access database back end to software on Windows? TextWrangler would probably have been within that method, and I probably asked a related question (without mentioning TextWrangler) in Stack Exchange. I'll certainly find paperwork filed beneath my desk at work.

    At a convenient time, ask a database administrator (not necessarily @arn) to find/seek all instances of that character.

    Expect some matches to be within private messages. So, I do not expect the results of that search to be shared.

    Get a ballpark figure for the number of public instances of the disappearance. (If, for example, two en dashes have disappeared from a single line, I should count that as two instances.)


    In the meantime

    Without that ballpark figure, I have a hunch that – if given a list of affected topics – it'll take no longer than a few hours for me to manually edit my affected posts to work around the disappearances.
     
  5. grahamperrin, Jun 27, 2015
    Last edited: Jun 28, 2015

    grahamperrin thread starter macrumors 601

    grahamperrin

    Joined:
    Jun 8, 2007
    #5
    Migration to XenForo: some en dashes converted to control characters, U+0096 (start of guarded area)

    Beginning with one of the topics where characters now invisible were probably en dashes before migration: I copied from the forum, pasted to Character Viewer in Mavericks. The character is recognised as 'START OF GUARDED AREA':

    START OF GUARDED AREA.png

    That's something like a dash in MingLiu:

    2015-06-27 09-15-40 screenshot.png 2015-06-27 09-15-43 screenshot.png 2015-06-27 09-15-49 screenshot.png

    https://en.wikipedia.org/wiki/Ming_(typefaces) for reference, but that similarity to a Chinese character may be coincidental/misleading.

    ----

    More relevant, in the order that I found them:

    http://www.georgehernandez.com/h/xComputers/CharacterSets/zArc_ANSI.htm (2005-10-28) with the following phrase in the same row as 'en dash':

    http://stackoverflow.com/a/20059572/38108 (2013-11-18) with:

    Some UTF-8 characters do not show up on browser (2009-09-09) – @arn, the accepted answer there may be most relevant to the problem here with XenForo.
     
  6. grahamperrin thread starter macrumors 601

    grahamperrin

    Joined:
    Jun 8, 2007
    #6
    Migration to XenForo: some ellipses … converted to control characters, U+0085 (NEXT LINE (NEL))

    At http://forums.macrumors.com/posts/21287006 for example, seven characters that are now invisible were previously ellipses. The offending control character is recognised as 'NEXT LINE (NEL)':

    NEXT LINE (NEL).png

    In Stack Exchange I found a few items that might help to pinpoint the root cause(s) of inappropriate conversions of this character. Three of those items:


    Safari can find the offending character on a page –

    2015-06-28 15-57-11 screenshot.png

    – but the search that's integral to MacRumors can not. A Google attempt to find the offending character –

    https://www.google.co.uk/search?q="…" site:forums.macrumors.com

    – finds pages that do not include the character.

    @arn, please:
    • can you arrange safe replacement of offending control characters (such as the two exemplified above) with the proper visible characters?
    No rush. Thanks.
     
  7. grahamperrin, Jun 28, 2015
    Last edited: Jul 5, 2015

Share This Page