force reboot of unresponsive remote server?

Discussion in 'Mac OS X Server, Xserve, and Networking' started by G0meZ, Nov 11, 2014.

  1. G0meZ macrumors regular

    Aug 9, 2011
    every now and then my 10.6.8 fileserver stops working afp & ssh wise. no remote desktop connection possible, either.

    so, not being able to login by any means, i cannot force a reboot on that server without actually travelling to the server room and switching the mac pro off and on again...

    isn't there any way to just cause the server to reboot somehow, without remote desktop or ssh? like a ping of death or something :)
  2. sevoneone macrumors 6502

    May 16, 2010
    Not really, that would be a major vulnerability to have something like that available that could just restart/shutdown servers with a packet. This is really what SSH is for.

    If I were you I would find out why the server is going down periodically, that isn't normal for SSH and AFP to lockup like that. What is the state of the rest of the machine when this happens? Is the entire OS non-responsive, kernel panic?

    My guess is that you may have a dying HDD or bad RAM
  3. G0meZ thread starter macrumors regular

    Aug 9, 2011
    hey that makes sense :p
    still though, there must be ways to create some emergency backdoor functionality...this is
    yeah. well, it ain't the hardware, hw diagnostics showed zilt and i've cloned the server onto another macpro and it blacks out the same way, preferably on the weekend....:mad:
    cannot reproduce the error so i got to wait for it to reoccur.
    when it happens, the system still runs, ping responds, locally all seems well.
    log files are useless to clues in afp down, nothing. system seems to descend into a semi-coma without warning...

    but here's the thing: i got a lacie4tb ext disk permanently attached, partitioned into several drives for timemachine and carboncopycloner.

    twice now i've had that disk dismounted over the weekend and the server stayed alive over that period.... so... no idea why but that drive seems to be the troublemaker... its filesystems are fine and both tm & ccc are functional, though.
  4. 556fmjoe macrumors 65816


    Apr 19, 2014
    If you lost SSH access, then you'll have to go hit the power button manually.

    I'd look into fixing the cause of its unresponsiveness rather than trying to create a backdoor (and security hole). What software are you running on it?
  5. G0meZ thread starter macrumors regular

    Aug 9, 2011
    agreed. yet i need a solution of any kind.
    regarding security holes...haha, you joking, right?
    all our systems are full of those and there's just nothing that could stop that.
    at latest since wikileaks we should all be fully aware that resisting unauthorized access to our systems is plain futile.

    like i said, it looks very much as if that ext. drive is the cause of it all...
  6. blacka4 macrumors 6502

    Sep 28, 2009
    so unplug the external drives and run the system for a few days and see what happens. if it doesn't lock up then you know what the problem is.
  7. Jamesbot, Nov 12, 2014
    Last edited: Nov 12, 2014

    Jamesbot macrumors member

    Jun 19, 2009
    If you don't have anyone you can call to reboot it for you, I'd take a look at a IP Managed Remote Power Switch/Strip.

    Basically a power strip that you can log into remotely to power-cycle specific outlets.


    Totally disagree. You should be able to secure your servers.

    Here are some links to get you started:

    When you say this, do you mean when you're physically at the machine everything seems fine?

    The account you're logging into over SSH, is it's home directory mounted on the local filesystem, or the remotely attached one?
  8. G0meZ thread starter macrumors regular

    Aug 9, 2011

    sweet, didn't know you could buy such a thing


    thanx for the links. totally agree, we should be able to secure them, but that's not up to me, it goes way deeper. heard abt the usb firmware hack? it's gone open source now! the adminuser-terminal that gives u root? openssl? the list goes on forever.
    so what gives? anybody with either the money and contacts or the expertise walks in and out of almost any networked computersystem unnoticed. no? the entire system has never worked for what we use it today. we're not far from rendering the internet useless i fear

    yes, standing in front of it, swearing & kicking, that's the only way to operate when it happens

    account is local, on sys disk. but i'm not even getting a login prompt from outside

    ... but like i said before, i strongly suspect the bloody ext drive...srv tested ok 2 weeks without it... maybe 10.6 has issues with too many active partitions or something?
  9. Jamesbot macrumors member

    Jun 19, 2009
    Are your backups running during these lockouts?

    It's kind of curious cause it sounds like an i/o problem, but if it was, your mac would feel sluggish even when you were logged in locally.

    That's why I asked where your home directory was... if your remote ssh is blocked waiting for a device or something on log in (something i've seen happen with user home directories mounted over NFS) it'll just hang there even though the system itself is ok.

    What do your system load scores look like?

    what does SSH's state look like when you do a ps aux | grep ssh?

    do you have anything like fail2ban running? maybe it's a firewall thing...

    what does ssh say in the system logs?

    what does your ssh client say when you run it verbosely?
  10. G0meZ thread starter macrumors regular

    Aug 9, 2011
    yes, i reckon, too. forgot to mention: once the error occurred, the server cannot shutdown or reboot no more, even locally.
    generally, the filesys control seems unstable; some 10 days ago an internal data drive went log entries again apart from the system starting to whine that it lost a drive...
    remount failed, disktuil repair-volume function came up with a thoroughly corrupted and unrepairable filesystem... had to reformat.. no idea if that's related, this happened on its own outside the usual afp/ssh problem hours.

    i got daily ccc-clones running on that 4tb drive plus tmachine, running currently in its default configuration, ie. hourly. so it's well possible that tmachine had been running when the srv went deaf and dumb...

    sys load is usually low, and stays low after it goes dark.
    no fail2ban or anything, trying to keep it simple here.

    no ssh log entries after it goes dark, like i said, i'm not even getting a password prompt...

    never tried -v when trying to login remotely...

Share This Page