Server randomly losing new user accounts

Discussion in 'Mac OS X Server, Xserve, and Networking' started by dmylrea, Mar 2, 2012.

  1. dmylrea macrumors 68000

    dmylrea

    Joined:
    Sep 27, 2005
    #1
    We have 2 SL servers (10.6.8). One is an OD master and the other holds a replica.

    Recently, I created a new user account for a new employee. We started having problems with the employee being unable to log in or connect to shares. If we reboot the server, the user can log in (for a while).

    When the user can't login, looking at Workgroup Admin on the OD master, the account is there. On the OD replica, it's gone. If I "refresh" WA, the account shows up, but the user still can't log in. Reboot the replica server, and the user can log in and the account shows up in WA without having to "refresh". After some amount of time (less than 24 hours) the problem returns, with the same issue...

    At first I thought there was some sort of communication problem between the servers, but I don't think there is. When it's not working and the account is missing in WA on the replica server, if I make a change to the user account on the OD master server (like change the picture icon of the user or other info in the account), then go to the replica server and start WA and refresh so the account shows up, the change is there! If I make more changes at this point, the changes are immediate on both servers. So, they are talking.

    Any ideas? This is driving me crazy why it's happening, why restarting the replica server fixes it, and why it stops working again after a day or so.
     
  2. StevenMeyer macrumors member

    StevenMeyer

    Joined:
    Dec 17, 2011
    Location:
    New York... Where Else?
  3. dmylrea thread starter macrumors 68000

    dmylrea

    Joined:
    Sep 27, 2005
    #3
    Do you have some useful help? Try posting it. :rolleyes:
     
  4. StevenMeyer macrumors member

    StevenMeyer

    Joined:
    Dec 17, 2011
    Location:
    New York... Where Else?
    #4
    Sure. I think this speaks to the level in which Apple does not give a #$%@ about server. I have had similar issues and have gone back to Windows. Try it, it works really well.
     
  5. dmylrea thread starter macrumors 68000

    dmylrea

    Joined:
    Sep 27, 2005
    #5
    I wish I could. All my clients have Windows Server and Exchange Server except for this one, which I came into with a mixture of Mac and Windows clients, and all Mac servers. After working a year with this, trust me...I'd LOVE to replace the servers with SBS 2011, but at this time, it isn't possible.

    In the mean time, I'd love to find a fix. I believe it's an LDAP problem. Looking at the console logs on the replica server, the ldap log shows constant entries stating "LDAP Server not found".

    Why it works for a bit, then stops...I'm not sure.
     
  6. StevenMeyer, Mar 6, 2012
    Last edited: Mar 6, 2012

    StevenMeyer macrumors member

    StevenMeyer

    Joined:
    Dec 17, 2011
    Location:
    New York... Where Else?
    #6
    Any way you can post logs? I'll try and hash through them. LDAP is always a tricky protocol. You can also do something called an ldap trace, I know the commands for unix and linux but not osx (ill link another page). Wireshark should help you also look for bad packets.
    http://forums.novell.com/netiq/neti...s-when-one-replica-becomes-unavailable-2.html
     
  7. StevenMeyer macrumors member

    StevenMeyer

    Joined:
    Dec 17, 2011
    Location:
    New York... Where Else?
    #7
    After re-reading this it really looks like a blacklisting. Have you tried to have the user login through a terminal you know is problem free? Every once in a while when mac's networking card go bad they start shooting out tons of crap that gets them locked out of things (it would be rare but SL could think its the user trying to attack the server).
     
  8. dmylrea thread starter macrumors 68000

    dmylrea

    Joined:
    Sep 27, 2005
    #8
    It's not the workstation (which is a Macbook Pro). Even trying to use the users account to connect to a share FROM THE SERVER TO THE OTHER SERVER, it will not use that users account (but will use other users' accounts). Also, to allow this new employee to access the shares, she is successfully using the previous employees credentials when connecting to shares.

    What I cannot understand is why this one user, and why it works fine for a while and then stops working?

    Here's a sample of the ldap log on the replica server JUST AFTER A RESTART:


    Mar 3 12:56:19 art slapd[76]: @(#) $OpenLDAP: slapd 2.4.11 (Aug 12 2010 17:17:10) $
    Mar 3 12:56:19 art slapd[76]: daemon: SLAP_SOCK_INIT: dtblsize=8192
    Mar 3 12:56:23 art slapd[76]: bdb_monitor_db_open: monitoring disabled; configure monitor database to enable
    Mar 3 12:56:23 art slapd[76]: slapd starting
    Mar 3 13:09:47 art slapd[76]: do_syncrep2: rid=183 (-1) Can't contact LDAP server
    Mar 3 13:09:47 art slapd[76]: do_syncrepl: rid=183 retrying
    Mar 3 14:09:46 art slapd[76]: do_syncrep2: rid=183 (-1) Can't contact LDAP server
    Mar 3 14:09:46 art slapd[76]: do_syncrepl: rid=183 retrying
    Mar 3 15:09:45 art slapd[76]: do_syncrep2: rid=183 (-1) Can't contact LDAP server
    Mar 3 15:09:45 art slapd[76]: do_syncrepl: rid=183 retrying
    Mar 3 16:09:47 art slapd[76]: do_syncrep2: rid=183 (-1) Can't contact LDAP server
    Mar 3 16:09:47 art slapd[76]: do_syncrepl: rid=183 retrying
    Mar 3 17:09:47 art slapd[76]: do_syncrep2: rid=183 (-1) Can't contact LDAP server
    Mar 3 17:09:47 art slapd[76]: do_syncrepl: rid=183 retrying


    Then, after a few days I start getting this:

    Mar 6 02:11:34 art slapd[76]: do_syncrep2: rid=183 (-1) Can't contact LDAP server
    Mar 6 02:11:34 art slapd[76]: do_syncrepl: rid=183 retrying
    Mar 6 02:22:46 art slapd[76]: SASL [conn=744] Failure: Have neither type of secret
    Mar 6 03:09:58 art slapd[76]: do_syncrep2: rid=183 (-1) Can't contact LDAP server
    Mar 6 03:09:58 art slapd[76]: do_syncrepl: rid=183 retrying
    Mar 6 03:11:33 art slapd[76]: do_syncrep2: rid=183 (-1) Can't contact LDAP server
    Mar 6 03:11:33 art slapd[76]: do_syncrepl: rid=183 retrying


    Maybe you can make some sense of this? Thanks!
     
  9. matspekkie macrumors member

    Joined:
    Oct 19, 2010
    #9
    Have you tried to make a new user and move all the files from the old account to the new and then delete the old account?
     
  10. cg0def macrumors regular

    Joined:
    Feb 9, 2009
    #10
    you should check for a hardware failure. Not sure what hardware you're running on, but if it's a Mac Mini or anything that does not have ECC RAM a memory failure can manifest in very very strange ways. I'm speaking from personal experience. The OS does not always crash and memory faults do usually manifest themselves as software bugs.

    Anyway, since you are on SL, you can use AppleJack but generally, you should use Apple Service Diagnostics or Apple Hardware Test tool. Apple Service Diagnostics is rather hard to get if you don't already have access to it, AHT comes as part of your SL installation (hold the D button before the gray startup screen)
     
  11. dmylrea thread starter macrumors 68000

    dmylrea

    Joined:
    Sep 27, 2005
    #11
    They are running on 2009 Mac PRO's. I hear what you're saying, but it sure seems unlikely that it's a hardware problem!
     
  12. StevenMeyer macrumors member

    StevenMeyer

    Joined:
    Dec 17, 2011
    Location:
    New York... Where Else?
    #12

    This is 90% a cert issue. when ldaps replicate they use a ssl or tls authentication. remove your certs and try again.
     
  13. cg0def macrumors regular

    Joined:
    Feb 9, 2009
    #13
    You are absolutely correct that it is not a hardware related issue and I jumped the gun a bit there. I just took a better look at the log file that you have posted.
    Is there any reason why you slapd times out every hour? It would seem like you are having a session time out problem which is usually caused by a configuration problem of the tcp stack and more specifically the
    tcp keepalive parameter.

    I have no idea why your firewall goes crazy after a couple of days but it's probably counting a number of failed attempts before deciding that it's an attack. Anyway a quick and dirty solution would probably be to adjust the tcp keepalive parameter on the client machine.

    Rather than me describing how to do this here's a very good description

    http://www.gnugk.org/keepalive.html

    take a look under the FreeBSD and MacOS section. You will need to make sure that the tcp keepalive (which on OS X is net.inet.tcp.keepidle + (net.inet.tcp.keepintvl x 8) ) is not larger than the allowed connection time period on the server. I think your server is set to max 60 min because you get errors every 61st minute. If you don't want to keep changing the setting on every new computer that you guys get, you might want to relax the server settings a bit.
     
  14. StevenMeyer macrumors member

    StevenMeyer

    Joined:
    Dec 17, 2011
    Location:
    New York... Where Else?
    #14
    +1 +cert issue
     
  15. matspekkie macrumors member

    Joined:
    Oct 19, 2010
    #15
    You probably checked this too but anyway. The clock on the servers should be in sync anything greater then a 5 minute drift will stop open directory from working.
     
  16. dmylrea thread starter macrumors 68000

    dmylrea

    Joined:
    Sep 27, 2005
    #16
    Just checked, and the time and date on both server are correct within a few seconds. Thanks, though!

    ----------

    I wasn't sure if slapd was timing out every hour, or just that it only checks every hour in order to replicate. It's interesting though that slapd is having a problem, because a month or two ago we had an ongoing problem with slapd ON BOTH SERVERS using 25-50% CPU constant. Rebooting did not fix it. Then, for some reason, rebooting the servers one weekend and slapd was no longer stuck on high CPU.

    The way I see it now is, the servers DO see each other, as I can make account changes to this one account and the other server picks it up immediately. Now, this account does show up in WM on both servers, but is still not usable. I still cannot use that account to mount a share from either server unless I reboot the replica server.

    I don't see this as a workstation issue, as I can replicate the problem completely using only the two servers (connect to share on one server from the other server using this trouble account). No client station is involved.

    I can look at the certs being an issue. Whereabouts would I go to look at that, and if I remove them and reinstall them, what are the ramifications of doing so?

    Thanks all for your help! :)
     
  17. StevenMeyer macrumors member

    StevenMeyer

    Joined:
    Dec 17, 2011
    Location:
    New York... Where Else?
    #17
    http://support.apple.com/kb/HT4183
    If you disable all certs then restart, it may fix it.
    Either way post the log again to see if that changed anything. (unless it did fix it, in which case i'm a wizard!)
     
  18. dmylrea thread starter macrumors 68000

    dmylrea

    Joined:
    Sep 27, 2005
    #18
    I will look at that kb article. One thing I did do a couple of days ago (and, so far, so good), is I changed the users password type from Crypt Password to Open Directory. Her account still works! But, I do have other user accounts that were created with Crypt Password and never have had a problem with them. Most user accounts look like they were created with Open Directory password type.

    Could the certs still be a problem given these outcomes? When the account was first created it was OD. In troubleshooting this mess, we did change to Crypt, but obviously that didn't fix it. Changing it back to OD seems to have made something work again (and I didn't have to reboot the server to get the account to work).

    It's only been a couple of days, but I'm keeping my fingers crossed.
     
  19. StevenMeyer macrumors member

    StevenMeyer

    Joined:
    Dec 17, 2011
    Location:
    New York... Where Else?
    #19
    How deep in this rabbit hole do you want to go? I think the certs are on a per user basis. I can research it if you really want to KNOW.
     
  20. dmylrea thread starter macrumors 68000

    dmylrea

    Joined:
    Sep 27, 2005
    #20
    Well, keep in mind that this not only affects the client PC but also server-to-server, using the account credentials when asked. The article you referenced seemed to indicate server-to-client SSL OD binding, but that may not be the root issue here.

    So, not sure if we're looking down the right rabbit hole or not! :)
     
  21. StevenMeyer macrumors member

    StevenMeyer

    Joined:
    Dec 17, 2011
    Location:
    New York... Where Else?
    #21
    OOPS wrong KB...
    http://support.apple.com/kb/HT3745
     
  22. dmylrea thread starter macrumors 68000

    dmylrea

    Joined:
    Sep 27, 2005

Share This Page