PDA

View Full Version : AFP Connection Failures and General Flakiness




DJLC
Oct 3, 2012, 09:29 AM
I'm working for a charter middle school with a 1-to-1 MacBook initiative. We've got one lone Xserve supporting about 200 MacBook Airs for students, as well as 15 MacBook Pros and 4 iMacs for staff. It's my first experience being a server admin for an OS X Server -- my background is in Windows Server with mixed-OS clients. I have to say that I am not at all impressed.

Anyway, getting to the issue, the server randomly denies AFP connections going to its FQDN (xserve.arsnc.private). As a result, students get errors upon login when it tries to mount their network home / sync their mobile account, and encounter great difficulty accessing other sharepoints. Oddly enough, AFP connections *always* work when using the IP address (192.168.2.200).

Aside from that, Open Directory has been randomly removing users from their groups, randomly changing service access settings for users, and occasionally dropping user passwords. I've had to reboot the server on several occasions because it refuses any and all authentication requests -- meaning nobody can log in at all.

I've spoken to a few people about the AFP issue. Most point to DNS -- "you shouldn't use .private," "you probably have multiple A records," etc. Unfortunately, the Apple consultant that upgraded the server to 10.7 chose the .private domain, and there's no way I can reimage or reconfigure every single MacBook until next summer. I've wiped out all the DNS settings and started over from scratch. Running both dig and nslookup return correct results. There seems to be no rhyme or reason to which users have problems. In the logs, the two most common messages are "Misconfiguration in the hash 'Kerberos'," and "Client response doesn't match what we generated."

Can anyone point me toward a solution? I've spent days Googling the issues. As far as I've been able to tell, it's just a common issue in Lion Server and there is no resolution.



DJLC
Oct 18, 2012, 09:25 AM
Update: Open Directory is doing better. Apparently I left the Users & Groups prefpane open for student access, so they were causing some issues on their own. Our Apple consultant has also said that Open Directory is flaky by nature, so whatever...

But what about the failing AFP connections? I really don't know where else to look. I know DNS is fine because every other service works over the FQDN (HTTP, Jabber, OD, LDAP). AFP itself refuses 50% of the connections coming to the FQDN; however, AFP connections to the IP address *always* work. Any thoughts at all?

jared_kipe
Oct 18, 2012, 10:37 AM
Any particular reason you haven't upgraded to 10.8 server (mountain lion)? I've found it actually to be more stable and better in general.

Otherwise download, and run Wireshark to look at packets, perhaps you'll find that there is an issue only with IPv6.

DJLC
Oct 18, 2012, 11:09 AM
We upgraded to 10.7 over the summer. We can't bring the clients up to 10.8 because of software incompatibility instructional software gets updated at a snail's pace. I suppose there's really nothing stopping us from moving up to 10.8 on the server, but I can't do anything about it until next summer (or over winter break) since it's mission-critical. I guess the question would be whether or not Workgroup Manager still functions in 10.8; we were told by Apple that the new Profile Manager was buggy and unusable in 10.7, and as a result I have no idea how to configure or use it.

I'll give Wireshark a try. We haven't enabled any IPv6 stuff, so theoretically there shouldn't be a problem there. Maybe I missed something, though.

jared_kipe
Oct 18, 2012, 11:22 AM
We upgraded to 10.7 over the summer. We can't bring the clients up to 10.8 because of software incompatibility — instructional software gets updated at a snail's pace. I suppose there's really nothing stopping us from moving up to 10.8 on the server, but I can't do anything about it until next summer (or over winter break) since it's mission-critical. I guess the question would be whether or not Workgroup Manager still functions in 10.8; we were told by Apple that the new Profile Manager was buggy and unusable in 10.7, and as a result I have no idea how to configure or use it.

I'll give Wireshark a try. We haven't enabled any IPv6 stuff, so theoretically there shouldn't be a problem there. Maybe I missed something, though.

I've seen AFP be negotiated over IPv6 without even trying. Just does it...

Upgrading the server to 10.8 can happen even on a 'mission critical system'. Just plan it out and do it overnight.

Take server offline. Duplicate HDD to external (preferably raid).
Upgrade and test, take online.

Worst case scenario, services are buggier, at which point you just re image the internal (from that external).

You wouldn't need to change passwords, or reconfigure any client machines.

DJLC
Oct 22, 2012, 07:50 AM
I've seen AFP be negotiated over IPv6 without even trying. Just does it...

Upgrading the server to 10.8 can happen even on a 'mission critical system'. Just plan it out and do it overnight.

Take server offline. Duplicate HDD to external (preferably raid).
Upgrade and test, take online.

Worst case scenario, services are buggier, at which point you just re image the internal (from that external).

You wouldn't need to change passwords, or reconfigure any client machines.
I suppose if there's no other alternative, but I'd really rather not waste a night at work just to fix a small annoyance. Students and staff are already navigating around the problem by connecting directly to the IPv4 address when they see the connection error, and I don't get paid overtime.

That said, IPv6 is functioning as link-local only, and connections to the server's IPv6 address *do* work.

jared_kipe
Oct 23, 2012, 06:47 PM
I suppose if there's no other alternative, but I'd really rather not waste a night at work just to fix a small annoyance. Students and staff are already navigating around the problem by connecting directly to the IPv4 address when they see the connection error, and I don't get paid overtime.

That said, IPv6 is functioning as link-local only, and connections to the server's IPv6 address *do* work.

Do you have a split horizon DNS (meaning afpserver.com => 192.168.1.5 on internal network and afpserver.com => 66.100.100.5 on outside network)?

If so perhaps there is either a DNS cache issue with either the computers or some part of the AFP application. Also could possibly be a misconfiguration in the DNS settings coming from your DHCP server.

DJLC
Oct 23, 2012, 08:09 PM
Nope -- we use xserve.arsnc.private as the FQDN (I know, bad plan; blame Apple's consultants), so there shouldn't be anything outside that to confuse it.

I'll double check the DHCP settings on the Sonicwall tomorrow and take another look at the DNS settings on the Xserve.

DJLC
Oct 26, 2012, 07:09 AM
So after battling the stupid SonicWall yesterday, I have to wonder: would I be better off running DHCP on the Xserve? That's the only real change in the whole configuration this year, aside from the 10.7 upgrade.

maalox
Oct 26, 2012, 08:28 PM
I am interested to find out how you resolved this issue.
Our students are experiencing the same thing.

I have Mac labs with around 20 stations each. Kids can log in without any problems but will intermittently have there network shares show up. And its as random as can be. Its never the same user or workstation at any given time and it changes every day.

I first thought it was a DNS issue but I did not see any problems on that end.
I am aslo running AD 08 R2 window shares.

Its really starting to become a nuisance. Plus I am a total X server newb to say the least.

DJLC
Oct 30, 2012, 01:36 PM
I still have not gotten the problem solved. As far as the original OD flakiness goes, I just cut off access to the Users & Groups prefpane via Workgroup Manager. That helped a bit. Apple tells me it's a little flaky by nature, though, so YMMV.

As for file sharing, I still have not solved the problem. Students will still get an "Unable to connect" error with no technical details. My DNS is fine, my DHCP is fine, my AFP / SMB settings are fine. I don't know where else to look. I've been having the students connect to file shares via the server's IP, which seems to successfully avoid the issue.

jared_kipe
Oct 30, 2012, 02:52 PM
If it works 100% of the time with straight IP and <100% of the time with FQDN then there is something wrong with your network, your DNS servers, DHCP or anything else related to reliably resolving the FQDN to the IP address.

You could try editing a handful of computers /etc/hosts file to manually point the server's FQDN to the correct IP address and see if the percentage of problems with those computers goes to zero. (this would confirm a DNS issue of some kind)

You could try turning on the AFP log and hoping to find something useful there. http://support.apple.com/kb/HT5541?viewlocale=en_US&locale=en_US

You could try configuring your routers and switches to block multicast and broadcast packets? Other than DHCP of course.

EDIT: reordered in the order I would probably try things.

Les Kern
Oct 31, 2012, 08:26 AM
I HIGHLY suggest you call Apple and set up a time when an engineer can come by. I used OD on about 1200 machines across 30-some servers and at one point all hell was broke loose. It took the engineer a little more than one day to figure it all out and get it running perfectly. They are worth every penny of the seemingly huge rate.
Now, while all that is well and good, I decided 2 years ago to dispense with mounting home directories. Made all the machines clones, added handy shortcuts for servers to the dock in a folder (student1, dropserver, etc.) so they can still get to their files and drop things to the teachers, locked them down with parental controls and Deep Freeze, never looked back. Still use OD for LDAP and setting up groups for accessing servers like for the yearbook, etc. The load on the network is almost zero compared to before, and all I have to worry about now are things that HELP them learn instead of getting in the way OF them learning. It was a eureka moment for sure. Consider it.

yojo056
Oct 31, 2012, 02:44 PM
So after battling the stupid SonicWall yesterday, I have to wonder: would I be better off running DHCP on the Xserve? That's the only real change in the whole configuration this year, aside from the 10.7 upgrade.

OH NO, you have sonicwall!!!! YIKES! Peace of crap! We had their combo vpn/dhcp/firewall/wireless controller/swiss army knife unit... our wireless SUCKED. Crashed all the time.. we had bought so many and it got so bad that one of their level 3 engineers who was one of their head developers came out and looked at it.. made a few changes.. didn't help. Finally we ripped it all down and installed cisco.. never looked back.

My suggestion, run DHCP from the xserve.

----------

If it works 100% of the time with straight IP and <100% of the time with FQDN then there is something wrong with your network, your DNS servers, DHCP or anything else related to reliably resolving the FQDN to the IP address.

You could try editing a handful of computers /etc/hosts file to manually point the server's FQDN to the correct IP address and see if the percentage of problems with those computers goes to zero. (this would confirm a DNS issue of some kind)

You could try turning on the AFP log and hoping to find something useful there. http://support.apple.com/kb/HT5541?viewlocale=en_US&locale=en_US

You could try configuring your routers and switches to block multicast and broadcast packets? Other than DHCP of course.

EDIT: reordered in the order I would probably try things.

Another idea.. why not just get rid of IPv6?? you don't have enough computers to justify it anyway. Theres no real advantage to it inside your network.

DJLC
Nov 5, 2012, 09:27 AM
I'm going to move DHCP back to the Xserve during our next half-day for students (in a few weeks). We'll see if that helps at all. We did have an Apple consultant / engineer come by to handle the Lion upgrade on the Xserve over the summer and he's actually the one who moved DHCP to the Sonicwall. I do agree with him that something as basic as DHCP is better handled by a router (so if the server is down users can still get online), but it becomes a moot point if our Sonicwall can't keep up. Besides, our iPrism filter authenticates via LDAP, so without the server you're not getting anywhere anyway.

The good news is the only Sonicwall device we have is the firewall. Our wireless is all Xirrus, and our switches are a mix of HP, Netgear, and Cisco. I haven't found any major problems with any of that equipment (yet).

As for IPv6, a lot of people have pointed me in that direction as being a source of problems. But since it's not present in any of the GUIs and I only get a link-local address, I can only assume that it isn't active.

Doing away with the mobile home directories might be a good plan for the future, but for now it's kind of a cornerstone of our whole setup. It doesn't sync anything except Desktop, Documents, and Library, and even then it only syncs text-based documents. The sync settings are all managed so we don't have 180 users syncing every 5 minutes. It's mostly a safety net if a student needs a loaner computer, all their notes and homework move over automatically. Beyond that they're responsible for making their own backups. It has worked pretty reliably so far; in fact, the only file sharing issues we have come from connecting to non-home shares (ie., Student Files / Dropboxes).

This all will probably be complicated further in the coming months. Our iPrism filters are going to expire, so we'll be switching to a free filtering service from NCDPI. But to do that, we've got to consolidate our two independent internet connections and establish a WAN between our elementary and middle school sites. Once that happens, we might have a shiny new box that can handle DHCP properly.