Hi All,
We have updated our main file server from OSX 10.8.5 to 10.11.1. This was mainly done to ensure stability and to help our Windows 7 users who have recently had renewed issues with opening files and keeping them open on the older 10.8.5 server (as per previous problems with white pages on open PDF documents and random drop-outs of the server connection discussed on this forum about 2 years ago - see discussions on LANMAN regedits). A recent Windows 7 update has caused this issue to re-appear. Early testing on OSX 10.11.1 showed this was finally fixed. SMB3 connections to PCs seem to have fixed most issues. However we still have some remaining issues. Here is a bit of a run-down of our systems and processes to both help others who may experience the same problems and hopefully get some help fixing the last of our issues:
Our setup:
What now works with OSX 10.11.1 file server (that didn't or only marginally worked with the 10.8.5 and bombed out completely with 10.9.x):
What we needed to do to get there:
The upgrade was pretty smooth (plain "update" no need to "reinstall"). We thought we had it all under control, with no issues reported for half a day. This changed after lunch when multiple people using Windows 7 reported access to drives dropping out constantly. We traced this down via PAM error logs to the SMB service crashing once a certain number of files had been opened. Thanks to some helpful advice adapted from here:
This disaster has been averted. Basically, OSX 10.11.1 seems to have an inbuilt limit of 256 open files! You can check your limit by typing this into the Terminal:
We changed that to 65,000 by following the advice of inserting the limit.maxfiles.plist and limit.maxproc.plist and this has made the SMB service more reliable (see above document link for plist requirements).
What we still have issues with (any help appreciated):
While the SMB service now keeps running, there are times when for some unknown reason it "drops out" in that file browsing in Windows Explorer says the sharepoint is unavailable. Waiting a few seconds usually lets the PC users continue browsing. This problem is intermittent and occurs more when heavy file browsing is being done. When it does occur, multiple people will have the same issue. The only OSX log entries we see are:
and
These logs appear all the time in small doses, but when the file browsing on Windows Machines bombs out, these log entries seem to flood the logs. Sometime we also get this message in the midst of the flood:
This problem has not yet been fixed. If anyone has any fixing ideas we're happy to try them .
We have updated our main file server from OSX 10.8.5 to 10.11.1. This was mainly done to ensure stability and to help our Windows 7 users who have recently had renewed issues with opening files and keeping them open on the older 10.8.5 server (as per previous problems with white pages on open PDF documents and random drop-outs of the server connection discussed on this forum about 2 years ago - see discussions on LANMAN regedits). A recent Windows 7 update has caused this issue to re-appear. Early testing on OSX 10.11.1 showed this was finally fixed. SMB3 connections to PCs seem to have fixed most issues. However we still have some remaining issues. Here is a bit of a run-down of our systems and processes to both help others who may experience the same problems and hopefully get some help fixing the last of our issues:
Our setup:
- MacPro (cheese-grater) running 10.11.1, does main file shares using OSX server app as well as Kerio Email Server.
- VMWare on MacPro runs an AD server (Windows 2008 Server) and some legacy apps.
- Client Machines are all iMacs, some running bootcamp Windows 7, and others running Mac OSX10.8.5 and parallels with Windows 7 for legacy or specialty apps.
What now works with OSX 10.11.1 file server (that didn't or only marginally worked with the 10.8.5 and bombed out completely with 10.9.x):
- PDFs on server now open properly on Windows 7 clients without "white screens" after short periods of timeout.
- Files are now correctly showing up as "in use" if opened on multiple machines at once.
- Windows Applications can now be installed directly off the server (without having to copy them to the users desktop first).
- SMB share stability seems to have improved and is faster.
- General server stability seems to have improved.
- The new VMWare for the AD server also seems to have improved stability (touch-wood!).
What we needed to do to get there:
The upgrade was pretty smooth (plain "update" no need to "reinstall"). We thought we had it all under control, with no issues reported for half a day. This changed after lunch when multiple people using Windows 7 reported access to drives dropping out constantly. We traced this down via PAM error logs to the SMB service crashing once a certain number of files had been opened. Thanks to some helpful advice adapted from here:
This disaster has been averted. Basically, OSX 10.11.1 seems to have an inbuilt limit of 256 open files! You can check your limit by typing this into the Terminal:
launchctl limit maxfiles
We changed that to 65,000 by following the advice of inserting the limit.maxfiles.plist and limit.maxproc.plist and this has made the SMB service more reliable (see above document link for plist requirements).
What we still have issues with (any help appreciated):
While the SMB service now keeps running, there are times when for some unknown reason it "drops out" in that file browsing in Windows Explorer says the sharepoint is unavailable. Waiting a few seconds usually lets the PC users continue browsing. This problem is intermittent and occurs more when heavy file browsing is being done. When it does occur, multiple people will have the same issue. The only OSX log entries we see are:
smbd[47114]: query_directory error<0xc000000f>
and
digest-service[47082]: digest-request: init return domain: DOMAINNAME server: SERVERNAME indomain was: <NULL>
These logs appear all the time in small doses, but when the file browsing on Windows Machines bombs out, these log entries seem to flood the logs. Sometime we also get this message in the midst of the flood:
com.apple.xpc.launchd[1]: (com.apple.smbd[20695]) Service exited due to signal: Segmentation fault: 11
This problem has not yet been fixed. If anyone has any fixing ideas we're happy to try them .