Microsoft has identified an NTFS corruption issue that can cause a Windows Server 2008 R2 server to freeze or hang. This article explains what the corruption is, some ways that we have been able to recreate the corruption in our test lab and potential workarounds to prevent the corruption from causing the server to freeze. If you don’t care about the details you can just download a hotfix from Microsoft that solves the problem on Windows Server 2008 R2 or upgrade to Windows Server 2012 R2 or higher: http://support.microsoft.com/kb/2866695
The specific corruption discussed here is a multi-level circular NTFS reference that Windows self-healing cannot fix. An example of a circular reference is an NTFS ID that has above it an NTFS parent ID that points back to that same ID. If it is single level circular reference (NTFS ID xxxx has a parent ID whose ID is also xxxx) then Windows Server 2008 R2 self-healing is able to auto fix it. If on the other hand it is more deeply nested such as a->b->c->a then without a Microsoft hotfix the kernel could go into an infinite loop and the server would hang.
The primary way that Acronis has been able to reproduce the problem is using a Mac client that has two different windows open to the same folder hierarchy on the server. If a user drags a folder from one window into one of the folders below it in the other window the circular reference will be created. In other words moving the folder “a” from one window into folder “c” of a/b/c in the other window. Although less likely, the circular reference can also be caused by two different users moving folders within a very short period of time. An example of that would be Fred moving folder “a” to x/y/z at approximately the same time Sally moves folder “x” to a/b/c.
Once there is a NTFS multi-level circular reference the reason that Acronis Files Connect (formerly ExtremeZ-IP) hangs is because when Acronis Files Connect asks the OS about that folder the kernel goes into an infinite loop. Acronis Files Connect does have built-in safeguards so that if an operation takes a thread more than 5 minutes to complete Acronis Files Connect will attempt to cancel it. With a circular reference after 5 minutes Acronis Files Connect does attempt to kill off the stuck thread, unfortunately by then it is too late. The other Acronis Files Connect threads which need to do kernel tasks are all backed up behind that single thread and the service can’t even get enough work done to shut down the stuck thread. More information about Acronis Files Connect stalled thread handling can be found in the following article: https://kb.acronis.com/content/39371.
The hotfix from Microsoft should prevent all known circular reference issues in the NTFS driver and is the recommended solution to the circular reference problem. See their KB article: http://support.microsoft.com/kb/2866695 for more information from Microsoft about the hotfix.
According to Microsoft the kernel hang caused by the corruption was fixed in Windows Server 2012 R2, and we have not had any reports from the field of that involved Windows Server 2012 R2. This issue is not addressed in Windows Server 2012, (the 'R1' version). We recommend that you upgrade all servers to the latest version of Acronis Files Connect.
Acronis Files Connect 8.0.4 and later attempts to detect and avoid Microsoft disk corruption (also known as circular reference) regardless of whether a Microsoft Hotfix is installed or not. Unfortunately sometimes just checking the file system for the circular reference is enough to hang the kernel. In cases where we can identify a circular reference Acronis Files Connect will log an event log message pointing to the location of a circular reference.
In Acronis Files Connect 8.0.5 and later not only do we attempt to detect existing corruption but we also prevent users from moving parent folders into child folders for regular Windows drives.
Acronis Files Connect 8.0.6 and later extend the blocking behavior to mount points. With this release all known ways of creating the circular reference from a Macintosh client are blocked by Acronis Files Connect.
If the filesystem already has a circular reference then running chkdsk with the /f switch is the only way to resolve the issue. Note: chkdsk in Read-Only Mode Does Not Detect Corruption on NTFS Volume (see http://support.microsoft.com/kb/283340) so you must run it with /f flag. The /f flag will allow chkdsk exclusive access to the drive and will take about 2 or 3 minutes per million files on Windows Server 2008 R2 and much less time on Windows Server 2012 R2.
If chkdsk finds and fixes an error such as “Breaking links between parent file xxxxx and child file xxxxx” then your disk was suffering from the circular reference and you should no longer have the long delays after the repair.