Announcement

Collapse
No announcement yet.

YASU + nVRAID == array corruption ?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • YASU + nVRAID == array corruption ?

    Not sure if this is actually YASU related but I need to check

    The system is running XP 64bit version, SP2. Latest version of DT and YASU build 7033. I just downloaded 7035. The chipset is 680i with the latest chipset drivers.

    I didn't have to use YASU at all up to now. I have been using it for about 6 days now.

    I have a software RAID array on the nvRAID south bridge controller on this board. Type is 0+1 with 4 WD RAptors 74G.
    Three days ago nvraidservice reported the disk on channel 1.1 failed just after I had quit Heroes of Might and Magic 5 which I was running with YASU. I quit the game since I was noticing visual artifacts after a bit of alt-tabbing. Thinking the drive was toast I shutdown to check it.

    After a bit of troubleshooting with cables etc (I will save you the useless details here) I figured out the drive was fine. I just rebuilt the array and it worked fine. And I was just thinking...odd.

    30 minutes ago I quit the same game again and this time nvraid service reported a disk failing again. First thought was that the drive was dying slowly. But when I actualy checked this time it was a different drive. It was channel 2.0.

    I tried rebuilding from the crappy nVIDIA windows tool. It said it couldn't find the drive. Although the drive was visible in the tool. Reboot. BIOS RAID tool and now I see 2 disks corrupted.
    Channels 2.0 and 2.1. Also funny thing is that the BIOS tool sees those 2 drives as separate corrupted arrays (same it did with the previous occurance).

    Right now, now the system is live and I'm rebuilding the array and synchronizing those same disks. And everything is working fine other than me boggling to figure out what causes this.

    If I am not wrong YASU hides the scsi drives by blocking access to specific registry keys, no? (I've noticed sptd does that to it's key)

    Is it possible that it somehow blocks access to nvraid related keys? Thing is this started happening after I used YASU and with my not so modest knowledge I think it's the only possible culprit.

    I did check the filesystem for corruption and even did strain tests on the array with various monitoring tools. There was no corruption whatsoever. If it's not YASU related the only other culprit could be the server 2003 SP2.

    I will not use YASU until I hear for you, both as a test to see if this will happen again with yasu not running and as a precaution as well. I'd be in deep trouble if the corrupted channels were not the 2 mirror ones...

    Thanks in advance,
    Nodens
    Co-Creator of Akkadia MUD Game Engine Codebase (GPL 1999)
    In /dev/null no one can hear you scream!

  • #2
    Just wanted to report that no RAID array issues have been noted up to now since I haven't been running YASU...
    Co-Creator of Akkadia MUD Game Engine Codebase (GPL 1999)
    In /dev/null no one can hear you scream!

    Comment


    • #3
      I could imagine that the nvraid background service regularily queries the scsi devices for changes and that a failure to do so might prompt it to issue an unfounded command to the raid driver indicating a problem.

      Comment


      • #4
        Aye, this is my assumption as well. It is quite possible that the array was never corrupted but the driver just thinks it is because the service provider can not access the channels and chooses to remove the degraded/problematic channel from the array.

        Which comes to the question, does YASU block access to all kinds of SCSI devices or said device is not intentionally affected? And in case this first scenario stands, is it possible to exclude devices like RAID miniports or not, due to implementation?

        I hope sYk0 can shed some light on this
        Co-Creator of Akkadia MUD Game Engine Codebase (GPL 1999)
        In /dev/null no one can hear you scream!

        Comment


        • #5
          I have a Nforce4 2 drive array (raid 0) for my disc images, and files. My games and os are on the same non-raided drive. I haven't had any problems with my array while using YASU, any version. I'm on x32 bit xp pro. I wonder if it's releated to 64bit xp?
          It is so choice. If you have the means, I highly recommend picking one up. -Ferris Bueller

          Comment


          • #6
            Originally Posted by Chiefnuts View Post
            I have a Nforce4 2 drive array (raid 0) for my disc images, and files. My games and os are on the same non-raided drive. I haven't had any problems with my array while using YASU, any version. I'm on x32 bit xp pro. I wonder if it's releated to 64bit xp?
            It is quite possible that this issue has not been manifested for you because your raid array is not being paged by the nvraidservice provider while YASU is running. Your operating system files, pagefile, game files, and running applications run of your non RAID drive. In theory your array gets paged only while a copy protection scheme checks for the original disk at the initialization process of a game since your images reside there. That period may not be enough for the issue to manifest itself (there may be several timers involved).

            Another option to consider is that your chipset is nForce4 based. This means that the whole service provider and driver approach could be entirely different from the n680i chipset (though I'm fairly sure the software codec for the RAID controller is compatible between those chipsets-I've been able to mount arrays created with nf4 on 680i).

            The third option as you suggest is the AMD64/EM64T extension difference. On the 64bit platform a driver inherently works quite differently. Certain coding practices are banned from this platform (eg kernel level system call hooks) so it is possible that this issue is 64bit specific.

            Lastly it good be related to the installation of SP2 and not YASU and just a coincidence that this happened twice while running YASU, so far. For example Server 2003 SP2 broke nhancer for me, and others, and the developer can not reproduce it. I'm currently waiting for the developer to send me a debug build on that one so I can provide him with debugger info.
            It's just that the very nature of what YASU does and the fact that twice now it happened while it was running, that makes it the number 1 suspect.
            Co-Creator of Akkadia MUD Game Engine Codebase (GPL 1999)
            In /dev/null no one can hear you scream!

            Comment


            • #7
              I agree with you completely. I thought it could very well be the x64 piece, to do different driver access levels & calls, and because YASU is usually loaded all the time and cloaking my drives. But all OS/Pagefile/App loading occurs on non-array drives. Hopefully, you can get to the bottom of this, becuase I was going to migrate the array to a i650/Core2 setup I just bought, and was planning on switching to x64, So it's good to know..
              Last edited by Chiefnuts; 03.04.2007, 15:07. Reason: my spelling sucks.
              It is so choice. If you have the means, I highly recommend picking one up. -Ferris Bueller

              Comment


              • #8
                I use x64 and nForce 570 chipset using raid-0 with dual 16mb cache 74gb raptors. I do not use the software tool in windows and I haven't had any issues. I also disable the nvidia raid service with start-run-services.msc. Try setting up and using your raid in BIOS only and you should be fine. Possibly try disabling the nvidia raid service in windows as I do as it is not necessary.

                Comment


                • #9
                  Originally Posted by arfett View Post
                  I use x64 and nForce 570 chipset using raid-0 with dual 16mb cache 74gb raptors. I do not use the software tool in windows and I haven't had any issues. I also disable the nvidia raid service with start-run-services.msc. Try setting up and using your raid in BIOS only and you should be fine. Possibly try disabling the nvidia raid service in windows as I do as it is not necessary.
                  I never said anything about any software tool.I always setup the arrays via the BIOS tool. What I meant is that the whole nVRAID solution is a software codec, BIOS assisted. It's not a hardware RAID solution.

                  Also if you disable the nvraidservice you won't be able to see degraded arrays (in your case it's not needed since if your stripe set gets degraded your array is toast anyway). Also the nvraidservice is THE ONLY way of rebuilding an array when your mirrorred array or RAID 5 gets degraded. It doesn't matter if you select to rebuild from the control panel or BIOS tool, the nvraidservice polls the array, reads the flag and performs the operation. So the service is necessary unless you are on a simple RAID-0 like you are. Every other RAID mode, supported by the codec (1,0+1,5), needs that service.

                  But let's get back on topic. I am beginning to believe YASU is entirely unrelated to the degradation issue. Yesterday it happened again without YASU running. My latest theory is that 3 out of the 4 Raptors were running on the same PSU line and that's a common no-no as they can't get enough amperage. The common rule is 2 HDs per line... I will stress test with YASU on now that I've backed up the array.
                  Co-Creator of Akkadia MUD Game Engine Codebase (GPL 1999)
                  In /dev/null no one can hear you scream!

                  Comment


                  • #10
                    I posted the following message three days ago on a more private forum:

                    I'm having another weird issue; I seem to attract them. It's somewhat complicated, and seems to happen most under certain conditions. Though I have money to replace what needs to be replaced - if replacing is needed, I cannot afford trial and error with NewEgg.

                    It first started a long time ago when I was playing Rainbow Six: Vegas. One day I updated it. The next day the OS was complaining about some registry corruption. Could do nothing but reinstall. All is well until recently when I got C&C 3. Since then I've had registry corruption resulting in reinstalls twice, and some HD corruption issues. Both of these games are SecuROM games, by the way.

                    I am also running Alcohol 120% with updated SPTD drivers, along with YASU. At first, I thought these were to blame. The system was crashing without YASU - I now have crashing issues while not using it after a clean install, so that's not the issue. Never had a problem with Alcohol.

                    I can play WoW for hours at a time and have no issue - play C&C for hours - no issue. Then I boot it up again, and it crashes within 5 minutes, system reboots, raid not detected - or registry corruption. I should mention the only time I have raid detection issues is when the system reboots itself after a crash, and that a power clear (unplug PSU, hit power, plug back in) always fixes it.

                    MemTest86/Prime95 stable for 6 hours straight. This should rule out memory/CPU/PSU. One would think that since the raid is not detecting after an automated reboot (which only a "power clear" will fix), this is an HD/controller issue. I find it odd however, that this has until today, only been happening while I was playing SecuROM games. WoW crashed a few times today - it never does that. I tried to delete a file in the WoW folder, some character settings, the system complained that the files were corrupted and could not be deleted. I ran Chkdsk then was able to delete them.

                    So I'm leaning more towards HD/controller. I'm running 2xSATA 70gb Raptors in RAID0 on a DFi LP nF4 board. I've run extended tests with WD Diagnostic tools/SpinRite lvl4, both turn out fine. I know this is not the end all be all solution to HD diagnostics, but it's something.
                    My issues first started while running R6:V in combination with YASU; it appears you are not alone here. My full system specs are as follows:

                    Case: Coolermaster Praetorian PAC-T01-EK Black
                    PSU: Antec TruePowerII 550W
                    Mobo: DFi nF4 Ultra-D w/BIOS 3-10-05
                    Proc: AMD64 4000+ (San Diego)
                    ---------ThermalTake AquariusIII External Liquid Cooling System (AS5)
                    Video: Sapphire Radeon X1800XT 512MB GDDR3 PCI-E x16
                    Memory: Corsair XMS 2Gb (2x1GB) 2-3-3-6@2.8v Twinx2048-3200c2
                    Storage: 2x Western Digital Raptor 74GB 10,000RPM SATA150
                    Lite-On 165x DVSD Burner
                    NEC 1.44MB 3.5" Floppy Drive
                    OS: Windows XP SP2


                    I took the liberty of relocating my drives from SATA channels 1/2 to 3/4. As much as this looks like a software issue... I cannot get over the fact that the raid will not detect upon automated reboot (no blue screen).

                    Comment


                    • #11
                      Hi mate, you seem to make an invalid assumption. Running Memtest86 and Prime95 for stress tests, does not rule out your PSU. It does rule out your memory and CPU and certainly the memory controller (North bridge or CPU again as AMD chips carry it on-die).

                      Thing is that hard disks draw power from the +5V rail. If they can not draw enough amperage from your PSU, it may manifest in a number of ways. Corruption of the file system, you may hear clicking sounds (parking/unparking hard disk heads) and a few other things. You should really check your event log for errors that mention "timed out" IO operations or retries.

                      The way you describe your issue really sounds like a PSU issue to me.


                      Are the specs of your PSU were it states 40A for the 5V rail but the little asterisk notes:* +5V, +3.3V, +12V1, 12V2 maximum output 530 Watts max.

                      It could be that some other rail is overloading your PSU, the usual culrpit on such cases are the +12V rails since they feed the, nowaways power hungry, GFX cards.

                      I also find quite interesting that your Corsair modules run at 2.8V. That is extremely high for memory modules...does your liquid cooling system have memory blocks? I'd suspect any memory running at 2.8V to eventually fry...I'm also wondering why you need that high voltage, your memory timings are not that aggressive...


                      Originally Posted by Aerowinder View Post
                      I posted the following message three days ago on a more private forum:
                      My issues first started while running R6:V in combination with YASU; it appears you are not alone here. My full system specs are as follows:
                      Case: Coolermaster Praetorian PAC-T01-EK Black
                      PSU: Antec TruePowerII 550W
                      Mobo: DFi nF4 Ultra-D w/BIOS 3-10-05
                      Proc: AMD64 4000+ (San Diego)
                      ---------ThermalTake AquariusIII External Liquid Cooling System (AS5)
                      Video: Sapphire Radeon X1800XT 512MB GDDR3 PCI-E x16
                      Memory: Corsair XMS 2Gb (2x1GB) 2-3-3-6@2.8v Twinx2048-3200c2
                      Storage: 2x Western Digital Raptor 74GB 10,000RPM SATA150
                      Lite-On 165x DVSD Burner
                      NEC 1.44MB 3.5" Floppy Drive
                      OS: Windows XP SP2
                      I took the liberty of relocating my drives from SATA channels 1/2 to 3/4. As much as this looks like a software issue... I cannot get over the fact that the raid will not detect upon automated reboot (no blue screen).
                      Co-Creator of Akkadia MUD Game Engine Codebase (GPL 1999)
                      In /dev/null no one can hear you scream!

                      Comment


                      • #12
                        I'd have to agree about the memory voltage. I'm running Corsair XMS Xtra-Low Latency sticks (2-2-2-5), and I only bump mine by .1v for stability. Corsair does need more power than generic stuff, but 2.8v sounds high, since the standard should be 2.5v. That means you have a .3v bump above stock settings. Might be worth dropping that down a point or two.

                        Comment


                        • #13
                          On the Corsair website, for this particular RAM, they are calling for 2.75v, confirmed on their forums. My board rounds to the nearest tenth and 2.7v wasn't stable, hence the 2.8v.

                          I hear no unusual sounds from my drives, the system is quiet as well as the hard drives. There are no, and never have been, any errors in the event log pertaining to time outs or I/O errors. The only time I get an error is typically when I get a crash, I get NTFS error 55.

                          Is there a benchmarking program I can look into that will stress everything in the system at once, to see if I can reliably reproduce the problem (3DMark?)? Or is there a better way to test voltage stability? I still find it odd that my problems are not consistent, is that typical with power supply issues? I've never run into a bad or underpowered one before.

                          Why after I get auto-rebooted will the raid not detect (system sees only one drive)? I thought the power reset (or whatever) when the system reset?

                          But anyhow, thank you both for the info, certainly something to investigate.

                          Comment


                          • #14
                            You can download Burn-In-Test from www.passmark.com, but the trial version will only run for 15 minutes at a time. To run for longer periods, you have to purchase a license (though it's not too bad, only $24 for standard or $49 for professional). Which reminds me, I need to upgrade to v5. I'm still running v4.

                            Comment


                            • #15
                              Originally Posted by Aerowinder View Post
                              I still find it odd that my problems are not consistent, is that typical with power supply issues?
                              Yes! It is, and that's why there is no "easy" 100% remote "diagnosis" for us. But what you describe could be typical PSU-issue.


                              As your prob occurs also without yasu, I doubt that its a soft-
                              ware related prob and my best bet is your PSU

                              Comment

                              Working...
                              X