Announcement

Collapse
No announcement yet.

MemTest86 6.2.0 issues in parallel mode and certain CPU cores

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • MemTest86 6.2.0 issues in parallel mode and certain CPU cores

    First of all, here's what I'm working with:

    CPU: AMD Phenom II X6 1090T BE
    GPU: NVidia GTX 650
    MB: ASRock 970M Pro3 (Firmware/UEFI version 1.30)
    RAM: 2 x 8 GB Kinston HyperX Fury Black DDR3
    PSU: Cooler Master V650S
    HDD: Samsung 850 EVO SSD 250 GB
    Case: BitFenix Prodigy M
    CPU Cooler: Arctic Cooling Freezer A11
    Fans: 2 x BitFenix Spectre 120 mm
    OS: Windows 8.1 64-bit & Linux Mint 17.2 Cinnamon 64-bit

    I was running MemTest86 6.2.0 to check my memory, since it's not officially supported for this particular motherboard. I started with a quick test with the default values in only let the test finish one pass, which it did without errors. As I was trying to pin down a problem I was experiencing with RealBench stress test, I thought I should probably run the test in parallel mode, since the stress test involves heavy multitasking. However, when I reached Test 2 the test just stopped progressing. It didn't freeze totally, mind you. The timer was still running and the icons representing CPU core activity were spinning - except for CPU1, which showed a "W" - so I thought I just needed to give the test some time. Six hours later only the time on the counter had changed. So I tried again. The second time around CPU1 started working as well, but almost immediately after that the computer just rebooted. There was no blue screen or any found memory errors that I could see, just a reboot.

    I decided to run the default test on each of the separate cores to see if CPU1 was somehow anomalous. So I started from CPU0 and checked if the CPUs could get past Test 2. This is what I saw:

    CPU0: OK
    CPU1: Passmark freezes before anything is tested. The testing screen appears, but the timer is dead from the start and the system doesn't respond to anything from the keyboard.
    CPU2: See CPU1
    CPU3: OK
    CPU4: See CPU1
    CPU5: See CPU1

    My system was at stock settings at the time and with a stock system the RealBench stress test had so far been the only thing I had had any problems with, and I had run roughly ten different benchmark tests before that. Furthermore, even with a mild OC I had passed eight hours of the standard blend test of Prime95 with six workers without any errors. I had also tried a custom version of the blend test which used all 16 GB of RAM for one hour and had encountered no errors.

    After encountering this issue with MemTest86 I got a bit worried and also ran Windows Memory Diagnostic Tool. It took about 26 hours to run 4 passes of the extended test, but no errors were found. While this is no guarantee that MemTest86 wouldn't find an error, I'm beginning to wonder if this is a UEFI-related issue. I've had a bit of trouble booting to MemTest86 and for some reason it just doesn't want to start around 50% of the time. I just see a sliver of my normal UEFI background picture in the upper edge of my monitor and that's it. If this happens, a reboot and retry has so far been all that it has taken to get MemTest to run.

    Any ideas?

  • #2
    I tried running the parallel test again, as I had changed the values for my RAM. So, previously I ran the RAM at "stock" speeds, meaning the values it defaulted to on the motherboard due to having a Phenom II CPU: 1333 MHz CL9. For some reason the motherboard was also defaulting to 1.585 V, although the official stock value is 1.5 V. So, I had manually set the voltage to 1.5 V and "overclocked" RAM to what it is sold as: 1600 MHz CL10 (I used the timing values listed in the SPD table). Still no luck, the computer reboots during Test 2. Here's what the logfile had to say about Test 2 of this latest attempt:


    2015-10-06 04:05:05 - MtSupportRunAllTests - Test execution time: 12.856 (Test 1 cumulative error count: 0)
    2015-10-06 04:05:05 - GetAMD10Temp - Temperature: 31
    2015-10-06 04:05:05 - Running test #2 (Test 2 [Address test, own address])
    2015-10-06 04:05:05 - MtSupportRunAllTests - Setting random seed to 0x50415353
    2015-10-06 04:05:05 - MtSupportRunAllTests - Start time: 15676 ms
    2015-10-06 04:05:05 - ReadMemoryRanges - Available Pages = 4134591
    2015-10-06 04:05:05 - MtSupportRunAllTests - Enabling memory cache for test
    2015-10-06 04:05:05 - MtSupportRunAllTests - Enabling memory cache complete
    2015-10-06 04:05:05 - Start memory range test (0x0 - 0x440000000)
    2015-10-06 04:05:05 - Pre-allocating memory ranges >=16MB first...
    2015-10-06 04:05:05 - All memory ranges successfully locked
    2015-10-06 04:05:05 - RunMemoryRangeTest - CPU #1 timed out, test time = 0ms (BSP test time = 0ms)
    2015-10-06 04:05:06 - RunMemoryRangeTest - CPU #1 timed out, test time = 0ms (BSP test time = 4ms)
    2015-10-06 04:05:06 - RunMemoryRangeTest - CPU #3 timed out, test time = 0ms (BSP test time = 4ms)
    2015-10-06 04:05:07 - RunMemoryRangeTest - CPU #2 completed but did not signal (test time = 5ms, event wait time = 1012ms, result = Success) (BSP test time = 4ms)
    2015-10-06 04:05:07 - RunMemoryRangeTest - Could not start AP#2 0x0000000005100000 - 0x0000000009100000 (Not Ready). Resetting...
    2015-10-06 04:05:08 - RunMemoryRangeTest - CPU #1 timed out, test time = 0ms (BSP test time = 159ms)
    2015-10-06 04:05:08 - RunMemoryRangeTest - CPU #1 timed out, test time = 0ms (BSP test time = 172ms)
    2015-10-06 04:05:08 - RunMemoryRangeTest - CPU #1 timed out, test time = 0ms (BSP test time = 170ms)
    2015-10-06 04:05:09 - RunMemoryRangeTest - CPU #1 timed out, test time = 0ms (BSP test time = 168ms)
    2015-10-06 04:05:10 - RunMemoryRangeTest - CPU #1 timed out, test time = 0ms (BSP test time = 167ms)
    2015-10-06 04:05:10 - RunMemoryRangeTest - CPU #1 timed out, test time = 0ms (BSP test time = 168ms)
    2015-10-06 04:05:11 - RunMemoryRangeTest - CPU #1 timed out, test time = 0ms (BSP test time = 146ms)
    2015-10-06 04:05:11 - RunMemoryRangeTest - CPU #4 timed out, test time = 0ms (BSP test time = 146ms)
    2015-10-06 04:05:12 - RunMemoryRangeTest - CPU #3 completed but did not signal (test time = 142ms, event wait time = 1001ms, result = Success) (BSP test time = 146ms)
    2015-10-06 04:05:12 - RunMemoryRangeTest - Could not start AP#3 0x00000000AB035AA0 - 0x00000000ABFD07F0 (Not Ready). Resetting...
    2015-10-06 04:05:12 - RunMemoryRangeTest - CPU #1 timed out, test time = 0ms (BSP test time = 32ms)

    Comment


    • #3
      The issue you are describing seems likely to be a UEFI-related bug with the multiprocessing subsystem. So I would check to see if there is updated firmware.

      Comment


      • #4
        I just updated the firmware a couple of days ago, so I'll probably need to contact ASRock and see what they have to say. Thank you for your help, I'll report back when I know more.

        Comment


        • #5
          After a short e-mail discussion, the advice I got from ASRock tech support was basically just to use some other program to test my computer. Well, there's always the possibility of future firmware updates making things better. Until that happens, I'll just run the test on CPU0.

          Comment


          • #6
            These RAM's have lifetime warranties, but as far as I research, faults are extremely common

            We are testing RAM's with memtest (thanks memtest)

            A lot of people just keep on updating various drivers while the actual problem is likely a faulty RAM

            So my question is, have you tried other sticks?

            Comment


            • #7
              I haven't, because I don't have extra memory modules lying around. However, as I've stated above, the RAM seems to check out just fine. Windows Memory Diagnostic Tool gave a clean bill of health (extended test, 26 hours), Prime95 doesn't run into problems with all of the RAM in use and even MemTest86 seems to find no errors. It's only when I try to use any other core than CPU0 that MemTest86 freezes an that happens after the cores keep timing out.

              Comment


              • #8
                ASRock technical support asked if MemTest86 runs the test in SMT mode. Does it? Apparently the mode causes problems with most models.

                Comment


                • #9
                  MemTest86 doesn't change any CPU settings when running the tests; it uses whatever that is set in the BIOS.

                  Comment


                  • #10
                    I'm having a similar problem on a HP laptop with a quad core A10 processor. Booting from a usb stick into memtest, on core 1, when I run test 3, the machine locks up either at the beginning or the end of the test, depending on the test configuration, as far as I can tell. Generally it does this without reporting an error, but when running the test in parallel, I got the following message:

                    Test: 3 Addr: 0 Expected: FFFFFFFF Actual: FFFFFF00 CPU: 1

                    So, is this the mentioned UEFI issue, or do I actually have a memory problem? I've been having problems with the machine in the form of phantom control and alt keypresses. These occurred more often when doing memory intensive operations (lots of web pages open, using my IDE, booting from a live usb stick into a linux distro, etc) so I was thinking it was memory related. I just want to make sure I'm covering my bases so HP support can't tell me there isn't a problem with the machine.

                    Its an HP 15-ab153nr laptop, running bios version F.13 with an AMD A10-8700P Radeon R6 chip
                    Last edited by jbhelfrich; 11-19-2015, 01:30 AM.

                    Comment


                    • #11
                      jbhelfrich,

                      Can you post the log file. We can at least check if it is the same problem as mentioned above.

                      A log file (MemTest86.log) is automatically created and updated while MemTest86 v5 is running. This file is saved in the 'EFI/BOOT' directory in the USB drive's first partition.

                      Comment


                      • #12
                        The website says it's an invalid file when I try to upload it (not sure why, it's only 221k of text) so I put it on Pastebin. http://pastebin.com/6Ar7ajux

                        The run that actually reported a failure is the very last one in the log. There are a lot of other runs, most of which were manually aborted by me because I kept forgetting to disable the hammer test. (The hammer test did complete two passes overnight once, and didn't report any errors.) But there are a couple other in there that crashed when it was working on test 3, in case those are useful.

                        Thanks very much!

                        Comment


                        • #13
                          Thanks for the logs.

                          Yes, it is strange that all the issues are related to CPU 1 on test 3. I would first try to eliminate RAM as the possible culprit by running the tests on a different set of known good RAM.

                          If you are still getting the same issues, you can try running MemTest86 v4 (BIOS version) to determine whether or not the issue is related to the UEFI firmware.

                          After that, I would focus on the chipset as the possible culprit.

                          Comment


                          • #14
                            It's a laptop, and I don't have another set of RAM for it. I did run a full test from CPU 0 against the RAM, including two full passes of the Hammer test, and it didn't report any errors, so if there is a problem, it's not likely in the RAM chips.

                            The machine is currently supposed to go back to HP because it behaves strangely, particularly when using Linux--Control_L, and sometimes Alt_L and Shift_R events just happen at random, even when the computer is sitting idle. Then, it just wakes up the screensaver. When I'm trying to type or scroll a webpage, it's much more annoying. It happens a lot faster when using a live USB stick, which is what made me think the problem might be RAM related and sent me to memtest in the first place. The other possible culprit is the touchpad or the touchpad driver--if I disable the touchpad using xinput, it seems to stop for a while. But something seems to reenable the touchpad automatically and it starts up again. I'm about ready to throw the thing out a window, and I'm afraid HP is going to say 'cannot reproduce' and stonewall me, so I'm frantically looking for anything reproducible.

                            I'll try the BIOS version in the morning and see if it improves. Thanks again.

                            Comment


                            • #15
                              A small update: I ran four passes of MemTest86 (~38 hours) on CPU0 and zero errors were found. Not that RAM has been the suspected cause of my multiprocessing issues, but it's nice to have further confirmation that the RAM is working as is should.

                              Comment

                              Working...
                              X