Announcement

Collapse
No announcement yet.

Extremely slow Memtest ( test 13 Hammer ) on 768GB and 1.5TB configurations

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Extremely slow Memtest ( test 13 Hammer ) on 768GB and 1.5TB configurations

    Hello everyone .

    I have a Mac Pro 7,1 ( 2019 ) workstation with an Intel Cascade Lake Xeon ( W3275M ) . This processor version has the large 2TB memory support . This Mac officially supports 1.5 TB of memory .

    I installed 12 x 128GB ( 1.5TB total ) 2933 MHz DDR4 ECC 8 Gb 3DS density modules in my Mac . The modules are validated by Intel for this processor generation .

    macOS Catalina and bootcamp Windows 10 Pro Workstation are installed .

    With macOS Catalina , the Mac boots up and is stable . Windows 10 Pro will not boot ( a red flag ) .

    I have been using memtest for many years with other platforms , so I am somewhat familiar with the program .

    When I attempted to run memtest ( all individual tests for one pass ) , the test eventually shut down my Mac after around 90 percent completion probably as it just begun to run test 13 ( Hammer ) .

    I attempted a second , identical test and it actually ran until it started test 13 ( Hammer ) again . It was trying to complete this individual test even after running for a total of 90 hours . I simply aborted the test as I figured something was wrong . It was obvious test 13 would continue until the end of time .

    Thinking one of my modules was failing , I removed six of the modules and tested these in my Mac in a 6 x 128GB ( 768 GB ) configuration .

    macOS Catalina and Windows 10 Pro both booted up fine ( no red flag ) .

    Then I attempted to run memtest again with this configuration .

    I am facing the same situation with a very slow running test . It seems it doesn't want to complete test 13 ( Hammer ) . It currently has an elapsed time of 66 hours and will not complete .

    Evidently , the other individual tests all passed as no errors were displayed .

    Suggestions , anyone ?

    Click image for larger version

Name:	IMG_0444.jpg
Views:	72
Size:	79.0 KB
ID:	46790









  • #2
    MemTest86 uses a default hammer test step size of 16MB, which is OK for systems with up to 128GB in RAM but greatly increases the test time for systems with more RAM.

    In the Pro version, the step size can be adjusted using the configuration file parameter HAMMERSTEP. See https://www.memtest86.com/tech_configuring-memtest.html for details.

    For 768GB/1.5TB RAM configurations, setting HAMMERSTEP=0x10000000 should reduce the test time to more manageable levels.

    We will also look into setting a more appropriate step size depending on the amount of system RAM for the Free version. This will likely be available in the next public release.

    Comment


    • #3
      Hello Keith ,

      I managed to alter the hammer test step size and have successfully passed the Hammer Test ( Test 13 ) with 768GB of installed memory .

      Now , all the tests run and pass except for Test 12 [Random number sequence, 128-bit] . Memtest simply freezes when this individual test is run .

      Click image for larger version

Name:	IMG_0458.jpg
Views:	26
Size:	67.4 KB
ID:	47051
      Attached Files

      Comment


      • #4
        RAM sizes have ramped up dramatically over the last few months. 128GB was considered a lot of RAM just a year or so back.
        We've had a few recent customers with RAM in the TB range and this exposed a problem with the way MemTest86 allocates and deallocates RAM.

        Up until now MemTest86 allocated all the required RAM for testing at the start of each test. Then deallocated it at the end of each test. Normally this is a very quick operation (millseconds). But it seems that are some UEFI BIOSs that attempt to zero RAM on deallocation. The UEFI spec is vague on the issue, so it is hard to say if this behavior is required (for security reasons), a bug. or just a quirk of some motherboards.

        Zeroing 1TB of RAM is slow. Probably the UEFI BIOS isn't using the fastest available method either and is doing it just with a single thread. So double slow.
        In MemTest86 this translates to long pauses at the end of each test. It looks like a system freeze, and it kind of is, if you have enough RAM installed. But eventually, after a few minutes, it will move on to the next test.

        In Memtest86 8.4, which is coming real soon, we have changed this behavior to allocate and de-allocate the RAM only once. So there is only one long pause.

        So this might be your problem.

        But we did also have a different problem with some system having bugs that prevented the SIMD instructions working correctly and with getting the correct variable alignment required for 128 instructions (but this was fixed in theory back in 2017).

        Can you send us a debug log
        https://www.memtest86.com/tech_debug-logs.html

        Comment


        • #5
          Hello David ,

          I have attached the debug log file as a ZIP .

          Please let me know if I sent the right file .

          I just run Memtest on my 768GB main system memory configured Mac Pro 7,1 with only Test 12 selected .

          There was no progress indicated and the drive did not indicate any read / write activity once Test 12 was supposed to run .

          Memtest also froze again and I had to shut down the System . It would not allow me to hit ESC and end the testing .

          Click image for larger version

Name:	IMG_0461.jpg
Views:	15
Size:	82.7 KB
ID:	47061


          Attached Files
          Last edited by Theo; 03-27-2020, 10:55 PM.

          Comment


          • #6
            Thanks for the logs.

            It looks like your baseboard UEFI firmware has an issue with running 128-bit SIMD instructions in Parallel CPU mode. To workaround this, you need to add your baseboard to the blacklist.cfg to run test 12 in Single CPU mode only.

            You can add the following line to blacklist.cfg file found under EFI\BOOT\ of the USB drive.

            Code:
            "Mac-27AD2F918AE68F61",ALL,EXACT,TEST12_SINGLECPU

            Comment

            Working...
            X