Shopping cart    |      
Page 2 of 2 FirstFirst 12
Results 11 to 17 of 17

Thread: CPU-Maths - no operations reported in timeout period

  1. #11
    Join Date
    Mar 2010
    Posts
    4

    Default

    I have confirmation that no hardware was shipped to Passmark on this Topic, though I am not sure why. Since this topic was created have you had anyone else with this type of issue on a GEODE based processor?

    I am in need of a paragraph explaining what is happening during this failure and a statement that it is or is not an issue.

  2. #12
    Join Date
    Mar 2010
    Posts
    4

    Default

    Couple of questions. First I was perusing the forums for a similar failure mode that we are seeing and I came up with this: http://www.passmark.com/forum/showthread.php?t=474

    Any way I can see if the DEBUG version solves my issues as well?
    Second do you know if we can a copy of the latest version to test on the unit that exhibits this failure? It does not seem like this issue is going to be dropped.

    And third, you had offered for us to send you a board for your help to debug this issue. I am now in a position that requires this. Is there a debug tracking number that I can use with my shipping paperwork?

  3. #13
    Join Date
    Jan 2003
    Location
    Sydney Australia
    Posts
    4,183

    Default

    The post you linked to above does mention an issue on the Geode back in 2006. But this issue was corrected in Release 5.1 build 1012, 15/August/2006. The symptoms are also not identical. Their issue was a single error being flagged right at the end of a 12 hour run.

    There was also this issue from early 2007,
    http://www.passmark.com/forum/showthread.php?t=800
    we suggested it might be a hardware / device driver fault, maybe with the system timer. And this turned out to be the case, the customer later stated, "My colleagues who have been investigating have found a problem with time handling by the platform. We have a fix and will be rerunning tests this week."

    If you want to ship us something you can do so at this address,
    http://www.passmark.com/about/contact_us.htm
    Address it to Ian Robinson. And include a printout of this page & your contact details.

    Please don't ship us something that doesn't just work. We have had cases where we have been sent motherboards, but without a compatible CPU, nor RAM, nor device drivers, etc.. We don't want to spend days trying to get it to even boot.

    Second do you know if we can a copy of the latest version to test on the unit that exhibits this failure
    I am not sure if I understand the question, but you can move the software between machines.

    A DEBUG build is one where we add in extra logging, to investigate a particular problem. We don't have one for this issue. We could do one, but my fear is that it might not show anything more than what was in the event log you already posted 9 months ago. And so might spend several weeks doing stuff by trial and error. But we could give this a go if you want.

  4. #14
    Join Date
    Mar 2010
    Posts
    4

    Default Thanks in advance.

    Thanks in advance for your help in this matter. I have verified our issue on multiple systems. In order to make troubleshooting a little easier I have put together this test sled for your use.

    The test sled, has the SBC in question with I/O cabled out for ease of use. It is loaded with a registered and updated version of Windows XP and all drivers installed. Unit should boot upon arrival and has Burn In Test loaded. Failure will occur within the first Hour to Hour and a Half of running. This has been verified using this same sled at our location.

    Please see below pictures.



  5. #15
    Join Date
    Jan 2003
    Location
    Sydney Australia
    Posts
    4,183

    Default

    The links to the images are broken (they might be on your intranet, which we can't access). But sounds OK.

  6. #16
    Join Date
    Mar 2005
    Posts
    917

    Default

    We seem to have found the cause of the problem, one of the Windows API functions we use for timing the length of certain tests (QueryPerformanceCounter, http://msdn.microsoft.com/en-us/libr...8VS.85%29.aspx) is returning unreliable values on your hardware, consequently this results in some tests becoming stuck executing while not updating any of the results and triggering the watchdog timer (to flag the error) after a certain time.

    Once we had discovered what the problem seemed to be we wrote a simple program to test and log the current value returned from the QueryPerformanceCounter call to highlight the inconsistency.We ran this on the Geode system and on another XP system using an Intel E8400.

    The values from left to right represent: Sample Number, Counter frequency (number of counts per second, should never change), Counter current value, Difference from last sample (should be very similar to counter frequency) .


    Geode results:
    Code:
       
    1      3,579,545     970,856,713     3,572,548
    2      3,579,545     957,664,136     -13,192,577
    3      3,579,545     961,248,841     3,584,705
    4      3,579,545     964,833,608     3,584,767
    5      3,579,545     968,418,269     3,584,661
    6      3,579,545     972,002,932     3,584,663
    7      3,579,545     992,364,858     20,361,926
    8      3,579,545     995,949,537     3,584,679
    9      3,579,545     999,534,244     3,584,707
    10     3,579,545     1,003,119,028    3,584,784
    11     3,579,545     989,926,436     -13,192,592
    12     3,579,545     993,511,135     3,584,699
    13     3,579,545     997,095,848     3,584,713
    14     3,579,545     1,000,680,522   3,584,674
    15     3,579,545     1,004,265,263   3,584,741
    16     3,579,545     1,024,627,167   20,361,904
    17     3,579,545     1,028,211,841   3,584,674
    18     3,579,545     1,031,796,543   3,584,702
    19     3,579,545     1,035,381,265   3,584,722
    20     3,579,545     1,038,965,967   3,584,702
    21     3,579,545     1,025,773,431   -13,192,536
    E8400 results
    Code:
    1     3,000,060,000     61,949,764,826,091     2,990,583,234
    2     3,000,060,000     61,952,764,860,855     3,000,034,764
    3     3,000,060,000     61,955,764,868,889     3,000,008,034
    4     3,000,060,000     61,958,764,832,661     2,999,963,772
    5     3,000,060,000     61,961,764,860,666     3,000,028,005
    6     3,000,060,000     61,964,764,858,818     2,999,998,152
    7     3,000,060,000     61,967,764,858,032     2,999,999,214
    8     3,000,060,000     61,970,764,865,544     3,000,007,512
    9     3,000,060,000     61,973,764,894,080     3,000,028,536
    10    3,000,060,000     61,976,764,904,535     3,000,010,455
    11    3,000,060,000     61,979,764,908,762     3,000,004,227
    12    3,000,060,000     61,982,764,942,860     3,000,034,098
    13    3,000,060,000     61,985,764,907,838     2,999,964,978
    14    3,000,060,000     61,988,764,912,344     3,000,004,506
    15    3,000,060,000     61,991,764,921,764     3,000,009,420
    16    3,000,060,000     61,994,764,930,248     3,000,008,484
    17    3,000,060,000     61,997,764,938,408     3,000,008,160
    18    3,000,060,000     62,000,764,948,755     3,000,010,347
    19    3,000,060,000     62,003,765,004,309     3,000,055,554
    20    3,000,060,000     62,006,764,969,161     2,999,964,852
    21    3,000,060,000     62,009,764,990,443     3,000,021,282
    As you can see in the Geode results the current counter values and difference from the previous sample jumps around significantly, even resulting in negative values, a negative should only ever occur if the counter has reached it maximum value and started again (something that should only occur after days of system uptime). In comparison the E8400 results are very consistent, the current counter value always increases and the last sample minus the first sample when divided by the frequency works out as 20 minutes elapsed during the test. This doesn’t hold true for the Geode and the count value even seems to wrap around close to 0 about 10 mins into the test.

    While a workaround in BurnInTest would be possible it seems clear that it is a hardware problem caused by a bios or chipset bug that is corrupting the Windows high resolution timers. It’s also likely that any software that uses these timers is going to be affected by odd behaviour.

  7. #17
    Join Date
    Jan 2003
    Location
    Sydney Australia
    Posts
    4,183

    Default

    So in fact the problem was pretty much exactly what we suggested it might be a few weeks ago. A problem with the system timers.

    You have a timer which should increase steadily over time, counting backwards from time to time.

Page 2 of 2 FirstFirst 12

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •