Disk performance is very dependant on the test scenario. But drive vendors tend to only advertise the largest most impressive sounding figures, even if those figures do not always reflect common use cases. Some other benchmark software from other developers have also be tuned to test scenarios that don't occur very often in real life, but also return impressive sounding numbers. So different benchmark software can be expected to return different results depending on the scenario tested. In many cases the difference is dramatic. As seen in the example below a drive advertised as a 7000MB/sec drive might only run at 80MB/sec, if data is accessed randomly. This is a 100x below the advertised speed from what might appear to be a fairly small change in the way the drive is used. This has always been the case for hard drives, but the with newer NVMe drives the difference is more dramatic and it is harder to reach their advertised performance.
There is also the common misconception that if Windows reports the drive as 100% busy then it should be running at it's maxiumum speed. But this is often not the case. A drive can potenitally be 100% busy and have it's througput being far less that it's maximum. (If the disk is not 100% busy, then there may be some other bottleneck in your system, e.g. your RAID controller, CPU is too slow, PCIe bus bottlneck, slow RAM, etc..).
At the high high speeds of a PCIe 4.0 NVMe drive it is likely that a slow CPU or PCIe bus (especially if slow in single threading) can be a bottle neck. Note that any extra background activity from Windows can also have a negative effect on the benchmark if testing the system drive.
The standard disk tests in PerformanceTest are detailed here. To keep the results reasonable close to real life results we use a single test thread.
Using the below setup for testing we'll see what it takes to get the advertised speed from a Western Digital PCIe 4 SSD. (WD claims 7000MB/sec writes for their NVMe drive):
Below is the "Drive Performance" disk benchmark test under the Advanced Test menu in PerformanceTest. It runs a bunch of disk test scenarios that are reasonably accepted in the industry as reasonable tests to run.
In all scenarios above, it does not match the marketing claims (yet the drive is still at, or close to 100% busy, for all the tests). The results are still great, but they do not match the marketing.
IOPS = Input Output operations Per Second, this implies a test scenerio with random access
4KQD32 = 4K is the size of the input / output buffer and tells you how mucg data is read or written in a single operation. QD refers to the Queue Depth. A Queue Depth implies 32 simultaneous operations are taking place at the same time.
The two most common usage scenerios in real life are reading small amounts of data randomly from a single thread (IOPS 4K) and reading / writing large amounts of data sequentially (Throughput Read). But the drive does reach it's advertised speeds for these common scenerios.
Next we will have a look if we can devise some less likely scenario that will get us numbers that are close to the WD's marketing numbers. For this we can use the "Advanced Disk Test" benchmark under the Advanced Test menu in PerformanceTest.
Here we pick a massive block size (8MB), relatively small test file (5GB), and non-random data (which is easy to compress). Result: 5781MB/sec.
This time we allow caching of the disk activity into RAM. As a result, we have matched and slightly exceeded the marketing claims of WD (7432MB/sec read). But task manager reports 0% busy time and no disk activity. Conclusion: All the data is being read from RAM and not from disk. Clearly this is a bit of a poor disk test if the disk isn't even being used. This illustrates nicely the limit on the RAM cache speed and single threaded CPU performance - that even with a fast CPU with fast DDR4 RAM in dual channel can only just hit 7000MB/sec (with the I/O overhead of Disk API calls). This also gives a hint that multiple CPU cores are probalby required to fully load the drive.
This time we use a massive block size (8MB), Asynchronous access (64 queue length), No caching and non-random data & 3 simultaneous threads. This time we have done it for real, 100% load on the disk and 7GB/sec performance! Also, task manager reports on all disk activity not just from the benchmark. So background activity is also counted.
So the marketing claims aren't false, but we needed to jump through some hoops to reproduce their numbers and the scenario required to reproduce the numbers isn't that realistic.