Stride directed prefetching in scalar processors 论文

1992ACM SIGMICRO newsletter/SIGMICRO newsletter/SIGMICRO, TCMICRO newsletter引用 312
Distributed systems and fault toleranceParallel Computing and Optimization TechniquesAdvanced Data Storage Technologies

摘要

the cache miss ratio for the scalar execution of the matrix multiply for matrix sizes of 100 x 100. For comparison purposes the corresponding vector execution is also shown. The results were obtained using trace driven simulation of 2 4 Kbyte cache with block sizes of 8, 16,32 and 64 bytes. The traces are from executions on an Alliant FX/80. Each trace is for single processor execution where the scalar and vector versions are generated using compiler optimizations. Two miss ratios are shown for each execution; ALL means that all memory data references am simulated and MATRIX means that only references to matrix data (data size of 8 bytes) are simulated. There are 19 and 2.2 million references for scalar and vector executions respectively but only 4 and 2 million of these references are to matrix data. Note that the vector miss ratios are computed relative to the number of vector accesses and not the number of vector referencing instructions. For example, a vector instruction may load 32 elements but this is counted as 32 vector accesses.