• kvp
    #29
    Hozzatennem, hogy a jelenlegi nvidia gpu-k is 16 magonkent vannak 1 valodi gpu maghoz csatolva, tehat 16 mag kap 1 elagazasi egyseget. Ez megfelel egy altalanos cpu-nak egy 16 magos simd vektor egyseggel. Jelenleg az nvidia-nal 256 magos gpu-kat gyartanak, ami kb. 16 darab azonos orajelu larabee magnak felel meg. Az egyetlen gond a ringbus-bol adodhat, ez mar a cell-eknek sem hasznalt, viszont mivel a larrabee-nel x86-okrol van szo, ezt barmikor ki lehet cserelni crossbar-ra a szofverek modositasa nelkul.

    Egy erdekes osszehasonlitas, a korabban belinkelt cikkbol:
    "It’s very tempting to compare Larrabee and Cell. Both use a multitude of single cores (in-order), putting the accent on vector calculation, 256 KB of dedicated memory per core, a ring bus to connect it all, etc. The similarities are numerous at first glance. Yet, the differences are also substantial: The Cell is first and foremost a CPU. Although it’s oriented toward streaming-type applications, it is not intended for rendering calculation, and consequently, there are no texture units.


    Zoom

    Another major difference is in the way memory is managed. On the Cell, except for the PPE, which is the only part of the processor that has a global vision of the memory space, all the SPU's memory accesses are limited to 256 KB of local store memory. So, access to main memory must be done explicitly via direct memory access (DMA) operations. Conversely, as we saw earlier, all of Larrabee’s cores have access to the entire memory space, via a cache memory whose management is transparent to the programmer, even if the programmer does have a certain form of control. Intel’s choice greatly simplifies programming and avoids having to include a more generalist core like the PPE. This heterogeneous system is one of the Cell’s handicaps, since it complicates things for the programmer. In addition to explicit management of memory, he or she must also build two executables using two different sets of instructions, which means using two different compilers.

    So Larrabee’s cores are much more complete than the Cell’s SPUs, since they support all the x86 instructions. However, their performance is also better in terms vector calculation. That’s because they operate on 512-bit vectors instead of the SPUs’ 128 bits, and while the Cell should have the advantage in clock frequency (Larrabee is expected to clock at 2 to 2.5 GHz, but that’s still very hypothetical), that doesn’t compensate for such a big disadvantage.
    ...
    What’s more, despite the flexibility GPUs have gained, their functionalities remain heavily oriented towards raw calculation. For example, there’s no question of performing I/O operations from a GPU. Conversely, Larrabee is totally capable of that, meaning that Larrabee can directly perform printf or file-handling operations. It’s also possible to use recursive and virtual functions, which is impossible with a GPU."