Xeon Phi

From Infogalactic: the planetary knowledge core
Jump to: navigation, search
Intel Many Integrated Core Architecture (MIC)
Designer Intel
Design Many-core extended x86/x64 design
Registers
General purpose Intel Architecture registers
Floating point 512-bit SIMD vector registers

Intel Many Integrated Core Architecture or Intel MIC (pronounced Mick or Mike[1]) is a coprocessor computer architecture developed by Intel incorporating earlier work on the Larrabee manycore architecture, the Teraflops Research Chip multicore chip research project, and the Intel Single-chip Cloud Computer multicore microprocessor.

Prototype products codenamed Knights Ferry were announced and released to developers in 2010. The Knights Corner product was announced in 2011 and uses a 22 nm process. A second generation product codenamed Knights Landing using a 14 nm process was announced in June 2013.

In September 2011, the Texas Advanced Computing Center (TACC) announced it would use Knights Corner cards in their 10 petaFLOPS "Stampede" supercomputer, providing 8 petaFLOPS of computing power.

At the International Supercomputing Conference (2012, Hamburg), Intel announced the branding of the processor product family as Intel Xeon Phi.[2]

In November 2012, Intel formally announced the first products citing claims of CPU-like versatile programmability, high performance and power efficiency.[3] The Green 500 list placed a system using these new products as the most power efficient computer in the world.[4]

In June 2013, the Tianhe-2 supercomputer at the National Supercomputing Center in Guangzhou (NSCC-GZ) was announced[5] as the world's fastest supercomputer. It utilizes Intel Xeon Phi coprocessors and Ivy Bridge-EP Xeon processors to achieve 33.86 petaFLOPS.[6]

Competitors include Nvidia's Tesla-branded product lines.


History

Background

The Larrabee microarchitecture (in development since 2006[7]) introduced very wide (512-bit) SIMD units to a x86 architecture based processor design, extended to a cache-coherent multiprocessor system connected via a ring bus to memory; each core was capable of four-way multithreading. Due to the design being intended for GPU as well as general purpose computing the Larrabee chips also included specialised hardware for texture sampling.[8][9] The project to produce a retail GPU product directly from the Larrabee research project was terminated in May 2010.[10]

Another contemporary Intel research project implementing x86 architecture on a many-multicore processor was the 'Single-chip Cloud Computer' (prototype introduced 2009[11]), a design mimicking a cloud computing computer datacentre on a single chip with multiple independent cores: the prototype design included 48 cores per chip with hardware support for selective frequency and voltage control of cores to maximize energy efficiency, and incorporated a mesh network for interchip messaging. The design lacked cache-coherent cores and focused on principles that would allow the design to scale to many more cores.[12]

The Teraflops Research Chip (prototype unveiled 2007[13]) is an experimental 80-core chip with two floating point units per core, implementing a 96-bit VLIW architecture instead of the x86 architecture.[14] The project investigated intercore communication methods, per-chip power management, and achieved 1.01 TFLOPS at 3.16 GHz consuming 62 W of power.[15][16]

Knights Ferry

Intel's MIC prototype board, named Knights Ferry, incorporating a processor codenamed Aubrey Isle was announced May 31, 2010. The product was stated to be a derivative of the Larrabee project and other Intel research including the Single-chip Cloud Computer.[17][18]

The development product was offered as a PCIe card with 32 in-order cores at up to 1.2 GHz with four threads per core, 2 GB GDDR5 memory,[19] and 8 MB coherent L2 cache (256 KB per core with 32 KB L1 cache), and a power requirement of ~300 W,[19] built at a 45 nm process.[20] In the Aubrey Isle core a 1,024-bit ring bus (512-bit bi-directional) connects processors to main memory.[21] Single board performance has exceeded 750 GFLOPS.[20] The prototype boards only support single precision floating point instructions.[22]

Initial developers included CERN, Korea Institute of Science and Technology Information (KISTI) and Leibniz Supercomputing Centre. Hardware vendors for prototype boards included IBM, SGI, HP, Dell and others.[23]

Knights Corner

The Knights Corner product line is made at a 22 nm process size, using Intel's Tri-gate technology with more than 50 cores per chip, and is Intel's first many-cores commercial product.[17][20]

In June 2011, SGI announced a partnership with Intel to utilize the MIC architecture in its high performance computing products.[24] In September 2011, it was announced that the Texas Advanced Computing Center (TACC) will use Knights Corner cards in their 10 petaFLOPS "Stampede" supercomputer, providing 8 petaFLOPS of the compute power.[25] According to "Stampede: A Comprehensive Petascale Computing Environment" the "second generation Intel (Knights Landing) MICs will be added when they become available, increasing Stampede's aggregate peak performance to at least 15 PetaFLOPS."[26]

On November 15, 2011, Intel showed an early silicon version of a Knights Corner processor.[27][28]

On June 5, 2012, Intel released open source software and documentation regarding Knights Corner.[29]

On June 18, 2012, Intel announced at the 2012 Hamburg International Supercomputing Conference that Xeon Phi will be the brand name used for all products based on their Many Integrated Core architecture.[30][31][2][32][33][34][35] In June 2012, Cray announced it would be offering 22 nm 'Knight's Corner' chips (branded as 'Xeon Phi') as a co-processor in its 'Cascade' systems.[36][37]

In June 2012, ScaleMP announced it will provide its virtualization software to allow using 'Knight's Corner' chips (branded as 'Xeon Phi') as main processor transparent extension. The virtualization software will allow 'Knight's Corner' to run legacy MMX/SSE code and access unlimited amount of (host) memory without need for code changes.[38] An important component of the Intel Xeon Phi coprocessor’s core is its vector processing unit (VPU).[39] The VPU features a novel 512-bit SIMD instruction set, officially known as Intel® Initial Many Core Instructions (Intel® IMCI). Thus, the VPU can execute 16 single-precision (SP) or 8 double-precision (DP) operations per cycle. The VPU also supports Fused Multiply-Add (FMA) instructions and hence can execute 32 SP or 16 DP floating point operations per cycle. It also provides support for integers. The VPU also features an Extended Math Unit (EMU) that can execute transcendental operations such as reciprocal, square root, and log, thereby allowing these operations to be executed in a vector fashion with high bandwidth. The EMU operates by calculating polynomial approximations of these functions

On November 12, 2012, Intel announced two Xeon Phi coprocessor families using the 22 nm process size: the Xeon Phi 3100 and the Xeon Phi 5110P.[40][41][42] The Xeon Phi 3100 will be capable of more than 1 teraFLOPS of double precision floating point instructions with 240 GB/sec memory bandwidth at 300 W.[40][41][42] The Xeon Phi 5110P will be capable of 1.01 teraFLOPS of double precision floating point instructions with 320 GB/sec memory bandwidth at 225 W.[40][41][42] The Xeon Phi 7120P will be capable of 1.2 teraFLOPS of double precision floating point instructions with 352 GB/sec memory bandwidth at 300 W.

On June 17, 2013, the Tianhe-2 supercomputer was announced[5] by TOP500 as the world's fastest. It used Intel Ivy Bridge Xeon and Xeon Phi processors to achieve 33.86 petaFLOPS. According to the TOP500 list, Tianhe-2 was the world's fastest supercomputer since its introduction in June 2013 through the most recent list in November 2015.[43]

Knights Landing

Code name for the second generation MIC architecture product from Intel.[26] Intel officially first revealed details of its second generation Intel Xeon Phi products on June 17, 2013.[6] Intel said that the next generation of Intel MIC Architecture-based products will be available in two forms, as a coprocessor or a host processor (CPU), and be manufactured using Intel's 14nm process technology. Knights Landing products will include integrated on-package memory for significantly higher memory bandwidth.

Knights Landing will be built using up to 72 Airmont (Atom) cores with four threads per core,[44][45] supporting for up to 384 GB of "far" DDR4 RAM and 8–16 GB of stacked "near" 3D MCDRAM, which is similar to Micron's Hybrid Memory Cube. Each core will have two 512-bit vector units and will support AVX-512 SIMD instructions, specifically the Intel AVX-512 Foundational Instructions (AVX-512F) with Intel AVX-512 Conflict Detection Instructions (AVX-512CD), Intel AVX-512 Exponential and Reciprocal Instructions (AVX-512ER), and Intel AVX-512 Prefetch Instructions (AVX-512PF).[46]

The National Energy Research Scientific Computing Center announced that Phase 2 of its newest supercomputing system "Cori" would use Knights Landing Xeon Phi coprocessors.[47]

Knights Hill

Knights Hill is the codename for the third-generation MIC architecture, for which Intel announced the first details at SC14. It will be manufactured in a 10 nm process.[48]

In April 2015, the United States Department of Energy announced that a supercomputer named Aurora will be deployed at Argonne National Laboratory[49] based upon the "third-generation Intel Xeon Phi" processor.[50]

Design

The cores of Knights Corner are based on a modified version of P54C design, used in the original Pentium.[51] The basis of the Intel MIC architecture is to leverage x86 legacy by creating a x86-compatible multiprocessor architecture that can utilize existing parallelization software tools.[20] Programming tools include OpenMP, OpenCL,[52] Cilk/Cilk Plus and specialised versions of Intel's Fortran, C++[53] and math libraries.[54]

Design elements inherited from the Larrabee project include x86 ISA, 4-way SMT per core, 512-bit SIMD units, 32 KB L1 instruction cache, 32 KB L1 data cache, coherent L2 cache (512 KB per core[55]), and ultra-wide ring bus connecting processors and memory.

The Knights Corner instruction set documentation is available from Intel.[56][57][58]

Programming

An empirical performance and programmability study has been performed by researchers,[59] in which the authors claim that achieving high performance with Xeon Phi still needs help from programmers and that merely relying on compilers with traditional programming models is still far from reality. However, research in various domains, such as life sciences[60] and deep learning,[61] demonstrated that exploiting both the thread- and SIMD-parallelism of Xeon Phi achieves significant speed-ups.

Competitors

See also

References

  1. Lua error in package.lua at line 80: module 'strict' not found.
  2. 2.0 2.1 Lua error in package.lua at line 80: module 'strict' not found.
  3. Lua error in package.lua at line 80: module 'strict' not found.
  4. Lua error in package.lua at line 80: module 'strict' not found.
  5. 5.0 5.1 Lua error in package.lua at line 80: module 'strict' not found.
  6. 6.0 6.1 Lua error in package.lua at line 80: module 'strict' not found.
  7. Lua error in package.lua at line 80: module 'strict' not found.
  8. Lua error in package.lua at line 80: module 'strict' not found.
  9. Lua error in package.lua at line 80: module 'strict' not found.
  10. Lua error in package.lua at line 80: module 'strict' not found.
  11. Lua error in package.lua at line 80: module 'strict' not found.
  12. Lua error in package.lua at line 80: module 'strict' not found.
  13. Lua error in package.lua at line 80: module 'strict' not found.
  14. Lua error in package.lua at line 80: module 'strict' not found.
  15. Lua error in package.lua at line 80: module 'strict' not found.
  16. Lua error in package.lua at line 80: module 'strict' not found.
  17. 17.0 17.1 Lua error in package.lua at line 80: module 'strict' not found.
  18. Lua error in package.lua at line 80: module 'strict' not found.
  19. 19.0 19.1 Lua error in package.lua at line 80: module 'strict' not found.
  20. 20.0 20.1 20.2 20.3 Lua error in package.lua at line 80: module 'strict' not found.
  21. Lua error in package.lua at line 80: module 'strict' not found.
  22. Lua error in package.lua at line 80: module 'strict' not found.
  23. Lua error in package.lua at line 80: module 'strict' not found.
  24. Lua error in package.lua at line 80: module 'strict' not found.
  25. Lua error in package.lua at line 80: module 'strict' not found.
  26. 26.0 26.1 Lua error in package.lua at line 80: module 'strict' not found.
  27. Lua error in package.lua at line 80: module 'strict' not found.
  28. Lua error in package.lua at line 80: module 'strict' not found.
  29. Lua error in package.lua at line 80: module 'strict' not found.
  30. Lua error in package.lua at line 80: module 'strict' not found.
  31. Lua error in package.lua at line 80: module 'strict' not found.
  32. Lua error in package.lua at line 80: module 'strict' not found.
  33. Lua error in package.lua at line 80: module 'strict' not found.
  34. Lua error in package.lua at line 80: module 'strict' not found.
  35. Lua error in package.lua at line 80: module 'strict' not found.
  36. Lua error in package.lua at line 80: module 'strict' not found.
  37. Lua error in package.lua at line 80: module 'strict' not found.
  38. Lua error in package.lua at line 80: module 'strict' not found.
  39. https://software.intel.com/en-us/articles/intel-xeon-phi-coprocessor-codename-knights-corner
  40. 40.0 40.1 40.2 Lua error in package.lua at line 80: module 'strict' not found.
  41. 41.0 41.1 41.2 Lua error in package.lua at line 80: module 'strict' not found.
  42. 42.0 42.1 42.2 Lua error in package.lua at line 80: module 'strict' not found.
  43. Lua error in package.lua at line 80: module 'strict' not found.
  44. Lua error in package.lua at line 80: module 'strict' not found.
  45. Lua error in package.lua at line 80: module 'strict' not found.
  46. Lua error in package.lua at line 80: module 'strict' not found.
  47. http://www.nersc.gov/users/computational-systems/cori
  48. Lua error in package.lua at line 80: module 'strict' not found.
  49. Lua error in package.lua at line 80: module 'strict' not found.
  50. Lua error in package.lua at line 80: module 'strict' not found.
  51. Lua error in package.lua at line 80: module 'strict' not found.
  52. Lua error in package.lua at line 80: module 'strict' not found.
  53. Lua error in package.lua at line 80: module 'strict' not found.
  54. Lua error in package.lua at line 80: module 'strict' not found.
  55. Tesla vs. Xeon Phi vs. Radeon. A Compiler Writer’s Perspective // The Portland Group (PGI), CUG 2013 Proceedings
  56. Lua error in package.lua at line 80: module 'strict' not found.
  57. Lua error in package.lua at line 80: module 'strict' not found.
  58. Lua error in package.lua at line 80: module 'strict' not found.
  59. Lua error in package.lua at line 80: module 'strict' not found.
  60. Lua error in package.lua at line 80: module 'strict' not found.
  61. Lua error in package.lua at line 80: module 'strict' not found.
  62. Lua error in package.lua at line 80: module 'strict' not found.
  63. Lua error in package.lua at line 80: module 'strict' not found.
  64. 64.0 64.1 Lua error in package.lua at line 80: module 'strict' not found.

External links