<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>AboutAI &#187; Processors</title>
	<atom:link href="http://www.aboutai.com/category/processors/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.aboutai.com</link>
	<description>The Artificial Intelligence Community</description>
	<lastBuildDate>Tue, 03 Nov 2009 12:30:31 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Moore&#8217;s Law at end</title>
		<link>http://www.aboutai.com/2009/04/moores-law-at-end/</link>
		<comments>http://www.aboutai.com/2009/04/moores-law-at-end/#comments</comments>
		<pubDate>Sun, 12 Apr 2009 09:50:52 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Processors]]></category>
		<category><![CDATA[growth]]></category>
		<category><![CDATA[moore]]></category>
		<category><![CDATA[newtech]]></category>

		<guid isPermaLink="false">http://aboutai.com/?p=371</guid>
		<description><![CDATA[Moore&#8217;s Law is maxing out. This is an oft-made prediction in the computer industry. The latest to chime in is an IBM fellow, according to a report. Intel co-founder Gordon Moore predicted in 1965 that the number of transistors on a microprocessor would double approximately every two years&#8211;a prediction that has proved to be remarkably [...]]]></description>
			<content:encoded><![CDATA[<p>Moore&#8217;s Law is maxing out. This is an oft-made prediction in the computer industry. The latest to chime in is an IBM fellow, according to a report. Intel co-founder Gordon Moore predicted in 1965 that the number of transistors on a microprocessor would double approximately every two years&#8211;a prediction that has proved to be remarkably resilient. But IBM Fellow Carl Anderson, who researches server computer design at IBM, claims the end of the era of Moore&#8217;s Law is nigh, according to a report in EE Times.</p>
<p><a href="http://aboutai.com/wp-content/uploads/moores_law_graph.png"><img src="http://aboutai.com/wp-content/uploads/moores_law_graph.png" alt="Moores Law at end moores_law_graph " title="moores_law_graph" width="400" height="229" class="aligncenter size-full wp-image-374" /></a></p>
<p>Exponential growth in every industry eventually has to come to an end, according Anderson, who cited railroads and speed increases in the aircraft industry, the report said.</p>
<blockquote><p>&#8220;A generation or two of continued exponential growth will likely continue only for leading-edge chips such as multicore microprocessors, but more designers are finding that everyday applications do not require the latest physical designs,&#8221; Anderson said in the EE Times&#8217; report. Anderson also cited the staggering costs of research and fabs (factories) as a formidable barrier for continued advancement. Few companies can afford chip plants that typically cost billions of dollars to build and maintain.</p></blockquote>
<p>So, what does the future hold? Anderson cited three technologies: optical interconnects, 3D chips&#8211;which have circuits and components stacked on top of each other&#8211;and accelerator-based processing as seeing significant advancements, the report said. The latter technology, accelerators, is hot right now.</p>
<p>In addition to IBM, companies such as Nvidia and Advanced Micro Devices&#8217; ATI unit supply graphics-processor-based computers to accelerate scientific, engineering, and animation applications. Intel is also expected to bring out its Larrabee chip later this year or early next year that can be used as an accelerator.</p>
<p>Brooke Crothers is a former editor at large at CNET News.com, and has been an editor for the Asian weekly version of the Wall Street Journal. He writes for the CNET Blog Network, and is not a current employee of CNET. Contact him at mbcrothers@gmail.com. Disclosure. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.aboutai.com/2009/04/moores-law-at-end/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Microchip Mimics a Brain With 200,000 Neurons</title>
		<link>http://www.aboutai.com/2009/03/microchip-mimics-a-brain-with-200000-neurons/</link>
		<comments>http://www.aboutai.com/2009/03/microchip-mimics-a-brain-with-200000-neurons/#comments</comments>
		<pubDate>Wed, 25 Mar 2009 22:06:49 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Features]]></category>
		<category><![CDATA[Processors]]></category>
		<category><![CDATA[brain]]></category>
		<category><![CDATA[emulation]]></category>
		<category><![CDATA[microchip]]></category>
		<category><![CDATA[neuroscience]]></category>
		<category><![CDATA[neurosilicon]]></category>

		<guid isPermaLink="false">http://aboutai.com/?p=360</guid>
		<description><![CDATA[An international team of scientists in Europe has created a silicon chip designed to function like a human brain. With 200,000 neurons linked up by 50 million synaptic connections, the chip is able to mimic the brain&#8217;s ability to learn more closely than any other machine. Although the chip has a fraction of the number [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://aboutai.com/wp-content/uploads/aichip_neurons_article.jpg"><img class="size-medium wp-image-363 alignleft" title="aichip_neurons_article" src="http://aboutai.com/wp-content/uploads/aichip_neurons_article.jpg" alt="Microchip Mimics a Brain With 200,000 Neurons aichip_neurons_article " width="220" height="190" /></a>An international team of scientists in Europe has created a silicon chip designed to function like a human brain. With 200,000 neurons linked up by 50 million synaptic connections, the chip is able to mimic the brain&#8217;s ability to learn more closely than any other machine.</p>
<p>Although the chip has a fraction of the number of neurons or connections found in a brain, its design allows it to be scaled up, says Karlheinz Meier, a physicist at Heidelberg University, in Germany, who has coordinated the Fast Analog Computing with Emergent Transient States project, or <a href="http://facets.kip.uni-heidelberg.de/">FACETS</a>.</p>
<p>The hope is that recreating the structure of the brain in computer form may help to further our understanding of how to develop massively parallel, powerful new computers, says Meier.</p>
<p>This is not the first time someone has tried to recreate the workings of the brain. One effort called the Blue Brain project, run by Henry Markram at the Ecole Polytechnique Fédérale de Lausanne, in Switzerland, has been using vast databases of biological data recorded by neurologists to create a hugely complex and realistic simulation of the brain on an IBM supercomputer.</p>
<blockquote><p>FACETS has been tapping into the same databases. &#8220;But rather than simulating neurons,&#8221; says Karlheinz, &#8220;we are building them.&#8221; Using a standard eight-inch silicon wafer, the researchers recreate the neurons and synapses as circuits of transistors and capacitors, designed to produce the same sort of electrical activity as their biological counterparts.</p></blockquote>
<p>A neuron circuit typically consists of about 100 components, while a synapse requires only about 20. However, because there are so much more of them, the synapses take up most of the space on the wafer, says Karlheinz.</p>
<p>The advantage of this hardwired approach, as opposed to a simulation, Karlheinz continues, is that it allows researchers to recreate the brain-like structure in a way that is truly parallel. Getting simulations to run in real time requires huge amounts of computing power. Plus, physical models are able to run much faster and are more scalable. In fact, the current prototype can operate about 100,000 times faster than a real human brain. &#8220;We can simulate a day in a second,&#8221; says Karlheinz.</p>
<blockquote><p>While it may sound implausible, neurons are actually very slow, at least compared to computers, says Thomas Serre, a computational neuroscience researcher at MIT. &#8220;The reason why computers seem much slower is that they are serial machines, while our brains run in parallel,&#8221; he says.</p></blockquote>
<p>FACETS is not the only group taking this approach. Researchers at Stanford University have also been creating neuronal circuits and the Defense Advanced Research Projects Agency recently started funding a similar project.</p>
<blockquote><p>&#8220;Where FACETS is ahead of anybody else is that they use these complex synapses,&#8221; says Markram. While the neurons are quite simple, he says, the synapses are designed to use a very powerful distributed algorithm&#8211;developed by Markram&#8211;called spike-timing dependent plasticity, that allows the device to learn and adapt to new situations.</p></blockquote>
<p>Building such complex circuits has required close collaboration with neurobiologists, says Markram. In fact, the project, whose current budget is €10.5 million (US$14.1 million), relies upon the contributions of 15 scientific groups from seven different countries. Among the challenges they face is recreating the three-dimensional structure of the brain in a 2-D piece of silicon, he says.</p>
<blockquote><p>Despite efforts to make the chips as biologically plausible as possible, Markram admits they are still crude compared to what can be achieved in simulation. &#8220;It&#8217;s not a brain. It&#8217;s a more of a computer processor that has some of the accelerated parallel computing that the brain has,&#8221; he says.</p></blockquote>
<p>Because of this, Markram doubts that the hardware approach will offer much insight into how the brain works. For example, unlike Blue Brain, researchers won&#8217;t be able to perform &#8220;in silico&#8221; drug testing, simulating the effects of drugs on the brain. &#8220;It&#8217;s more a platform for artificial intelligence than understanding biology,&#8221; he says.</p>
<p>The <a href="http://facets.kip.uni-heidelberg.de/">FACETS </a>group now plans to further scale up their chips, connecting a number of wafers to create a superchip with a total of a billion neurons and 1013 synapses.</p>
<p>Source: <a href="http://www.technologyreview.com/computing/22339/">http://www.technologyreview.com/computing/22339/</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.aboutai.com/2009/03/microchip-mimics-a-brain-with-200000-neurons/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Intel: Use our CPU (not their GPU)</title>
		<link>http://www.aboutai.com/2009/02/intel-use-our-cpu-not-their-gpu/</link>
		<comments>http://www.aboutai.com/2009/02/intel-use-our-cpu-not-their-gpu/#comments</comments>
		<pubDate>Fri, 06 Feb 2009 12:25:22 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Gaming]]></category>
		<category><![CDATA[Processors]]></category>
		<category><![CDATA[adrenaline]]></category>
		<category><![CDATA[gpu]]></category>
		<category><![CDATA[intel]]></category>
		<category><![CDATA[larabee]]></category>

		<guid isPermaLink="false">http://www.aisolver.com/?p=315</guid>
		<description><![CDATA[Intel is back, pitching its processors for gaming graphics. The chipmaker will attempt to promote its silicon for sophisticated game effects at the upcoming Game Developers Conference in March, as it strives to make a case for quad-core processors in lieu of graphics chips from Nvidia and Advanced Micro Devices. The pitch goes like this: [...]]]></description>
			<content:encoded><![CDATA[<p>Intel is back, pitching its processors for gaming graphics. The chipmaker will attempt to promote its silicon for sophisticated game effects at the upcoming Game Developers Conference in March, as it strives to make a case for quad-core processors in lieu of graphics chips from Nvidia and Advanced Micro Devices.</p>
<p>The pitch goes like this: &#8220;Learn how to easily add real-time 3D smoke, fog and other fluid simulations to your game without using up the GPU.&#8221; That&#8217;s according to an Intel Web page entitled <a href="http://software.intel.com/en-us/articles/intel-at-gdc/">Intel at Game Developers Conference</a>. (The CPU is the central processing unit, or main brains of a computer; GPU stands for graphics processing unit.)</p>
<p style="text-align: center;"> </p>
<p style="text-align: center;"><a href="http://www.aisolver.com/wp-content/uploads/intel_visualadrenaline.jpg"><img class="size-full wp-image-321  aligncenter" title="intel_visualadrenaline" src="http://www.aisolver.com/wp-content/uploads/intel_visualadrenaline.jpg" alt="Intel: Use our CPU (not their GPU) intel_visualadrenaline " width="436" height="120" /></a></p>
<p>The session abstract goes on to say that the &#8220;source code to a fluid simulator optimized for multi-core CPUs&#8230;can easily be integrated by game developers into their engines to produce unique 3D effects.&#8221;</p>
<p>Intel&#8217;s argument raises the question, how should the CPU and GPU divvy up their tasks? In games, the CPU can handle things like physics and AI (artificial intelligence), and certain older games actually run some graphics on the CPU. Generally, however, the GPU is much more efficient (that is, faster) at handling most of the high-end effects that the gamer sees on the screen.</p>
<p>But there are exceptions. &#8220;Not all algorithms and processes map well to a GPU,&#8221; said Jon Peddie, president of Jon Peddie Research. &#8220;You have to have a problem that is naturally parallel, and except for the rendering of, say, a water surface and subsurface and reflections, the wave motion equations will run just fine on a CPU,&#8221; Peddie said.</p>
<p>Intel may also be seeking ways to make better use of its quad-core processors, according to Tom R. Halfhill, an analyst at the Microprocessor Report. But, he added: &#8220;I need to be convinced that a CPU can do those 3D effects better than a GPU can.&#8221;</p>
<blockquote><p>Then, there&#8217;s also the Larrabee factor. Larrabee is an upcoming high-end graphics processor due late this year. &#8220;I&#8217;m sure some of it may also relate to Larrabee, which will include x86 cores, if or when it comes to market,&#8221; said Jim McGregor, an analyst at In-Stat.</p></blockquote>
<p>(This <a href="http://www.youtube.com/watch?v=nqdLrACBrOI">Mythbusters demonstration at an Nvidia conference </a>is oversimplified and self-serving but it crystallizes the difference between CPUs and GPUs.)</p>
<p>In another GDC session, Intel is also pushing the CPU for physics and AI: &#8220;How can your game have more accurate physics, smarter AI, more particles, and/or a faster frame-rate? By threading your game&#8217;s engine to take advantage of multi-core processors. Intel has built a threaded game engine and demo called &#8216;Smoke&#8217; that shows one way of achieving this goal,&#8221; the abstract states.</p>
<blockquote><p>It continues: &#8220;This presentation examines the Smoke architecture and how it is designed to take advantage of all CPU cores available within a system. It does this by executing different functional and data blocks in parallel to utilize all available cores.&#8221;</p></blockquote>
<p>Intel won&#8217;t stop there. It will also focus on the bane of many PC game developers: gaming on Intel integrated graphics silicon&#8211;a relatively low-performance platform that prohibits game titles from being displayed in all their glory at higher resolutions. The session will focus on &#8220;programming for scalable graphics applications&#8221; and cover &#8220;performance considerations when programming for integrated graphics in general with specific tips for Intel Integrated graphics.&#8221;</p>
<p>source:<br />
Brooke Crothers is a former editor at large at CNET News.com, and has been an editor for the Asian weekly version of the Wall Street Journal. He writes for the CNET Blog Network, and is not a current employee of CNET. Contact him at mbcrothers@gmail.com. Disclosure.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.aboutai.com/2009/02/intel-use-our-cpu-not-their-gpu/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>MIT Artificial Vision Researchers Assemble 16-GPU Machine</title>
		<link>http://www.aboutai.com/2008/08/mit-artificial-vision-researchers-assemble-16-gpu-machine/</link>
		<comments>http://www.aboutai.com/2008/08/mit-artificial-vision-researchers-assemble-16-gpu-machine/#comments</comments>
		<pubDate>Tue, 26 Aug 2008 16:19:37 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Processors]]></category>
		<category><![CDATA[gpu]]></category>
		<category><![CDATA[supercomputer]]></category>
		<category><![CDATA[vision]]></category>

		<guid isPermaLink="false">http://dev.aisolver.com/?p=41</guid>
		<description><![CDATA[As part of their research efforts aimed at building real-time human-level artificial vision systems inspired by the brain, MIT graduate student Nicolas Pinto and principal investigators David Cox (Rowland Institute at Harvard) and James DiCarlo (McGovern Institute for Brain Research at MIT) recently assembled an impressive 16-GPU &#8216;monster&#8217; composed of 8x9800gx2s donated by NVIDIA. The [...]]]></description>
			<content:encoded><![CDATA[<p>As part of their research efforts aimed at building real-time human-level artificial vision systems inspired by the brain, MIT graduate student Nicolas Pinto and principal investigators David Cox (Rowland Institute at Harvard) and James DiCarlo (McGovern Institute for Brain Research at MIT) recently assembled an impressive 16-GPU &#8216;monster&#8217; composed of 8x9800gx2s donated by NVIDIA.</p>
<p>The high-throughput method they promote can also use other ubiquitous technologies like IBM&#8217;s Cell Broadband Engine processor (included in Sony&#8217;s Playstation 3) or Amazon&#8217;s Elastic Cloud Computing services.</p>
<p>Interestingly, the team is also involved in the PetaVision project on the Roadrunner, the world&#8217;s fastest supercomputer </p>
<p>http://hardware.slashdot.org/article.pl?no_d2=1&#038;sid=08/07/27/0721222</p>
]]></content:encoded>
			<wfw:commentRss>http://www.aboutai.com/2008/08/mit-artificial-vision-researchers-assemble-16-gpu-machine/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>IBM&#8217;s eight-core Power7 chip to clock in at 4.0GHz</title>
		<link>http://www.aboutai.com/2008/08/ibms-eight-core-power7-chip-to-clock-in-at-40ghz/</link>
		<comments>http://www.aboutai.com/2008/08/ibms-eight-core-power7-chip-to-clock-in-at-40ghz/#comments</comments>
		<pubDate>Thu, 07 Aug 2008 16:23:59 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Processors]]></category>
		<category><![CDATA[ibm]]></category>
		<category><![CDATA[multicore]]></category>
		<category><![CDATA[powerpc]]></category>

		<guid isPermaLink="false">http://dev.aisolver.com/?p=47</guid>
		<description><![CDATA[IBM looks set to join the seriously multi-core set with the Power7 chip. Internal documents seen by The Register show Power7 with eight cores per processor and also some very, very large IBM boxes based on the chip. The IBM documents have the eight-core Power7 being arranged in dual-chip modules. So, that&#8217;s 16-cores per module. [...]]]></description>
			<content:encoded><![CDATA[<p>IBM looks set to join the seriously multi-core set with the Power7 chip. Internal documents seen by The Register show Power7 with eight cores per processor and also some very, very large IBM boxes based on the chip.</p>
<p>The IBM documents have the eight-core Power7 being arranged in dual-chip modules. So, that&#8217;s 16-cores per module. As IBM tells it, each core will show 32 gigaflops of performance, bringing each chip to 256 gigaflops. Just on the gigaflop basis, that makes Power7 twice as fast per core as today&#8217;s dual-core Power6 chips, although the actual clock rate on the Power7 chips should be well below the 5.0GHz Power6 speed demon.</p>
<p>In fact, according to our documents, IBM will ship Power7 at 4.0GHz in 2010 on a 45nm process. We&#8217;re also seeing four threads per core on the chip.</p>
<p>For some customers, IBM looks set to create 2U systems with four of the dual-chip modules, giving the server 64 cores of fun. These 2U systems will support up to 128GB of memory and hit 2 teraflops.</p>
<p>IBM has an architecture that will let supercomputing types combine these 2U boxes to form a massive unit with 1,024 cores, hitting 32 teraflops of performance with 2TB of memory.</p>
<p>And, er, if you are a seriously demanding type, boy, does IBM have the system for you.</p>
<p>The Giant<br />
The Register has uncovered the first detailed specifications of the &#8220;Blue Waters&#8221; system IBM is building for the National Center for Supercomputing Applications (NCSA).</p>
<p>If our documents are to be believed &#8211; and they&#8217;re penned by an IBM executive &#8211; this system, funded by a $208m grant and meant to go up at the University of Illinois in 2011, will be the most massive machine ever created.</p>
<p>We&#8217;ve got documents showing IBM going after a 10 petaflop system (peak) comprised of 38,900 eight-core Power7 chips with each chip running at 4.0GHz. This monster will have an astonishing 620TB of memory and 5PB/s of memory bandwidth.</p>
<p>According to the documents, IBM will rely on a 1.30PB/s interconnect to link the systems and will feed them with 26PB of storage. As if that&#8217;s not enough, IBM will offer an exabyte of archival storage. Why not?</p>
<p>This insane machine will be built out of more than 100 racks filled with servers and storage systems, taking up close to 4,400 sq. feet.</p>
<p>Er, if this stuff isn&#8217;t sending shivers down the spines of Sun and Intel, then I don&#8217;t know what will.</p>
<p>IBM has clearly decided to get a bit radical with Power7. This isn&#8217;t the single-thread focused Power6. It&#8217;s a true multi-core chip, which should stack up very, very well against Sun&#8217;s 16-core rock and what will likely be an eight-core version of Itanium around in 2010.</p>
<p>And then IBM still has the Quasar project lurking in the background, where it&#8217;s combining Power and Cell chips. Stand back, friends. Stand back. ® </p>
]]></content:encoded>
			<wfw:commentRss>http://www.aboutai.com/2008/08/ibms-eight-core-power7-chip-to-clock-in-at-40ghz/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Second-gen Tesla packs more memory and power</title>
		<link>http://www.aboutai.com/2008/06/second-gen-tesla-packs-more-memory-and-power/</link>
		<comments>http://www.aboutai.com/2008/06/second-gen-tesla-packs-more-memory-and-power/#comments</comments>
		<pubDate>Mon, 16 Jun 2008 18:22:05 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Processors]]></category>
		<category><![CDATA[cuda]]></category>
		<category><![CDATA[nvidia]]></category>
		<category><![CDATA[powerpc]]></category>
		<category><![CDATA[Supercomputing]]></category>

		<guid isPermaLink="false">http://dev.aisolver.com/?p=72</guid>
		<description><![CDATA[Nvidia today announced its second generation of Tesla floating point accelerators based on the GT200 series of graphics processors. It is the first big upgrade for the company’s supercomputing product portfolio – streamlining the offering and introducing double precision support as well as much more performance than the original 8-series, which was introduced one year [...]]]></description>
			<content:encoded><![CDATA[<p>Nvidia today announced its second generation of Tesla floating point accelerators based on the GT200 series of graphics processors. It is the first big upgrade for the company’s supercomputing product portfolio – streamlining the offering and introducing double precision support as well as much more performance than the original 8-series, which was introduced one year ago.</p>
<p>High-performance computing (HPC) applications are likely to see several new technologies this week. In the hardware arena, AMD already announced its 1+ TFlop GPU earlier today and Nvidia is following with a GT200 based GPGPU, also claiming to be capable of hitting 1 TFlop per processing unit in single precision applications. Compared to the first generation, the floating point performance is up from 518 GFlops.</p>
<p>The new T10P processing unit represents a massive die, integrating 1.4 billion transistors and 240 processing cores, which is up from 128 cores in the 8-series of GPUs.</p>
<p>Nvidia has cut the deskside supercomputer (D870), answering to trends of customers who have been purchasing workstation graphics cards rather than an expensive external add-on, and is now limiting the product portfolio to a 4-GPU 1U blade and a Tesla add-in card. The S1070 blade integrates GPUs clocked at 1.5 GHz, a total of 960 processing cores, 4 GB of GDDR3 800 memory per GPU for a 16 GB total, 408 GB/s memory bandwidth and a total processing capability of 4 TFlops. Power consumption is up from 550 watts in the first generation to 700 watts</p>
<p>The blade will be offered with either 2 PCIe interfaces ($7995) or one PCIe connect ($8295), both of which are slightly more expensive than the S870 blade, which sold for $7500 at introduction.</p>
<p>The entry-level Tesla product remains an add-in card, in this case the C1060, which essentially represents Quadro graphics card on steroids. The card includes on T10P processor, 102 GB/s memory bandwidth and a power consumption rating of 160 watts, down from 170 watts of the previous generation. Nvidia said that thermal restrictions forced the company to clock the C1060 GPUs at 1.33 GHz instead of the 1.5 GHz in the blade. As a result, the C1060 will not hit 1 TFlops and is estimated to check in at about 900 GFlops.</p>
<p>The C1060 will be offered for $1699 MSRP, up from the $1500 price tag of the original C870.</p>
<p>Besides performance improvements, the T10P also delivers 64-bit or double-precision capability, which is required for most fluid dynamics and financial stream processing applications. Double precision is substantially more intensive than single precision calculations and with decrease the performance of the card dramatically. Nvidia told us that double-precision calculations will result in a 90% speed penalty and deliver only 100 GFlops per T10P processor.</p>
<p>There is also news surrounding the CUDA application platform, which Nvidia says can be used more any multi-core processing environment out there: This summer, the company will release a beta version of CUDA that developers can apply to multi-core CPUs. Nvidia claims that CUDA has been downloaded 60,000 times so far, but it is safe to say that there aren’t 60,000 developers working on HPC applications – and even Nvidia admits that most of those 60,000 developers are “playing” with CUDA trying to create “consumer applications.” The expansion into the CPU area could help the company reach a far greater developer base than it is able to attract with a GPU-only software foundation.</p>
<p>Technically, there is nothing that prevents from CUDA to also be used for ATI’s GPU products, Nvidia told us. However, not surprisingly, Nvidia said that it won’t be offering CUDA for ATI products and stated that “someone else can do that.” ATI offers its own high-level development tools called Brook+. </p>
<p>Source:</p>
<p>http://www.tgdaily.com/html_tmp/content-view-37955-135.html</p>
]]></content:encoded>
			<wfw:commentRss>http://www.aboutai.com/2008/06/second-gen-tesla-packs-more-memory-and-power/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Cell could offer dramatic boost for scientific computing</title>
		<link>http://www.aboutai.com/2008/06/cell-could-offer-dramatic-boost-for-scientific-computing/</link>
		<comments>http://www.aboutai.com/2008/06/cell-could-offer-dramatic-boost-for-scientific-computing/#comments</comments>
		<pubDate>Sun, 15 Jun 2008 18:23:59 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Features]]></category>
		<category><![CDATA[Processors]]></category>
		<category><![CDATA[cell]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[speed]]></category>

		<guid isPermaLink="false">http://dev.aisolver.com/?p=74</guid>
		<description><![CDATA[A new paper from a group at Lawrence Berkeley National Laboratory, &#8220;The Potential of the Cell Processor Scientific Computing,&#8221; explores the performance of IBM&#8217;s Cell processor on some specific types of code commonly found in high-performance computing (HPC) applications. The programs used in the study are essentially smallish code blocks called kernels (see this older [...]]]></description>
			<content:encoded><![CDATA[<p>A new paper from a group at Lawrence Berkeley National Laboratory, &#8220;The Potential of the Cell Processor Scientific Computing,&#8221; explores the performance of IBM&#8217;s Cell processor on some specific types of code commonly found in high-performance computing (HPC) applications. The programs used in the study are essentially smallish code blocks called kernels (see this older article for more on kernels and benchmarking) that implement typical algorithms like FFTs, stencil computations, and matrix multiplication. The paper compare Cell&#8217;s performance on these kernels to the performance of the Cray X1E, AMD Opteron, and Intel&#8217;s Itanium2.</p>
<p>The idea here is that Cell will be a commodity processor (at least that&#8217;s what the authors and IBM hope), so it&#8217;ll be a viable HPC alternative for the cost-sensitive academic research market. This paper represents the first formal academic attempt to decide if Cell hardware is something that researchers will want to invest in.</p>
<p>So how does Cell stack up in comparison to these three competitors? In a word, it screams.</p>
<p>First, the good news<br />
Take a look at the following results for single-precision dense matrix multiplication, or GEMM (all numbers are Gflop/s):</p>
<p>Cellpm: 204.7<br />
Cray X1E: 29.5<br />
AMD64 7.8:<br />
Itanium2: 3.0</p>
<p>The &#8220;pm&#8221; above means &#8220;performance model.&#8221; Because Cell hardware isn&#8217;t generally available for tests like this, the paper&#8217;s authors used a combination of performance projections and benchmarks on a cycle-accurate simulation of Cell that IBM has released. Real-world results should be very comparable to those in the paper, if not even better.</p>
<p>Note that the above results aren&#8217;t exactly typical. In some of the rest of the tests, Cell is only a mere ten times faster than the competition. Also, I should mention that the paper also looks into power consumption, and Cell still manages to trounce the other guys at performance/watt.</p>
<p>Needless to say, these results are extremely promising, and the authors of the paper clearly believe that Cell could change the HPC game if it is available in quantity and at commodity prices. I personally think that Cell&#8217;s &#8220;commodity&#8221; status outside of the PS3 is a bigger &#8220;if&#8221; than the paper presumes, but we&#8217;ll see soon enough.</p>
<p>Now for the caveats<br />
So now that we&#8217;ve seen that Cell blows away the competition for these HPC kernels, that means that it&#8217;s going to completely dominate the next-gen console market and kill Itanium, right? Not exactly.</p>
<p>First, single-precision (SP) is the place where Cell really blows the doors off the barn, because SP is what game developers need. IBM made some compromises on double-precision (DP) performance, with the result that such performance is a fraction of what it is for SP. On DP code, Cell merely leads the pack for most of the tests.</p>
<p>The paper&#8217;s authors propose a microarchitectural improvement to Cell&#8217;s DP capabilities that they call Cell+, and they&#8217;re clearly hoping IBM will adopt their suggestion. Cell+ significantly enhances DP throughput with minimal changes, so we&#8217;ll see if IBM bites.</p>
<p>Another thing that should be pointed out is that the Cell used in the paper has full access to all eight SPEs, and not the six SPEs of the PS3. (Remember, one SPE is disabled for yield reasons, and the other is reserved for the system.) So keep this in mind when fantasizing about how these results are going to extrapolate to the PS3 hardware.</p>
<p>More important than the eight vs. six SPE issue is the fact that, due to the nature of the kernels used and the way that they were implemented for these tests, taking these results and trying to think about how a future iteration of Gran Turismo will look on the PS3 is a bit like comparing apples to cucumbers. Here&#8217;s why.</p>
<p>Programming models and the big picture<br />
To get the kinds of mind-blowing results found in the paper, the Berkeley team took each kernel and custom-fit it to the bare Cell hardware using labor-intensive intrinsics and extensive hand optimization. They didn&#8217;t rely on IBM&#8217;s higher-level development tools, and they didn&#8217;t even code the kernels in C. In other words, they were operating at &#8220;Tier I&#8221; of the Cell programming complexity hierarchy. By taking into account things like the deterministic load latencies at the various levels of the memory hierarchy, this code was tuned and timed, cycle by cycle and word by word, to fit the cell hardware.</p>
<p>Our first Cell implementation, SpMV, required about a month of learning the programming model, the architecture, the compiler, the tools, and deciding on a final algorithmic strategy. The final implementation required about 600 lines of code. The next code development examined two flavors of double precision stencil-based algorithms. These implementations required one week of work and are each about 250 lines, with an additional 200 lines of common code. The programming overhead of these kernels on Cell required significantly more effort than the scalar version&#8217;s 15 lines, due mainly to loop unrolling and intrinsics use. Although the stencils are a simpler kernel, the SpMV learning experience accelerated the coding process.</p>
<p>Having become experienced Cell programmers, the single precision time skewed stencil — although virtually a complete rewrite from the double precision single step version — required only a single day to code, debug, benchmark, and attain spectacular results of over 65 Gflop/s. This implementation consists of about 450 lines, due once again to unrolling and the heavy use of intrinsics.</p>
<p>The authors were able to do this kind of custom fit because they picked a programming model based on data parallelism. What this means is that they had the eight SPEs doing identical work on different parts of a highly parallel dataset. When you&#8217;ve got all eight SPEs marching in lock-step through a large, parallel dataset, then you can really put all of the hardware on that chip to work in a dramatic way, as the paper indeed shows.</p>
<p>IBM, however, is pushing a task-based approach to parallel programming the Cell, where there are many individual tasks running concurrently on the different SPEs. This is way harder to code for and optimize than the data parallism-based approach used in the paper, but it&#8217;s also where the money&#8217;s at in the consumer and game markets.</p>
<p>In the end, what the paper demonstrates is that, for the HPC kernels that are amenable to a data parallelism programming model, then Cell&#8217;s particular combination of a software-controlled memory hierarchy (with deterministic load latencies) and an obscene amount of parallel execution hardware is clearly the way to go. This approach is dramatically superior to a general-purpose computing architecture with a hardware-controlled memory hierarchy from both performance and performance/watt perspectives.</p>
<p>If Cell doesn&#8217;t really catch on as a commodity part outside the PS3, I expect we&#8217;ll eventually be posting a news item about a lab somewhere (Iran?) that placed an order for 200 PS3 consoles, with plans to cluster them.</p>
<p>Speaking of the PS3, that&#8217;s going to feature mostly task-based programming, which as I just said is a different beast than what was done in the Berkeley paper. Also, the programming will be done at higher levels of abstraction from the hardware. So please, don&#8217;t read this and then assume that Cell will administer a similar drubbing to general-purpose architectures like Opteron, Itanium, and Conroe on all game, physics, and AI code. </p>
<p>Source:</p>
<p>http://arstechnica.com/news.ars/post/20060615-7071.html</p>
]]></content:encoded>
			<wfw:commentRss>http://www.aboutai.com/2008/06/cell-could-offer-dramatic-boost-for-scientific-computing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Intel&#8217;s 80 Core Terascale Chip Explored</title>
		<link>http://www.aboutai.com/2007/02/intels-80-core-terascale-chip-explored/</link>
		<comments>http://www.aboutai.com/2007/02/intels-80-core-terascale-chip-explored/#comments</comments>
		<pubDate>Sun, 11 Feb 2007 18:29:17 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Processors]]></category>
		<category><![CDATA[multicore]]></category>
		<category><![CDATA[processor]]></category>
		<category><![CDATA[terascale]]></category>

		<guid isPermaLink="false">http://dev.aisolver.com/?p=80</guid>
		<description><![CDATA[During the Fall Intel Developer Forum in San Francisco this past September, Intel started to unveil information on its terascale processing projects. Terascale is basically defined as processing on terabytes of data on single machine requiring teraflops of power. Intel initially told us that the research being done by the terascale team was not intended [...]]]></description>
			<content:encoded><![CDATA[<p>During the Fall Intel Developer Forum in San Francisco this past September, Intel started to unveil information on its terascale processing projects. Terascale is basically defined as processing on terabytes of data on single machine requiring teraflops of power. Intel initially told us that the research being done by the terascale team was not intended for a particular product, or that it would even result in a sellable product, but as we have been getting more and more information, the likelihood of seeing this technology soon is increasing.</p>
<p>If you haven&#8217;t already, I would HIGHLY recommend you look over my original terascale computing article, as it covers the basics of how such an architecture functions and the benefits and drawbacks it offers for computing. In fact, the product that is being showcased with Intelâ€™s announcement today is basically the same thing we saw at IDF last year; onlnow we are getting data on frequencies and computing horsepower that were left out before.</p>
<p>I will be including the terascale backup data (the general information that leads up to today&#8217;s announcement) after the new information is shown here; so look for the remainder of the article should you need more information.</p>
<p>An 80-tile 1.28 TFLOPS CPU</p>
<p>Yes, we are indeed looking at what is essentially an 80-core processor; one of the worldâ€™s first and probably most exciting. The basic architecture of the 80-tile design is based on the ideas of a NoC architecture, or Network-on-Chip, that contains hundreds of processing elements with integrated on-die communications. The tiles are arranged in a 10&#215;8 2D mesh and can operate at speeds up to 4 GHz.</p>
<p>Each of these 80-tiles consist of a processing engine connected to a 5-port router for passing data amongst the tiles with a bandwidth up to 256GB/s. On each tile&#8217;s processing engine (PE) there are two floating point units that are single precision. For data storage, the PE includes a 3KB instruction memory and a 2KB data memory.</p>
<p>Each of the FPMACs (floating point units) has a 9-stage pipeline that can reach a sustained multiply-add result (2FLOPS) every cycle. With dual FPMACs in each PE, the tile can provide 16GFLOPS of aggregate performance at the peak 4 GHz clock speed.</p>
<p>The chips clocking scheme allows for mesochronous timing to allow for communication between the tiles independent of the clock timings. The PLL (PLL: Wikipedia) responsible for the clocks runs on both the horizontal and vertical axis (called spines) and distributes the clock information in timing pattern shown on the right hand side.</p>
<p>One of the most interesting parts of this chips design is the amount of power control that has gone into it. Fine-grained clock gating, sleep transistor cycles and enhanced circuits all combine to reduce the power the chip uses all in the hardware itself. In fact, each of the 80 tiles has 21 smaller sleep-sections that can be activated separately and the tiles use a 6-cycle pipeline wakeup sequence.</p>
<p>This sleep cycle method serves purposes: 1) it mitigates the current spikes that might arise from 80 cores waking up simultaneously and 2) it allows the FPMAC execution (data processing) to start only a single cycle into that wakeup sequence. Essentially, each tile can begin processing data before the rest of it wakes up. In all, about 90% of the FPMAC transistors and 74% of the total of each PE is sleep-enabled.</p>
<p>Even more impressive, this chip is able to achieve incredibly high clock speeds on modest power usage. Running on a 1.0v current at 110 degrees C the tile maximum frequency is 3.13 GHz while at 1.2v the tiles can run at 4.0 GHz. That brings the peak processing performance with all 80 tiles functioning on block matrix operations to 1.0TFLOPS at 1.0v and 1.28TFLOPS at 1.2v. Power consumption at these levels is estimated at 98W and 181W respectively.</p>
<p>Finally, we have a layout of the chip itself that measures only 275mm^2 in area; that is 3mm^2 for each tile with some additional I/O area added in. Built on a 65nm process technology and using standard copper interconnects, this chip is designed with a unique 1248-pin LGA package design and uses 100 million transistors.</p>
<p>No-where-near-closing Thoughts</p>
<p>This information that reached my inbox tonight is revolutionary beyond what I expected to see after being introduced to the technology late last year.</p>
<p>Here is a direct quote from the Intel PR:</p>
<p>&#8220;Intel has no plans to bring this exact chip designed with floating point cores to market. However, the company&#8217;s terascale research is instrumental in investigating new innovations in individual or specialized processor or core functions, the types of chip-to-chip and chip-to-computer interconnects required to best move data and, most importantly, how software will need to be designed to best leverage multiple processor cores. This Teraflops research chip offered specific insights in new silicon design methodologies, high-bandwidth interconnects and energy management approaches.&#8221;</p>
<p>Again, Intel is adamant about this product NOT being design with any specific purpose in mind, but I think they would be crazy to not further develop this technology into areas that could use the kind of processing power it provides. We&#8217;ve already heard talks about Intel going into the discrete GPU business, and our original look at the terascale computing projects looked at how this chip could handle real-time ray tracing. Such applications, as well as all kinds of super-computing algorithms could benefit from TFLOP performance on a single chip.</p>
<p>Here is another quote to get excited about:</p>
<p>&#8220;Further Tera-scale research will focus on the addition of 3-D stacked memory to the chip as well as developing more sophisticated research prototypes with many general-purpose IntelÂ® Architecture-based cores. Today, the Intel® Tera-scale Computing Research Program has more than 100 projects underway that explore other architectural, software and system design challenges.&#8221;</p>
<p>Adding 3D memory to the terascale processor is a requirement to fill the huge amount of processing power this chip can provide with data to actually perform it on. Also interesting to see is how Intel might be able to apply a more generic x86-like architecture to such a tiled design to bring this kind of power to even more users that demand it.</p>
<p>In all, this new announcement only adds to the allure of such 80-core processors, even with the very specific uses that they might be helpful for in today&#8217;s world. As data and storage continue to increase though, the ability to process terabytes of information with teraflops of CPU power is going move from mere theory to reality. </p>
<p>Source</p>
<p>http://www.pcper.com/article.php?aid=363</p>
]]></content:encoded>
			<wfw:commentRss>http://www.aboutai.com/2007/02/intels-80-core-terascale-chip-explored/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Intel presents prototype CPU with 80 cores</title>
		<link>http://www.aboutai.com/2006/09/intel-presents-prototype-cpu-with-80-cores/</link>
		<comments>http://www.aboutai.com/2006/09/intel-presents-prototype-cpu-with-80-cores/#comments</comments>
		<pubDate>Tue, 26 Sep 2006 18:32:13 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Processors]]></category>
		<category><![CDATA[multicore]]></category>
		<category><![CDATA[processor]]></category>

		<guid isPermaLink="false">http://dev.aisolver.com/?p=82</guid>
		<description><![CDATA[During the Intel Developer Forum taking place in San Francisco, Intel presented a prototype CPU with 80 cores operating at 3.1GHz promising 1 TeraFLOP of performance in our near future. In addition to the large number of cores, Intel will be using some recently announced RAM technologies to allow massive transfers of data among the [...]]]></description>
			<content:encoded><![CDATA[<p>During the Intel Developer Forum taking place in San Francisco, Intel presented a <a href="http://www.intel.com/pressroom/archive/releases/20060926corp_b.htm">prototype CPU with 80 cores</a> operating at 3.1GHz promising 1 TeraFLOP of performance in our near future. In addition to the large number of cores, Intel will be using some recently announced RAM technologies to allow massive transfers of data among the cores. Intel claims that their architecture can deliver more than a terabyte-per-second of bandwidth between the cores and a memory chip attached on the CPU.</p>
<p>This new multi-core CPU will bring to the desktop the kind of super computer performance that was available 10 years ago with the exception that the cost will now be dramatically smaller. Obviously, these CPUs will first find their way into server rooms of large data centers but it won&#8217;t be long before the cost is lowered and the same technology becomes available to the desktop market. Intel plans to mass produce the new chips in 5 years while its current focus is on bringing to the market their Quad Core CPUs. The latter will become available in November, 2006.</p>
<p>I think that such amazing progress in computing power spells good news for artificial intelligence and robotics. Other than the fact that researchers will now be able to attack larger problems at a much lower cost, many artificial intelligence, computer vision and machine learning algorithms are easily parallelizable and so researchers could easily take advantage of the multiple cores. In addition, Intel is focusing much effort in lowering the power consumption of their new CPUs which can only be good news for robotics. Lower power consumption along with hopefully an increase in battery capacity can lead to robots operating for longer periods of times in excess of the current 1-2 hours. Longer operating times will allow robots to achieve a larger variety of tasks with higher complexity.</p>
<p>I should point out that there is a number of AI researchers that believe that maybe we already have all the computational power that we need in order to achieve human level intelligence but what we luck is the proper methods. They might be correct. However, It is my opinion that having faster computers can help us find these methods since it will now be possible to run more experiments with significantly larger amounts of data.</p>
<p>Source:<br />
<a href="http://smart-machines.blogspot.com/2006/09/intel-presents-prototype-cpu-with-80.html">http://smart-machines.blogspot.com/2006/09/intel-presents-prototype-cpu-with-80.html</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.aboutai.com/2006/09/intel-presents-prototype-cpu-with-80-cores/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

