Showing posts with label Cray. Show all posts
Showing posts with label Cray. Show all posts

Tuesday, March 31, 2020

Climbing El Capitan? Details Detailed!

 Supercomputer Leaps to New Peaks

At the top of the show we discuss whether Indiana (where Jessi is located at Purdue) is in the heartland or not. We all agree that it is and, yes, Jessi sees Larry Bird all the time.

Getting into the heart of the episode, Dan talks about the briefing he received on the new Lawrence Livermore El Capitan system to be built by HPE/Cray. This new $600 million system will be fueled by the AMD Genoa processor coupled with AMD’s Instinct GPUs. Performance should come in at TWO 64-bit exaflops peak, which is very, very sporty. The new box (more like a room) will be 10x faster than today’s fastest supercomputer and faster than the top 200 supercomputers in the world – combined.

As the show continues, we talk about the specifics of the system and components. Henry make the unfortunate mistake of bringing up IEEE floating point and sending Dan into a mini-rant. Back to the show, the system should require somewhere close to 30MW worth of electricity, which is much lower than the nearly 60MW predicted just a year or so ago. Not surprisingly, the system will be liquid cooled, but not, as we speculate, cooled by Slushy machines. We have a tremendous tech talk around the varying aspects of the machine and AMD’s great progress in clawing their way back into the market. Well worth a listen.

Why No One Should Ever Be Online. Ever.

In this edition, Henry talks about how an ultrasonic hack can make your phone vulnerable to pownership. Just sending the exact right frequency of sound to a phone sitting on a solid object might be enough to unlock it and let a miscreant get at all of your goodies. Yikes!

Catch of the Week

Jessi:  Astronaut applications have opened up again! If you ever wanted to go to space, this might be your chance. You’ll want to have a strong science and computing background – plus hero or heroine good looks wouldn’t hurt either.

Shahin:  Discusses AMD vs. NVIDIA GPU comparative shipment figures from 4Q2019.

Henry:  Net is empty, ouch.

DanBees can count to six, which is hugely disquieting. If bees can do math, we might be doomed. Maybe this is why beehives are hexagonal?

Listen in to hear the full conversation

* Download the MP3 
* Sign up for the insideHPC Newsletter
* Follow us on Twitter
Subscribe on Spotify 
Subscribe on Google Play 
Subscribe on iTunes 
RSS Feed
* eMail us



Monday, March 9, 2020

Let's Learn Deeply about Extreme Weather

This week we have Dan, Jessi, and Shahin on the call. Henry is off in Los Cruces overseeing construction of what can only be called a bunker. Why? Its main feature is 21-inch rammed earth walls, guaranteed to withstand withering heat waves, cold snaps, and probably any high caliber round. We speculate on the exact configuration of the home, wondering if Henry is running wild with the rammed earth and concrete theme, with concrete chairs and tables, plus rammed earth interior walls.

Applying Deep Learning to Extreme Weather

Dan deftly moves us on to the main topic of this show, how researchers are using supercomputers to apply deep learning to extreme weather. A research team from Rice University utilized three supercomputers (TACC’s Stampede 2, Wrangler, and Pittsburg Supercomputing Center’s Bridges system) to see if data on heat waves and cold spells could be predicted by analysis of atmospheric circulation and prior surface temperature. The results of these tests indicated that this deep learning approach is more accurate at predicting extreme weather.

In the call, we discuss the computational difficulty of weather forecasting and the use case that the Rice researchers are testing. This promising research can pay great dividends in terms of  giving early warning to hazardous weather, saving crops and perhaps saving lives in the process.  As promised in the podcast, here’s a link to the paper. We also have a short discussion of what motivates Dan to read a particular paper and what turns him off. Jessi’s main standard in papers is that it has to be able to be printed in black and white and remain legible and understandable. So if you want to attract Jessi’s attention for your paper, make sure your charts don’t use color.

Things You Think You Know, But Maybe Don't.

The question this week is why Cray computers were horseshoe shaped.  One of the reasons was wire length and this shape puts the components closely together to reduce the length of the wires needed to connect them. It also gave them enough room for a person to get their hands inside to weave the wires. So the key was minimal, uniform, and accessible wire length. There are also a couple of other explanations, one is that it gave room for the liquid cooling pipes necessary to cool the box, another is that the system forms a capital “C” shape, which stands for, of course, Cray.

Catch of the Week



Henry: is away this week. (We know some of you don't read this all and come straight here!)

Jessi:  Tells us that the US might want to take a close look at Estonia as a model to overcome cybersecurity. The country has put together a civilian cybersecurity force and instituted mandatory cyber classes in schools. This is a response to massive cyberattacks launched against Estonia in 2007 that took down much of their digital infrastructure for weeks.

Shahin:  Discusses how Justine Haupt came up with a way to keep her cell phone from distracting her – she built a rotary dial interface for it. Along with helping save her from using the most time-wasting features on her phone, it will also confound an entire generation of folks who have never seen a rotary phone dialer. Justine also is working in robotics and has a page of her inventions and thoughts.

Dan:  Brings up a story about a man convicted of murder mainly on the basis of DNA evidence, although that evidence was shaky, mainly saying that they couldn’t exclude him. His case was reopened by the Innocence Project who reached out to a company called Cybergenetics for further analysis. Cybergenetics ran samples through their 170,000 line AI algorithm and found that there was zero chance that the convicted man’s DNA was present in the sample. So the man will be released, which is great. The problem is that the Cbyergenetics code is a black box and the company, citing competitive advantage, will not release the code.  How should we deal with situations like this in the future?

Listen in to hear the full conversation

* Download the MP3 
* Sign up for the insideHPC Newsletter
* Follow us on Twitter
Subscribe on Spotify 
Subscribe on Google Play 
Subscribe on iTunes 
RSS Feed
* eMail us

Wednesday, March 4, 2020

Slingshotting to Exascale, It's Hot!

At the top of this episode, Henry notes that the temperature in his city will be touching -15F, which is plenty cold. However, it’s very good overclocking weather as Dan and Shahin point out. Not quite quantum weather, unfortunately.

Cray Slingshot Interconnect

We quickly get to the main topic of the day, an examination of HPE/Cray’s Slingshot interconnect. It’s Ethernet on HPC steroids and will be the interconnect of choice for their upcoming slate of Exascale systems. Slingshot includes a bunch of HPC enhancements while maintaining compatibility with existing Ethernet devices and protocols. Cray has designed a new Ethernet superset of features that includes smaller headers, support for smaller message sizes, plus other features aimed at cutting Ethernet latency and improving performance on HPC-oriented interconnect tasks. At the heart of this new interconnect is their innovative 64 port switch that provides a maximum of 200 Gb/s per port and can support Cray’s enhanced Ethernet along with standard Ethernet message passing. It also has advanced congestion control and quality of service modes that ensure that each job gets their right amount of bandwidth.
The architecture can scale to an astounding 279,040 endpoints, which is, as we note, “a lot of endpoints.” We also kick around the possibility that HPE/Cray might sell the interconnect as a standalone for use with competitive gear.

Cray Slingshot Interconnect

As mentioned on the call, the chips on this switch run so hot that they need liquid cooling – a first for interconnect processors. We also discuss the rising heat load coming from new CPUs and particularly ASICs and how network design can greatly impact costs. Listen to the show to learn about more, it’s a good and meaty discussion.

Why Nobody Should Ever be Online. Ever.

Henry’s latest reason why we need to abandon the internet cracks us all up. What’s so funny? It’s that the Phillips smart lightbulbs need a firmware upgrade in order to prevent miscreants from pwoning your entire network. No kidding, it’s true. And hilarious. Here’s the link. This has Henry thinking about how to protect his new home from war flying drones. He’s looking into drone killing home-based air defense systems or perhaps a whole-home Faraday cage.

Catch of the Week



Henry:  Another security related story, this time about low level exploits in the Cisco Discovery Protocol (CDP) that can expose tens of millions of devices to internet troublemakers. This is highly disturbing since there is so much Cisco gear out there and the fix relies on users updating their firmware to plug the holes. Ouch.

Jessi:  Brings athletics into the podcast, which is the cause of some banter about how totally un-athletic the rest of us are (with the exception of Jessi, of course). Nike is using big time computation to 3D print their new uppers to give athletes the ultimate advantage in shoe performance.

Shahin:  Alerts us to a comprehensive review of AMD’s Ryzen Threadripper 3990X, the first CPU in the world to sport 64 cores. This CPU is currently the top of AMD’s line and is just another signpost signaling AMD’s resurgence. Welcome back, AMD.

Dan:  As we covered in a prior episode, Microsoft had the fantastic idea of forcing their corporate Office 365 users to have Microsoft’s Bing installed as their default search engine, using an update to accomplish this task. Well, the users have spoken and their voice was heard loud and clear in Redmond. The company is retreating from their forced ‘upgrade’ to Bing and back pedaling with all due speed. Hee. Hee.

Listen in to hear the full conversation

* Download the MP3 
* Sign up for the insideHPC Newsletter
* Follow us on Twitter
Subscribe on Spotify 
Subscribe on Google Play 
Subscribe on iTunes 
RSS Feed
* eMail us

Saturday, August 24, 2019

Coral is Cray for All

Cray Pulls an Exascale Hat Trick

Guess who's having a great year? Think Aurora, Frontier, and El Capitan. Cray has put some nice numbers on the accounts receivable ledger, and these are not ordinary numbers. The Exascale era is being defined substantially by the DOE Coral program and the commercial markets are watching as their computing needs start looking like those of the national labs. In that context, Cray's clean sweep makes its leadership in this area very important.
All of this is happening as Cray gears up to become what we hope to be an important part of HPE. The last time Cray sold anything like this to anyone was Cray BSD going to Sun, and that ended up being a multibillion dollar juggernaut. Exascale is a bigger deal, especially as supercomputing goes mainstream because of AI and data science. Exciting times. And kudos to HPE for snapping up Cray at the right time.

The impact of AI on Science

Speaking of AI, there is a series of town halls is being held around the nation by Argonne National Labs "aimed at collecting community input on the opportunities and challenges facing the scientific community in the era of convergence of High Performance Computing (HPC) and artificial intelligence (AI) technologies and the expected integration of large-scale simulation, advanced data analysis, data driven predictive modeling, theory, and high-throughput experiments. The term we are using to represent the next generation of methods and scientific opportunity is 'AI for Science'."
Co-chairing the town halls are Rick Stevens of Argonne, Kathy Yelick from Berkeley Labs, and Oak Ridge Labs' Jeff Nichols. Dan references a very good interview with Rick Stevens.

Henry Newman's Feel-Good Security Corner

Henry delights us all once again by describing how your camera can be an "Enter Here" sign for malware:

Canon DSLR Camera Infected with Ransomware Over the Air

Vulnerabilities in the image transfer protocol used in digital cameras enabled a security researcher to infect with ransomware a Canon EOS 80D DSLR over a rogue WiFi connection.
A host of six flaws discovered in the implementation of the Picture Transfer Protocol (PTP) in Canon cameras, some of them offering exploit options for a variety of attacks.

Catch of the Week

It was a pretty full episode and so we skip Catch-of-the-Week segment this week.

Listen in to hear the full conversation.

Download the MP3 * Subscribe on iTunes * RSS Feed

Sign up for our insideHPC Newsletter

Wednesday, July 3, 2019

Why did HPE buy Cray?

Why did HPE buy Cray?

The RFHPC team tackles the HPE-Cray acquisition as it reviews the companies' recent moves and strengths and market conditions in the context of:
  • the 5-tier data center application architecture: Embedded, Mobile, Desktop, On-premises, Off-premises
  • the emergence of AI as a must-do enterprise app, and
  • increasing commonality between supercomputers and enterprise servers.

Catch of the Week


Henry:

Another week another breach!

Massive Quest Diagnostics data breach impacts 12 million patients

A massive data breach has struck Quest Diagnostics and the information of up to 11.9 million patients has potentially been compromised. On Monday, the US clinical laboratory said that American Medical Collection Agency (AMCA), a billing collections provider that works with Quest, informed the company that an unauthorized user had managed to obtain access to AMCA systems.

Dan:

Dan points out that the new Apple Mac Pro can be configured to cost tens of thousands of dollars. Given that he and Dan are PC people, the nuances of the Apple value are obviously lost of them, goes the counter argument.

Apple’s top spec Mac Pro will likely cost at least $35,000

That’s before you count the GPUs or a Pro Display XDR screen.
Apple announced today that its new Mac Pro starts at an already pricey $6,000, but the company neglected to mention how much the top-of-the-line model will cost. So we shopped around for equivalent parts to the top-end spec that Apple’s promising. As it turns out: $33,720.88 is likely the bare minimum — and that’s before factoring in the four GPUs, which could easily jack that price up to around $45,000.
Listen in to hear the full conversation.

Download the MP3 * Subscribe on iTunes * RSS Feed

Sign up for our insideHPC Newsletter

Monday, June 17, 2019

TOP500 Jun2019, Facebook Coin

The new TOP500 list of most powerful supercomputers is out and we do our usual quick analysis. Not much changed in the TOP10 but a lot is changing further down the list. Here is a quick take:
  • There are 65 new entries in 2019.
  • US science is receiving support via DOE sites and academic sites like TACC.
  • 26 countries are represented. China continues to widen its lead, now with 219 entries, followed by the US with 116, Japan with 29, France with 19, the UK with 18, Germany with 14, Ireland and the Netherlands with 13 each, and Singapore with 10.
  • Vendors substantially reflect the country standings. Lenovo has 175 entries, Inspur 71, and Sugon 63, all in China. Cray with 42 and HPE with 40 (which will combine when their deal closes), followed by Dell at 17 and IBM at 16.  Bull has 21 entries.
  • There are a lot of "accidental supercomputers" on the list. These are systems that probably are not be doing much science or AI work but they could, and the vendors counted them and it seems to be within the rules to list them. It's controversial but not a new practice.
  • There are several systems listed as "Internet" companies. Hard to tell what that means but it points to the existence of very large clusters in the cloud for whatever purpose. Last year, there was one system listed as Amazon EC2, which remains on the list. This time, there is also one at Facebook. Usually the big social/cloud players don't care to participate, though they obviously could summon the resources to run the benchmarks.
  • Just over half of systems use Ethernet as a fabric. A quarter us InfiniBand, nearly 50 use Intel's OmniPath, and the rest, 55, use custom interconnects like the ones Cray provides. The team talks about Cray+HPE entering the interconnect business for real and if so, they will be formidable.
  • The majority of entries, 367, do not have any accelerators. 125 use Nvidia GPUs.
  • The overwhelming majority of the systems, 478 of them, are based on Intel CPUs. 13 are IBM, and there is 1 system based on Arm provided by Cavium, now part of Marvell.
  • So the when it comes to chips, it's an Intel game with a respectable showing by Nvidia when GPUs are used. Alternatives are bound to appear as the tens and tens of AI chips in the works become available and Arm, AMD, and IBM build on. The recently announced system at Oakridge will be all AMD, and that will point to an alternative as well.
  • Notably, Intel is listed as the vendor for 2 entries and Nvidia is listed for 4. While Intel has stayed largely away from looking like a system vendor, Nvidia is going for it with its usual alacrity. That, and the pending acquisition of Mellanox by Nvidia should serve as a warning to all system vendors who might feel stuck between treating Nvidia as an important supplier and an up and coming competitor.

CryptoSuper500

Shahin mentions the 2nd edition of the CryptoSuper500 list (really 50 for now), a list developed by his colleague Dr. Stephen Perrenod, which was launched last November, and is being released at the same time as the TOP500. The TOP500 has spawned variations that look at different workloads and attributes, for example, the Green500Graph500, and IO500 lists. CryptoSuper500 was inspired by those lists. The material for the inaugural edition of the CryptoSuper500 list here.
Cryptocurrency mining operations are often pooled and are very much supercomputing class, typically using accelerator technologies such as custom ASICs, FPGAs, or GPUs. Bitcoin is the most notable of such currencies. Scroll down for the top-10 list and see the slides for the full list and the methodology.

Catch of the Week


Henry:

Henry talks about check-out lanes at Target all being down for unknown reasons, though he hesitates to call that a cybersecurity breach. It turned out he's right and the company blamed an "internal technology issue".

Target down (then back up) as cash registers fail and leave long lines

Target's payment systems appeared to be missing the mark the day before Father's Day, as terminals went AWOL for a couple of hours in a number of the company's US retail outlets. The outage caused long lines but prompted an encouraging show of sympathy for Target employees from people on Twitter. And there were some jokes too, of course.

Shahin:

Facebook is expected to release a new cryptocurrency that is already impacting the crypto market.

Here’s what we know so far about the secretive Facebook coin

Facebook is likely to release information about its secretive cryptocurrency project, codenamed Libra, as soon as June 18, TechCrunch reports.
As is traditional with new cryptocurrencies, the social networking giant is expected to release a so-called “white paper” outlining how the currency works and the company’s plans for it.

Dan:

Dan reminds us all of the inimitable Erich Anton Paul von Däniken and his ancient astronauts hypotheses!

Listen in to hear the full conversation.

Download the MP3 * Subscribe on iTunes * RSS Feed

Sign up for our insideHPC Newsletter