To better cool its servers, Microsoft plunges them into a boiling liquid

Microsoft is experimenting with a new liquid cooling technique at its data center in Quincy, located on the eastern shore of the Columbia River, in Washington state. In a blog post published on April 6, the company explains that it has developed and tested a two-phase immersion cooling technique in a production environment. She claims to be the first cloud player to have done so.

Boiling cooling
Concretely, it is a closed loop liquid cooling system. The servers are immersed in a liquid that is brought to a boil at 50 ° C, which is half the heat of water, which allows it to evaporate before the servers overheat. The fluid in question was designed by 3M. Unlike water, it is harmless to equipment because it has dielectric properties that make it an effective insulator. They can therefore operate without problem by bathing in it.

The servers are placed in a “sofa-shaped” box. When the liquid is heated and brought to a boil on contact with the components, vapor escapes and comes into contact with a condenser located in the lid of the casing. The liquid then falls like rain on the submerged servers and cools them. This makes it a closed circuit. The condenser itself is connected to a second closed circuit which also uses liquid to transfer heat to a dry cooler located outside the cabinet.

Low temperature boiling allows servers to run continuously at full power without the risk of failure due to overheating. Microsoft, the second largest cloud operator in the world behind AWS, took an interest in seeing what some cryptocurrency miners are doing. For its part, it is seeking to use it for other high performance computing (HPC) applications, in particular related to artificial intelligence.

5% to 15% savings in energy consumption
The cooling of data centers is a major problem at the so-called “hyper scale” of a cloud operator. All the players in the sector are looking to reduce as much as possible the energy consumption linked to cooling (which far exceeds that of the servers themselves), but they must also make it possible to constantly increase the available computing power and limit hardware failures.

As air cooling techniques are no longer sufficient and the demand for computing power is accelerating, “liquid cooling allows us to densify our infrastructures and therefore to continue the trend of Moore’s law at the scale of a data center”, explains Christian Belady, vice president of Microsoft’s advanced data center development group. In this case, during these tests the researchers observed a reduction in server consumption of between 5 and 15%.

This figure is all the more promising as the Natick project, during which Microsoft submerged servers to the bottom of the sea, experienced a failure rate eight times lower than for usual land-based data centers. A performance linked to the fact that their box was filled with pure and dry nitrogen instead of ambient air. The absence of humidity and the corrosive effects of oxygen would be involved. The team expects similar performance for this method of immersion cooling.

Risk-free overclockable servers
For Marcus Fontoura, Chief Architect of Azure Compute at Microsoft, one of the benefits of this two-tier cooling technique is having more flexibility to manage compute peaks. When a data center has such enclosures, it can allocate the demands to them during a sudden increase because the enclosures can be overclocked without the risk of overheating.

He gives the example of videoconferences during a pandemic, where many employees work from home and video connections are bursting. “We know that with Teams, when you arrive at 1pm or 2pm, there is a huge spike because people join meetings at the same time,” explains Marcus Fontoura.

Deployment in hard to reach areas
If the studies confirm that the cases of failure are low, Microsoft wants to work on a model where the components will not be immediately replaced when they fail in order to limit the losses of vapor. The next step will be to deploy caissons of this type in remote and difficult to serve places. “This first step is to get people comfortable with the concept and to show that we can run production workloads,” says Christian Belady. Among the examples given, a subwoofer could be deployed under a 5G cellular communications tower in the middle of a city for applications such as self-driving cars.


Please enter your comment!
Please enter your name here