Did you know? For an ordinary chip, processing a complex mathematical task like reconstructing the surface of the cerebral cortex takes at least tens of minutes. The new chip created by Professor Yang Yuchao's team at Peking University and Researcher Song Zhitang's team at the Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, compresses this time to a fraction of a second.
This is a 40nm chip that requires only 2.12 milliseconds per iteration for running a neural dynamical system, making it tens of times faster than the previously known fastest similar chip.
Its core capability lies in a device called phase-change memory. The researchers have innovated this memory, using it both to store data and to adjust the computation step size. In the study, they used this chip for 3D reconstruction of the cerebral cortex and gray matter surface, showing an error of less than 0.001 mm and a speed nearly 500 times faster than the best GPU.
For neural dynamical systems, "neural" means it embeds neural networks that can learn patterns from data. "Dynamical system" refers to a set of mathematical equations that describe how something changes over time. Combining these two yields a computational model that can both learn and evolve.
Such a model is well-suited for reconstructing the surface of 3D objects, as shapes vary greatly. Traditional methods that compute frame by frame are not only slow but also error-prone.
![]()
(Source: Science)
Neural dynamical systems treat shape changes as a continuous flow process. It starts from a rough template and gradually deforms it until it fits the real object surface. Mathematically, this method is rigorous, ensuring that the surface does not self-intersect during deformation, resulting in smooth, complete, and hole-free 3D meshes. This property is crucial for medical imaging, computer graphics, and AR/VR.
However, neural dynamical systems have a huge problem: they are computationally slow. They require repeated numerical integration, and each step needs step-size adjustment. If the step size is too large, the result becomes unstable; if too small, it takes forever. The controller must constantly try and error.
This trial-and-error process involves a large number of reads, writes, multiplications, and additions. Traditional chips move data back and forth during these operations, wasting time and power. The researchers took a different approach: using phase-change memory to adjust the step size.
Phase-change memory is a material that changes its resistance with heat. When electrically heated, its crystal state changes, causing resistance to change. The researchers found that this resistance change is not static; it exhibits a phenomenon called drift, where the resistance slowly changes over time. Previously considered a defect, the researchers saw it as an opportunity.
In the study, they controlled the resistance drift to follow a predetermined direction and speed. When the step size needed to increase, they adjusted the resistance to a certain value; when it needed to decrease, they adjusted to another value. This process required no additional computation circuits or data movement. As the resistance drifted, the step size was automatically adjusted.
This design compressed the hardware area for step-size search to one-tenth of traditional solutions. On the chip, a phase-change memory array runs the neural network. Multiplications and additions for each step are completed directly in the memory without moving data out. Once the current flows, the result comes out.
On the chip, the phase-change memory array runs the neural network while the same material handles step-size drift. These two functions do not interfere and work harmoniously. Thus, phase-change memory becomes part of the computation, not just a data storage location.
In traditional chip designs, step-size adjustment relies on a bunch of digital circuits: counters, comparators, multipliers, and adders. These occupy significant space and require reading, computing, and writing data multiple times per adjustment, resulting in several rounds of back-and-forth.
The drift effect of phase-change memory bypasses all these steps. The resistance changes on its own, with speed and direction controlled, effectively performing step-size adjustment at the physical level.
![]()
Figure: From left to right: Yang Yuchao, Song Zhitang (Source: Provided)
To achieve the required control precision, the researchers made many optimizations in materials and processes. They doped carbon into the phase-change material to refine grains, maintaining stable electrical performance after repeated phase transitions. Tests showed that the chip can withstand 10^10 write-erase cycles, equivalent to continuous operation for several years. When temperature varies from 0°C to 70°C, the resistance distribution shifts uniformly, and different resistance levels maintain clear separation without overlapping. This is crucial because chips encounter various temperatures in real devices, such as hot phones, cold winters, and poor server cooling, so stability is required in all conditions.
Another advantage of phase-change memory is its ability to store multi-level resistance. Ordinary memory can only store 0 and 1 states. This chip can stably store 16 different resistance values, arranged in a differential structure, with each cell expressing up to ±8 levels. This packs more weight information into the same area, allowing the neural network to run faster.
To achieve such density and stability, the researchers doped carbon to refine grains, maintaining stable performance after repeated phase changes, giving the chip a write-erase endurance of 10^10 cycles, enough for several years of continuous operation. When temperature varies from 0°C to 70°C, the resistance distribution only shifts linearly without significant overlap or out-of-control behavior.
![]()
(Source: Science)
The researchers performed several tests to verify the chip's performance. The most rigorous was 3D reconstruction of the cerebral cortex, a highly time-consuming task in medical imaging. The traditional tool FreeSurfer takes two to three hours per run; even a 16-core server needs two and a half hours. With a standard GPU using the same neural dynamical system algorithm, the fastest time was nearly two seconds.
The new chip took only 426 milliseconds—50 times faster than the GPU. The reconstructed cerebral cortex surface had very low error: the average distance errors for gray matter and white matter were only 0.245 mm and 0.376 mm, respectively, with no holes or intersections. It can be directly used for 3D printing to create navigation models for brain surgery.
They also used the chip for more complex 3D manifold mesh generation. Each iteration took only 2.12 milliseconds, 36 times faster than the previously known fastest similar chip, with only 1/24 the power consumption. For a complete surface reconstruction, the chip consumed about one-thousandth of the energy needed to charge a phone. Tasks that once required a server to grind away for half an hour can now be completed in the blink of an eye with negligible power.
The researchers also designed a time-interleaving mechanism, allowing step-size drift to work on different memory rows in rotation. This evenly distributes the workload across rows, greatly extending the array's lifespan. While a single phase-change memory cell has a write-erase endurance of 10^10 cycles, this rotation scheme allows the entire chip to far exceed the limit of a single device. This engineering consideration shows that the researchers were thinking from the start about moving the technology from the lab to real-world environments, not just publishing a paper.
Compared to traditional neural dynamical system accelerators, this chip also has a significant area advantage. For the same task, the traditional approach requires 0.7 mm² for multiply-accumulate circuits and 0.26 mm² for weight cache, totaling about 1 mm². The researchers' phase-change memory solution stores weights directly in the memory array, completes multiply-accumulate operations in the array, and uses the drift effect for step-size adjustment, resulting in a total computation-related area of only 0.28 mm². With smaller area comes lower power consumption, reduced heat generation, simpler cooling solutions, and better packaging and system integration.
However, the chip is still a prototype. Its significance goes beyond impressive numbers. In traditional computers, storage and computation are separate; data shuttles back and forth, creating a major bottleneck. The researchers' phase-change memory performs computation right where data is stored, saving both data movement and power.
Previously, neural dynamical systems required high computational power, limiting many tasks to large servers. Now, a dedicated chip can handle them, with speeds hundreds of times faster. In the era before magnetic storage, many researchers dismissed phase-change memory as unstable, with drift noise. But the researchers turned these defects into features: drift is no longer noise but a step-search engine; resistance is no longer a fixed value but a tunable computational parameter. This provides a concrete verification of in-memory computing.
![]()
Figure: First author Cai Lei (Source: Cai Lei)
The cerebral cortex reconstruction is just the beginning. In the physical world, there are many scenarios requiring high-fidelity surface modeling: 3D reconstruction of coronary vessels, real-time environmental modeling for autonomous driving, digital preservation of cultural relics, and dynamic character deformation in games. Each scenario demands smooth, complete, error-free 3D surfaces computed in extremely short times.
This chip demonstrates just how fast such computation can be and how deep in-memory computing can go. The supplementary results are published in Science. One key significance is that phase-change memory has been tried in digital computers for years but never took off in the consumer market. Using it for analog in-memory computing turns its drift and multi-level resistance into useful tools.
Corresponding authors of the paper also include Researcher Zhu Yixin and Associate Researcher Tao Yaoyu from Peking University. First authors are Postdoctoral Fellow Cai Lei from Peking University (now a Lecturer at Beijing University of Chemical Technology), Researcher Xie Chenchen from Shanghai Institute of Microsystem and Information Technology, and Postdoctoral Fellow Yan Longhao from Peking University.
Many improvements are expected for this chip, such as expanding the array size, optimizing peripheral circuits, and integrating more tightly with large models. Many previously impossible things will become reality one after another.
Reference: Related paper: https://www.science.org/eprint/WEY75M4YUHGJVGTEX5YC/full?activationRedirect=/doi/full/10.1126/science.aee6277
http://shmmc.kjtj.cas.cn/zj/201505/t20150519_493975.html
https://www.ece.pku.edu.cn/info/1045/2542.htm
https://www.ai.pku.edu.cn/info/1137/2306.htm
https://www.ai.pku.edu.cn/info/1136/1864.htm
https://www.linkedin.com/in/%E6%99%A8%E6%99%A8-%E8%A7%A3-4221a5b4/
https://ic.pku.edu.cn/szdw/bsh/index.htm
https://www.linkedin.com/in/%E6%99%A8%E6%99%A8-%E8%A7%A3-4221a5b4/
Layout: Hu Weiwei
Note: The cover/first image is generated with AI assistance.



