ARMv9: Create a more secure architecture and stronger AI computing power
“ARM released some technical details of the ARMv8 processor architecture. This is ARM’s first processor architecture that supports a 64-bit instruction set. Because the authorized core of ARM processor is widely used in many Electronic products such as mobile phones, the ARMv8 architecture is used by many parties as the core technology of the processor. Ten years later, in order to meet the global demand for increasingly powerful security, artificial intelligence (AI) and ubiquitous dedicated processing, ARM announced the launch of the ARMv9 architecture. ARMv9 is based on the successful foundation of ARMv8 and is the latest ARM in the past decade. Architecture.
In November 2011, ARM released some technical details of the ARMv8 processor architecture. This is ARM’s first processor architecture that supports a 64-bit instruction set. Because the authorized core of ARM processor is widely used in many electronic products such as mobile phones, the ARMv8 architecture is used by many parties as the core technology of the processor. Ten years later, in order to meet the global demand for increasingly powerful security, artificial intelligence (AI) and ubiquitous dedicated processing, ARM announced the launch of the ARMv9 architecture. ARMv9 is based on the successful foundation of ARMv8 and is the latest ARM in the past decade. Architecture.
Simon Segars, CEO of ARM said: “The launch of the ARMv9 architecture marks our company’s entry into a new era. This is a global universal platform that promotes secure artificial intelligence-driven computing and will enable us to be composed of more than 1,000 partners. The ecosystem can continue until the 2030s. ARMv9 will give birth to a large market of 300 billion chips based on the ARM architecture. The ARMv9 roadmap contains many new elements to meet the needs of specialized computing from the smallest sensor to the largest supercomputer .”
ARMv9 architecture highlights: focus on security and AI
1. Confidential computing architecture to build a solid security fortress
As the global network of devices proliferates and security is becoming more and more important, Symantec has detected nearly 19 million attacks on its Internet of Things in Q1 of 2020. This is equivalent to the speed of every more than 100 attacks, which is 13% higher than the end of 2019, and the amount of cybercrime losses will be as high as six trillion U.S. dollars. It is not difficult to see that in the field of Internet of Things, one of the biggest challenges lies in the security of numerous terminal devices, data collection, and interaction with the physical world.
In order to protect global data security, the ARMv9 architecture roadmap introduced the ARM Confidential Compute Architecture (CCA). Confidential computing performs calculations by creating a hardware-based secure operating environment to protect part of the code and data from being accessed or modified, and even not affected by privileged software.
ARM CCA will introduce the concept of dynamically creating realms, which are for all applications and run in an environment independent of the secure or non-secure environment to achieve the purpose of protecting data security. For example, in commercial applications, the confidential domain can protect the commercial confidential data and codes in the system, regardless of whether they are being used, idle, or in transit. In fact, in a recent survey of corporate executives, more than 90% of the respondents believe that confidential computing can help reduce the cost of companies’ security investments, so that they can turn to a large number of engineering innovations. . The confidential area is a dynamic security area for storing data and executing code, separated from the privileged mode of the operating system or hypervisor.
It is built on ARM’s TrustZone technology and is a hardware version of the software container, allowing applications to easily run on different systems. However, this will require changes to the operating system (such as the Linaro version of Linux) and hypervisors, so the focus is on the ecosystem. For example, personal banking information can be completely separated from social media applications on smartphones. The new security features of ARM CCA mean that even if the social media application is really infected with malware, it will not spread to other parts of the device.
Seagars: “From cloud computing, data centers to supercomputers, the confidential field can provide security in the highest performance system. “The confidential field is a new technology. We expect it will be widely used. We are working very hard on software. Work to help people build software on top of hardware. “
In addition, ARM has also jointly developed the platform security architecture PSA with its partners, as a set of standard threat models, measures and time for use by terminal designers. Combining it with the classification certification can make the purchaser of this equipment more assured of the safety of the assets.
Memory tag: Solve long-standing problems in software
In addition, memory security has been plagued us for more than ten years. How to find problems before these memory security vulnerabilities are exploited is the most important step to improve global software security. To this end, ARM cooperated with Google to develop the “memory tag extension” technology, which can find space and time memory security issues in the software. These extended technology software will establish a connection between the pointer to the memory and the tag, and check whether the tag is correct when using the pointer. . If the access exceeds the range, the label check will fail. In this way, we can find the root cause of the memory safety problem.
2. Scalable Vector Extension (SVE2) greatly improves AI computing power
Remember Fukagu, the number one supercomputer in Japan in 2016? Scalable Vector Extension (SVE) was first applied here. Now ARM has launched a new version of SVE2, which is a technology used to assist machine learning and digital signal processing, which is helpful for processing a series of workloads such as 5G systems, VR/AR, and machine learning.
The advantage of SVE and SVE2 also lies in its variable vector size, ranging from 128b to 2048b, which allows the variable granularity of the vector to be 128b, no matter what the actual hardware is running. From a purely vector processing and programming perspective, this means that software developers will only need to compile their code once, and if a certain CPU has a native 512b SIMD execution pipeline in the future, the code will be able to fully utilize the unit’s The entire width. Similarly, the same code will be able to run on more conservative designs with lower hardware execution width capabilities, which is critical for ARM designs moving from the Internet of Things to CPUs in the data center.
It is understood that the challenge for ARM is to apply the SVE used in the Fugaku supercomputer to the A series chips of mobile AI, as well as the R series real-time controllers and M series microcontrollers.
Simon Seagars said: “We have been working on v9 for a long time, and now we can finally make it public. We pay special attention to the execution of ML workload and DSP, but we will focus on the ecosystem, open coding, and make it standardized in any It’s important to work on a horizontal platform.”
The ARMv9 SVE2 extension increases the ability to compress and decompress the code and data in the CPU core to reduce off-chip data movement, thereby reducing energy consumption. Seagars said: “SVE2 is a very important step. It expands the size of the data types we can manipulate, supports more parallelization, and will significantly improve the performance of many applications. In addition, SVE2 enhances a number of DSP and ML processing capabilities, such as Scatter-Gather DMA direct memory access, put it in the CPU architecture, can achieve more cycles, greater DSP processing capabilities, and thus support more parallelization.”
Seagars emphasized the importance of data reuse, “Data reuse is a problem we have been paying attention to for many years-the process of moving data to and from the chip consumes a lot of power, so we have done a lot of things to use it over the years. Data on the chip. This is why we have increased the data size in SVE2. The more data on the chip, the higher the energy efficiency.” However, SVE2 needs to be adjusted for the R series and M series implementations. This will more directly compete with the expansion in the RISC-V architecture.
In terms of computing performance, ARM expects that with the optimization of software and hardware, the next two generations of CPU products based on the ARMv9 architecture will achieve a performance increase of over 30%, and such a 30% increase in computing power is entirely due to its own architecture rather than the process Process to achieve.
How to weigh the degree of standardization?
ARM talked about the degree of standardization, emphasizing how to balance the standard and fixed. If there is too much standardization, partners will not be able to develop suitable dedicated solutions. But if there is too little standardization, we have to bear the risk of low-value, similar solutions. This will increase the cost of the software ecosystem without any benefit. To this end, ARM launched the server infrastructure SBSA and related certification programs, which will promote an appropriate amount of standardization process.
In the past ten years, ARM has expanded many new technologies and strengthened many security functions. This is the main update of the ARM architecture in the next ten years, and it will be applied to all of our CPUs. Today, data collection often starts with ultra-low-power IoT devices such as the ARM-M series. For the future, the need to manage large amounts of data means that data processing across the entire network will continue to increase. However, many networks currently only serve as relatively simple channels for data transmission. Therefore, it is necessary to enhance computing power and standardized systems at the edge of the network, which requires new components and versions.
ARM plans to release a synchronized version of the ARMv9 architecture every year. Seagars said: “In ARMv8, we created additional configuration files for the R series and M series. They implement some features in slightly different ways. I hope the same is true for v9. The timetable for the R series and M series is not yet public. , But over time, these will be implemented on our CPU.”
ARM also hopes to provide a standardized process called SystemReady so that the code can easily run on any v9 processor. This must take into account a large number of different CPU, GPU and NPU cores and peripherals.
Data center standardization, security, and code portability are Amazon’s main concerns, and Amazon is a key customer of its Graviton chip.
Seagars said: “It is expected that Neoverse core based on v9 will be available in the near future. We have seen many activities around the ARM architecture in the data center. We expect other data center vendors to also deploy ARM technology. Over time, it will shift from v8 to v9.”
“As we look to the computing field in the next ten years, no model is universal. From a wider range of execution units to small energy-saving microcontrollers that process these vectors, these microcontrollers do not have huge processing power, but they still have to Processing in the most energy-efficient and safe way, so we hope to mix and match CPU, GPU, NPU and any other processing power in one framework and reuse as much as possible.”
Future ARM CPU roadmap
ARM talked about CPU planning, which is closely related to the technical roadmap of the upcoming v9 design. ARM also talked about their views on the expected performance of the v9 design in the next two years.
ARM continues to regard the CPU as the most versatile computing module in the future. Although dedicated accelerators or GPUs will have a place, they are difficult to solve some important issues, such as programmability, protection, universality (essentially the ability to run them on any device), and proven normal operation ability. At present, the computing ecosystem is extremely fragmented in the way it operates. Not only are equipment types different, but equipment vendors and operating systems are also different.
SVE2 and Matrix Multiply can greatly simplify the software ecosystem and allow computing workloads to move forward in a more uniform way, which will be able to run on any device in the future.
Since 1991, ARM has shipped 180 billion units. They expect that, driven by the ARMv9 architecture, shipments will exceed 300 billion units in the next 10 years.