HN

Arm's Cortex X925: Reaching Desktop Performance (chipsandcheese.com)

15h ago by ingve 252 points 151 comments

Incipient 13h ago

Without being a cpu geek, a lot of the branch prediction details go over my head, however generally a good review. I liked the detail of performance on more complex workloads where IPC can get muddy when you need more instructions.

I feel these days however, for any comparison of performance, power envelope needs to be included (I realise this is dependent on the final chip)

adrian_b 12h ago

ARM Cortex-X925 achieves indeed a very good IPC, but it has competitive performance only in general-purpose applications that cannot benefit from using array operations (i.e. the vector instructions and registers). The results shown in the parent article for the integer tests of SPEC CPU2017 are probably representative for Cortex-X925 when running this kind of applications.

While the parent article shows AMD Zen 5 having significantly better results in floating-point SPEC CPU2017, these benchmark results are still misleading, because in properly optimized for AVX-512 applications the difference between Zen 5 and Cortex-X925 would be much greater. I have no idea how SPEC has been compiled by the author of the article, but the floating-point results are not consistent with programs optimized for Zen 5.

One disadvantage of Cortex-X925 is having narrower vector instructions and registers, which requires more instructions for the same task and it is only partially compensated by the fact that Cortex-X925 can execute up to 6 128-bit instructions per clock cycle (vs. up to 4 vector instructions per clock cycle for Intel/AMD, but which are wider, 256-bit for Intel and up to 512-bit for Zen 5). This has been shown in the parent article.

The second disadvantage of Cortex-X925 is that it has an unbalanced microarchitecture for vector operations. For decades most CPUs with good vector performance had an equal throughput for fused multiply-add operations and for loads from the L1 cache memory. This is required to ensure that the execution units are fed all the time with operands in many applications.

However, Cortex-X925 can do at most 4 loads, while it can do 6 FMAs. Because of this lower load throughput Cortex-X925 can reach the maximum FMA throughput only much less frequently than the AMD or Intel CPUs. This is compounded by the fact that achieving better FMA to load ratios requires more storage space in the architectural vector registers, and Cortex-X925 is also disadvantaged for this, by having 4-time smaller vector registers than Zen 5.

my123 11h ago

> While the parent article shows AMD Zen 5 having significantly better results in floating-point SPEC CPU2017, these benchmark results are still misleading, because in properly optimized for AVX-512 applications the difference between Zen 5 and Cortex-X925 would be much greater. I have no idea how SPEC has been compiled by the author of the article, but the floating-point results are not consistent with programs optimized for Zen 5.

The arithmetic intensity of most SPECfp subtests is quite low. You see this wall because it ends up reaching bandwidth limitations long before running out of compute on cores with beefy SIMD.

hajile 3h ago

SIMD workloads on CPU tend to be bursty. If your workload is all SIMD with few other instructions or branches, it's almost certainly going to be faster on a GPU or SME co-processor.

If there's space between the SIMD instructions, then double-pumping or even quad-pumping isn't very expensive (and with 6 SIMD ports, it might even be basically free).

DeathArrow 12h ago

Still, what percentage of software uses AVX512 for its core functionality, so vector performance matters in practice?

galangalalgol 11h ago

Auto vectorizing optimizers have gotten quite good. If you are using integers it often just happens whether you think about it or not. With floats unless you specify fast math you will need to use wide types to let it know you don't care about floating point addition order.

CyberDildonics 8h ago

I don't know where the focus on vector instructions comes from. 6 128-bit instructions per clock is not bad at all. 512 bit wide vector instruction being used are exotic.

What most people want is interactivity and fast web pages which doesn't have much to do with wide vector instructions (except possibly for optimized video decoding).

barrkel 9h ago

In my view, power consumption isn't relevant to a desktop or workstation (and increasingly, desktop machines are workstations since almost everyone uses laptops instead). When I'm plugged into a wall socket, I will take performance over efficiency at every decision point. Power consumption matters to the degree that the resulting heat needs to be dissipated, and if you can't get rid of the heat fast enough, you lose performance.

rbanffy 1h ago

There is a whole universe of good-enough desktop computers that doesn't care that much about performance, but where power consumption is important, because it makes the computer bulky, noisy, and expensive.

I'd love to have a Xeon 6, a big EPYC, or an AmpereOne (or a loaded IBM LinuxOne Express) as my daily driver, but that's just not something I can justify. It'd not be easy to come up with something for all this compute capacity to do. A reasonable GPU is a much better match for most of my workloads, which aren't even about pushing pixels anymore - iGPUs are enough these days - but multiplying matrices with embarrassingly low precision, so it can pretend to understand programming tasks.

dinglo 13h ago

If ARM starts dominating in desktop and laptop spaces with a quite different set of applications, might we start seeing more software bugs around race conditions? Caused by developers writing software with X86 in mind, with its differing constraints on memory ordering.

vardump 12h ago

That's a possibility. Some code still assumes (without realizing!) x86 style ordered loads and stores. This is called a strong memory model, specifically TSO, Total Store Order. If you tell x86 to execute "a=1; b=2;", it will always store value to 'a' first. Of course compilers might reorder stores and loads, but that's another matter.

ARM is free to reorder stores and loads. This is called a weak memory model. So unless it's explicitly told to the compiler, like C++ memory_order::acquire and memory_order::release, you might get invalid behavior. Heisenbugs in the worst case.

IshKebab 1h ago

I think that's less likely than you'd expect because the memory ordering model used by C++ and others essentially requires you to write code that works even without x86's total storage order. If you don't then you can get bugs even on x86, because the compiler will violate the ordering you thought you had in your program, even if the CPU doesn't.

Also most software runs on ARM now and I don't think that has actually happened in practice.

rbanffy 1h ago

> Also most software runs on ARM now and I don't think that has actually happened in practice.

At least in my house, ARM cores outnumber x86 cores by at least four to one. And I'm not even counting the 32-bit ARM cores in embedded devices.

There is a lot of space for memory ordering bugs to manifest in all those devices.

dd_xplore 13h ago

The major issue is these days most software is electron based or a webapp. I miss the days of 98/XP, where you'd find tons of desktop software. A PC actually felt something that had a purpose. Even if you spin up a XP/98(especially 98/2000 VM) now, you'd see the entire OS feels something that you can spend some time on. Nowadays most PCs feel like a random terminal where I open the browser and do some basic work(except for gaming ofcourse). I really hate the UX of win 11 , even 10 isn't much better compared to XP. I really hope we go back to that old era.

rbanffy 58m ago

> Nowadays most PCs feel like a random terminal

It's a fun perception. For the longest time, all the "serious" computers were used through networks and terminals and didn't even come with any ability to connect a monitor or a keyboard (although a serial terminal would work as the system console). I used to joke (usually looking at Unisys Windows-based big servers), if the computer had VGA and PS/2 ports, it wasn't a computer, but a toy. Those Unisys servers weren't toys, but you could run Pinball and Minesweeper directly on them, which kind of said otherwise.

I think we got used to such levels of platform bloat that we don't care if the UI toolkit these days is bigger than the entire operating system that runs 95% of the world's payment transactions.

cmrdporcupine 10h ago

This is actually one reason I feel like developing my systems level stuff on ARM64 instead of x86 (I have a DGX Spark box) is not a bad idea. Building lower level concurrent data structures, etc. it just seems wiser to have to deal with this more immanently.

That said, I've never actually run into one of these issues.

Zardoz84 11h ago

If it is programmed in assembly. This kind of nasty detail should be handled by the compilers.

askl 11h ago

If it's programmed in assembly, it just wont compile for a different architecture.

runeks 13h ago

Wouldn't the compiler take care of producing the correct machine code?

octachron 12h ago

The issue is that the C memory model allows more behaviours than the memory model of x86-64 processors. You can thus write code which is incorrect according to the C language specification but will happen to work on x86-64 processors. Moving to arm64 (with its weaker memory model than x86-64) will then reveal the latent bug in your program.

rbanffy 55m ago

This architecture trick was often used for precisely this - finding bugs in the program that would work in one architecture and fail in another. A very common class of issues like these was about endianness, and PowerPC was very handy because it could boot as both high and low-endian modes (I think I remember different versions of Linux for each mode, but I'm no longer sure).

jasomill 0m ago

Starting with POWER8, the Linux kernel and some of the BSDs support 64-bit PowerPC in both big- and little-endian modes. Older PowerPC chips had more limited support for little-endian, and AFAIK all commercial desktop/server PowerPC OSes (Mac OS, OS/400 / IBM i, AIX, BeOS) are big-endian only.

As you'd expect, Linux distribution support for big- and little-endian varies.

Someone 11h ago

And “happen to work on x86-64 processors” also will depend on the compiler. If you write

  *a = 1;
  *b = 'p';

both the compiler and the CPU can freely pick the order in which those two happen (or even execute them in parallel, or do half of one first, then the other, then the other half of the first, but I think those are hypothetical cases)

x86-64 will never do such a swap, but x86-64 compilers might.

If you write

  *a = 1;
  *b = 2;

, things might be different for the C compiler because a and b can alias. The hardware still is free to change that order, though.

mrweasel 11h ago

OpenBSD famously keeps a lot of esoteric platforms around, because running the same code on multiple architectures reveal a lot of bugs. At least that was one of the arguments previously.

lproven 6h ago

Which is why Windows NT was multiplatform in 1993.

Developed on Intel i860, then MIPS, and only then on x86, alongside Alpha.

spijdar 1h ago

Big endian MIPS, no less! At least initially.

rbanffy 54m ago

I don't think the i860 port lasted very long. IIRC, the performance in context switches was atrocious.

mhh__ 13h ago

The compiler relies on the language and programmer to enforce and follow a memory consistency model

ivolimmen 13h ago

If you go around your OS yes that could be the case but you can already have issues using the application from machine to machine with the same OS having different amounts of RAM and different CPU's. But I am not an expert in these matters.

jordiburgos 13h ago

Only for the hand-written assemply parts of the source code. The rest will be handled by the compilers.

bpye 12h ago

You don't need to be writing assembly. Anything sharing memory between multiple threads could have bugs with ARM's memory model, even if written in C, C++, etc.

silon42 12h ago

Not even close. Except maybe in Rust /s

galangalalgol 11h ago

For rustaceans missing that /s, if you just use Relaxed ordering everywhere and you aren't sure why, but hey tests pass on x86, then yeah on arm it may have a problem. On x86 it effectively is SeqCst even if you specify Relaxed.

pdpi 14h ago

Kind of weird to see an article about high-performance ARM cores without a single reference to Apple or how this hardware compares to M4 or M5 cores.

ezst 13h ago

That would only matter (to me, at least) if those Apple chips were propping up an open platform that suits my needs. As things stand today, procuring an M chip represents a commitment to the Apple software ecosystem, which Apple made abundantly clear doesn't optimize for user needs. Those marginally faster CPU cycles happen on a time scale that anyway can't offset the wasted time fighting MacOS and re-building decades-long muscle memory, so thanks but no thanks.

pdpi 13h ago

Sure. Insofar as Apple Silicon beats these things, "I'll take less powerful hardware if it means I'm not stuck with the Apple ecosystem" is a perfectly reasonable tradeoff to make. Two things, though.

First, I don't like making blind tradeoffs. If what I need (for whatever reason) is a really beefy ARM CPU, I'd like to know what the "Apple-less tax" costs me (if anything!)

Second, the status quo is that Apple Silicon is the undisputed king of ARM CPU performance, so it's the obvious benchmark to compare this thing against. Providing that context is just basic journalistic practice, even if just to say "but it's irrelevant because we can't use the hardware without the software".

rbanffy 43m ago

> Apple Silicon is the undisputed king of ARM CPU performance

The cores, yes, but you can get an AmpereOne with 192 ARM cores (or rent out beefier machines from AWS and Azure). If you need to run macOS, then you are tied to Apple, but if all you want is ARM (for, say, emulated embedded hardware development), you have other options in the ARM ecosystem. I'm actually surprised Ampere maxes out at 192 cores when Intel Xeon 6+ has parts with 288 cores on a single socket (and that can go up to 4 sockets).

I wonder how many cores you'd need to make htop crash.

bluGill 11h ago

Why do you need ARM? There is nothing magic, most CPUs are an internal instruction set with a decoder on top. bad as x86 is, decoding is not the issue. they can make lower power use x86 if they want. They can also make mips or riskv chips that are good.

pdpi 11h ago

There's nothing special about ARM, sure. Hence "for whatever reason". Still, ARM is a known quantity, and the leading alternative to x86 for desktop CPUs. The article is titled "reaching desktop performance".

We know how Apple's hardware performs on native workloads. We know how it performs emulating x86 workloads (and why). Surely "... and this is how this hardware measures up against the other guys trying to achieve the exact same thing" is a relevant comparison? I can't be the only person who reads "reaching desktop performance" and wonders "you mean comparable to the M1, or to the M3 Ultra?"

wamatt 8h ago

>I can't be the only person who reads "reaching desktop performance" and wonders "you mean comparable to the M1, or to the M3 Ultra?"

You're not. IMHO it's a fairly obvious, narrow and uncontroversial observation (and hence why its the top comment). That said, I personally still enjoyed the back and forth as many others one could imagine. There can be value in the counterarguments from multiple other usernames, as this facilitates sharpening reasoning for the conclusion from readers. (even when the original premise stays in tact)

The lack of others agreeing could be the result of many reasons. IMHO, a not insignificant one could be the incentive structure skews heavily towards lurking as HN rightfully disincentives "me too" type replies and not everyone always has something interesting to add

2c not an epistemologist ymmv

GoblinSlayer 8h ago

If you know how your favorite CPUs (and you can have many, even ppc) work in desktop performance units, then you have the numbers to compare. Are you sure you can migrate from Apple?

amluto 6h ago

Sometimes the ISA matters. For example, modern ARM has flexible and lightweight atomics, whereas x86 is almost entirely missing non-totally-ordered RMW operations.

peterfirefly 2h ago

Memory models matter.

jayd16 8h ago

The problem is you can't really compare things apples to apples anyway. You're always comparing different builds and different OSes to get a sense of CPU performance.

__alexs 6h ago

> the status quo is that Apple Silicon is the undisputed king of ARM CPU performance

If your metric is single thread performance yes but on just about anything else Graviton 4 wins.

guerrilla 5h ago

Are M* chips even beating AMD anyway?

jakogut 4h ago

On average according to Geekbench, the M5 compared to the 9950X is ~17% faster in single thread performance and ~30% slower in multithread performance.

Individual benchmarks tell the bigger picture. These two are optimized for different use cases, with Apple heavily leaning towards low latency single thread throughput with low sustained power usage.

https://browser.geekbench.com/v6/cpu/compare/16833358?baseli...

EDIT: The M4 Max compares much more closely https://browser.geekbench.com/v6/cpu/compare/16834801?baseli...

pdpi 2h ago

That M4 Max is in a laptop. The Mac Studio version is a couple percent faster still:

https://browser.geekbench.com/v6/cpu/compare/16839304?baseli...

The M3 Ultra sacrifices a bunch of single-thread performance for not that much of a multithreaded gain:

https://browser.geekbench.com/v6/cpu/compare/16839654?baseli...

guerrilla 4h ago

Alright, thanks. Seems like a tradeoff issue.

amelius 11h ago

Let's say my company makes systems for in-flight entertainment, with content from my company.

I am looking for a CPU.

I don't want to confront my users with "Please enter your Apple ID" or any other unexpected messages that I have no control over.

Is Apple M series an option for me?

buran77 10h ago

This CPU will end up in products that are competing against Apple's in the market. People will look at and choose between two products with X925 or M4/5. It's a very obvious parallel and a big oversight for the article.

For better or worse if you make a (high end) consumer CPU it will be judged against the M-series, just like if you make a high end phone it will be judged against the iPhone.

ghosty141 11h ago

Why should it be?

All he is saying: We currently have products in a similar product category (arm based desktop computers) that are widely used and have known benchmark scores (and general reviews) and it would make sense if I publish a new cpu for the same product category ("Reaching Desktop Performance" implies that) that I'd compare it to the known alternatives.

In the end you can just run Asahi on your macbook, the OS is not that relevant here. A comparison to macbooks running Asahi Linux would be fine.

amelius 10h ago

But why would an article address _their_ specific usecase?

buran77 10h ago

> But why would an article address _their_ specific usecase?

amelius, if anyone had specific requirements, it was you with your "systems for in-flight entertainment".

OP asked a very reasonable question for a very generic comparison to the 800-pound gorilla in the consumer CPU world in general, and ARM CPU world in particular.

If the article can reference AMD's Zen 5 cores and Intel's Lion/Sunny Cove, they could have made at least a brief reference to M-series CPUs. As a reader and potential buyer of any of them, I find it would have been a very useful comparison.

amelius 10h ago

In industry, people want to take computing parts and build products with them.

This is not possible with Apple parts.

That's what my example was about. It was only specific because I wanted to have a concrete example.

buran77 10h ago

> In industry

Talk about specifics, eh? Didn't you just argue against an article addressing "_their_" specific usecase?

In a store people will ask "is this better than an Apple?".

And I'll tell you one more thing, when I was in the industry and taking computing parts to build products with them I did not form an opinion by reading internet reviews. I haven't met anyone who did.

GoblinSlayer 8h ago

Does Apple allow benchmarks on Asahi Linux?

vlovich123 7h ago

Believe it or not Apple has no say about this

rick_dalton 11h ago

The X925 core is used in chips like the gb10 for the nvidia dgx spark. So it is relevant to compare to apple silicon performance imo. The mac studio is pretty much a competitor to it.

flembat 13h ago

When purchasing any ARM based computer a key question for me, is how many of those can I purchase for the cost of a Mac mini, and how many Mac mini can I purchase for the cost of that, and does that have working drivers...

ezst 11h ago

And the answer there may absolutely be "none", which equates to doing away with ARM, which is totally fine. I don't have a horse in the x86 vs ARM race, especially since it's pretty clear that performance per watt stands within a narrow margin across arches on recent nodes.

synergy20 7h ago

totally true. for me it's unless until those apple hardware can run linux first-class, till then it's irrelevant. sad to say this but macos sucks.

truelinux1 6h ago

This echoed my thoughts exactly - Linux only.

tucnak 12h ago

FWIW, Apple Virtualization framework is fantastic, and Rosetta 2 is unmatched on other Arm desktops where QEMU is required. For example, you can get Vivado working on Debian guest, macOS host trivially like that.

ezst 11h ago

https://www.macrumors.com/2025/06/10/apple-to-phase-out-rose...

Orygin 11h ago

They are not phasing it out for virtualization.

p_ing 9h ago

Only reference I can find is:

"Starting with computers using macOS 28, Rosetta functionality will be available only for certain older, unmaintained games that rely on Intel-based frameworks."

https://support.apple.com/en-us/102527

And

"Beyond this timeframe, we will keep a subset of Rosetta functionality aimed at supporting older unmaintained gaming titles, that rely on Intel-based frameworks."

https://developer.apple.com/documentation/apple-silicon/abou...

drzaiusx11 9h ago

Been using Colima to run mixed architecture container stacks in docker compose on my M3 Mac and the machine barely blinks. I get a full day running a dozen containers on a single battery charge.

Colima is backed by qemu, not Rosetta, so if Rosetta disappeared tomorrow I don't think I'd notice. I'm sure it's "better" but when the competition is "good enough" it doesn't really matter.

upcoming-sesame 11h ago

still matters as a benchmark imo

renewiltord 12h ago

Last time I tried, getting Linux working on Apple Silicon actually worked better than on Qualcomm ARM machine (which only support strange Windows).

drzaiusx11 9h ago

Asahi Linux is fantastic these days, but as with most linuxes on laptops the power management / battery life is the worst part. If treating a laptop like a portable desktop is ok for your use case you'd be plenty happy. If you're far away from an outlet for too long however, you'll find it lacking. At least that's my experience. It's possible they eventually figure that out too...

spiderfarmer 13h ago

> represents a commitment to the Apple software ecosystem

I don't see how that's holding you back from using these tools for your work anymore than using a Makita power tool with LXT battery pack.

ezst 11h ago

Pretty simply because I don't want to use MacOS, its terrible window management, quirks and idiosyncrasies. In your comparison, my gripe wouldn't be about the hassle of finding 3rd-party compatible batteries, but about the daily handling of the Makita while knowing the DeWalt to be more ergonomic and better suited to my needs.

BlobberSnobber 10h ago

As someone who uses Linux, macOS and Windows interchangeably, I'm curious to know what you're using.

I learned to live with macOS, but I also like and use Gnome, which many Linux-only people hate. I tried most WMs on Linux, like Hyprland, Sway, i3, but none ever felt worth the config hassle when compared to the sane defaults of Gnome.

lproven 6h ago

> the sane defaults of Gnome.

I have to admit that when I read this, my eyebrows went up so far that my hat moved.

spiderfarmer 8h ago

That’s not what a commitment is though. If I use a Makita because the battery life and resale value is twice that of a DeWalt, I wouldn’t say Makita is asking for a commitment to their ergonomics.

ezst 3h ago

We clearly have different values and priorities, and just to be clear, that's perfectly fine. I haven't considered "battery life" to be a bottleneck for about a decade, which is when all my devices started to be able to last me a whole day of work. Similarly, I only change device when I must, which most often equates to "when they die", so "resale value" doesn't matter to me (and in that regard, Apple takes themselves out of the selection pool due to poor repairability and no upgradeability). My devices are tools, I care that they help me do the task at hand while stepping as little as possible in the way.

atwrk 12h ago

Those are of almost zero use for people wishing to run Linux etc.

Yes, Asahi exists, and props to the developers, but I don't think I'm alone in being unwilling to buy hardware from a manufacturer who obviously is not interested in supporting open operating systems

promiseofbeans 11h ago

I mean… Apple went out of their way to build a GUI OS picker that supports custom names and icons into their boot loader.

So they don’t actively help (or event make it easy by providing clear docs), but they do still do enough to enable really motivated people

amelius 13h ago

Apple does not produce general purpose computing parts.

This is an industry blog, not a consumer oriented blog.

hajile 3h ago

Chips and Cheese covers Apple products in a LOT of their posts.

The real reason is probably because they are supported by patrons and can only get new equipment to review when people donate (either money or sometimes the hardware itself).

If you like what they do (as pretty much the last in-depth hardware reviewers), consider supporting them.

charcircuit 10h ago

M4 and M5 are literally general purpose computing parts. Apple literally owns the most profitable general purpose computing platform with the iPhone.

senko 9h ago

Perhaps this was worded poorly, but the parent is referring to inability to source these processors from Apple and use them in other (non-Apple devices).

As in, they don't sell you the parts, they only sell you the entire product. If you don't want the entire package, the processors alone are irrelevant.

llm_nerd 3h ago

None of us can buy or make an X925 in isolation. We can't get one and stick it in our motherboard. It has literally zero relevance to the desktop space. You can buy a DGX Spark and use it, just like you can buy a Mac Mini.

The tested machine is an nvidia GB10 which nvidia makes and sells as a whole unit and various vendors stick it in different devices to try to differentiate (although in the end they're all basically identical).

And yes, it is extremely weird for it to never mention the Apple chip, which has a little something to do with who they thank for lending them the device. The arbitrary claims for why they ignored the enormous, class-leading ARM processor in the space is not convincing.

I mean, the other claim that this is an "industry blog" and not a "consumer blog" was equally silly. It's basically for curious hobbyists. Zero industry insiders follow this to see about the core in the GB10. It's basically Anandtech.

layer8 8h ago

The iPhone is anything but a general purpose computing platform. Apple actively prevents many purposes.

charcircuit 7h ago

A general purpose platform does not mean that any possible purpose is possible. It means that it is not architected for a specific purpose, but instead is open to multiple.

GoblinSlayer 7h ago

iPhone is an Apple controlled computing platform.

charcircuit 7h ago

That does not stop it from being intended for general purpose use cases as opposed to targeted ones like gaming.

SG- 14h ago

Same, I wish Chips and Cheese would compare some of these cores to Apple Silicon, especially in this case where they're talking about another ARM core.

A few years ago they were writing articles about Apple Silicon.

GeekyBear 5h ago

It does make me miss the deep dives for new core designs from Anandtech.

Running the SPEC benchmark interger and floating piitnt suites takes all day, but it's hard to game a benchmark with that much depth.

It's a shame that nobody has been willing to offer that level of detail.

geerlingguy 7h ago

Chips and Cheese focuses on architecture and chip design, and I think a lot of the tooling is less refined on macOS, so the comparison graphs can't quite get the same depth on Apple's chips. That's just a guess.

But I did some comparisons when I tested the same Dell GB10 hardware late last year: https://www.jeffgeerling.com/blog/2025/dells-version-dgx-spa...

hank808 5h ago

They are talking specifically about ARM cores designed by and licensable from ARM Holdings (the company), not other designs that don't use ARM's designs (like the Apple silicon).

close04 4h ago

They repeatedly compare to Intel and AMD cores though, which are x86. If they’re worth a mention, then so are some of the other ARM consumer desktop chips on the market regardless of who designed them. Apple was one of the closest ARM chips they could have compared to.

Your “specifically ARM cores designed by and licensable from ARM Holdings” argument doesn’t hold any water.

DeathArrow 12h ago

>Kind of weird to see an article about high-performance ARM cores without a single reference to Apple

And Qualcomm.

cubefox 10h ago

Kind of weird that you pick Apple CPU cores when Qualcomm cores would be a far more appropriate comparison.

llm_nerd 11h ago

The core they're talking about was released about two years ago. nvidia stuck it on their grace blackwell (e.g. DGX Spark) as basically a coordinator on the system.

Anyway, here it is in GB10 form-

https://browser.geekbench.com/v6/cpu/14078585

And here is a comparable M5 in a laptop-

https://browser.geekbench.com/macs/macbook-pro-14-inch-2025

M5 has about a 32% per core advantage, though the DGX obviously has a much richer power budget so they tossed in 10 high performance cores and 10 efficiency cores (versus the 4 performance and 6 efficiency in the latter). Given the 10/10 vs 4/6 core layouts I would expect the former to massively trounce the latter on multicore, while it only marginally does.

Samsung used the same X925 core in their Exynos 2500 that they use on a flip phone. Mediatek put it in a couple of their chips as well.

"Reaching desktop" is always such a weird criteria though. It's kind of a meaningless bar.

drzaiusx11 9h ago

Afaict the "desktop" target is meaningless these days. Desktops aren't really a thing anymore in the general sense are they? Only folks I know still hanging on to desktop hardware are gamers and even those I see going by the wayside with external video cards becoming more reliable.

"Daily driver" is probably a better term, but everyone's daily usage patterns will vary. I could do my day job with a VT100 emulator on a phone for example.

ThrowawayR2 5h ago

There's a zillion office workers that have low cost mini PCs from the big OEMs on their desk. After all, all those off-lease mini PCs on eBay that are so beloved by home lab enthusiasts have to come from somewhere.

drzaiusx11 4h ago

For whatever I don't really register those little hockey pucks (mac minis, NUCs, etc) the same way as the desktop tower PCs of old. A me problem for sure, but those mini device things vary _wildly_ in capabilities manufacturer to manufacturer, from full blown intel i9s to little more than headless phones running ChromeOS on an underpowered ARM. Desktops _used_ to be fairly standardized to one CPU arch, same order of magnitude RAM, ran the Windows du jour, etc. Today the landscape isn't so monotonous (and thats a good thing!)

wmf 5h ago

The "desktop" market includes laptops but excludes servers, phones, tablets, etc.

KingOfCoders 11h ago

Perhaps you're not the target audience of the article.

Numerlor 10h ago

Apple doesn't expose the kind of introspection necessary to compare with the data the article is about. Any mention would just be about Apple's chips existing and being better

hrmtst93837 11h ago

You make a valid point; Apple has indeed set a high standard for ARM cores in performance. A comparison with their M4 and M5 cores would provide valuable context for these new developments.

dgacmu 11h ago

Most of your comment history reads like LLM generated trite comments. Are you human?

hrmtst93837 10h ago

Yes, and my optinions are my own.

xarope 13h ago

I can't seem to find any power draw or efficiency figures (e.g. <perf>/watts).

Only found this which talks about performance-per-area (PPA) and performance-per-clock ()I assume cycle) (PPC): https://www.reddit.com/r/hardware/comments/1gvo28c/latest_ar...

wmf 5h ago

We should have N1X vs. X2 vs. M5 laptop battery life reviews in a few months.

phkahler 9h ago

Nor do they say what process it's fabricated with.

voidmain0001 10h ago

Already usurped by Arm C1 Ultra.

https://www.androidauthority.com/arm-c1-cpu-mali-g1-gpu-deep...

adgjlsfhk1 8h ago

The C1 Ultra looks really powerful. 128 kb L1D cache on it's own is a ~10% IPC improvement that should let it pull firmly ahead of the x86 competition which is very stuck at 32kb due to the legacy 4k page size.

joha4270 7h ago

I'm sorry, I'm clearly missing something but why would page size impact L1 cache size?

aseipp 7h ago

When you do a cache lookup, there is a "tag" which you use as an index during lookup. But once you do the lookup, you may need to walk a few entries in the corresponding "bucket" (identified by that tag) to find the matching cache line. The number of entries you walk is the associativity of the cache e.g. 8-way or 12-way associativity means there are 8 or 12 entries in that bucket. The larger the associativity, the larger the cache, but also it worsens latency, as you have to walk through the bucket. These are the two points you can trade off: do you want more total buckets, or do you want each bucket to have more entries?

To do this lookup in the first place, you pull a number of bits from the virtual/physical address you're looking up, which tells you what bucket to start at. The minimum page size determines how many bits you can use from these addresses to refer to unique buckets. If you don't have a lot of bits, then you can't count very high (6 bits = 2^6 = 64 buckets) -- so to increase the size of the cache, you need to instead increase the associativity, which makes latency worse. For L1 cache, you basically never want to make latency worse, so you are practically capped here.

Platforms like Apple Silicon instead set the minimum page size to 16k, so you get more bits to count buckets (8 bits = 256 buckets). Thus you can increase the size of the cache while keeping associativity low; L1 cache on Apple Silicon is something crazy like 192kb, and L2 (for the same reasons) is +16MB. x86 machines and software, for legacy reasons, are very much tied to 4k page size, which puts something of a practical limit on the size of their downstream caches.

Look up "Virtually Indexed, Physically Tagged" (VIPT) caches for more info if you want it.

adgjlsfhk1 7h ago

This the the most cursed part of modern cpu design, but the TLDR is that programs use virtual addresses while CPUs use physical addresses which means that CPU caches need to include the translation from virtual to physical adress. The problem is that for L1 cache, the latency requirement of 3-4 cycles is too strict to first do a TLB lookup and then an L1 cache lookup, so the L1 can only be keyed on the bits of ram which are identical between physical and virtual addresses. With a 4k page size, you only have 6 bits between the size of your cache line (64 bytes) and the size of your page, which means that at an 8 way associative L1D, you only get 64 buckets*64 bytes/bucket=32 kbits of L1 cache. If you want to increase that while keeping the 4k page size, you need to up the associativity, but that has massive power draw and area costs, which is why on x86, L1D on x86 hasn't increased since core 2 duo in 2006.

joha4270 5h ago

Can you not take some of those virtual bits and get more buckets that way? I am sure it will make things more complicated if nothing else by them possibly being mapped to the same physical page, but it doesn't sound like an impossible barrier. Maybe something terrible where a cache line keeps bouncing between different buckets in the rare case that does happen, but as long as you can keep the common case as fast...

Otoh L1 sizes hasn't increased since my first processor, those CPU designers probably know more than I do.

dmitrygr 4h ago

that will break if any page is mapped at two VAs, you'll end up with conflicting cache lines for the same page...

joha4270 1h ago

The L2 already keeps track of what lines are somewhere in L1's for managing coherency.

Divide the cache into "meta-caches" indexed by the virtual bits and treat them as separate from the L2's point of view. Duplicate the data and if somebody writes back invalidate all the other copies. The hardware already exists for doing this on any multicore system. Sure, you will end up duplicating data sometimes and it will actually be slower if you're actually writing to aliased locations. But is this happening often enough to be a problem compared to generally having a bigger cache?

It sounds to me like an engineering tradeoff that might or might not make sense, not a hard limit which at least was what I think was being asserted. But as I also said, L1 sizes hasn't increased in a while and smart people are working on it, so there is probably something I don't know.

dmitrygr 59m ago

this "divide" thing will add latency which you really do not want to add to L1 hits

rslashuser 7h ago

Nice HN explanation! One hopes we will not be living with 4kb pages forever, and perhaps L1 performance will be one more reason.

tliltocatl 2h ago

I'd really hope we do live with 4kb pages forever. Variable page size would make many remapping optimizations (i. e. continuous ring buffers) much harder to do, so we would need more abstraction layers, and more abstraction layers will eat away all the performance gains while also making everything more fragile and harder to understand. Hardware people really love those "performance hacks" that make live a more painful for the upper layers in exchange for a few 0.1%s of speed. You could also probably gain some speed by dropping byte access and saying the minimal addressable unit is now 32 bits. Please don't. If you need larger L1 cache - just increase associativity.

adgjlsfhk1 21m ago

The extra L1 cache from a 64k page is on it's own a ~5-10% perf improvement (and it decreases power use by reducing the number of times you go out to L2.

spijdar 2h ago

Funny, most of what you described sums up the Alpha architecture. 8KB pages + huge pages and, initially, only word-addressable memory, no byte access.

(Of course, it only took a few years for this to be rectified with the byte-word extension, which became required by ~all "real software" that supported Alpha)

It's also one of the only architectures Windows NT supported that didn't have 4KB pages, along with Itanium. I've wondered how (or if?) it handled programs that expect 4KB pages, especially in the x86 translation subsystem.

throwaway85825 8h ago

Why would I care about desktop performance without the PC desktop ecosystem where everything 'just works'? Universal ARM linux distros aren't supported by anything.

guerrilla 5h ago

Why would you not be able to build a PC around it? That's what you do with PowerPC.

Supersaiyan_IV 11h ago

Another good read is about ARM's SVE2 extensions: https://gist.github.com/zingaburga/805669eb891c820bd220418ee...

It has some interesting conclusions, such as that it covers certain AVX512 gaps:

"AVX512 plugs many of the holes that SSE had, whilst SVE2 adds more complex operations (such as histogramming and bit permutation), and even introduces new ‘gaps’ (such as 32/64-bit element only COMPACT, no general vector byte left-shift, non-universal predication etc)."

And also that rusty x86 developers might face skill issues:

"Depending on your application, writing code for SVE2 can bring about new challenges. In particular, tailoring fixed-width problems and swizzling data around vectors may become much more difficult when the length is unknown."

rayiner 8h ago

ARM designs are effectively paper launches. You get these press releases saying the new ARM matches Apple and AMD, but its years before you can buy a product with it. Google Pixels that came out in the fall are still on the X4, which was introduced in 2023. At this rate, Pixel 11 will launch with X925, which is an Apple A17/M3 tier core, when Apple is on the A20: https://wccftech.com/apple-a20-and-a20-pro-all-technological.... Outsourcing the core design creates a major lag in product availability.

ac29 7h ago

> ARM designs are effectively paper launches. You get these press releases saying the new ARM matches Apple and AMD, but its years before you can buy a product with it.

This is an article testing shipping hardware you can buy today.

Symmetry 6h ago

Yeah, the paper launch OP is talking about happened way back in May 2024.

rbanffy 21m ago

> ARM designs are effectively paper launches.

Won't ARM have validation silicon available to their licensees?

aseipp 6h ago

I feel like that was much more true in the past but the X925 was only spec'd 18 months ago(?) and you can buy it today (I'm using one since October). Intel and AMD also give lots of advance notice on new designs well ahead of anything you can buy. ARM is also moving towards providing completely integrated solutions, so customers like Samsung don't have to take only CPU core and fill in the blanks themselves. They'll probably only get better at shipping complete solutions faster.

Honestly, Apple is the strange one because they never discuss CPUs until they are available to buy in a product; they don't need to bother.

my123 48m ago

Google outright has worst in class SoCs on both CPU and GPU unfortunately.

If you want something more perf competitive, pick Dimensity, Exynos, or Snapdragon.

hajile 3h ago

This core was released in the MediaTek 9400 in October 2024 some 16 months ago.

The successor of x925 is C1 Ultra and even that was released 6 months ago in September 2025 with the MediaTek 9500 and GeekerWan even has a phone review they did with that chip last year.

exabrial 4h ago

Hoping someday we can get ARM System76 laptops that meet Apple M* chip performance.

jadbox 4h ago

For most, it doesn't need to 'meet' Apple's performance. It just needs to be competitive to general hardware of around the -the same price point- category. This is the same problematic statement I hear that a ~$1500 PC laptop just isn't as good as a ~$3000 macbook.

megous 4h ago

BTW, does anyone have some pointers to where one can find an oldish in-order Cortex-A core (like A53) in verilog RTL form? I know ARM must give this out to companies that implement ARM based SoCs for eg. purpose of validation on FPGA.

So far I've only found various M cores online. It would be fun to have something to experiment with on a cheapish FPGA like Kintex XC7-K480T, that may have enough resources for some in-order A core, and can be had for $50 or so.

wmf 55m ago

Arm lawyers have the RTL locked down tight. If you find it, it means you are already dead.

adgjlsfhk1 4h ago

You're going to have a much better time finding RiscV cores.

megous 3h ago

Yeah, I don't need help with that one. :)

sylware 12h ago

But with hardware IP locks like x86_64.

Better favor as much as possible RISC-V implementations.

But, I don't know if there are already good modern-desktop-grade RISC-V implementations (in the US, Sifive is moving fast as far as I know)... and the hard part: accessing the latest and greatest silicon process of TMSC, aka ~5GHz.

Those markets are completely saturated, namely at best, it will be very slow unless something big does happen: for instance AMD adapts its best micro-architecture to RISC-V (ISA decoding mostly), etc.

And if valve start to distribute a client with a strong RISC-V game compilation framework...

dmitrygr 4h ago

> Sifive is moving fast as far as I know)

worked with their cores in $pastJob. I'd say their main products are flowery promises and long errata sheets.

sylware 1h ago

Which models? Which nasty issues did you encounter?

DeathArrow 12h ago

This is kind of a solution in search for a problem. RISC-V will grow only if people find some value in it. If it solves their actual problems in ways that other architectures can't.

hylaride 10h ago

Yeah, the primary reason RISC-V exists is political (the desire to have an "open source" CPU architecture). As noble as that may be, it's not enough to get people or companies to use (or even manufacture!) it. It'll either be economical (costs) and/or performance (including efficiency) that drives people.

It took ARM decades to get to where it is, and that involved a long stint in low-margin niche applications like embedded or appliances where x86 was poorly suited due to head and power consumption.

Symmetry 6h ago

That might be true for the desktop, but RISC-V is wonderful from a pedagogical and research standpoint for academic uses and in the embedded world its license and "only pay for what you need" is also quite nice.

cmrdporcupine 10h ago

I don't think that's the primary reason there's momentum there. The reason is to avoid ARM licensing fees and IP usage restrictions.

I think you'll see ever more accelerating RISC-V adoption in China if the United States continues on its "cold war" style mentality about relations with them.

That said we're a long long way from Actually Existing RISC-V being at performance parity with ARM64, let alone x86.

sylware 9h ago

Yep, licensing fee and IP usage restrictions is a massive decision point on some silicon markets.

The other massive point: RISC-V integrates a lot of CPU "we know now" in a very elegant "sweet spot".

And it is not china only, the best implementations are US, and RISC-V is a US/berkley initiative re-centered in switzerland for "neutrality" reasons.

If good large RISC-V implementations do reach TMSC silicon process (5GHz), some markets won't even look at arm or x86 anymore.

And there is the ultimate "standard ISA" point: assembly written code then become very appropriate, hence strong de-coupling from all those, very few, backdoor injecting compilers.

On many of my personal projects, I don't bother anymore: I write RISC-V assembly which I run with a small x86_64 interpreter, that with a very simple pre-processor and assembler, aka SDK toolchain complexity close to 0 compared to the other abominations.

And I think the main drawback is: big mistakes will be made, and you must account for them.

GoblinSlayer 7h ago

Standard ISA being rv64gc? Isn't MIPS 2 easier to emulate? It has less funky encoding.

sylware 1h ago

There are tons of RISC-V SOCs and mini-boards. Ez and inexpensive native port...

ddtaylor 14h ago

Can't zoom any of the content on mobile so most of the charts are unreadable.

sfdlkj3jk342a 12h ago

Zoom works fine with Firefox on Android.

GaggiX 13h ago

Browsers usually have an accessibility option to force the ability to zoom on all websites.

ddtaylor 1h ago

This website has those features disabled in Chrome or Brave. Apparently the Zoom option will only appear for "sites that support this feature". This is because they set this header in the meta tags:

    user-scalable=0