How does it make you feel that entire AI industry is dominated by a single company and their proprietary API? Should we be worried?
How does it make you feel that entire AI industry is dominated by a single company and their proprietary API? Should we be worried?
About the same as knowing that the GPU industry in general is dominated by a single company; awful. We really need more competition in the space.
>CUDA
A challenger approaches.
https://www.intel.com/content/www/us/en/developer/tools/oneapi/overview.html#gs.gtbm57
https://github.com/oneapi-src
doa
can it run SD
you can write SD in any API you want, it's not like CUDA is doing something magical that others could never reproduce. but it does have a massive headstart with the amount of existing library code.
I thought Arc was gonna sell like shit and go for crazy discounts, meaning O could get one.
What world are we living in
Most people here hypocritically don't care because they're addicted to porn. The indifference towards this is proof of how crippling the addiction really is.
Just look at this board. People here welcome the monopoly with an open mouth.
not that worried, the industry knows better than to get drawn in by nvidia marketing
AMD dropped the ball so fuckin hard with ROCm, their AI researchers use Nvidia cards.
Fastest AI supercomputer in the world doesn't use this shit LOL
what does it use?
ROCm
>Nvidia announces Eos, "world's fastest AI supercomputer". Includes 4,608 H100 GPUs
>March 2022
And if you go back a few months back, Zucc's AI Research SuperCluster uses Nvidia too.
Why do AMDrones feel the need to lie like garden gnomes when it comes to GPUs?
Once aurora and el capitan are up and running next year the top 5 fastest super computer will not be using any nvidia hardware.
cope
ROCm is cross platform and open source, it will be the biggest growth sector in ai research for the next 5 years at least, start now to get ahead anon. AMD 7000series will accelerate ROCm and HIP adoption. plenty of resources available.
ROCm: AMD's platform for GPU computing
State of ROCm 5.3 in 2022: 6x Mi210, 1petaflop, in the 2u Supermi...
Hands-On GPU Computing with Python | 7. Working with ROCm and PyO...
GPU Programming Concepts (Part 1)
Introduction to AMD GPU Hardware
But Mummy I don't want to use CUDA - Open source GPU compute...
AMD Instinct(tm) Accelerators and the ROCm(tm) Platform...
Introduction to AMD GPU programming with HIP Webinar - June 7, 20...
https://www.youtube.com/watch?v=3ZXbRJVvgJs
ROCm had been so fucking stagnant for years now even though it's open source
That's one thing I'm extremely baffled about
the way nvidia is going with high power cards is going to make more researchers jump ship. imagine being responsible for burning your ai lab down because you wanted to leave the thing running overnight. rocm is increasingly being adopted. they release new versions every couple of months. it could be way better and have more devs but times have been changing steadily.
unless Nvidia adopts ROCm with performance on par with CUDA, ROCm is going nowhere
sure thing bud, the AI market is very young, of course rocm/hip is going to keep growing. If you can't see that you clearly aren't any good at ai or i. did you notice rocm is cross platform, i'll guess you didn't
tell me how well is ROCm working on your Nvidia card
you're a shill
AI researchers don't run RTX 4090 gaming cards, retard.
AMD Instinct MI250X have a TBP of 560W.
Nvidia's H100 PCIe has a TBP of <350W, the SXM version up to 700W if required but can be configured far lower.
Besides, your equipment is supposed to be cooled when you work in a lab.
ROCm is also turbo shit and only works with awfully specific esoteric combinations of both hardware and software
Two of the top 5 fastest supercomputers (1st and 3rd place) use AMD Instinct cards
Nvidia CUDA monopoly is shriveling up pretty quickly
THIS. Academia is rapidly transferring to ROCm / HIP. the advantages of open source and typically less expensive architecture are numerous in an academic setting.
ayy how fast can they generate coomer images in stable diffusion tho?
6800 xt can do 8.5it/s-10it/s windows installs might run a bit slower maybe 6-7it/s.
you want a 6000 series or greater because they put more cores into those.
It is worth sorting all this out before the 7000 series goes on sale, compared to nvidia 4000 series AMD will run more efficiently and aren't prone to set on fire like the nvidia melting wire hazzards.
none of the sd code has been optimized for AMD either so it could easily improve.
>8.5it/s-10it/s
Doubt. My 6600 does ~2.4 it/s on Linux, and the TFLOPs gains of the 6800XT over it should be around 130%, so 5.8 it/s would be already pushing it.
what distro are you using, i've seen a few with greater than 6it/s no problems.
Ubuntu 18.04.5 kernel 5.15 with ROCm 5.3 dkms and the Docker container for Pytorch/ROCm:latest with torch for rocm5.2 package installed and running Auto's webui. As to why not Arch, I just needed to check if it ran at all so I didn't have the time, getting it to here was already enough headache.
use debian testing
it's up to date like arch but minus the pacman -SyulgbtQwerty autism
What's the state of ROCm/HIP on Radeon 6000 series?
5.3 is the latest rocm release earlier this october
Currently working on Linux just fine, stable diffusion runs pretty well on the 6000 series.
The amd shills at full force.
Nice bullshit chart, see where google, openai or facebook train their llm models and whatnot. Protip:not on that list, guess why?
Would be nice if there was competition but amd hasnt recognized AI as important and it has shown in their almost non existant investment in it.
Who cares about rocm and stuff when their architecture didnt innovate in the ai space like tensor cores (or google tpus even) try.
Just like with rtx, amd is not in the gpu innovation space, they are in the cheap knockoff version.
How do you go from
>Fastest AI supercomputer in the world doesn't use this shit LOL
To
>Two of the top 5 fastest supercomputers (1st and 3rd place) use AMD Instinct cards
The former being wrong and the latter not being about AI research.
These aren't AI supercomputers, and if you don't understand the difference I wonder why you're even on BOT arguing on that subject.
Stfu low IQ subhuman.
https://www.amd.com/en/products/server-accelerators/instinct-mi250
ROCm/HIP syntax was designed to be very similar to CUDA and you can port CUDA code into HIP.
there really is no downside to learning it as the skills are transferable to CUDA programming.
Porting CUDA to HIP
it just comes down to simple string replacement like sed replacement of equivalent commands, tools are available to automate the process.
If this is true. When are Blender performances for AMD going to be on par with Nvidia. Really want to go into 3D. Why is the performances so far apart?
have you read this
https://www.phoronix.com/review/blender-33-nvidia-amd
puts a 6800 around 3070ti performance for blender
Finally some good fucking food. Hope they keep it up with HIP/ROCm. I wonder how RDNA3 will be this coming Thursday
Don't care, still buying AMD.
>Should we be worried?
A lot, yes. I had to jump through too many loops to get SD working on a RDNA2 GPU.
>ROCm not available on Windows or WSL
>Linux support is there but abyssmal
>the recommended way is to run it through a Docker container
>doesn't support consumer cards
>ended support for cards like the Mi25 after only 4 years
>if you get a datacenter card you can't use the features like MxGPU because they keep the drivers and documentation behind a paywall
So much for openness, Nvidia doesn't have to do anything, they're their own greatest enemy, and maybe yours too if you don't have deep pockets and the time to deal with the bullshit.
yes,. we need lots of different brands incompatible competing tech.
We should all buy the complete set of different brand GPU's so we can have the best experience in every game, rather than being gimped if we choose the wrong one.
see
also amd alows for cuda code
https://www.phoronix.com/news/LUMI-Preparing-For-AMD-HPC
NVIDIA CUDA GPU code to run on Radeon GPUs as well as whether writing new GPU-focused code with OpenMP device offload is worthwhile.
https://stackoverflow.com/questions/55262250/use-cuda-without-cuda-enabled-gpu-rocm-or-opencl
https://www.lumi-supercomputer.eu/preparing-codes-for-lumi-converting-cuda-applications-to-hip/
Wait, OpenMP supports ROCm?
Fuck, now I want to start writing C++ again just to try this out
Any resources you'd recommend besides OMP documentation?
>OpenMP documentation
maybe this?
https://docs.amd.com/bundle/OpenMP-Support-Guide-v5.3/page/Features.html
Link 404s
Yes, LLVM has most of the support for OpenMP target offloading. All the vendor compilers are based off of that one. If you have ROCm installed you should be able to use it with LLVM. There's various talks I'm aware of, mostly for the science labs that use this stuff. Random Google search returns https://www.openmp.org/wp-content/uploads/2021-10-20-Webinar-OpenMP-Offload-Programming-Introduction.pdf if you care. If you manage to get a working clang install I'd test it with some OpenMP code to see if it actually executed on your GPU.
#include <assert.h>
#include <omp.h>
int main() {
int isDevice;
#pragma omp target map(from : isDevice)
{ isDevice = omp_is_initial_device(); }
assert(isDevice == 0 && "Did not offload to the device");
}
Tried it, but I can't seem to get the --march param since rocminfo just segfaults, so I can't get it to work
If rocminfo segfaults there's probably something wrong with the configuration or installation, which isn't surprising since getting ROCm set up is a real ordeal. I eventually got mine to work on arch using `paru -S rocm-hip-sdk rocm-opencl-sdk` and just waiting forever for it to compile since I have a beefy machine (please set your makeflags environment variable to use parallel or else this will take like 5 years). After that I just configured my own build of LLVM.
Does your ROCm install work for other applications? You could try a basic HIP example. Also, if ROCm isn't working you don't actually need the full installation to use OpenMP offloading. There was a build script I stumbled across that built the minimal dependencies itself but can't seem to find again.
>Does your ROCm install work for other applications?
I haven't tried using ROCm before, last time I did GPGPU I was writing CUDA stuff at work
Would the SD threads have good info on this? I know they've obsessed a lot over getting it to work on AMD
Yeah it's similar, FWIW I have a 6950XT that I got working for SD / HIP / OpenMP. The only platform I've seen ROCm get used on reliably is Arch using the AUR like I mentioned or Ubuntu using vendor binaries.
I also managed to dig up that script so you could try building it all from source https://pastebin.com/SFfCh5UC, it'll probably take about two hours depending on your hardware. This will only work for OpenMP offloading compared to a real installation of ROCm. If the script works you should be able to use the built clang like `clang -fopenmp --offload-arch=gfx<whatever>` if you know the arch or `clang -fopenmp -fopenmp-targets=amdgcn` to let it try to figure it out.
>The only platform I've seen ROCm get used on reliably is Arch using the AUR like I mentioned or Ubuntu using vendor binaries.
I'm on Ubuntu 22.04, wish I had the luck of everyone else
>I also managed to dig up that script so you could try building it all from source
Will try that, thanks for the link
it should work on 22, many who had problems with 22 resolved any problems using 20. lots of info in previous amd rocm threads. search tbh archives. not sure if the hsa command might be necessary for you or your purposes. there needs to be a step by step guide we need to build up. if you document your efforts, I will keep a track of whatever is relevant and do a future thread. we need to solve this.
Retrying the script with threads capped at 8
Uncapped brought my 5700x with 64 GB RAM to its knees lmao
It's been awhile since I've seen an OOM shootout
Yeah, building LLVM can take a lot of memory. A lot of times it's the linking step. You may need to pass `LLVM_PARALLEL_LINK_JOBS=${N}` as well if that's an issue even after limiting the build threads. Also looking at the script I'd set `-DBUILD_SHARED_LIBS=OFF` personally. Hopefully it works after that's resolved.
It compiled without needing those args
Anyway, I tried --offload-arch=gfx1031 (for my 6750xt) and the test program from
compiled but the assert fails
Maybe I should try installing arch to see if that works any better
it's possible it's failing if you have an existing (non-functional) ROCm installation. Since it won't know which libraries to find. Try the `amdgpu-arch` or `llvm-omp-device-info` in the install directory, should indicate if it can even detect your GPU with the libraries.
For my ROCm build on Arch it was mostly painless using `paru -S rocm-hip-sdk rocm-opencl-sdk` after setting parallel make flags globally. The installation of pytorch for getting SD working was a huge pain however. First needed to set a random environment variable to build for your GPU, then I had to manually patch a few problems in the source until it finally built. There's the arch4edu repository that has some of these prebuilt so you don't need to take the time, but I couldn't get that to work so I just built from source.
$ amdgpu-arch
$ llvm-omp-device-info --help
Device (0):
This is a generic-elf-64bit device
Device (1):
This is a generic-elf-64bit device
Device (2):
This is a generic-elf-64bit device
Device (3):
This is a generic-elf-64bit device
I did try to install the ROCm packages via apt, I'll try removing those
>--help
same result without --help, not sure why I typed that flag
Yeah if you're not getting anything there then something's definitely wrong. I'd try removing the old ROCm install and making sure the new install is in your library search path. Also probably won't change anything, but one time I thought my ROCm installation broke because I couldn't get any information from the GPU. Turns out I had like 5000 zombie processes on it so the HSA queue always returned an error.
Checked for zombies, none present
Removed the ROCm packages and make sure no rocm modules were still loaded
Also checked the library search path
I'll probably try installing Arch on another machine and tossing my 5600XT in there to see if it works
Thanks for the help
Can't believe AMD still hasn't figured this out, CUDA was a breeze compared to this
It was a pain for me, but not this much of a pain. Shame you couldn't get it working easily. If you do manage to get ROCm working it should have everything you need for HIP. For OpenMP you'll need to build LLVM again, but without the redundant libraries that the script pulls in otherwise they'd clash. For Stable Diffusion you need to install the rocm build of pytorch which gave me a lot of pain. Good luck getting it to work anon, more people working on GPGPU is good.
Also last thing I can think of is if you have two GPUs installed with different architectures it'll die, you'd need to manually tell it to ignore one or the other.
Happy because I fucking hate AMD poorgays. They are like leftists, acting as if they have some kind of moral high ground and annoying everyone with their childish good and evil shit
AYYMD hardware was historically and still is today not as well tested and verified as Intel or NVIDIA. That's why their chipsets have so many bugs such as USB issues, and requires constant bios updates to work properly.
Never had a bios update on my B450 mobo and it works perfectly.
I just love these anecdotes of how amd doesn't work from nvidia/intel users.
>anecdotes
https://www.ghacks.net/2022/10/31/amd-is-investigating-ryzen-7000-performance-issues/
AMD is fucking shit.
I hope rocm dies in a fire. I dont particularly like nvidia but the last thing we need is another competing standard, reproducability in ml research is fucked enough as it is.
ROCm doesn't really try to compete with CUDA, it's more like an open source copy for AMD hardware. It brings nothing new.
Field is immature.
"AI" is a meme so I don't care.
that's not how neural nets operate
Where does it say "neural network" on the pic?
AI isn't limited to neural networks. The pic describes expert systems, which count as AI softwares.
t. doesn't have a clue how computation is implemented
It's such a shame OpenCL isn't more popular. It works everywhere, in contrast to CUDA or ROCm.
There were problems with it, so support was removed from most orograms. Not sure if problems were from AMD/Nvidia side (implementation), or OpenCL itself.
Maybe AMD should pick up the fucking slack then
They can't. It's proprietary garbage.
The dojo will be king soon
pytorch and other libraries have multiple backends now, no need to program the gpu directly for AI.
Humanity loves a monopoly.
AMD Rocm is absolute garbage because it uses LLVM. LLVM is made for CPU code and it is decently good at generating code for CPUs.
This is another story for GPU code where it just wasn't made for that. The reason Nvidia dominates so much in AI is mainly because their compiler sucks less than the AMD one.
Explain why NVidia made an CUDA LLVM compiler and their own IR based on LLVM IR. Retard.
Honestly I know almost nothing about nvidia except that their compiler is better.
I do know that the reason AMD sucks so much at compute is cause of LLVM and their LLVM based compiler that generates crap unoptimal code.
That's pretty much exactly it. LLVM is nice because using it makes your thing compatible with every programming language that it supports. I guess they decided this was much more important than performance.
There's a lot of things present in normal languages that absolutely kill performance on GPUs. Simple ones being function calls and branches.
Those aren't really limitations of LLVM-IR except in the sense that vector semantics need to be reconstructed from the source for some operations. Some of this is why a lot of LLVM-based GPU projects are leveraging MLIR, since it allows the code to maintain more high level semantics closer to the domain language. MLIR at the end of the day is just a successive transformation language with peephole optimizations though. Inlining is hardly an issue, the AMDGPU backend it already heavily weighted to inline functions. But inlining everything isn't really helpful all the time if it leads to excessive register spills.
>There's a lot of things present in normal languages that absolutely kill performance on GPUs. Simple ones being function calls and branches
The gist of optimizing branches is the same on CPU and GPU. In relative terms, unpredictable/randumb branches kill perf on both. So you either make sure they're predictable or get rid of them.
Function calls are similarly relatively expensive on CPU. They're mostly bloat- if you benefit from function calls, you're likely in the local minimum of a code/algorithmic complexity issue. I don't think any compiler is particularly good at deciding when to inline, even after you jump through hoops to make it possible.
>But inlining everything isn't really helpful all the time if it leads to excessive register spills
True. Just don't write kernels that are too big. Once again, same on CPU! I get your point that you can't just shoehorn high-level CPU code onto GPU, but typical CPU software (in both language and technique) is ultra suboptimal in targeting modern CPUs too.
Nvidia's compiler is based on LLVM as well, it's just much more divergent. Their entire device library is written in Nvidia's version of LLVM IR. Nvidia's other compiler is based off of the old PGI compiler which is a massive piece of shit. The problems with using LLVM IR as a target is because GPU code is more like a vector machine. LLVM IR more or less expects code and threads to be independent which isn't true for most GPUs. I think it was only recently that Nvidia released a new architecture that guarantees threads can make progress. This is part of why writing GPU code is so hard. Good luck implementing a mutex if you can't guarantee that thread will ever give it back.
Smoothbrains not wanting property abolished because of personal property is sort of understandable, but intellectual property is an entirely different level of fucked
wrong board, commie scum
Ownership is a natural instinct that is observable in most animals that exist. Lions for example are territorial and maintain ownership over large swaths of land, and they hide kills for themselves and their prides, lest they get stolen by other animals.
Communist dreams only work with deadly force backing them. You come for me and I will fight back.
can someone tell me how to go about optimizing the stable diffusion code for AMD rocm, just outline some of the steps. any help understanding the process is amazing.
Not sure what you mean by optimizing. HIP is a carbon copy of CUDA so the only differences that matter will probably be on the library or compiler level. Not sure what more you could add besides tweaking a few compiler flags.