-
This isn't a tech support question as my system works fine at the moment it's rather a (Why?)
So I have the MSI air boost Vega 64 and from the get go it crashed without fail using any gaming drivers released more than a few months after it's release.
Sometimes it would crash within a few minutes sometimes it would take hours, but it would always crash when running a 3D application of some kind. After weeks of trial and error I finally determined amds newer game ready drivers for windows just didn't work with my GPU and I have no idea why. Linux drivers work fine, old drivers work fine, enterprise drivers work fine, hell even running the basic windows install graphics drivers works fine and with the most recent pro drivers released a few days ago those also work fine.
My question is why? What is it about the gaming drivers over the past few years that just doesn't work with my card? I can overclock it and tune it without issue using the pro drivers/ MSI after burner etc. I even tried tuning the clocks, voltages, and setting to be an exact 1 for 1 in both the most recent pro drivers and gaming drivers and it still crashes.
What is the difference between the gaming drivers and the pro drivers that makes this happen?
-
Had a similar issue with 2 different vega 64 cards. Apparently the way it downlocks the HBM is broken in later drivers. The way I fixed it for me is to lock the HBM at max frequency (State 3) in Radeon Settings (Left click the graph and select "set as min"). This prevents the HBM from downclocking and that made it stop crashing for me.
ID: hlslfme
I had this same problem. I stream and exclusively use AMD Encoder in OBS. Noticed ZERO crashes. Then I get black screen on desktop when doing nothing intensive.
Setting MinState1 fixes this for me. Unfortunately after every driver upgrade I have to do this. Great post.
Does locking HBM at max frequency have any adverse effects, eg longevity? Heat?
It idles at around 1-2 degrees centigrade more than before, with power usage staying around the same. I'm not sure about longevity but I'm pretty sure it won't have a significant impact, as the voltage stays the same, the temperature is similar, and HBM is also very power efficient.
You can also disable ULPS, which fixed it for my Vega 56.
Other points to stop crashing are to make sure you're running the CPU/Ram at completely safe clockspeeds, because slightly unstable doesn't work with Vega, same with overclocking the card itself.
I found that not only did I have to dial back my 3466 Ram to 3400, I had to slightly reduce my Vega 56 overclock and raise my previously stable undervolt a small amount. Combined with disabling ULPS, everything is stable again.
As for ULPS, I believe AMD screwed up low power states, and doesn't have any failsafe for when low power modes don't work right. So your card works when it is brand new, but after electron mitigation, ULPS completely breaks and you get problems. As for low power state HBM, that likely has to do with the lower voltages at idle, so disabling the lowest power state can also help.
Is your Vega 64 daisy chained to psu off one rail or do you have two separate rails plugging into your gpu
Nope. I did hear that was a problem back when this first happened, but that's not my issue. Like I said it's something specific to the gaming drivers that causes the crashes I am trying to understand what that thing is.
What PSU is it?
I've had my Vega 64 Nitro from Sapphire for a year now and i've used all the drivers released up until today and i've never had a single crash in any game.
never used any of the drivers from the lauch window or any pro version at all.
this kind of driver issues always sound like an underlying hardware problem to me but idk.
Also since another person pointed out there’s different kind of issues described as black screen… I had issues with the screen going black for a couple of seconds at random times after an upgrade to a 240Hz monitor. Completely solved those with an higher quality hdmi cable.
I too had a few black screen issues that were completely fixed by replacing the displayport cable - it felt like it was getting slowly worse, with sometimes the screen flickering or staying black more often. It also only seemed to be when I was playing games, or something that used the GPU heavily.
I ended up tracking it down by noticing that I could reproduce the same desync and flicker when I turned on my headphone amp (which has a pretty meaty 'clunk' switch) - I guess that spat out enough electrical interference to disrupt the signal in the cable too? It was right beneath my monitor, so the cables ran near it which I doubt helped.
Replaced the cable with a new one and not seen issues since (though I replaced my vega64 a year or so ago so not been using it).
I went mental trying to fix my MSI vega 56 black screen issues. Like so many others, I changed PSU, under volt, more volt, power states etc, tried so many drivers etc. They got more and more frequent over time and it just failed to POST one day after a blackscreen. In the end, I think there is deep problems with these cards. Sell it and upgrade if you can. Not worth the frustration.
It isn't I get that 100%, but I am trying to understand why? Like why does the gaming driver crash when the pro driver doesn't ? What is actually making the difference?
What black screen we talking about? black screen monitor no signal and then it comes back or we talking about black screen system crashes and becomes unresponisive and fans ramp after a while? If the first then DP/hdmi cable issue, if the latter PSU or RAM issues. My old V56 had those crashes for a while but after a bit i started getting artifacting in games on top of that and then i just RMA-d it, going theory on that was the HBM that was micron was biting dust and dying, luckly for me got a full refund. Why would it work on older drivers and enterprice drivers, well AMD drivers are the best at checking if your RAM is stable and if there is anykind of unstability will prob show, so i would run a heaven+memtest(at the same time) loop to check if the heat from that vega is unstabilizing you memory to rule the RAM out and if that's fine either PSU or the card is dying, can prob test for that by underclocking the HBM and seeing if it continues to crash on you.
It's your system ram, it's unstable. Downclock slightly. Vega and rdna1 were very sensitive to ram oc. I could get my v64 to black screen by using too aggressive timings and clocks on my ram. Fixed that never crashed again.
I have the exact same GPU and the most stable driver I found was 20.4.2 to play current games, the older drivers may have higher fps in older games, the later driver versions bug a lot with this card. Believe me, I have tried many driver versions. If 20.4.2 doesn't work properly anymore in Windows 10, it's probably the PSU, that was my case recently, changed PSU and it's back to near perfect. Remember that our MSI Airboost Vega 64 can pull around 320W, mine does.
I am actually running the new pro 21.Q3.1 released on 11/19/21 and that is stable for me. All pro/enterprise drivers work for me though.
It's just the gaming line of drivers that causes the crashing. I am well aware of our cards power draw and have a 850 watt PSU for it. I did consider that the gaming drivers pull too much power underload, but I haven't been able to find any evidence of that. The gaming drivers still crash when I disable higher power states and voltages while giving the card as much power as it wants. When I am running full blast on the pro drivers pulling as much power as I can that never seems to crash the card.
My Vega 56 is only stable with the pro drivers too and I've pretty much done everything i could to try and stabilise it.
Vega56 here msi air boost version and I replaced the original due to fan started to fail.
Mounting the new cooler caused shut downs due to heat.
second mount worked better and since then no issues.
At anytime I want a blackscreen I can push the card power levels and it will black screen if I do so. If I dont then it just works.
If a crash happens usually unstable, somewhere.
It may not be the card but a conflict with software or hardware, sometimes heat, or a faulty cable or unstable ram and such.
I could get a crash due to the amd set fan curve wasn't good and the heat build up caused a crash and blackscreen. In some cases old drivers is causing issues that hasn't been cleaned out. If the power levels shift with the card, usually when a p-state changes inside the card
based on load /idle then the voltage may either be to low or high causing an heat issue as an example.
Troubleshooting starts with default stable computer without OC
good luck
I had same problem with sapphire nitro vega 56 crashing to black screen,tried undervolting,underclocking,changing drivers and nothing helped. I gave gpu to m friend to test it and for him it was working great. After that i changed ram frequency from 3000mhz to 2933mhz and since then i had zero crashes for months and still going.
Can you get your memory temperatures before the crash? I had trouble with 3D applications on every Micron memory Vega I've used with 3d applications. Once the memory temperature goes above 75C they will eventually crash, even after exiting the 3d programs. Every. Single. One. The only solution seems to be a hard reset of the system.
Sounds like a heat issue to me
There is some sort of bug (or maybe an issue) with Vega GPUs. Sometimes GPU incorrectly reports its junction temp as 511C, the card goes into panic mode (100%fan speed) and crashes. Sometimes it can happen within 10 mins and sometimes after 6-8 hours.
Leave the GPU-Z running in the background and enable the logging feature. Run your favorite 3d app in loop and wait until it crashes. Upon the crash, restart your system and check the log file. If you see the 511C junction temp right before the crash, then it means you hit the jackpot.
I had a problem that sounds similar enough for me to post about it. I have a powercolor vega 56, and it would sometimes reboot the pc (grey screen, nothing meaningful in the event logs) at random during gaming (only specific games strangely enough). The only solution that worked for me was to manually set all memory states (P0 - P3) to the same MHz and mV. I think I set all 4 to 500 | 800. Might be worth to try this for 24h - 48h.
The MSI airboosts have two bios’s on it. On the board is a tiny little black switch that you can flip to move to the other bios.
Flip that switch and see what happens…
Maybe try seeing if there is a bios update for your card
Use GPUz to see what version your card has, and if there is a newer one in the link above try flashing it. I wouldn’t recommend trying if you don’t have a bios switch on the card.
man I also have a Vega64 blower model, here everything is in order, I recommend you to stop using the MSI afterburner, this software is not reliable, if you want to perform OC or Undervolt, do it through the Adrenalin Control Center, it complies perfectly the function is not buggy and conflicting with AMD drivers, MSI afterburner is unreliable.
引用元:https://www.reddit.com/r/Amd/comments/r05abr/help_determining_source_of_vega_64_black_screen/
Upvoting this as it sounds like the exact same thing I went through.