于是我就在公司的 research 邮件列表发了封邮件讲述了我的设备连接方式和需求,结果 Takashi Iwai(内核音频子系统的维护者之一)回复我说确实没有什么办法,音频驱动只是按照显卡给的顺序分配编号,所以大概率是随机的。特别是我还发现这玩意好像也不一定按照显示器输出的顺序来排号,于是 Plan A 是彻底行不通了。那我还有 Plan B 和 Plan C。
当然有硬件的解法就有软件的解法,PipeWire 和 JACK 一样可以进行基于图的连接,那我只要搞一个虚拟的输出设备然后把两个 HDMI 设备跟它连一起不就行了?Arch Wiki 上恰好有一段 同时向一块声卡上的不同端口输出音频 的文档,我本来以为照做即可,但发现还是不对,并没有出现我想象中的一个新音频设备。不过后来我仔细研究,搞懂了里面各种术语,才知道是怎么回事。
于是在购买这款声卡七八年之后我终于在 Linux 下面把它按我想要的用法划分了通道,同时发现 PipeWire 对复杂音频设备的处理确实比 PulseAudio 更加灵活,而和同样基于图和连接的 JACK 相比,又能同时控制不同的声卡,对于我这种设备复杂需求却不复杂的用户而言显然更加方便。
English Version
I might be the only one who owns a complex audio setup among my friends. TL;DR: To share the only pair of speakers between PS4, Switch and computer I connect it to my monitor instead of internal sound card of my computer, so all devices can output audio to speakers via HDMI/DP. It's fine until I bought another monitor, it's hard to find which monitor is the one with speakers, maybe it's HDMI 1 this week and become HDMI 2 next week, so I always need to test before playing audio. I'm too angry to accept this recently, so I try to fix it by hand.
At first I guess PipeWire just randomly sorts audio ports, so it's easy to fix it, what I need to do is finding ID for each port and disabling the HDMI port without speakers. But soon I see PipeWire just sorts ports via ALSA's device number, so if I disable a port, the port might be the other monitor on next boot. Is there no way to let ALSA do a fixed mapping for HDMI audio devices? We all know monitors report EDID to system, and I read ALSA's document, it contains how to handle sequence of different sound cards via udev, but no way to handle ports on the same sound card like HDMI ports. I even find document of GPU audio from NVIDIA, it says each port has a ELD file, which contains monitor info, but if you try to read it with cat /proc/asound/cardX/eld*, you'll find it only contains model, not serial number, and I have two monitors of the same model in order to save time on dual-monitor scale, so they looks the same. But if your monitors/TVs are of different models, it is easier, ALSA will read model in ELD and you can access it via node.nick property of a PipeWire device, you can just read it, or write some WirePlumber rules to rename properties that your desktop environment uses, so you get a fixed name. But I need more help.
Then I send a Email to our company's research mailing list of my setup and demand, and Takashi Iwai (maintainer of kernel's audio subsystem) tell me there is no better way, audio driver just assign number when GPU driver notifies a new port, so it's dynamic. And I also find GPU driver may not emit ports as display probing sequence, so Plan A fails, but I have Plan B and Plan C.
Another colleague suggests me to buy a mixer hardware so I can connect two monitors into one pair of speakers, and he even draws a circuit diagram and says you could make one by yourself like this. I also considered this, it allows PC and game consoles play audio at the same time, but I have to manually sync volume of two audio devices on my PC. I don't need to play audio at the same time but I am lazy, so I decide to try this last.
If there is a hardware solution, there should be a software solution. PipeWire supports graph-based connection like JACK, then I could just create a virtual output device, and wire two HDMI devices to it. There is a section called Simultaneous output to multiple sinks on the same sound card on Arch Wiki, I thought I just need to follow it, but I was wrong, there is no new audio device. The I read more documents to understand the term and totally understand it.
First I find that section is only about how to create a "profile that shows both two mappings", but what is mapping and what is profile? Mapping is like one kine of combination of input/output on a sound card, and profile controls which kind of combination you could use. For example, if you have a sound device which has 2 input channels and 4 output channels, it could be a stereo output, or surround 4.0 output, or stereo input + surround 4.0 output, those are different profiles. And why you need to manually create a profile to simultaneously output to two sinks? Because by default ALSA does auto-profile which creates a profile for each mapping, and by default one mapping is for one HDMI port, so if you launch pavucontrol or Helvum, you'll find you can only see 1 of 2 HDMI devices if you don't switch profile, so you cannot wire them both. But you may also ask why GNOME Shell shows both of 2 HDMI sinks? Because libgnome-volume-control iterates sound cards first, and then ports on a sound card, not directly iterate ports (which could be effected by profile), and it will switch profile when you choose ports.
So the first step to do is create a new profile sets, I use /usr/share/alsa-card-profile/mixer/profile-sets/hdmi-multiple.conf:
I just copy mapping from default.conf, and the profile just contains those two mappings, and then write a WirePlumber rule to use this profile for GPU sound card. I write the rule into /etc/wireplumber/main.lua.d/51-hdmi-multiple.lua:
And then run systemctl --user restart wireplumber, you should see both 2 HDMI sinks in Helvum now.
Then let's do steps which Arch Wiki does not contain, how to output audio to 2 sinks? The easiest way is manually wire output program to both 2 sinks, but that's not persistent, and you cannot control volume in desktop environment. After reading PipeWire's document, I find I could solve this via virtual devices, there is a module called combine-stream which could create such a combination device, so I just follow combine-stream's document, write following content into /etc/pipewire/pipewire.conf.d/10-hdmi-combined-sink.conf:
It's fairly easy to understand, just create a combination device, all audio streams point to this device will be send to all HDMI sinks on a GPU sound card, and run systemctl --user restart pipewire wireplumber you should be able to choose it as output sink and control its volume. No matter speakers are connected to which monitor, it should work.
Then I find I could solve the channel problem of my USB sound card. I still uses Scarlett 2i4 bought when I was in high school, it has 2 input channels and 4 output channels, and auto-profile will set it to a surround 4.0 output and stereo input, but those 4 output channels is made of a stereo headphone output and a stereo speaker output, the 2 input channels typically are used as mono microphone and mono instructment. I used to set mono input in different software to fix my microphone. But now I find there is a example of another sound card in PipeWire's document (Split speakers/headphones of UMC404HD 和 Split speakers/headphones of UMC404HD), but mostly they are the same, so I also tweak my sound card.
The same thing is to replace default profile in order to split each channels, but this time manually creating profile is not needed, PipeWire provides a pro-audio profile for all audio devices, it will expose all channels without assuming their usage (obviously, your desktop environment supports this badly), and then we could do what we need, so just create a rule to use pro-audio by default in /etc/wireplumber/main.lua.d/51-scarlett-2i4.lua:
Then run systemctl --user restart wireplumber, and launch Helvum, the sound card now should has AUX0~3 instead of LR RR, and then create virtual devices that map to different channels.
For output channels, I create two devices for headphones and speakers, I typically only uses headphones. Which differs from the example is Scarlett 2i4 uses AUX0/1 for headphones instead of AUX2/3, and that's the reason why is works in surround 4.0 output profile. Anyway, just write those configuration into /etc/pipewire/pipewire.conf.d/10-scarlett-2i4-sinks.conf:
Then run systemctl --user restart pipewire wireplumber there should be 2 sinks called Scarlett 2i4 Headphones and Scarlett 2i4 Speakers. For input channels, I also map them into 2 mono virtual input devices, write configuration into /etc/pipewire/pipewire.conf.d/10-scarlett-2i4-sources.conf:
Every thing should be done here, but just in case someone really uses this sound card for surround 4.0 output or stereo input, combine-stream also could be used to combine those virtual devices, it could be done via writing those contents into /etc/pipewire/pipewire.conf.d/10-scarlett-2i4-combined.conf:
Theoretically creating new virtual devices wired to physical channels should also work, but my graph messed up after I tried it. Note that I swapped channels in surround 4.0 output, I use speakers sink for front and headphones sink for rear, which might leads into a better result, but it's opposite to what auto-profile generates.
So after owning this sound card for 7~8 years I finally tweaked it's channels as my will, and I find PipeWire is more flexible than PulseAudio on handling complex sound devices, and when compared with JACK which also uses graph and wire, PipeWire can control different sound cards, which is more convenient for users like me who have complex setup but simple demand.