This post explains how to get media keys (play, pause, …) on keyboards and Bluetooth headphones work with a bare X window manager (as opposed to a full desktop environment) and how to make them control multiple media players including the web browser (YouTube, bandcamp, myNoise, etc.) which is something that even majority operating systems and desktop environments don’t quite get right out of the box.
Table of Contents
My use cases:
- listening to music/myNoise while working (near computer) or reading (away from computer)
- listening to podcasts while cooking/cleaning (away from computer, screen locked, wet hands)
- discovering new music on YouTube, bandcamp, soundcloud, …
- listening off-line (that why I use mpd and buy music on bandcamp)
I’d love to use the same play/pause key/button in all of these scenarios, and this key/button should control the appropriate application (pause the one that’s playing, play the last paused one, …). It’s annoying, bad UX otherwise.
I don’t want to think how to pause music when someone needs me in the meatspace. Walking back from the kitchen to pause a podcast is just plain silly. Playing YouTube, bandcamp, soundcloud etc. using mpd is possible and it’s what I used to do back when my play/pause buttons were hardwired to mpd, but it’s a hack and likely against the ToS.
There is a standard D-Bus interface for controlling media players on a modern Linux desktop: MPRIS. It seems to be supported by both Chromium and Firefox these days, and there’s a command line tool playerctl as well, so we just need to write a few scripts to wire it all together and everything should just work.
After some hacking, my setup looks like this (all the icons and some of the arrows are clickable):
The window manager (xmonad) and screen locker (xsecurelock1) bind all the keys (and thus also headphone buttons via uinput and bluetoothd) and call the liskin-media script:
A background service uses playerctl to keep track of the last active player:
The liskin-media script then selects the appropriate media player (the first one that’s playing; the one that’s paused, if there’s only one; or the last one to play/pause) and uses playerctl to send commands to it:
Note that similar logic is also implemented by mpris2controller, which I unfortunately haven’t found until I started writing this post.
The final component is the media players themselves.
Chrome/Chromium 81 seem to work out of the box, including metadata
(artist, album, track) in websites that use the Media Session API.
Somewhat surprisingly, play/pause works for any HTML
websites that don’t use Media Session (bandcamp, …) can be controlled too. It
seems this wasn’t always the case as there are several webextensions that seem
to solve this now non-existent problem: Media Session Master, Web Media
Firefox 76 works after enabling
about:config. This only enables play/pause/stop, however. To be able to skip
to next/prev and to get metadata (artist, album, track), Media Session API
needs to be enabled separately via
dom.media.mediasession.enabled (note that
I couldn’t get this to work in Firefox 75, so this is hot new experimental
stuff and may be unstable). Also, not all websites can be controlled
(play/pause): YouTube and bandcamp works, soundcloud and plain HTML5
example don’t. Firefox’s emerging support for media controls is documented
here; there are some interesting details about
ignoring silence, short clips, and giving up control if paused for more than a
minute (a feature that I find undesirable and unfortunately present in Chrome
on non-Linux platforms, as noted further).
myNoise can’t be controlled by media keys in either browser as it uses plain Web Audio API, so I’ve made a userscript as a workaround (chromium-only) and will try to help get it fixed upstream.
mpd 0.21.22 itself does not support MPRIS, but the frontend I use — Cantata — does.
Likewise, mpv 0.32.0 needs a plugin, mpv-mpris works well.
vlc 3.0 supports MPRIS out of the box. Reportedly, so does Spotify. (I don’t use either.)
State of the art
Out of curiosity, I wanted to know if this is a solved problem if one uses a less weird operating system or desktop environment. Turns out, not really… :-)
GNOME (popular Linux desktop environment)
Media keys (and presumably also headphone buttons, not tested) appear to work out of the box, including when the desktop is locked. Unfortunately it only works reliably (predictably) when there’s just one media player application. When there’s more than one (e.g. YouTube in Chromium and music in Rhythmbox), only the one that started first (application launch, not necessarily start of playing) is controlled, regardless of whether this one player is stopped/playing/paused, or whether it has any playable media at all.
In practice, this means that a browser that had once visited YouTube blocks other apps from being controlled by media keys. This is further complicated by the fact that gnome-settings-daemon has two different APIs for media keys: MPRIS and GSD media keys API and prefers MPRIS players. Therefore, Chromium are Rhythmbox are preferred to Totem (GNOME’s movie player) even when launched later, which means that a user needs to understand all these complicated bits to have any hope of knowing what player will act upon a play/pause button press. Oh and Totem does support MPRIS in fact, it’s a plugin that may be manually enabled in its preferences.
It is a bit of a mess.
(I tested this on a Fedora 32 live DVD with GNOME 3.36. Recordings of some of those experiments: https://youtu.be/1fN6NMDBFNI, https://youtu.be/FCStseDBwC4)
KDE Plasma 5
KDE is the only environment out of those tested that works really well. (Almost.)
Media keys work out of the box in all media players I tried (Dragon, vlc, mpv + mpv-mpris, Totem + MPRIS plugin) including lock-screen. When there are multiple players, the last one is controlled, and when it’s closed, it automatically switches to another one. Additionally, there’s an applet in the bottom panel that lets users override this automatic behaviour and force a selected player to be controlled.
When Firefox is first launched, KDE prompts the user to install the Plasma Browser Integration extension which adds MPRIS and even Media Session API support to Firefox, presumably because this extension predates this support in Firefox itself. The implementation is different to the one in recent Firefox versions, so it’s not entirely surprising that soundcloud works as well (as opposed to vanilla Firefox). And it also follows that it doesn’t work when a media file is opened directly; the extension only works in HTML pages.
Unfortunately, Chromium doesn’t work so well. It’s visible in the list of media players in the panel applet, it shows what’s currently playing, but the control buttons are grey and media keys don’t do anything either. This is also the case if there are more players active: whenever Chromium is the active player, media keys do nothing. This is a bug in Chromium’s MPRIS implementation that should be easy to fix. In the meantime, one can install the Plasma Browser Integration extension as a workaround (for HTML audio/video).
(I tested this on a Fedora 32 KDE live DVD with KDE Plasma 5.18.3. Recordings of some of those experiments: https://youtu.be/-vpHDXg5jW8, https://youtu.be/IybSl2WiNYE)
Similarly to GNOME, media keys appear to work fine at first glance, but when multiple/specific apps are involved, minor problems appear.
Windows 10 have their equivalent of MPRIS called System Media Transport Controls and this is supported by Chromium, and therefore by both Chrome and the new Chromium-based Edge. It’s not supported by the (deprecated) Windows Media Player, but that’s probably fine as the modern replacement Movies/Films & TV supports it very well.
As opposed to GNOME, an application not supporting the SMTC API does not mean it doesn’t react to media keys. Windows Media Player does, quite well actually (even on lock screen), but it doesn’t grab the keys so when there’s another app, media keys control both of them. On the other hand, vlc only handles the keys when focused. Finally and not surprisingly, old Edge and Internet Explorer don’t handle them at all.
Handling of multiple apps that all support SMTC is good, but there’s a bug that would make this completely unusable for my podcast use case: it’s not possible to continue playing from the lock screen if it’d been paused for more than a few seconds. This bug does not affect Movies/Films & TV, though.
Were it not for this issue, I’d say it’s perfectly usable, as deprecated players/browsers can easily be avoided and I wouldn’t mind not being able to use vlc for background playback.
(I tested this on a clean Windows 10 Pro version 1909 with no vendor-specific bloatware. Recordings of the experiments are linked from the preceding paragraphs, and for completeness also listed here: https://youtu.be/9DN2tcZGsHU, https://youtu.be/1-m0kECqt38, https://youtu.be/aPSkMTZcy8w, https://youtu.be/FQAFurnLUVU, https://youtu.be/uKRqZ3p76Gw.)
I expected this to work almost flawlessly as Apple is known for their focus on UX, but it seems worse than Windows 10, unfortunately. Worse than Windows 10 with the optional upgrade to the new Chromium-based Edge, that is.
My experience as a user of macOS is very limited, and as a developer non-existent, but it seems that the macOS equivalent of MPRIS is MPRemoteCommandCenter in the Media Player framework.
Media keys work in every app I tried (but I haven’t tried any that don’t come pre-installed, like vlc), and they work on lock screen as well, regardless of how long it’s been paused/locked. Unfortunately, they only start controlling the app after I’ve interacted with the play/pause button at least once, so when I open a video and press the play/pause key on the keyboard, instead of pausing, the Music app opens.
When multiple players are open, the last one that I interacted with is controlled, as it should be. When one of them is closed, however, the control isn’t transferred to the other one, unless the application is terminated entirely or I manually interact with the other one. Strangely, it works well when a music-playing tab in Safari is closed.
(I tested this on a clean macOS Catalina 10.15.4 with no additional software installed. Recordings of some of those experiments: https://youtu.be/VN7-eZsIpOE, https://youtu.be/oIo21HRPfhM)
Android 10 (Samsung One UI 2.1)
Had I not been a longtime Android user, I would expect this to work flawlessly as smartphones are the primary means of media consumption for many (most?) people. Turns out there are issues, too. There always are.
Android’s API for media control: “A MediaSession should be created when an app wants to publish media playback information or handle media keys.”
My Android device does not have a dedicated play/pause button, but my Bluetooth headphones do, so that’s what I tested (wired headphones will likely behave the same). Obviously, most apps (including vlc) react to play/pause just fine. Additionally, pressing play in one app pauses any other that is currently playing, which is something that desktop systems don’t do and that isn’t implemented (yet) in my setup either. Also, an incoming/outgoing call pauses any playing media. So far so good.
Interaction between multiple players is a bit weird, though. Like in macOS, after closing one of them, control is not transferred to the other one. Unlike in macOS, quitting the application (force close) doesn’t help either. Like in macOS, closing a browser tab does transfer control to a music player.
What’s worse, when a media playing in the browser (Chrome) is paused and the device is locked, it disappears after a while and can’t be continued, similarly to Windows 10. As noted in the notes about Firefox media controls, this might be intentional, but I don’t like this: it forces me to install an app for anything that I might need to pause for longer than a few seconds, and there isn’t always (a good) one.
(I tested this on a not at all clean, but fully updated Samsung Galaxy S10e. This is not vanilla Android 10, but Samsung’s One UI 2.1, so it’s possible other devices will behave better (or worse). Recordings of the experiments are linked from the preceding paragraphs, and for completeness also listed here: https://youtu.be/2vQAbaMpXfM, https://youtu.be/UOXvDx6Dvas.)
(Completely unrelated, but perhaps worth noting: to ensure high-quality playback, Android sends audio at 100% volume to Bluetooth headphones3 and lets them adjust volume themselves. Without this, 16-bit audio at 25% volume effectively becomes 14-bit. pulseaudio doesn’t do this, but liskin-media does.)
None of the mainstream environments except KDE supports media keys/buttons well enough to cover my use cases. It seems, therefore, that niche X window managers aren’t at a very big disadvantage — their target demographic is used to tweaking things to their liking, after all.
And there are more, presumably from back when there was no MPRIS support in the browser whatsoever: