Self-hosted GitHub runners · nixos

Have anyone managed to do this on a personal account (not org)? To let a NixOS machine (ideally using containers) act as self-hosted runner for many personal repositories.

{ pkgs, config, ... }:
{
  # TODO: Run inside container
  services.github-runners = {
    emanote-runner = {
      enable = true;
      name = "emanote-runner";
      # TODO: use sops-nix
      tokenFile = "/home/srid/runner.token";
      url = "https://github.com/srid/emanote";
      extraPackages = [ pkgs.cachix pkgs.nixci ];
      extraLabels = [ "nixos" ];
    };
  };
  nix.settings.trusted-users =
    builtins.map
      (runner: runner.user)
      config.services.github-runners;
}

Tim DeHerrera (Jan 26 2024 at 19:41):

Apparently it is possible to share a runner across an organizations repositories. I'm not sure if this applies to user repos though

Tim DeHerrera (Jan 26 2024 at 19:42):

Srid (Jan 27 2024 at 06:31):

Srid (Jan 27 2024 at 06:32):

You still have to manually create the runner in Settings for each repository. And then copy paste the token into the config for immediate deployment (if it not be immediate the token gets invalidated).

Srid (Jan 27 2024 at 12:17):

Tim DeHerrera (Jan 31 2024 at 15:43):

Yeah, so on my nrdxp/nrdos repo I setup 16 runners to avoid the one job at a time bottleneck. Each one is in a nixos-container for a layer of isolation, but has the nix store and daemon socket mounted so that the host daemon can manage and coordinate all the jobs, respecting its configured limits.

This is coupled with divnix/std-action, which I designed to do one eval up front and distribute the derivations to multiple build matrices after, which worked great since they are all on the same disk. One of the biggest bottlenecks in the past was the network cost of transferring the evaled derivations around.

Tim DeHerrera (Jan 31 2024 at 15:49):

Discovery is the only bottleneck, but once that's done all the builds can kick off in parallel each in its own runner.

Srid (Feb 01 2024 at 03:24):

Oh the discovery->matrix part is interesting. Is std-action really required here? I wonder how I can do the same from scratch.

Tim DeHerrera (Feb 01 2024 at 03:48):

It was originally designed to consume Standard's API, but it's fairly simple in practice. It should be fairly simple to port to the regular flake API if desired. The core idea is to nix eval a single json file containing a list of all the Nix derivations you plan to build. I came to this because Nix's eval cache is subpar at best, and because I noticed that multiple runners were evaling the same things over and over.

The real trouble I came upon is that the cost of evaluation in that context (of multiple runners building from a shared flake) is unknowable, especially since Nix has a severe lack of any real profiling tooling (for the language itself at least). Better then, to consolidate all evaluation up front, where the single threaded Nix evaluator can share any state between derivations in a single pass, and where the cost can at least become linear and predictable.

Srid (Feb 01 2024 at 07:24):

I've observed that running nix build separately for each output (whether it is specified in same CLI invocation or not) would slow down evaluation by O(n),

Which is how I came up with https://github.com/srid/nixci and use that in CI. But it doesn't have the nice per-output build status.

Srid (Feb 01 2024 at 07:27):

Was this fully automated? Or, like me, did you have to manually create the token for each of those runners?

Srid (Feb 01 2024 at 07:28):

Srid (Feb 01 2024 at 07:50):

Assuming this token was created from nrdxp/nrdos repository, your 16 runners can run only jobs from that repo, and not from elsewhere, such as nrdxp/foo

Tim DeHerrera (Feb 01 2024 at 14:09):

For now yes, they are only setup for that repo. I can set up more for others repo if when needed. They have almost no overhead unless/until they start building anyway.

Tim DeHerrera (Feb 01 2024 at 16:44):

Actually, now that I think about it, I wonder if I even need to mount the /nix/store at all inside these containers. Since the host daemon will be the one to do the actual building it might not be necessary. I already have nix.enable = false in their configs. I was wondering if I should actually make these VMs instead of containers for added security isolation.

Srid (Feb 02 2024 at 03:18):

Tim DeHerrera (Feb 02 2024 at 05:31):

Do you pass through the socket? In my case, for the sake of avoiding repeated work, I want the builds to all work from the same store, since they share dependencies.

Srid (Feb 02 2024 at 13:35):

Srid (Feb 02 2024 at 13:54):

[root@github-runner-emanote:~]# mount| grep store
/dev/md/nixos on /nix/store type ext4 (ro,relatime,stripe=32)

Tim DeHerrera (Feb 02 2024 at 14:07):

I still want them all to speak to the same daemon, so that the set limits on jobs and cores are globally respected between them

Srid (Feb 02 2024 at 14:24):

❯ ps -ef | grep nix-daemon
root        1459       1  0 Jan29 ?        00:00:00 nix-daemon --daemon
root      132628    1459  0 Jan30 ?        00:00:00 nix-daemon 132605
root      132971    1459  0 Jan30 ?        00:00:00 nix-daemon 132948

Are these child daemons spawned for each container? And if so, do they inherit jobs/cores limits automatically?

Tim DeHerrera (Feb 02 2024 at 14:28):

The daemon does spawn child procs for each build, but if the hosts socket is not available then they could be independent instances.

If the /nix/store is auto mounted though, perhaps the socket is as well. Not sure (not at my desk to check ATM)

Srid (Feb 02 2024 at 14:30):

[root@github-runner-emanote:~]# mount| grep socket
/dev/md/nixos on /nix/var/nix/daemon-socket type ext4 (ro,relatime,stripe=32)

Srid (Feb 02 2024 at 14:30):

[root@github-runner-emanote:~]# mount| grep /dev/md/nixos
/dev/md/nixos on / type ext4 (rw,relatime,stripe=32)
/dev/md/nixos on /run/host/os-release type ext4 (ro,nosuid,nodev,noexec,relatime,stripe=32)
/dev/md/nixos on /nix/store type ext4 (ro,relatime,stripe=32)
/dev/md/nixos on /nix/var/nix/daemon-socket type ext4 (ro,relatime,stripe=32)
/dev/md/nixos on /nix/var/nix/db type ext4 (ro,relatime,stripe=32)
/dev/md/nixos on /nix/var/nix/gcroots type ext4 (rw,relatime,stripe=32)
/dev/md/nixos on /nix/var/nix/profiles type ext4 (rw,relatime,stripe=32)
/dev/md/nixos on /etc/localtime type ext4 (ro,nosuid,nodev,relatime,stripe=32)

Srid (Feb 13 2024 at 11:17):

error (ignored): error: reached end of FramedSource
[..]
error: opening file '/nix/store/i8jjpg7im5jgr6dvr9ikylpa1szx1kpi-treefmt-check.lock': Permission denied

And this is happening only in Github Runner inside containers. I wonder if this is to do with the mounted nix-store access from the container.

Srid (Feb 13 2024 at 13:30):

nix-store --verify --repair --check-contents

Tim DeHerrera (Feb 13 2024 at 14:42):

hmm, haven't seen that one yet, have you tried making the daemon socket writable (mine is)?

Srid (Feb 13 2024 at 14:44):

Srid (Feb 14 2024 at 09:47):

TIL that you can just use a single personal access token (with 'repo' scope) for all runners across one's personal repositories.

Srid (Feb 14 2024 at 10:46):

easy-github-runners.nix

            services.easy-github-runners = {
              "srid/emanote" = { };
              "srid/haskell-flake" = { };
              "srid/nixos-config" = { };
              "srid/ema" = { };
              "EmaApps/orgself".owner = "srid";
            };

Tim DeHerrera (Feb 17 2024 at 17:47):

Just a heads up, I tried updating my nixos-unstable pin and somebody has modified the github runners in a way that my configuration now totally breaks. I cannot get the nix client to talk to the daemon and I am unsure why. I'll keep you posted if I figure it out

Tim DeHerrera (Feb 17 2024 at 18:51):

Firstly, they appear to have removed the old services.github-runner option and only have the newer services.github-runners option now, which led to a reorganization of the code, which means my snippet to import the options and use them in containers broke. So that's the first thing.

Even taking them out of the container and just running them directly on the host I was still failing to communicate with the daemon. This appears to be because the dynamically created users created by systemd were not trusted users. I thought I had already addressed that, but apparently not or I reverted it or something.

In any case, I was able to add the dynamic users to the trusted users list like so:
nix.settings.trusted-users = lib.genList (i: "github-runner-nrdos-${i}") 16

To match the names of the dynamically allocated users. I may swing back around and configure each runner in a minimal microvm for added isolation with just the host daemon socket mounted. But for now this is fine since I only have them configured for a private repository at this point.

Tim DeHerrera (Feb 17 2024 at 20:30):

seems kind of presumptuous to assume all runners should have Nix. Hopefully we can get this reviewed and merged quickly, cause I need it :sweat_smile:
https://github.com/NixOS/nixpkgs/pull/289607

Srid (Feb 18 2024 at 14:51):

Tim DeHerrera (Feb 18 2024 at 17:11):

The service is not new, what's new is that the code has been restructured and my previous code broke.

Srid (Feb 21 2024 at 13:10):

Parallels VM

The local machine's clock may be out of sync with the server time by more than five minutes. Please sync your clock with your domain or internet time and try again.

The local time on the container and host is incorrect, though. Wonder how that happened ... I'm running this in Parallels.

Srid (Feb 21 2024 at 13:13):

Srid (Feb 21 2024 at 13:15):

Srid (Feb 21 2024 at 16:34):

I made the mistake of not setting a good disk size. 60GB (the default) quickly ran out in CI. Neither boot.growPartition nor systemd repart worked to repartition root on boot, after I resized the disk image in Parallels.

Srid (Mar 25 2024 at 12:03):

New nix-darwin module

I upgraded nix-darwin which has a revamped github-runners module that explicitly writes the log files.

-bash-3.2$ pwd
/var/lib/github-runners
-bash-3.2$ tail -5 nixci-1/_diag/Runner_20240325-120051-utc.log
[2024-03-25 12:00:53Z ERR  Runner] GitHub.DistributedTask.WebApi.TaskAgentExistsException: A runner exists with the same name nixci-1.
   at GitHub.Runner.Listener.Configuration.ConfigurationManager.ConfigureAsync(CommandSettings command) in /private/tmp/nix-build-github-runner-2.314.1.drv-0/src/src/Runner.Listener/Configuration/ConfigurationManager.cs:line 306
   at GitHub.Runner.Listener.Runner.ExecuteCommand(CommandSettings command) in /private/tmp/nix-build-github-runner-2.314.1.drv-0/src/src/Runner.Listener/Runner.cs:line 126
[2024-03-25 12:00:53Z ERR  Terminal] WRITE ERROR: A runner exists with the same name nixci-1.
[2024-03-25 12:00:53Z INFO Listener] Runner execution has finished with return code 1
-bash-3.2$

Srid (Mar 25 2024 at 12:03):

Srid (Mar 25 2024 at 12:04):

As for this particular problem, there is replace = true; which solves the issue.

Notification Bot (Mar 25 2024 at 12:06):

Srid (Mar 26 2024 at 21:12):

I decided to do this the other way. Setup runners on the NixOS VM, and do remote build on macOS as necessary.

Srid (Jun 21 2024 at 23:38):

Alright, based on how I'm setting CI for Juspay Github, the new approach is not to use distributed builds, but just dedicate runners to each host.

Andreas (Jun 22 2024 at 07:37):

Awesome, I will look at this, once I am out of drama mindset thanks to the ongoing legacy NixOS community.

Stream: nixos

Topic: Self-hosted GitHub runners

Srid (Jan 26 2024 at 16:56):

Tim DeHerrera (Jan 26 2024 at 19:41):

Tim DeHerrera (Jan 26 2024 at 19:42):

Srid (Jan 27 2024 at 06:31):

Srid (Jan 27 2024 at 06:31):

Srid (Jan 27 2024 at 06:32):

Srid (Jan 27 2024 at 12:17):

Tim DeHerrera (Jan 31 2024 at 15:43):

Tim DeHerrera (Jan 31 2024 at 15:49):

Srid (Feb 01 2024 at 03:24):

Tim DeHerrera (Feb 01 2024 at 03:48):

Srid (Feb 01 2024 at 07:24):

Srid (Feb 01 2024 at 07:27):

Srid (Feb 01 2024 at 07:28):

Srid (Feb 01 2024 at 07:50):

Tim DeHerrera (Feb 01 2024 at 14:09):

Tim DeHerrera (Feb 01 2024 at 16:44):

Srid (Feb 02 2024 at 03:18):

Tim DeHerrera (Feb 02 2024 at 05:31):

Srid (Feb 02 2024 at 13:35):

Srid (Feb 02 2024 at 13:54):

Tim DeHerrera (Feb 02 2024 at 14:07):

Srid (Feb 02 2024 at 14:24):

Tim DeHerrera (Feb 02 2024 at 14:28):

Srid (Feb 02 2024 at 14:30):

Srid (Feb 02 2024 at 14:30):

Srid (Feb 13 2024 at 11:17):

Srid (Feb 13 2024 at 13:30):

Tim DeHerrera (Feb 13 2024 at 14:42):

Srid (Feb 13 2024 at 14:44):

Srid (Feb 14 2024 at 09:47):

Srid (Feb 14 2024 at 10:46):

easy-github-runners.nix

Tim DeHerrera (Feb 17 2024 at 17:47):

Tim DeHerrera (Feb 17 2024 at 18:51):

Tim DeHerrera (Feb 17 2024 at 20:30):

Srid (Feb 18 2024 at 14:51):

Tim DeHerrera (Feb 18 2024 at 17:11):

Srid (Feb 21 2024 at 13:10):

Parallels VM

Srid (Feb 21 2024 at 13:13):

Srid (Feb 21 2024 at 13:15):

Srid (Feb 21 2024 at 16:34):

Srid (Mar 25 2024 at 12:03):

New nix-darwin module

Srid (Mar 25 2024 at 12:03):

Srid (Mar 25 2024 at 12:04):

Notification Bot (Mar 25 2024 at 12:06):

Srid (Mar 26 2024 at 21:12):

Srid (Jun 21 2024 at 23:38):

Andreas (Jun 22 2024 at 07:37):

Srid (Jun 24 2024 at 19:52):

Srid (Jun 25 2024 at 14:40):

Srid (Jun 25 2024 at 14:44):