One thing I have come to realize as I study Nix's architecture more, and after years of experience and just dealing with the pain points, is that Nix may have actually missed a critical opportunity for itself by being turing-complete.
In order to address some of the inefficiency concerns (changing hashes over added whitespace, reformatting, etc), the most formal solution would be to use a total programming language that can be reliably reduced to normal form (since it's not turing complete). You can then compute the hash over this normal form and use it to verify the "semantic integrity" of the code (as Dhall puts it).
This hash could also equally function as a cache key, for more granular and correct expression caching. Also keeps the complexity of expressions capped, which puts a cap on the overall complexity of evaluation (which keeps getting worse and worse in nixpkgs, for example).
Nix has already missed the boat on this, since any changes of this magnitude would break all backward compatibility, but when starting a new project completely from scratch you have the opportunity to design a language with these things in mind up front, before you get locked in.
For that reason, I just wanted to suggest, if you haven't thought of it already, specifying the language as a "total" programming language, which would give you a compelling advantage over Nix. Yes there is already Dhall, but that lang has some of its own problems (too rigid of typing, mostly).
cab eval
is run, it hashes the file. If that hash already exists in the hash table that maps file hashes to structure hashes, it skips to step 3. If not, step 2.This way, changing whitespace won't do anything as it does not affect the AST. You don't need the language to not be turning complete to achieve this, either.
This could be performed for every single thunk evaluation, but initial evaluation would be quite super slow, so we need to find a middle ground.
Top-level is also quite bad, which is the way Nix does it. I'm thinking of doing it per island, which is either a git repository, a zip file, etc. Like flake inputs, one might say
Makes sense, but the benefit of reducing to normal form and then hashing would be that refactors would still maintain the same hash, basically any two expressions that are semantically equivalent would have the same hash, also I think a turing-complete language is a bit overkill for what should have been a DSL, no?
"reducing to normal form" is evaluation, though. And the result of that would still be the same derivation files, so you wouldn't end up building anything
I don't think we are going to have any performance issues, as I'm planning to implement the standard library in Rust, instead of Nix. That is what takes the most time in Nix to evaluate modules. Nix itself is just slow and shouldn't be used to create huge things, only use them as glue.
reducing to normal form would be the first step in eval sure. That isn't even technically always possible in turing complete languages though, due to arbitrary complexity, the point is that it is always possible and should be computationally fairly inexpensive in a total language.
The point about two expressions building the same derivation anyway is also true, but if you wanted to cache that evaluation, you'd have know way of knowing they are equivalent beforehand unless you first reduced them to normal form.
Anyway, just wanted to share my thoughts with you since you are doing this, as it's something I've been thinking about lately in my effort to understand how one might improve Nix. Ultimately I am looking for something that is backwards compatible, but for someone in your position it is more feasible.
if my ideas are fuzzy or useless, that is also fine, I didn't fully flesh it out
It should also be possible to do ast-hashing in Nix. Not sure why it doesn't do it though.
Maybe because in C++, you can't just #[derive(Hash)]?
Last updated: Dec 03 2024 at 16:21 UTC