Nevermark 1 day ago

I mistook the title for "A Tiny Boltzmann Brain"! [0]

My own natural mind immediately solved the conundrum. Surely this was a case where a very small model was given randomly generated weights and then tested to see if it actually did something useful!

After all, the smaller the model, the more likely simple random generation can produce something interesting, relative to its size.

I stand corrected, but not discouraged!

I propose a new class of model, the "Unbiased-Architecture Instant Boltzmann Model" (UA-IBM).

One day we will have quantum computers large enough to simply set up the whole dataset as a classical constraint on a model defined with N serialized values, representing all the parameters and architecture settings. Then let a quantum system with N qubits take one inference step over all the classical samples, with all possible parameters and architectures in quantum superposition, and then reduce the result to return the best (or near best) model's parametesr and architecture in classical form.

Anyone have a few qubits laying around that want to give this a shot? (The irony that everything is quantum and yet so slippery we can hardly put any of it to work yet.

(Sci-fi story premise: the totally possible case of an alien species that evolved one-off quantum sensor, which evolved into a whole quantum sensory system, then a nervous system, and subsequently full quantum intelligence out of the gate. What kind of society and technological trajectory would they have? Hopefully they are in close orbit around a black hole, so the impact of their explosive progress has not threatened us yet. And then one day, they escape their gravity well, and ...)

[0] https://en.wikipedia.org/wiki/Boltzmann_brain

2
ithkuil 1 day ago

Poor quantum beings. They don't have access to a computation model that exceeds the speeds of their own thoughts and they are forever doomed to be waiting a long time for computations to happen

immibis 1 day ago

That isn't how quantum computers work.

Nevermark 1 day ago

My understanding, which is far from certain, is that problems like this (try a large combination of solutions, cooperate between superpositions to identify the best, then set up the quantum outputs so when they are classically sampled, the best answer is the most likely sample obtained) are solved in four stages:

1. N^2 potential solutions encoded in superposition across N qubuts.

2. Each superposition goes through the quantum version of a normal inference pass, using each superpositions weights to iteratively process all the classical data, then calculates performance. (This is all done without any classical sampling.)

3. Cross-superposition communication that results in agreement on the desired version. This is the weakest part of my knowledge. I know that is an operation type that exists, but I don't know how circuits implement it. But the number of bits being operated on is small, just a performance measure. (This is also done without any classical sampling.)

4. Then the output is sampled to get classical values. This requires N * log2(N) circuit complexity, where N is the number of bits defining the chosen solution, i.e. parameters and architectural settings. This can be a lot of hardware, obviously, perhaps more than the rest of the hardware, given N will be very large.

Don't take anything I say here for granted. I have designed parts of such a circuit, using ideal quantum gates in theory, but not all of it. I am not an expert, but I believe that every step here is well understood by others.

The downside relative to other approaches: It does a complete search of the entire solutions space directly, so the result comes from reducing N^2 superpositions across N qubits, where for interesting models the N can be very large (billions of parameters x parameter bit width), to get N qubits. No efficiencies are obtained from gradient information or any other heuristic. This is the brute force method. So an upper bound of hardware requirements.

Another disadvantage, is since the solution was not obtained by following a gradient, the best solution might not be robust. Meaning, it might be a kind of vertex between robust solutions that manages to do best on the designed data, but small changes of input samples from the design data might not generalize well. This is less likely to be a problem if smoothing/regularization information is taken into account account for determining each superpositions performance.