This is getting away from the original point which is that deep neural networks are, by default, not explanatory in the way Einstein's theory of relativity is.
But even so,
> In other words, I would claim that as a model more closely matches those "surface statistics" it necessarily more closely resembles the underlying mechanisms that gave rise to them.
I don't what it means, for example, for a deep neural network, to "more resemble" the underlying process of the weather. It's also obviously false in general: If you have a mechanical clock and quartz-crystal analog clock you are not going to be able to derive the internal workings of either or distinguish between them from the hand positions. The same is true for two different pseudo-random number generator circuits that produce the same output.
> I have yet to see an example where a more accurate model was conceptually simpler than the simplest known model at some lower level of accuracy.
I don't understand what you mean. Simple models often yield a high level of understanding without being better predictors. For example an idealized ball rolling down a plane, Galileo's mass/gravity thought experiment, Kepler etc. Many of these models ignore less important details to focus on the fundamental ones.
> From an information theoretic angle I think it's similar to compression (something that ML also happens to be almost unbelievably good at). Related to this, I've seen it argued somewhere (I don't immediately recall where though) that learning (in both the ML and human sense) amounts to constructing a world model via compression and that rings true to me.
In practice you get nowhere trying to recreate the internals of a cryptographic pseudo-random number generator from the output it produces (maybe in theory you could do it with infinite data and no bounds on computational complexity or something) even though the generator itself could be highly compressed.
> Sure, but what leads to those theories? They are invariably the result of attempting to more accurately model the things which we can observe.
Yes but if the model does not lead to understanding you cannot come up with the new ideas.
Admittedly my original question (how "not explanatory" leads to "is not a") begins to look like a nit now that I understand the point you were trying to make (or at least I think I do). Nonetheless the discussion seems interesting.
That said, I'm inclined to object to this "explanatory" characteristic you're putting forward. We as humans certainly put a lot of work into optimizing the formulation of our models with the express goal of easing human understanding but I'm not sure that's anything more than an artifact of the system that produces them. At the end of the day they are tools for accomplishing some purpose.
Perhaps the idea you are attempting to express is analogous to concepts such as principal component analysis as applied to the representation of the final model?
> If you have a mechanical clock and quartz-crystal analog clock you are not going to be able to derive the internal workings of either or distinguish between them from the hand positions.
Arguably modern physics analogously does exactly that, although the amount of resources required to do so is astronomical.
Anyhow my claim was not about the ability or lack thereof to derive information from the outputs of a system. It was that as you demand increased accuracy from a model of the hand positions (your example) you will be necessarily forced to model the internal workings of the original physical system to increasingly higher fidelity. I claim that there is no way around this - that fundamentally your only option for increasing the accuracy of the output of a model is for it to more closely resemble the inner workings of the thing being modeled. Taken to the (notably impossible) extreme this might take the form of a quantum mechanics based simulation of the entire system.
Extrapolating this to the weather, I'm claiming that any reasonably accurate ML model will necessarily encompass some sort of underlying truth about the physical system that it is modeling and that as it becomes more accurate it will encode more such truth. Notably, I make no claim about the ability of an unaided human to interpret such truths from a binary blob of weights.
> I don't understand what you mean. Simple models often yield a high level of understanding without being better predictors.
I said nothing about efficiency of educating humans (ie information gathering by or transfer between agents) but rather about model accuracy versus model complexity. I am claiming that more accurate models will invariably be more complex, and that said complexity will invariably encode more information about the original system being modeled. I have yet to encounter a counterexample.
> [CSPRNG recreation]
It is by design impossible to "model" the output of such a function in a bitwise accurate manner without reproducing the internals with perfect fidelity. In the event that someone figures out how to model the output in an imprecise manner without access to the key that would generally be construed as the algorithm having been broken. In other words that example aligns perfectly with my point in the sense that it cannot be approximated to any degree better than random chance with a "simpler" (ie less computationally complex than the original) mechanism. It takes the continuum of accuracy that I was originally describing and replaces it with a step function.
> Yes but if the model does not lead to understanding you cannot come up with the new ideas.
I suppose human understanding is a prerequisite to new human constructed models but my (counter-)point remains. Physics theories are "nothing more" than humans fitting "surface statistics" to increasing degrees of accuracy. I think this is a fairly fundamental truth with regards to the philosophy of science.