Is it realistic to expect noticeable “aha-like” jumps from pure pertaining at 124M? Do you think after resolving the sudden loss spikes, with 3.82B tokens seen, we can reproduce a modal matching the ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results