Above is the graph of 8th graders in math scores on the 2015 TIMSS international test. There are also scores for 4th graders in math and both grades in science.
Here’s a 2015 paper by Heiner Rindermann comparing the two.
TIMSS tasks were seen as more curriculum-related and requiring more school knowledge than PISA tasks. For solving PISA tasks, thinking/reasoning ability and general intelligence were rated as being more important (d = 0.36).
In general, it’s not that hard to get a testing process so its scores are fairly accurate, but you run into diminishing marginal returns in reliability the more subtle the type of judgments you want to make. For example, averaging the two tests across both grades and all subjects tends to give a more reliable rank ordering of countries than trying to tease out more specific questions like why did Finland’s 8th grade science score go or up down.