I guess that's a different type of study, though - it's looking at behaviour in a very specific case, and answering the question "can x happen?" I think the type of study we're talking about here is one that seeks "statistical significance".

Yes. And to determine how significant your result is, you need to be able to model the probability distribution that underlies it. Mick made a simulation that assumes all male rats have basically the same chance to get cancer. What if that is not true? Is our average chance still correct then? If you have looked up my first quote in the book you linked, you'll have seen the graphs right below it, where a skewed distribution is shown, and the averages look skewed as well. In that case, any idea we might have of how accurate the observed average is goes out the window.

But if you have 30 rats or more, you can usually assume that your average is going to behave much the same as if every rat had the same chance to get cancer, and the discussion we've been having becomes possible. Only then can we say, well, if the true chance is 0.57%, then there's an 11.49% chance that we see 3 or more sick male rats in that group, because that's a statement on how samples behave in normal distributions. You can then decide if that chance is small enough for you to attach significance to it, and it kinda tells you how high your chance is that it isn't, based on that set of data.

Now, by convention, in most sciences we'll say "significant" means 5% chance or less, and that makes results easy to compare and combats sensationalism. But what we really would want to know is not a binary yes/no, but the actual probability. If the chance to be wrong is 4.9% that's worse odds than if it was 0.3%, right? Significance is a choice that we assign to a specific level of confidence that we want to have in the result, and we can only do that if we can compute how probable it is that we got that result by random chance. And for that, we usually want the distribution to be like a normal distribution, so 30+ samples or rats or whatever.

If you have that, your study data (usually) tells you how accurate your result is.

And then the other thing I said earlier comes into play: if you need a lot of accuracy because you want to show a very small effect, you have to use a small sample size; but if you're ok with getting an approximate idea, a small sample is enough. The study I cited computed the average incubation length and the average serial interval, and because there were only 16 samples, the error was probably quite big; but since all they probably wanted was to check if that lines up with the data we had from China, they did it anyway.

Long story short, to be able to assess the error inherent in the data, we need a minimal sample size. If we can do that, if we can talk about the error, then we have scientific data that can be significant if judge it so; but if we can't, we only have anecdotes.