Just to clarify

best practice is for all the paralleling to be done first, aka "groups" of cells at the 1S level

so that the pack is just one "string" of those groups connected in series.

In effect the groups function each as a larger single cell, the individual units becoming "invisible" electrically until isolated from each other.

This is only needed if your choice of cell model / line is unavailable in the Ah capacity you need for the pack as a whole.

Better if you need a 180Ah pack, you find 180+Ah cells to build it with, but of course other factors may prevent that.

Building modular sub packs and then connecting those in series is fine (if you have a strong reason to do so) but many BMS can't handle that design.

But building multiple packs to then be paralleled is really sub-optimal, certainly once you get past two or three strings. Can be done if you have a truly compelling reason,

a single BMS can control the whole pack, so long as the individual groups at 1S are connected between the sub packs,

but current flow in use and while charging will inevitably be unbalanced, thus whole-pack lifespan shortened.
