Your MobileNet-based custom backbone has no ImageNet weights right now:
torchvision.models.mobilenet_v3_small(pretrained=False)
Switch to:
torchvision.models.mobilenet_v3_small(weights="DEFAULT")
or even better:
💪 resnet50_fpn backbone (default for Mask R-CNN)
→ larger, pretrained feature extractor → huge boost.