Alright - I think just about everyone who's interested is watching this thread
We still can't tell for sure which piece of equipment got confused, because during the testing we rebooted and powercycled the couple of boxes that may have known something. What we do know is switching the IP addresses for our uplinks on the NYI side and flushing the ARP tables fixed it. Most likely, flushing the ARP tables would have been enough by itself.
As to what poisoned the ARP tables in the first place - we still don't know.
Here's the steps we're taking to avoid these problems in future:
1) get an "out of band" management link direct to one of our servers, so that our connections don't all come through the same IP range - so if something is wrong with that, we can troubleshoot from the inside as well as the outside.
2) buy a managed switch rather than an unmanaged switch to sit between NYI's uplinks and our equipment - this will allow us to see more quickly where the issue is if this happens again.
3) NYI have placed any networking issues with us on an immediate escalation rather than "try the standard troubleshooting first" - because the problem only affected us, it did "smell" like a problem with our equipment at first.
Step 1 is happening immediately. Step 3 has already been done. Step 2 requires equipment purchase and a
planned swapover when NYI have their main network techs on hand and we have techs on hand (there's not a huge window when it's working day for both of us!) so will probably be a week or two away.