I assume that most implementations have a base set of known data that gets spun up fresh each test run. I think there are a few basic schools of thought from here..
I think it's obvious that #3 is the least maintainable approach.. but I'm still curious if anyone has been successful with it. Perhaps you could have databases for various scenarios, and drop/add them from test code.
It depends on the type of data and your domain. I had one unsuccessful attempt when the schema wasn't stable yet. We kept running into problems adding data to new and changed columns which bricked the tests all the time.
Now we successfully use starting state data where the dataset will largely be fixed, stable schemas and required in the same state for all tests. (e.g. A postcode database)
for most other stuff the tests are responible for setting up data themselves. That works for us!