This is something I have thought a lot recently since I recently saw a project that absolute didn’t care in the slightest about this and used many vendor specific features of MS SQL all over the place which had many advantages in terms of performance optimizations.

Basically everyone always advises you to write your backend so generically with technologies like ODBC, JDBC, Hibernate, … and never use anything vendor specific like stored procedures, vendor specific datatypes or meta queries with the argument being that you can later switch your DBMS without much hassle.

I really wonder if this actually happens in the real world with production Software or if this is just some advice that makes sense on surface level but in reality never pans out. I personally haven’t seen any large piece of Software switch to a different DBMS, even if there would be long term advantages of doing so, because the risk and work to retest everything would be far too great.

The only examples I know of (like SAP) were really part of a much larger rewrite or update rather than “just” switching DBMS.

  • Oliver Lowe
    link
    fedilink
    410 months ago

    Great question. Short answer: yes!

    Long answer: I did this on a production system about 2 years ago.

    The system was using MySQL, which was served from 3 virtual machines. Nobody took responsibility for that MySQL cluster, so outages and crazy long maintenance windows were normal especially as there was no DB admin expertise. The system had been hobbling along for 3 years regardless.

    One day the company contracting me asked for help migrating some applications to a new disaster recovery (DR) datacentre. One-by-one I patched codebases to make them more portable; even needing to remove hard-coded IP addresses and paths provided by NFS mounts! Finally I got to the system which used the MySQL cluster. After some digging I discovered:

    1. The system was only ever configured to connect to one DB host
    2. There were no other apps connecting to the DB cluster
    3. It all ran on “classic” Docker Swarm (not even the last released version)

    My ex-colleague who I got along really well with wrote 90% of the system. They used a SQL query builder and never used any DB engine-specific features. Thank you ex-colleague! I realised I could scrap this insane not-actually-highly-available architecture and use SQLite instead, all in a single virtual machine with 512MB memory and 1vCPU. SQLite was perfect for the job. The system consisted of a single reader and writer. The DB was only used for record-keeping of other long-running jobs.

    Swapping it over took about 3 days, mostly testing. No more outages, no more working around shitty network administration, no more “how does the backup work again?” when DB dumps failed, no more complex config management to bring up, down DB clusters. The ability to migrate DB engines led to a significant simplification of the overall system.

    But is this general advice? Not really. Programming with portability in mind is super important. But overly-generic programs can also be a pain to work with. Hibernate, JDBC et al. don’t come for free; convenience comes at the cost of complexity. Honestly I’m a relational database noob (I’m more a SRE), so my approach is to try to understand the specific system/project and go from there. For example:

    I recently saw a project that absolute didn’t care in the slightest about this and used many vendor specific features of MS SQL all over the place which had many advantages in terms of performance optimizations.

    Things I would want to learn more about:

    • were those performance optimisations essential?
    • if so, is the database the best place to optimise? e.g. smarter queries versus fronting with a dumb cache
    • are there database experts who can help out later? do we want to manage a cache?

    Basically everyone always advises you to write your backend so generically with technologies like ODBC, JDBC, Hibernate, … and never use anything vendor specific like stored procedures, vendor specific datatypes or meta queries with the argument being that you can later switch your DBMS without much hassle.

    Things I would want to learn more about:

    • how many stored procedures could we be managing? 1, 100, 1000? may not be worth taking on a dependency to avoid writing like 3 stored procedures
    • is that tooling depended on by other projects already?
    • how much would the vendor-specific datatype be used? one column in one table? everywhere?
    • does using vendor-specific features make the code easier to understand? or just easier to write? big difference!

    My shitty conclusion: it depends.