Archiving Data from Massive Production Tables

Just a quick tip.

Data is valuable, but as it grows in your production system, it can slow everything down.

If you have data that has grown to millions of rows, and decide to archive it off your servers, and you have replication/sharding going, selecting into archive tables and deleting millions of rows can cause your replication to back up.

Another strategy, though requiring downtime, is quicker:

  1. Make sure nothing is updating the table you are archiving.
  2. Select the amount of data you want to KEEP into an equivalent schema table with equivalent indexing.
  3. rename the original table to indicate it’s oldness.
  4. rename the new table to the original table name
  5. Turn your updates back on.

Then, going forward you can use the siphoning off approach for smaller chunks of data that fall off the end of your window (30, 90 days, whatever.)