QBoard » Big Data » Big Data - Others » Best way to delete millions of rows by ID

Best way to delete millions of rows by ID

  • I need to delete about 2 million rows from my PG database. I have a list of IDs that I need to delete. However, any way I try to do this is taking days.
    I tried putting them in a table and doing it in batches of 100. 4 days later, this is still running with only 297268 rows deleted. (I had to select 100 id's from an ID table, delete where IN that list, delete from ids table the 100 I selected).
    I tried:
    DELETE FROM tbl WHERE id IN (select * from ids)
    That's taking forever, too. Hard to gauge how long, since I can't see it's progress till done, but the query was still running after 2 days.
    Just kind of looking for the most effective way to delete from a table when I know the specific ID's to delete, and there are millions of IDs.
      December 21, 2020 6:05 PM IST
    2
  • First make sure you have an index on the ID fields, both in the table you want to delete from and the table you are using for deletion IDs.

    100 at a time seems too small. Try 1000 or 10000.

    There's no need to delete anything from the deletion ID table. Add a new column for a Batch number and fill it with 1000 for batch 1, 1000 for batch 2, etc. and make sure the deletion query includes the batch number.

      December 21, 2020 6:13 PM IST
    2
  • Two possible answers:
    1. Your table may have lots of constraint or triggers attached to it when you try to delete a record. It will incur much processor cycles and checking from other tables.
    2. You may need to put this statement inside a transaction.
      December 21, 2020 6:16 PM IST
    2
  • We know the update/delete performance of PostgreSQL is not as powerful as Oracle. When we need to delete millions or 10's of millions of rows, it's really difficult and takes a long time.
    However, we can still do this in production dbs. The following is my idea:
    First, we should create a log table with 2 columns - id & flag    (id refers to the id you want to delete; flag can be Y or null, with Y signifying the record is successfully deleted).
    This post was edited by Jasmine Chacko at December 21, 2020 6:25 PM IST
      December 21, 2020 6:22 PM IST
    1