Will sqoop export create duplicates when the number of mappers is higher than the number of blocks in the source hdfs location?
My source hdfs directory has 24 million records and... moreWill sqoop export create duplicates when the number of mappers is higher than the number of blocks in the source hdfs location?
My source hdfs directory has 24 million records and when I do a sqoop export to Postgres table, it somehow creates duplicate records. I have set the number of mappers as 24. There are 12 blocks in the source location.
Any idea why the sqoop is creating duplicates?
I need to delete about 2 million rows from my PG database. I have a list of IDs that I need to delete. However, any way I try to do this is taking days.
I tried putting them in a... moreI need to delete about 2 million rows from my PG database. I have a list of IDs that I need to delete. However, any way I try to do this is taking days.
I tried putting them in a table and doing it in batches of 100. 4 days later, this is still running with only 297268 rows deleted. (I had to select 100 id's from an ID table, delete where IN that list, delete from ids table the 100 I selected).
I tried:
DELETE FROM tbl WHERE id IN (select * from ids)
That's taking forever, too. Hard to gauge how long, since I can't see it's progress till done, but the query was still running after 2 days.
Just kind of looking for the most effective way to delete from a table when I know the specific ID's to delete, and there are millions of IDs. less