Importing from Huge CSV Files to MySQL Database and more…
Filed Under (General) by Wenbert on 18-06-2008
Tagged Under : csv, handling large files, importing large files, MySQL, PHP
The scenario: I am given a huge CSV File dumped thru FTP from some big JDE-ish system everyday. The CSV File is about 15MB or so. The file has around 60,000 lines in it.
What I needed to do is to update a “main” transaction table. Which means I have to lookup each line in the CSV File and then search each row in the transaction table, and then update the row if a match is found. But I figured that it would be crazy to directly handle each line in the file and then go to MySQL for each line.
The pseudo code would look something like this:
1) Open CSV File
2) Loop each line of the file
3) Use $row[0] + $row[1] + $row[2] in a WHERE statement to search the MySQL Database
4) If row is found, update the row in MySQL. If not found, then insert the row.
In case you didn’t notice, steps 2-4 would loop 60,000 times! And note that the mysql table I had already had 300,000 records in it. Can you imagine how much memory and resources this script would eat up if I implemented the code above?
First, opening the big csv file would already consume a lot of resources. On top of that, we have to loop thru each line of the file and do database updates. This would do just fine if you were handling like 100 lines, but 60,000 would hurt a lot.
So what I did was I just let MySQL do most of the hard work. I created a temporary table in MySQL. I made a script that imports the CSV file into the temporary mysql table. After that, I used MySQL queries to compare the temporary table and the main transaction table. I used queries such as these:
INSERT `transactions`
(`fieldA`,
`fieldB`,
`… this means more fields …`,
`… this means more fields …`,
`fieldX`)
SELECT
daily.fieldA,
daily.fieldB,
… this means more fields …,
… this means more fields …,
daily.FieldX
FROM ".$table_name." daily
WHERE
NOT EXISTS
(SELECT
t.fieldA,
t.fieldB,
t.fieldC,
t.fieldD
FROM transactions t WHERE
t.fieldA = daily.fieldA AND
t.fieldB = daily.fieldB AND
t.fieldC = daily.fieldC AND
t.fieldD = daily.fieldD)
";
And for the updates I used something like this:
UPDATE `transactions` t , `".$table_name."` daily
SET
t.fieldA = daily.fieldA,
t.fieldB = daily.fieldB,
t.fieldC = daily.fieldC,
t.fieldD = daily.fieldD,
/* more fields */
WHERE
t.fieldA = daily.fieldA AND
t.fieldB = daily.fieldB AND
t.fieldC = daily.fieldC AND
t.fieldD = daily.fieldD
"
So there you have it. Once you have the CSV file imported into a MySQL table, you can basically do anything with it and let MySQL do all the hard work for you.