A web developer's blog. PHP, MySQL, CakePHP, Zend Framework, Wordpress, Code Igniter, Django, Python, CSS, Javascript, jQuery, Knockout.js, and other web development topics.

Importing from Huge CSV Files to MySQL Database and more…

The scenario: I am given a huge CSV File dumped thru FTP from some big JDE-ish system everyday. The CSV File is about 15MB or so. The file has around 60,000 lines in it.

What I needed to do is to update a “main” transaction table. Which means I have to lookup each line in the CSV File and then search each row in the transaction table, and then update the row if a match is found. But I figured that it would be crazy to directly handle each line in the file and then go to MySQL for each line.

The pseudo code would look something like this:

1) Open CSV File
2) Loop each line of the file
3) Use $row[0] + $row[1] + $row[2] in a WHERE statement to search the MySQL Database
4) If row is found, update the row in MySQL. If not found, then insert the row.

In case you didn’t notice, steps 2-4 would loop 60,000 times! And note that the mysql table I had already had 300,000 records in it. Can you imagine how much memory and resources this script would eat up if I implemented the code above?

First, opening the big csv file would already consume a lot of resources. On top of that, we have to loop thru each line of the file and do database updates. This would do just fine if you were handling like 100 lines, but 60,000 would hurt a lot.

So what I did was I just let MySQL do most of the hard work. I created a temporary table in MySQL. I made a script that imports the CSV file into the temporary mysql table. After that, I used MySQL queries to compare the temporary table and the main transaction table. I used queries such as these:

$sql = "
INSERT `transactions`
(`fieldA`,
`fieldB`,
`... this means more fields ...`,
`... this means more fields ...`,
`fieldX`)
SELECT
daily.fieldA,
daily.fieldB,
... this means more fields ...,
... this means more fields ...,
daily.FieldX
FROM ".$table_name." daily
WHERE
NOT EXISTS
(SELECT
t.fieldA,
t.fieldB,
t.fieldC,
t.fieldD
FROM transactions t WHERE
t.fieldA = daily.fieldA AND
t.fieldB = daily.fieldB AND
t.fieldC = daily.fieldC AND
t.fieldD = daily.fieldD)
";

And for the updates I used something like this:

$sql = "
        UPDATE `transactions` t , `".$table_name."` daily
        SET
            t.fieldA  = daily.fieldA,
            t.fieldB  = daily.fieldB,
            t.fieldC  = daily.fieldC,
            t.fieldD  = daily.fieldD,
            /* more fields */
 
        WHERE
            t.fieldA  = daily.fieldA AND
            t.fieldB  = daily.fieldB AND
            t.fieldC  = daily.fieldC AND
            t.fieldD  = daily.fieldD
        ";

So there you have it. Once you have the CSV file imported into a MySQL table, you can basically do anything with it and let MySQL do all the hard work for you.

This entry was posted in General and tagged , , , , . Bookmark the permalink.

6 Responses to Importing from Huge CSV Files to MySQL Database and more…

  1. lance says:

    Have you looked into REPLACE INTO? sounds like it would do basically everything you’re doing there in one statement…

  2. Wenbert says:

    Hi lance,

    Awesome. I will look up REPLACE INTO. There are other updates I needed to do. But I think the 2 statements above could do better. I will improve my working code when time permits..

    Thanks lance!

  3. Wenbert says:

    For more info regarding Lance’s notes, please go to: http://dev.mysql.com/doc/refman/5.0/en/replace.html

  4. Paul Sprangers says:

    Just being curious: is this csv file freely available? I’m looking for huge csv files in order to push my own database system to its limits.

    Kind regards,
    Paul Sprangers

  5. Wenbert says:

    Hi paul,

    You can create a PHP script that will create a huge csv file.

    thanks,
    Wenbert

  6. Paul Sprangers says:

    Dear Wenbert,

    Thank you for replying.
    Unfortunately, I can’t create PHP scripts (I don’t even know what they are), but fortunately, I found a lot of huge and downloadable databases and CSV files on the internet.

    Kind regards,
    Paul Sprangers

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>