Importing from Huge CSV Files to MySQL Database and more…

Filed Under (General) by Wenbert on 18-06-2008

Tagged Under : , , , ,

The scenario: I am given a huge CSV File dumped thru FTP from some big JDE-ish system everyday. The CSV File is about 15MB or so. The file has around 60,000 lines in it.

What I needed to do is to update a “main” transaction table. Which means I have to lookup each line in the CSV File and then search each row in the transaction table, and then update the row if a match is found. But I figured that it would be crazy to directly handle each line in the file and then go to MySQL for each line.

The pseudo code would look something like this:

1) Open CSV File
2) Loop each line of the file
3) Use $row[0] + $row[1] + $row[2] in a WHERE statement to search the MySQL Database
4) If row is found, update the row in MySQL. If not found, then insert the row.

In case you didn’t notice, steps 2-4 would loop 60,000 times! And note that the mysql table I had already had 300,000 records in it. Can you imagine how much memory and resources this script would eat up if I implemented the code above?

First, opening the big csv file would already consume a lot of resources. On top of that, we have to loop thru each line of the file and do database updates. This would do just fine if you were handling like 100 lines, but 60,000 would hurt a lot.

So what I did was I just let MySQL do most of the hard work. I created a temporary table in MySQL. I made a script that imports the CSV file into the temporary mysql table. After that, I used MySQL queries to compare the temporary table and the main transaction table. I used queries such as these:

$sql = "
INSERT `transactions`
(`fieldA`,
`fieldB`,
`… this means more fields …`,
`… this means more fields …`,
`fieldX`)
SELECT
daily.fieldA,
daily.fieldB,
… this means more fields …,
… this means more fields …,
daily.FieldX
FROM "
.$table_name." daily
WHERE
NOT EXISTS
(SELECT
t.fieldA,
t.fieldB,
t.fieldC,
t.fieldD
FROM transactions t WHERE
t.fieldA = daily.fieldA AND
t.fieldB = daily.fieldB AND
t.fieldC = daily.fieldC AND
t.fieldD = daily.fieldD)
"
;

And for the updates I used something like this:

$sql = "
        UPDATE `transactions` t , `"
.$table_name."` daily
        SET
            t.fieldA  = daily.fieldA,
            t.fieldB  = daily.fieldB,
            t.fieldC  = daily.fieldC,
            t.fieldD  = daily.fieldD,
            /* more fields */

        WHERE
            t.fieldA  = daily.fieldA AND
            t.fieldB  = daily.fieldB AND
            t.fieldC  = daily.fieldC AND
            t.fieldD  = daily.fieldD
        ";
 

So there you have it. Once you have the CSV file imported into a MySQL table, you can basically do anything with it and let MySQL do all the hard work for you.

Uploading huge files (i mean huge ones, like 5Gig and upwards). No PHP :P

Filed Under (General) by Wenbert on 27-05-2008

Tagged Under : , ,

Okay, here is an interesting find.

Although I have never had a client who asked me to upload files larger than 10MB :P , i have always wondered if there was a better way to do it without using PHP. The biggest file I was able to handle using PHP was zipping a directory with lots of files. The zip file created is about 2Gig. PHP ate up a lot of memory! It gobbled up the 4gig ram of the server :P But I found a way around it by using system() calls. I got the list of directories and ran tar inside the system() call. Worked like a charm.

Now back to the large file uploads, the post recommends using Tramline - but I have doubts because it is still in its earl stages. Does know of other solutions to uploading huge files aside from using Tramline?

Morph Exchange to offer PHP

Filed Under (General) by Wenbert on 22-05-2008

Tagged Under : , , , ,

Morph Exchange brought by Mor.ph is “Software as a Service” company that provides applications spaces/platforms for Ruby on Rails and just recently Java. A few weeks back, I emailed them if they have support for PHP. And guess what, they have plans. But they did not give me a specific time when the release date will be — no month, no date, no year — they just said that they will.

But my sources tell me that Morph will release PHP support early next year. Next year is such a long wait! They better come up with PHP support faster or else (*ahem* maybe)  Heroku will beat them to it. PHP has such a wide user-base that whoever comes up with something for it like RoR appspaces in Morph will have lots of users jumping into their band-wagon. Most advanced PHP developers are not satisfied with shared-hosting shit. We want appspaces and SVN. Manual FTP upload is dead.

More info on Mor.ph

PHP: Ternary Operator

Filed Under (General) by Wenbert on 07-05-2008

Tagged Under : ,

This is one my favorites. Sometimes I use this when I come around “short” conditions in variable assignments.

Instead of writing something like this:

if (isset($myvar)) {
    $temp = $myvar;
} else {
    $temp = ""; //i need to set this so that when I echo this, it won’t show a warning
}
 

I just use the ternary operator to make things easier to read by doing something like this:

$temp = isset($myvar)?$myvar:"";
 

Now how does it work? Basically, the syntax of the ternary operator is:
(condition)?“execute when true”:“execute when false”
On first glance it is confusing to use, but trust me, once you get the hang of it, you won’t be able to live without it.

Twitter moving away from Ruby on Rails

Filed Under (General) by Wenbert on 03-05-2008

Tagged Under : , , ,

This may be a rumor but Techcrunch is reporting that Twitter is abandoning Ruby on Rails due to scalability issues. I have read a few articles a while back about people having problems with Ruby on Rails. That is why I was hesitant to use Ruby on Rails on some of my web applications (at work). I was hesitant so therefore never got to learn it either :(.

Personally, I think the best bet for Twitter would be PHP. I would trust my life with PHP especially with scalability issues. Techcrunch also mentioned Java, but I doubt Java would work for them. Remember Friendster a few years back? It had a .jsp extension in the URL and it was almost unusable. It was dead slow.

The developers of Twitter can write a clone within a day and no one would notice it. They should use the Zend Framework :P Anyways, PHP is FTW!

Mixing PHP Variables with Javascript (Zend Framework and jQuery)

Filed Under (General) by Wenbert on 23-04-2008

Tagged Under : , , ,

Sometimes, I “mix” PHP variables with Javascript. For example, I have something like this:

function deletenote(notes_id)
{
    if(!confirm("Delete note? You will not be able to undo this action."))
        return false;
       
    $.post("< ?=$this->baseUrl?>/notes/deletenote",{
        skeedl_notes_id: notes_id
    }, function(data){
        $(‘#notes_id’+data).fadeOut();
    });
}
 

Note the $this->baseUrl, I need it like that. The thing is, I have that code in my view file (.phtml - in Zend Framework, this is parsed like a normal PHP file). If I remove all the Javascript in my view file and place it in a .js file, the $this->baseUrl will not be parsed. To go around this, I create a hidden input element in my view file with $this->baseUrl echoed as the value. Like this:

// … in my .phtml file …
// rest of the php and html code goes here
< input type="hidden" id="base_url" value="<?=$this->baseUrl?>" />
// rest of the php and html code goes here
 

Then in my .js file (assuming that you have already included this file in your header, or somewhere else), I have something like this:

//This is using jQuery, but you can use document.getElementById(’base_url’).value if you like
function deletenote(notes_id)
{
    if(!confirm("Delete note? You will not be able to undo this action."))
        return false;
       
    $.post($(‘base_url’).val()+"/notes/deletenote",{
        skeedl_notes_id: notes_id
    }, function(data){
        $(‘#notes_id’+data).fadeOut();
    });
}
 

What it does is that jQuery gets the value of the hidden form element. The value of this hidden form element is from a PHP Variable.

So there you go, nothing special. Just something that works and I just wanted to share.

Serialize + base64_encode an array for a URL

Filed Under (General) by Wenbert on 08-04-2008

Tagged Under : , , ,

This is an alternative for passing a bunch of variables in the URL. The variables might be used for pagination scripts (limits, etc.), sorting results and others. Generally, I would do something like this for sorting results:

http://localhost/reports?sort=username&sort_type=ASC
 

That will work fine in most cases, but what if you had tons of variables to pass to the URL? Using POST would be out of the question since I would want that URL to be “bookmark-able”. So the answer would be serialize() and base64_encode().
I would do something like this:

//In this example, I am using Zend Framework. The code below is found in my Controller file.
$myOptions = array();
$myOptions[’sort_by’] = ‘username’;
$myOptions[‘display_images’] = ‘yes’;
$myOptions[‘edittable_fields’] = ‘no’;
$x = base64_encode(serialize($myOptions));
$this->_redirect(‘/report?q=’.$x);
 

On the “receiving” end of the applications — the View file, I would do something like this:

if(isset($_GET[‘q’])) {
    $q = unserialize(base64_decode($_GET[‘q’])); //this reverses the serialize() and base64_encode()
}
Zend_Debug::Dump($q); //This would output the same array (found in the Controller file) with the values included.
 

This is useful in many other scenarios. And this is just one way of doing it — pretty dirty in my own opinion but the serialize() + base64_encode() method has saved my neck more than once.

If you have a better solution, then please post a comment below. ;)

Reading and Writing Excel Spreadsheet Using PHP

Filed Under (General) by Wenbert on 04-04-2008

Tagged Under : ,

I have blogged about Microsoft Excel and PHP within the past few months. So a post from Zend Devzone caught my attention. It is about reading and writing into Excel using PHP. You can add a spreadsheet, do calculations and more. Click here for the post.

What Zend_Layout looks like. An overview. Code sample. No explanation :P

Filed Under (General) by Wenbert on 10-03-2008

Tagged Under : , ,

I wrote this for myself instead of for other people. The short code example is found on my wiki (http://wiki.ekini.net/main/Zend_Layout).

Basically it is a very quick overview of Zend_Layout — more like a cheatsheet (again). The bootstrap, the directory structure and then the controller and view part. It is based on Akrabat`s Zend_Layout Tutorial.

Get number of hours between 2 timezones

Filed Under (General) by Wenbert on 21-02-2008

Tagged Under : ,

Here is a way to get the number of hours between two different timezones. I am sure that there is a way of doing it shorter and faster, but I did it this way for readability purposes.

date_default_timezone_set("America/Kentucky/Louisville");
$lex = date("Y-m-d h:i:s A"); //Lexington, USA
echo "Lex: ".date("Y-m-d h:i:s A")."\n\n";

date_default_timezone_set("Asia/Hong_Kong");

$cebu = date("Y-m-d h:i:s A"); //Cebu, Philippines
echo "Cebu: ".date("Y-m-d h:i:s A")."\n\n";

echo (strtotime($cebu) - strtotime($lex))/3600 ." hours difference between the timezones";
//3,600 seconds is 1 hour

List of suported timezones in PHP can be found here:
http://us2.php.net/manual/en/timezones.php

Subscribe to Rss Feed : Rss