1 08 2011

by Decently Exposed

Today I’m going to play a sort of hopscotch on the computer in order to transform data available in an Excel spreadsheet into a MySQL table and then convert that table into XML.  Some of you may question the wisdom of such an undertaking. Amazingly, Windows and its related technologies still evoke a lot of antipathy among PHP Developers.  But, that said, imagine going to a job interview and discovering that the perspective employer uses Excel spreadsheets that must be regularly converted for use by MySQL   How would you help out the company? Perform the task manually? That might prove less than efficient.  Automation is the keyword here; so, in the spirit of  “the right tool for the right job”, let’s proceed.

In order to solve this multi-faceted problem speedily, I use a very mixed bag of technologies, including the bundle Wamp Server2 (Apache, MySQL, PHP),  COM,  Excel 2003 and its built-in version of VBA (Visual Basic for Applications), DOS, including the primitive editor edlin and XML.

Let’ start with an Excel spreadsheet where each row contains a person’s first and last names, gender and occupation. The first row of the spread sheet contains labels pertaining to these various types of data. To make the data usable for MySQL, it must be converted into a CSV format (comma-separated value) which signifies that each field is marked with a comma and each record ends with a newline character. You can manually do this with Excel 2003 by simply clicking SaveAs and then selecting the CSV format from a drop-down list. But,  surely there has to be a better way and there is! In fact, we can do this task and subsequently load the CSV data into a MySQL table in one fell swoop.

The following links I found helpful in devising the PHP script below:

// perform conversion with positional params
$excel = new COM('excel.application');
$excel->Visible = 0;

$wkb = $excel->Workbooks->Open($argv[1]);
// hide column labels
$worksheet = $wkb->Worksheets('sheet1');
$excel_cell = $worksheet->Range('A1.D1');
$excel_cell->value = '';

$xlCSV = 6;
// save in CSV format --
$wkb->SaveAs($argv[2], $xlCSV);

//close and free resources
$excel->ActiveWorkbook->Close(false); // kills excel process

// now convert .csv -> mysql table
$link = null;
$host = 'localhost';
$user = 'root';
$pword = '';
$dbname = 'csv_db';

$link = mysqli_connect($host,$user,$pword,$dbname) or die('cannot connect :(');

// using php5.3 nowdoc syntax:
// mysql seems to prefer backslashes
$sql = <<<'EOD'
LOAD DATA LOCAL INFILE 'C:/Documents and Settings/Sharon/Desktop/test4discard.csv'
INTO TABLE tbl_name
LINES TERMINATED BY '\n' (fname,lname,gender,occupation);
// update database by emptying the table and then replace with fresh data
mysqli_query($link,"TRUNCATE TABLE tbl_name");

As you may surmise, this is one script that is most suitable to run on a command-line rather than through a web browser. After a little experimenting, I found that the best CLI tool is php-win.exe .  You can find it in Wamp Server’s PHP bin directory for php5.3.5. It is fast!

Of course there is one more thing left to do and that is to convert the data to XML. MySQL has a convenient utility that you can find it its bin directory entitled mysqldump.exe in Wamp Server. This dandy utility when given the necessary parameters will convert a table’s data into XML with great speed. But, who wants to be troubled with recalling parameters? We can write them out once if we create a .BAT file in Windows/DOS.

Creating the .BAT file was incredibly easy using the primitive editor edlin that is still in the Windows/DOS of XP. edlin is to DOS sort of what vi is to linux, except vi is far more powerful. edlin is a way to edit on the fly and has a limited set of one letter commands, such as l (list), e (end) and q (quit). Here’s my .BAT file:

@echo off
C:\wamp\bin\php\php5.3.5\php-win com4x2.php "C:\Documents and Settings\Sharon\Desktop\%1.xls" "C:\Documents and Settings\Sharon\Desktop\%1.csv"
C:\wamp\bin\mysql\mysql5.5.8\bin\mysqldump -uroot --xml csv_db > C:\wamp\www\exp\csv_db.xml
echo on

Note: Using positional params in .BAT file, too!

Previously I had a ‘-p’ after ‘-uroot’ which created an unnecessary speed bump. As the .BAT file was running, it halted to ask me to supply a password. Since there is no password, I had to hit the ENTER key. By removing that ‘-p’ switch, the program runs smoothly now without stopping to ask for a password.

The .BAT file which I named com4x.bat needs to execute in a terminal window, namely a DOS Command Prompt so that we can provide the .BAT file with a positional parameter representing the name of the Excel spreadsheet, for example:

C:\wamp\www\exp> COM4X.BAT test4discard

Note – to re-test the code, you may wish to delete the CSV file . Of course, instead of manually doing that you could amend the above PHP script to do that for you! 🙂

This work is licensed under a Creative Commons License




Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: