Welcome to my online portfolio, the complement/substitute for my resume. The opinions included herein are my own and do not reflect those of any client or employer, past or present. Please check out the new site: http://danieljohnsonjr.com

Tuesday, January 30, 2007

Mining data from 3,000 separate Excel files

One of the consulting projects I worked on was for the finished product exports department at a global consumer products company. The company produces products that get shipped all around the world, but this particular group concentrated on North America.

Because of the North American Free Trade Agreement (NAFTA), the company was able to save millions of dollars each year by taking advantage of tariff exemptions for products shipped among the U.S., Canada, and Mexico. Prior to my joining the project, the company had discovered from their legal department that they needed to provide documentation that these products were indeed exempt.

The regulations describe raw materials that compose the finished products, so we needed to look at the constituent level for every product the company manufactures. I was informed that the company had close to 3,000 Excel spreadsheets for various products that listed the composition of each. Additional information would be forthcoming in the near future after I had started on the project.

I looked at the Excel files and discovered that the composition was not in a flat layout (headings and rows), and thus, I wouldn't readily be able to import the information into a database table. I would need to manipulate the data into such a layout and then import it into a database.

Instead of going brute force into all 3000 of these files, I opted to automate the process. I decided to build a conversion tool in Excel that would search for the relevant contents of each file and copy and paste them into a new spreadsheet in a table layout. I wrote VBA code in a standalone Excel workbook to do the job. In addition, I also wrote code to record errors and other anomalies in a log.

I also wrote some VBA code in Microsoft Access to launch the Data Conversion tool in Excel, along with some use of the Microsoft Office object interface. I put all the directories where the files were located into a Lookup table and had the program go through each directory, performing the conversion on each Excel file, importing each Excel spreadsheet's information to a master database table.

I wound up being able to click a button and watch as the program performed the conversion. I had set up a flag in the Lookup table to indicate whether files were imported successfully. Then I would check the log in Excel for anomalies, handle them appropriately and retry.

I'll share more about this project in future updates.

No comments: