crribs.com

the blog of brad shultz, systems design, RETS development, ETL, scripting, and windows task scheduler stuff

Archive for the ‘Text file processing’ Category

Microsoft Text Driver Taps out at 255 Columns

without comments

At work, a main component of my job involves daily maintenance of 200 or so scheduled vbscript’s that maintain servers, generate daily data feeds, process images, transfer files, and do the bulk of our ETL processing (retrieval, normalization, etc.).  Because many of the imports consist of relational files that we need to combine into one, file- an internal schema we use in an attempt to standardize loading into the db- we use the Microsoft text driver to query the text files (with the text driver, you can query delimited text files with a subset of SQL, so you can join multiple files together and aggregate relational data).

The text driver requires a file named “schema.ini” to be placed within the directory holding the files you want to query.  As you guessed, the schema.ini specifies the schema of the file.  You set values like character set, if the first row has headers or not, and the length/type of the headers.  But you are SOL if you have a .csv file with more than 255 columns.  I haven’t found any really good solutions yet (actually, I’ve used some pretty terrible ones).  I have actually resorted to firing off a DTS package in the middle of a script that trims off unneeded columns and outputs a new file, with less than 255 columns.  Short of looping through the whole .csv, splitting every line by the delimiter, and reassembling a new file based on the resulting array, I can’t even really think of any good solutions.  One promising lead is the free component for working with .csv files from Chilkat (http://www.example-code.com/vbscript/csv_read.asp – it can be downloaded for free on this page- the download link is the little peach-colored down arrow icon about two inches from the top of the page).

I’m thinking, though, that it may be time to ditch vbscript in favor of Powershell, which has an ‘import-csv’ commandlet.  It takes a .csv file in memory, and creates an object for every header item.  It works really well, in my limited testing.

Written by bshultz

February 3rd, 2010 at 4:03 pm

CsvEasy – A Killer App for .csv files…

without comments

Just found an outstanding program for opening/editing/??? .csv files.  CSVEasy - http://www.tizma.com/csveasy is an AWESOME program for opening csv files, from small to large.

CSVEasy Screenshot

CSVEasy Screenshot

I can’t get over how great this program is.  If CSVed chokes on the file, CSVEasy may be your answer!!  Although lacking some of the features that make CSVed great, CSVEasy pretty much fills in all the gaps.  Between CSVed, CSVEasy, and Excel 2007, you shouldn’t ever have trouble with .csv files again!!  Top notch app.

CSV Files. A blessing and a curse.

without comments

In my line of work, I deal with a lot of .csv files (we refer to them as .csv – comma seperated value – files, when in actuality they may be tab or pipe seperated, quote delimited, and/or some other random variation or mixture of tabs, quotes, commas, pipes, delimiters, etc.).  They can be a nightmare if they are given to you misformatted (in a pipe, i.e. ‘|’ , seperated file, if the text is not scrubbed for pipes before it is formatted, you end up with an invalid file (aka a file not able to be parsed by DTS, SSIS, ODBC Text drivers, Excel, etc.).  I have found a few tools that are very helpful when you are working with .csv files.

1.  CSVED – (download from) http://csved.sjfrancke.nl/

CSVED is the go-to-guy (next to Excel) in terms of CSV editors.  Although a low-key app, its free (!) and integrates really well into the Windows right-click context menu.  It gives you all sorts of options, from deleting columns from the file, to exporting the .csv in XML.  Absolutely top-notch freeware program.

2.  Excel – (download the trial from Microsoft.com)

Not freeware, but Excel does do a great job at opening .csv files (and fixed length, too!).  The trick is to paste all your data into the first column, then select “Text to Columns” and tell Excel about the file it is opening.  If you are using Excel 2003 or older, you are limited to around 56,000 rows.  Excel 2007 will let you open a .csv file with up to a million rows (very helpful).

Written by bshultz

December 1st, 2009 at 4:23 pm