The PODA Blog

News, views and articles from our membership

Archive for April, 2007

Text Data Sources in MS Access (part 1)

Posted by Tim Critchley on 24th April 2007

Introduction

In the first of a series of discussions, I’d like to share some of my recent experiences of interrogating flat text files. In an age where data management is a zen-like discipline, the prospect of having to work with file data-sources can create a distinct sense of unease within many professional developers.

However, coming from this mind-set myself, I found myself thrown into a project where I had no alternative but to evolve a set of practices which I now believe can enable conscientious Access developers to confidently engage what is a common reality for many businesses.

I was staggered to discover during the project, that many tier 1 investment banks trading billions of dollars a day, commonly use flat text files to internally expose their trade data for such trivialities as “front office”/”back office” daily trade reconciliations. In the particular project I was involved with, this information was then processed in an existing Access database, which I shall generously refer to as “rustic” in both design and performance.

That engagement provides the perfect backdrop to this discussion; it also presented me with a unique opportunity to discover the challenges of working with humble text file – and to learn some fascinating techniques along the way!

Old Dogs

The very first hurdle I had to both recognise as such and overcome, was the answer to:

why would anyone use text files to supply business-critical proprietary data?

Coming from a relatively “purist” development ethic, this took a while to sink past a few barriers!

The fact is, that many investment banks depend on long-established and monolithic trading systems – burgeoning DOS-based behemoths originally scripted sometime in the 60’s, that harbour a brooding contempt for modern concepts such as integration and modularity.

Allowing for an element or two of poetic license, you get the idea. They’re old, and in many cases, they’re supplied by 3rd party vendors with a monopoly in the market. So as anyone who’s built an IT system knows, you don’t want Tom, Dick or Harry poking around willy-nilly in its the inner sancta, and the first lesson you learn (having suffered your first corrupted database), is to make it utterly inaccessible to its users excepting they use the prescribed tools. That’s reason number 1.

Reason number 2. Even where you provide read-only access to your data, if enough people start hitting your database, its performance is going to start taking casualties. And you don’t need me to tell you that when a system is mission critical to facilitating the core of a business, the risks posed there far out-weigh any benefits proposed by just about anybody.

Lastly - and you may have spotted this at the start of this review - old systems are worse than old dogs at learning new tricks. Security patches? Forget it!

So essentially all we’re left with are flat text files. Ugly perhaps, but possessing undeniable beauty in their simplicity.

New Tricks

Fine, so I’ll use text files, and I’ll ditch them the very first chance I get!

Or that was the gist of my first thoughts anyway. Cue my next lesson.

Using a file data-source requires a different mind-set to the less eccentric and more popular RDBM systems. And if you’re not familiar with it already, booby traps aplenty lurk in the shadows of over-confidence. So let’s take a crash course through the department of the bleedin’ obvious!

If you’re building a data system with text files at its source, for any persistent purpose at all, new text files must be supplied to it at regular intervals. Obvious isn’t it; open and shut case – next? Look again.

I estimate this and related aspects of the kind of system we’re talking about here, required about one third of my entire development time for the project. Some might say I went too far with some of it. I’d put money on a wager that says “cut just one corner here, and your version will fail – catastrophic options available!“.

I plan to use subsequent instalments of this discussion to address specific technical solutions to the foundations of this component. But first I’d like to prime you with some clues as to the challenges that gave birth to them:

  1. Content dependability
  2. Accessibility strategy
  3. Updatability pit-falls
  4. Chronology
  5. Data source topography
  6. Performance
  7. Concurrency

Maybe I can’t promise any prizes for correctly predicting all these sub-plots in advance, but comments offering related experience are very welcome!

Posted in Access, Office (All) | 1 Comment »