Return to main menu
Unit Record (and Sequential Processing) Data Flowby Ed Thelen - updated April 17, 2017
the following should be checked by a real data processing expert ;-))
This is a developing web page intended to provide an overview of how data used to flow through a data processing department in the 1900 through 1990 to handle a variety of problems, such as generating pay checks, bills, bank statements, and other common situations.
I will call these "traditional functions".
Several points are intended:
- There is no mystery to performing the "traditional functions", just plain straight forward, relatively simple logic.
- Oddly, the data flows follow similar paths in the above (and many other) situations.
- The data flow of the "traditional functions" is also shown through a stored program computer (such as a 1401 system) using serial (sorted) records.
The actual data flow is substantially identical in a computer even though several "unit record" machines are replaced by the computer, .
Warning, for "simplicity", some interesting, useful details are left out.
For a first pass, the following is complex enough already yet!
This letter discusses long card sorts.
Operations, from Grant Saviers - Aug 20, 2022
There is the concept that business data "flows" through various stages of data processing, from data entry, sorting, through to the finished result - such as pay checks, bills, inventory, bank statements, ... There two basic kinds of data:
- Short term data, such as current electric meter reading, hours worked this week, ...
We shall call this "Current Detail Data".
- Long term data such as customer number, customer name, customer postal address, pay per hour or cost per KWHour or ... Some data accumulations such as tax totals, total pay this year, ...
This data is needed each billing, payroll, ... cycle.
We shall call this "Master Data".
Customer or employee number
To get these finished results, you can use sequential processing (on sorted data records). By sorted I mean that say each customer has a unique customer number - this sounds heartless, but helps solve the question of how to uniquely handle three "John K. Smith" customers or employees.
I have long wished to find a "vehicle" so show that the data flow, from data entry and verification, sorting, through to final result
- be it payroll, billing, inventory, bank account processing, ...
in unit record and also magnetic tape based data processing (such as IBM 1401 system) is remarkably similar.
New data, say the electric meter reading, is processed against a master file (containing say the customer name and address and the previous meter reading and ) to print electric bills for thousands of customers. Other new data, say the records of payment, can also be processed against the same master file to print customer account information.
Also the maintenance of the sorted master files. The customer may have moved locally need address change, and also handle new and dropped customers. The methods used in maintenance is similar to the methods shown in Figures 1 and 2. As is reconciliation of errors and mistakes.
If you know one system above, you know the basis for all :-)). The data names change, but the techniques remain the same.
This book, 17 megabytes, (offered by Stan Paddock) at least has a ( too detailed? ) data flow diagram for "Accounts Payable" in chapter 6 "Case Study", pages 82 and 83.
Here is a Data Flow of an Electric Company Billing Operation
Figure 1, Data Flow of "Traditional Functions" though Unit Record equipment.
Oddly, the same general data flow is used for much of sequential business data processing, including the above mentioned
- payroll, billing, inventory, bank account processing, ...
only the types of data are different.
Details in sequential order
- Source Documents - "Electric Meter Readings" - -
- IBM 026 Keypunch
- IBM 056 Verifier
- Detail Cards
- IBM 083 Sorter
- Sorted Detail Cards (or magnetic tape records)
- Sorted Master Cards (or magnetic tape records)
- IBM 077 Collator
- Merged Cards
- IBM 602 Calculator
- Calculated Merged Cards
- IBM 402 Accounting Machine
- Printed Output - "Electric Bills"
- Old Cards
- New Sorted Master Cards (or magnetic tape records)
- a Sort War Story - not with IBM equipment
After replacement of much unit record equipment
- IBM 083 Sorter(s) (maybe keep one for utility use)
- IBM 077 Collator(s)
- IBM 602 Calculator(s)
- IBM 402 Accounting Machine(s)
- racks and racks of plug board programs for the above 077, 602, 402 units
with an IBM 1401 w tapes system, the data flow is remarkably similar, but the 1401 system replaces much equipment, and card handling, especially handling during sorting.
Figure 2, Data Flow of "Traditional Functions" using a stored program computer with magnetic tapes.
Added Details in sequential order
Section Describing various functions and details
Source Documents - "Electric Meter Readings" - -
These are the "source documents" for this particular "run".
In this case they are the scrawlings (actually careful hand printing) made by the meter reader who visits (or used to visit) your electric meter each month. (Automation, with electric meters which broadcasts your meter reading, is replacing the meter reader who used to excite your dog each month.)
The meter reader hand prints the current electric meter reading onto forms which already contain:
- The customer's account number and street/building address and hints where to find the electric meter.
- The customer number is for later use by the accounting department.
IBM 026 Keypunch
The keypunch operator was always a woman - except for when I keypunched for a troubled insurance company during a college Christmas Break. The company was that desperate to get a particular "job" done.
She is seated at a leased IBM 026 keypunch, most likely one of many. IBM leased (did not sell) equipment. (For $15 less a month, a company/government could lease an IBM 024 keypunch, but that did not have a printer to print the meaning of the holes.)
There is an account number field, a detail value field (in this case watt_hours).
Each keypunch had a format card to help the operator punch into the required columns (called fields).
These women were fast and accurate - at least 3 times faster than when I got paid to do it.
IBM 056 Verifier
Another "keypunch girl", sitting at an IBM 056 Verifier, reads the same document and "keypunches" the data into the same detail card as the "girl" at the IBM 026.
Actually, the 056 compared the holes punched into the card with the keys being press by the 056 operator. If the holes in the card matched the key strokes by the operator, everything matched and the 056 cut a little semicircular notch in the 84th column of the card, at about the row 3 position. Since the 84th column was about 1/2 not present, the notch was on the trailing edge of the card, easily visible in a deck of cards.
If there was a mis-match ???
Any card in a deck that does not have the "verification notch" was clearly visible.
This procedure helped assure that two sets of eyes and hands agreed about what was on the source document.
An error on the source document or the detail card usually causes customer ill will, company/government embarrassment, and expensive labor to correct. "Garbage in, garbage out" is to be avoided !!!
The newly punched "Detail Cards" are gathered together to be sorted. Each card contains at least two "fields" (groups of columns) of data
- A "sort key", most usually the customer/employee number. In later processing this helps gather the customer's detail information to be adjacent to the customer's other data, such as billing rate and address, some historical data such as past due bills, totals such at energy used year_to_date, ...
- the detail data, in this case the current electric meter reading
IBM 083 Sorter
An excellent article by Ken Shirriff, http://www.righto.com/2016/05/inside-card-sorters-1920s-data.html
The IBM 083 sorter (other potential sorters are 082 and 084) sort the cards by reading a single column and placing the card in the pocket identified by that column, 0 through 9, two "zone" rows ( 11 & 12, used for alphabetics and special characters), and a reject pocket for blanks or rejects or errors.
There is a definite ritual to sorting in these machines.
A bad joke - How was T.J. Watson buried? Face down, 9 edge in.
And that is how you insert cards into most hoppers.
Lets assume the customer/employee id is all numeric. (Easier for everyone that way)
You start sorting with the least significant digit, and in further passes sort on more significant digits. If the cards are taken from the output hoppers in the correct order and properly handled, you eventually get a deck of cards sorted from lowest customer to highest.
Sorted Detail Cards (or magnetic tape records)
Alright! You have completed gathering, keypunching, verifying, and sorting your input (detail) information. Lets hope no garbage got in so that the confusion and labor of "reconciliation" is avoided!!)
Sorted Master Cards (or magnetic tape records)
By "Master" data we mean that data that is long term, hopefully your customers are relatively long term. This includes name, address, historical data such as previous meter reading, electric rate, special discounts, ... as well as the customer to match up with the detail information.
IBM 077 Collator
This fascinating machine can compare the values of fields from two card decks. It can do electro-mechanical compares of two fields for greater_than, equal_to, and less_than, and make decisions, based on its plug_board wiring of what to do.
If the plug_board is properly wired for this, it can merge two sorted decks together. It can place the customer master card(s) first, then the customer detail card(s), into the output deck, so that the all the customer information is together and can be processed by some machine further down the line, such as an IBM 402 accounting machine.
It can also be wired so that it can detect if there is a customer detail card, but no customer master card. This situation is to be avoided !!
The deck is sorted by customer number, with in this case, the master card(s) first followed by the detail card(s). The IBM 077 was programmed (by plug_board) to eliminate detail cards with no master cards.
IBM 602 Calculator
None of the previous machines, nor following machines, can multiply or divide. In our situation we need to multiply the watt_hours by the billing rate.
A plug_board program can read the billing rate from a master card and multiply that by the watt_hours from a detail card, and punch the product into the same detail card.
Calculated Merged Cards
Here we are, everything ready to be processed by the accounting machine.
IBM 402 Accounting Machine
IBM 402 & Trip Report, July 2010
This machine could read cards, accumulate sums, print data from cards (such as names and addresses) at up to 100 lines per minute, and print from the internal accumulators.
Ken (above) uses the very interesting descriptive sentence:
" Another important operation is to compare two cards to see if they have the same id (and should be counted together) or if they have different ids (so a subtotal should be printed and the counters reset)."
Data processing people use the phrase "control break" for this function.
The IBM 403 is similar in function and operation to the IBM 402. https://en.wikipedia.org/wiki/IBM_402 with a little handier multiline print optional control.
Printed Output - "Electric Bills"
Special preprinted forms could be used in the printer, such as for bills and pay checks.
Another machine was available called a "burster" to separate the continuous forms into individual documents. Yet another machine was available to remove carbon paper for "multi-part" documents such as an "original" and two "copies".
You could store these in long term storage (the mountain) say for seven years to satisfy government requirements.
New Sorted Master Cards (or magnetic tape records)
You may need to retain updated customer information such as electric usage year_to_date, pay year_to_date, FICA (Social Security contributions), ...
This can be punched (not shown) in a "Summary Punch" ( an IBM 514 will do ) under control of the IBM 402 accounting machine. These can be merged (not shown) back into the master cards.
In the case of magnetic tape records, no special equipment or passes are required. The program just writes a new master tape.
IBM 1401 - Sort (using magnetic tape records)
C24-3317-1_sort7spec.pdf - Sort7 Spec & Operating Procedures , in BitSavers
Sort 7 does not copy cards to tape. To do this, another program (not shown in Fig 2.) is used. This program is likely considered trivial, other than tape write error recovery. Standard I/O program, such as IOCS, can be used to handle the tape writing and error recovery.
(A quick (non-essential) word about placing more than one record (card) onto tape for operating efficiency. If you place only one card image per tape record or "block", you are wasting tape and running time. The key word is "record block" where maybe 8 or 10 card images are placed in one contiguous record on tape. See Block Layout in Wikipedia, page 12 of C24-3317-1_sort7spec.pdf - Sort7 Spec & Operating Procedures above, also see Magnetic Tape Capacity and the Effect of "Tape Blocking", and also from Stan Paddock, 10/26/2009.)
Sorting data records (cards) using a magnetic tape system is MUCH faster, and much less labor intensive compared to using the physical cards and a card sorter, such as an IBM 083.
IBM 1401 - Run (using magnetic tape records)
You may notice that the 1401 tape system replaces a number of (much slower) unit record devices. Since the 1401 system is so much faster than the replaced devices, multiples of each device can be replaced in medium and large size installations.
- Replacing the IBM 077 Collator(s), the merge operation using is almost trivially performed by the 1401 software as part of inputting Detail Data and Master Data. Exception situations (no corresponding master records) can be punched using the 1401 system's 1402 into a selected pocket.
- Replacing the IBM 602(s), any necessary arithmetic, adding, subtracting, multiplying, dividing, can be easily performed quickly in the 1401. If the optional multiply and divide features are not present, existing subroutines can be used.
- Replacing the IBM 402(s) accumulators, the 1401 easily handles any totalizing.
- Replacing the IBM 402(s) 150 line/minute printer, the IBM 1403 printer can print 600 lines/minute
- Replacing the IBM 514(s), the IBM 1401 write a New Sorted Master - to be used next time (an Electric Billing operation (run) is performed
Are you familiar with unit record data processing ??
If so, could I use you as a consultant/author ??
(I was more a sidewalk superintendent, never seriously wired a plug board,
and the only time I sorted cards was to straighten up a dropped deck.)
How it is done with the current relational data bases, I have no clue -
For all I know the traditional tasks are still done sequentially ?????????
Also I wish to expose folks to the joys of sorting
- on unit record equipment such as an IBM 083
- with a computer and magnetic tape storage
This letter discusses long card sorts.
While at the 2018 Vintage Computer Festival, in CHM, Henry Strickland asked about BIG card sorts.
IBM 1401 Docent Bill Worthington responds:
Punched card sorting was a long, tedious, and very important part of data processing (DP).
We had sorters at the bank that worked for as a 1401 programmer for four years before joining IBM for forty years. It sounds as though your mother and I did very similar work for IBM. The bank's trust department work was a punched card (unit record) operation for the four years I worked for the bank. They had 3,000 card file trays that were wheeled around when it came time to do any of the batch accounting for the trust dept. As part of the processing, a lot of sorting. The sorters they used had an extra card holder that was mounted just behind the sorter itself. Here is a picture of an IBM 083 sorter with the extra card stackers;
Image result for ibm 087 sorter
We have a sorter set up like this in Revolution which you can see next time you visit the museum. There is another that we use during the 1401 demonstrations on Wednesday afternoons and Saturday mornings. I'll be leading the 1401 demo this Wednesday at 3:00. Can you come see the demo?
An adroit operator would be able to keep an eye on how full the sorter's stackers were and he would reach into the stacker and pull out the cards that had fallen into a pocket. He would then put the cards into the rack behind the sorter. (I'm sure there times when the extended stacker filled up and the whole sort process would have been stopped until space was found for the cards. I never saw this happen.)
Now, to your question: sorter operators would leave a card on the top of the sorter threatening all sorts of unusual punishment if anyone disturbed the in process sort. When it got close to the end of a shift, the operator would try very hard to find a stopping point where the cards could be taken from the sorter and put into the extra card frame to be continued. If the operator needed a bio-break,, he would just stop the sorter and resume when he got back.
Depending on the workload, an operator would hand off to the next shift operator. It would be very straight forward since the new operator would be told which card column was being sorted and just be handed the baton for the next shift. The "run book" told what to do next. It was very systematic.
The key to remember is that the sorter sorts only one card column at a time -- a, then b, then c, then... a is the ones digit, b is the tens digit, c is the hundreds digit, ... It isn't difficult. The key being to not drop the cards and have to restart the sort. So, to sort by Zip-code would mean five passes of the card file. If we're doing payroll, we might have nine passes for the Social Security number. Hope this helps.
Aside: Note that I said "he" as the sorter operator. A case of cards (five boxes of 2,000 cards) weighed about twenty-five pounds) Ladies weren't supposed to deal with heavy things. So they were key-punch operators and did the data entry for the DP department.
a Sort War Story - not with IBM equipment - by Ed Thelen
Inspired by Michael Schuetz's presentation IBM 1401 howto: sort7 tapesort - R12.
"Long Ago", early 1960s, "and Far Away", eastern U.S.A.
I was maintaining General Electric Computer Division 225 systems.
Fortunately my direct customers were "scientific users" but I was frequently called in to help with bank customers "business users".
My theory then and now is that "anyone" can design/build a useful, reasonably reliable computer,
but few can design/build useful, reasonably reliable computer peripherals. The ex-farmers in up-state New York, employed by IBM, seemed to have that special talent !!!
What does the above have to do with sorting? Ah, magnetic tapes, being sequential, are useful in handling sequential data, such as payroll, billing, etc.
And sorting "random" input records, such as electric meter readings, into say customer or location sort order is quite efficient.
HOWEVER - the processing runs of thousands of customers can take a while, and magnetic tape drive reliability is a "very good thing".
Unfortunately, the OEM tape drives that GE used were from AMPEX, and designed (and satisfactory) for light duty - say to record a missile launch for a few minutes, occasionally. gory details.
AH, the point of the story was that J.C. Penny, a big retailer of household things, leased a GE 225 system with many mag tape drives. Soon we began hearing rumors that the big seven hour, nightly sorts were failing way too often.
Of course you can checkpoint the multi-tape sorts, but that is time consuming and a bother. It involves writing two merge tapes, and saving one off-line, just in case of a failure later.
In any case, J. C. Penny became another problem customer, and according to rumor, returned the equipment. (and likely spread more word about unreliable GE Computer equipment).
Operations, from Grant Saviers - Aug 20, 2022
My summer job in 1966 was to run a 1401 in a DP environment for the Westinghouse Mansfield, Ohio division that made "white goods" - refrigerators, washers, dryers, etc. The configuration of this machine was identical to the CHM restored machines. 1401, max memory, four 729 tape drives, 1403 printer, 1402 card reader/punch. No wires except power to the rest of the world. I was on a swing shift (awful). Operation was 24x7 except for Sunday swing shift which was IBM CE time.
The 1401 applications I remember were payroll, order receipt, accounts payable, customer invoicing, and closing the books on a weekly, monthly, and quarterly basis.
The 1401 was moved from Mansfield to the Pittsburgh Westinghouse Telecomputer center (TCC) when I started to work there. The plan was to move its set of DP applications to the pair of Univac 494's that were connected to the nationwide Westinghouse private TTY network and to near real time DP. That network poled all W sales offices and facilities and captured data "24 hour real time", mostly from ASR Teletypes. I think pretty advanced for the day. The physically ginormous Univac Fastrand dual drums (3x) were the 494 online storage, 132M 6 bit bytes each.
The order flow is approximately correct in the examples, but reality was much more complicated. Take payroll for instance as Pennsylvania as some egregious tax complications. Each school district has unique "mil taxes" applied to wages. Allegheny County, about equal to greater Pittsburgh, has 46 unique mil tax rates and they change every year.
Thus where every employee claimed as residence had to be factored into tax deductions. Plus state, federal, employee beneficiaries, and whatever legislated modifications to those was needed in more or less annually. Then hourly rates, dues, overtime, deductions, etc. All to be printed on checks in the right boxes.
Backing up in the process, employee time cards were read, punched and verified weekly (the "DP rules" were "all punch cards must be verified"). There was a sorter in the keypunch room (70+ women), but it was not used in the DP data flow. Weekly, punched cards were read to tape by the 1401, sorted sequentially into blocks, multi tape drive merged and then run against the employee master tax, wages, etc tape to produce a check file which was then run with check forms in the 1401, and some carbons in the forms. Of course live check handling had special security rules and every serial number had to be accounted for and printer jams retained for auditing. Concurrently, weekly data needed output on a tape, then merged with last weeks cumulative data, so yearly W2 forms and IRS filings could be produced.
It's important to realize that only sequential reference information and sequential data (eg by employee number) could be processed against each other when random access storage devices didn't exist. Sequential data processing also drove tape drive hardware designs to have the minimum sequential access time to the next record, which for a period of time was less than the random access time of disk drives, thus sustaining tape sort/merge data architectures. Likewise, CKD (count key data) and on disk variable length record formats were carried forward by IBM to preserve these record structures for customer's legacy sequential access data processing architectures.
Orders from retailers and distributors were matched by women to item cards held in tub files, IIRC one card for every item (eg 10 cards for 10 almond color 17 cu ft refrigerators) and placed behind the customer id card. I think some process had the cards available in the tub similar to what the factory had in inventory or could make. Our 1401 was all classic accounting applications, nothing re shop floor or MRP. I think we ran these card batches daily into summary daily tapes. Then we did weekly summary tapes and finally a monthly final set of tapes. Since customers entered orders at random times for random items, sorting and merging had to occur for each time period. At some point we had to reconcile what was ordered with what was shipped, but I don't remember the process to decide what the customer owed.
Accounts receivable and invoicing worked similarly until the monthly closing. Then the accounting green eye shade folks got involved in final reconciliation of dollars received vs bills sent. I recall spending almost an entire shift re-running daily and weekly summaries as they tried to find a few cent error in what was 10's of millions of dollars. It was a matter of honor to have perfect reconciliation plus big boss sign off was needed if not. One output file was the aged invoices that were not current or partially fulfilled.
A number of "systems analysts" in accounting developed the specific process steps, program requirements, and information flows based on the various data files and hardware capabilities. I think they were the big user of the classic IBM flowchart templates. Production programming was complete for the needed Mansfield related 1401 applications and 494 programming for these was a separate department. One 1401 operator was RPG programming literate so rarely wrote special programs needed by accounting.
Fortunately, the 729 tape drives had marvelously low error rates. After working with Bob Feretich (1401 volunteer) making the 729 tape emulator, I learned how that magic was achieved by IBM in the 1401 TAU. However, bad tapes did happen even though we frequently threw them out or cut off the first 30 feet or so. It doesn't take much tape to hold 1000 cards, even one record per card. We never wrote 800bpi tapes, only 556. The intrinsic backup was retaining the daily, weekly, and monthly summaries should an error be found later in the process. I think we had about 500 reels in regular use. Some summary tapes were duplicated and sent offsite for secure storage (plenty of old coal mines were available around Pittsburgh).
IBM CE's kept the system in good shape although a drifting 729 transistor that caused very intermittent writing of bad tape records needed the factory team and a few days to find and fix. At the beginning of each shift every peripheral was completely cleaned by the incoming operators. Over the July 4th long weekend the 1402 read about 3 million cards to transfer the tub files to tape for translation to the 494. Between 5 operators and several others bringing us cards, we kept the 1402 running essentially continuously 24x7 and had only one read check.
The other 1401 operators didn't like my getting jobs done in a fraction of the rated run time, so mid summer I was reassigned as a 494 programmer. Another eye opening experience for me on a strange 30 bit machine in a DP programming shop. Everything was written in their shop built assembler with lots of rules. My task was to write a faster 1401 to 494 tape label (all DP tapes had IBM on tape labels) and EBCDIC to Univac character translator. (Univac Uniservo III tape drives could read IBM NRZI tapes). CW was "Nobody can beat the speed of the TCC CTO's translater", but I did by about 50%. Then they found a programming rule that I violated and threw out my program even though tape translation was the biggest consumer of 494 time. Oh well, the end of my DP programming career - yea!
Some other trivia - since the TCC was 24x7 real time DP, power reliability and EMI protection were critical. As a newly minted engineer, that was interesting to me. It had a "building within a building design" so that the 494 system power was 100% uptime. Two main HV feed lines were connected for grid redundancy and backup generators were installed. All power circuits in the computer room, including lights, convenience outlets, etc. were on a separated system to prevent lightning transients from connecting to the computer cabling. That "interior building" was powered by a motor generator set run off the grid or generators. The motor ran at some incredible rpm which provided inertial energy and was magnetic slip clutch connected to the generator as the generators spun up, which then spun up the motor back to full rpm. It mechanically isolated the entire computer room from lightning since there was no grid connection. A 1966 few hundred KW UPS system.
Of course Westinghouse is no more except for the nuclear power part which was sold to Toshiba and resold by them. The brand name is occasionally used.
PS John Walker's fourmilab.ch site is pretty interesting. As classmates, he and I worked on the Case Univac 1107 before he founded Autodesk.
Grant Saviers Aug 20, 2022
My e-mail address is Ed@ed-thelen.org :-))
Updated April 17, 2017