Reduce fields read on Merged datasets.
When input files need to be
merged and they are already sorted by a common key, a great deal of
space can be saved by merging them using a data view. The downside of
merging data held in different files is that subsetting cannot be
performed against combinations of data held in different files until
those files are merged. Usually this would not cause an undue amount of
extra processing, but if a high number of additional fields need to be
read from each file, it would be advantageous to be able to merge
within the same step.
For example, the Type 63 -
VRUF2 IOLAIRE file consists of a single line per account, each having
up to 100 products held in a repeating portion of the account. If these
accounts needed to be subset to only include live accounts, then it
would need to be merged with another file to add on the live / final
indicator. These files could be merged efficiently (in terms of space)
by using a data view, but all of the repeating sections of the product
section would need to be read regardless of whether the account was
live or final.
A more desirable way of
merging the files would be to read the two files in a single step using
two infile statements, as in the following example ;
Data Accts(Drop=T63Acc Active Offset LivFinal);
Retain T63Acc;
Infile GenBase;
Input @28 LivFinal $1.@;
If LivFinal=’L’;
Input @1 DistAcc $10.@;
Infile Type63 End=EOF63;
Do Until(T63Acc>=DistAcc or EOF63);
Input / @3 T63Acc $10.@@;
End;
If T63Acc=DistAcc;
Input @71 Active pd2.@;
Do OffSet=77 To 77+(Active-1)*24 By 24;
Input @(Offset+10) ProdCode $6.@;
Output;
End;
In the above example, the
generic base file is searched for the first live account. Once found
the Type 63 file is searched within the Do Until ... End loop until the
same or next account is read. The rest of the record is only read if
the district and account codes match. The double ampersand on the input
statement for T63Acc is required to hold the input line pointer at the
current place in the Type 63 file ready for the next loop. If that
district / account does not match the account returned from the generic
base file, then it may match the next record, so the value needs to be
retained and the line pointer position held. Since the double ampersand
is used, the ‘/’ (read next line) operator also needs to be
used.
A more sophisticated example
is shown in the example below. In this example, accounts were
read from ten generations of the Call Guard detail files. For an
account to be included in the output file, it had to be live in all ten
generations, and either the account number needed to be defined in the
$KEEPAC. format or the outer part of the postcode needed to be included
in the $OUTERP format. Once matched, the total minutes from the type 10
segment (a repeating segment) needed to be calculated and the result
held in an individual array element per generation (i.e. per month).
The same principle applied
above can be used, although the code is a little more complicated since
ten input files are read in the same step. However, since the same
principle is applied to each input file each of which has the same
structure (almost !), a macro can be used to reduce the code listed.
Unfortunately some additional care needs to be taken since the last two
generations have a slightly different record layout. The code for this
step is as follows ;
%Macro
Grabit;
Data Storage.CallG(Drop=NumT10Sg Offset LivFinal ThisAc1-ThisAc9 Mins);
Array totmin(10)
8;
Retain
ThisAc1-ThisAc9 '
';
Infile
CallG10;
Input @36 LivFinal
$1.@;
If
LivFinal='L';
Input @1
DistAcc
$10.
@19 PostCode
$4.@;
If
Put(DistAcc,$KeepAc.) or
Put(PostCode,$OuterP.);
Input @37 NumT10Sg
pd3.@;
Do Offset=57 To
57+(NumT10Sg-1)*44 By
44;
Input @Offset Mins pd7.2
@;
totmin(10)=Sum(totmin(10),Mins);
End;
%Do MonRef=9 %To 1 %By
-1;
Infile
CallG&MonRef;
Do
Until(ThisAc&MonRef ge
DistAcc);
Input / @1 ThisAc&MonRef
$10.@@;
End;
If
ThisAc&MonRef=DistAcc;
%if
&MonRef<3 %Then Input @35 LivFinal
$1.@@;;
%if
&MonRef>2 %Then Input @36 LivFinal
$1.@@;;
If
LivFinal='L';
%if
&MonRef<3 %Then Input @36 NumT10Sg
pd3.@@;;
%if
&MonRef>2 %Then Input @37 NumT10Sg
pd3.@@;;
%if &MonRef<3 %Then Do Offset=56 To 56+(NumT10Sg-1)*44 By 44;;
%if &MonRef>2 %Then Do Offset=57 To 57+(NumT10Sg-1)*44 By 44;;
Input @Offset Mins pd7.2
@@;
totmin(&MonRef)=Sum(totmin(&MonRef),Mins);
End;
%End;
Run;
%Mend;
%GrabIt;
In this example a different
value of district and account needs to be maintained for each input
file, hence the variable names THISAC1 - THISAC9 defined by macro
statements. The district and account record is only output at the end
of the data step, so if it is not found or is not live, at any point in
the 10 GDG generations, then it will be deleted.