Specifying the length of variables can save a great deal of
space in an output dataset. For character variables (minimum length 1 maximum
200) this is usually obvious. For numeric variables (minimum length 2, maximum
8), it may not be so straightforward. The following table shows the number of
significant digits retained and the largest integer represented exactly for
different length numeric values. Note that floating point values are more
complicated.
Length in Bytes |
Significant Digits Retained |
Largest Integer Represented Exactly. |
2 3 4 5 6 7 8 |
2 4 7 9 12 14 16 |
256 65,536 16,777,216 4,294,967,296 1,099,511,627,776 281,474,946,710,656 72,057,594,037,927,936 |
Note however, that the length of numeric variables returned from PROC
SUMMARY, MEANS, and FREQ is always 8 bytes regardless of the length of
variables on the input dataset.
Because the minimum length of numeric variables is 2, it is more
efficient (with regards to space) to hold “boolean flags”
as a character value. E.g. Rather than using “If TotRev < 10
Then Small=1;”, use “If TotRev < 10 Then
Small=’Y’;”.
When creating character variables which are the first few characters of
an existing variable, set the new variable to the required length, and
assign the new variable to the old one - the function SUBSTR is not
required. This is more efficient for two reasons. Firstly, if the new
variable was assigned using SUBSTR (e.g. DISTRICT=SUBSTR(DISTACC,1,2)),
then the new variable inherits the length of the old one (in this case
if DISTACC was of length 10, DISTRICT would also be 10 characters
rather than two and the remaining 8 bytes would be wasted). Secondly it
is also more efficient in terms of CPU.
Similarly if only comparing the first few characters of a string with a
known value use the ‘=:’, ‘>:’,
‘<:’, ‘IN:’ comparison operands rather than
using SUBSTR. These operands specify that the characters to be compared
are from the smallest string (e.g. IF ‘CL12345678’ >:
‘LN’ and IF ‘LN’ <: ‘CL12345678’
are equivalent). The added ‘:’ symbol is known as
the “colon modifier”.