Add-Innovation Home

About Add-Innovation

Contact Us

 SAS Institute logo.    SPECIFYING VARIABLE LENGTHS.

Specifying the length of variables can save a great deal of space in an output dataset. For character variables (minimum length 1 maximum 200) this is usually obvious. For numeric variables (minimum length 2, maximum 8), it may not be so straightforward. The following table shows the number of significant digits retained and the largest integer represented exactly for different length numeric values. Note that floating point values are more complicated. 

Length in Bytes

Significant Digits Retained

Largest Integer Represented Exactly.

2

3

4

5

6

7

8

2

4

7

9

12

14

16

256

65,536

16,777,216

4,294,967,296

1,099,511,627,776

281,474,946,710,656

72,057,594,037,927,936

So if the value you need to store is less than 72,057,594,037,927,936 (like Number of Minutes), consider using a smaller length.

Note however, that the length of numeric variables returned from PROC SUMMARY, MEANS, and FREQ is always 8 bytes regardless of the length of variables on the input dataset.

Because the minimum length of numeric variables is 2, it is more efficient (with regards to space) to hold “boolean flags” as a character value. E.g. Rather than using “If TotRev < 10 Then Small=1;”, use “If TotRev < 10 Then Small=’Y’;”.

When creating character variables which are the first few characters of an existing variable, set the new variable to the required length, and assign the new variable to the old one - the function SUBSTR is not required. This is more efficient for two reasons. Firstly, if the new variable was assigned using SUBSTR (e.g. DISTRICT=SUBSTR(DISTACC,1,2)), then the new variable inherits the length of the old one (in this case if DISTACC was of length 10, DISTRICT would also be 10 characters rather than two and the remaining 8 bytes would be wasted). Secondly it is also more efficient in terms of CPU.

Similarly if only comparing the first few characters of a string with a known value use the ‘=:’, ‘>:’, ‘<:’, ‘IN:’ comparison operands rather than using SUBSTR. These operands specify that the characters to be compared are from the smallest string (e.g. IF ‘CL12345678’ >: ‘LN’ and IF ‘LN’ <: ‘CL12345678’ are equivalent).  The added ‘:’ symbol is known as the  “colon modifier”.