Dynamic Arrays

25.06-98
F. Glaser

Dynamic arrays in Pascal

This seems to become a new FAQ especially with programmers who upgrade from C or VB to Delphi. It is a very basic article for beginners only, dealing with type, array and pointer subjects.

By Franz Glaser, Austria
http://geo.meg-glaser.at/tp.html
http://geo.meg-glaser.at
http://www.meg-glaser.biz
office@meg-glaser.biz

TP-links FAQ 1QA TP-memory information articles

Introduction

In Pascal, as opposed to most other high level programming languages, data structures are so important, that they get special attention to the compiler. Niklaus Wirth decided to make the same stringent checking of data as other languages make with the code (syntax) only.

Most upgraders from other languages feel that the Pascal type checking reduced their freedom in using data structures, but this is not true. Pascal only needs that the programmer writes the source code according to stringent rules, declaring the data structures thoroughly before using them.

The TYPE declaration of Arrays.

Data can be single numbers or strings, but often they are defined as structures (ASM86 nomenclature), as records and arrays. This is not the location where the particular structures are explained.

The type declaration does not reserve memory, it is only used to explain the compiler how the data are arranged in a record or array, or in a single variable, or in a set, or in a string.

Physical memory for data is reserved with the Var instruction (the global Var, not the Var in a procedure's formal parameter list). Data memory can be reserved with the Var instruction in the global data area (the DS: data segment) and in local variables within procedures, on the stack. To reserve data memory, the programmer must tell the compiler, which kind of data variable should be reserved. This is where the type declaration comes into effect.

Var Counter2 : Integer; {Integer is a pre-defined type}
Customers : Array[1..2048] of CustRec; {a user defined type}
Note that this is a combined declaration. While the compiler reserves space for the customers, it also gets aware of a new type "on the fly", namely an array of some records. This is not a good example of a Var declaration!

As opposed to "C" in Pascal any array MUST be declared as an ARRAY, before it is used. Arrays need not range from 0 .. nnn, but they can get any range, eg. -20 .. +100. If the programmer first declared an enumeration type, eg.
Type TDaysOfWeek = (Son,Mon,Tue,Wed,Thu,Fri,Sat);
this can be used for an array type declaration:
Type TWorkArr = Array[TDaysOfWeek] of TWorkRec; (whatever is TWorkRec);
Type TMonths = (Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec);
TDaysOfYearArr = Array[TMonths,1..31] of Twhatever;
Var DaysOfYear : TDaysOfYearArr;
...
Begin
DayWE := DaysOfYear[May,25];
.....
This should emphasize the practicability of Pascals way of type declaration. It is perfect for maintainability and self - commenting source code. And it is not a drawback!

What is an address

You should consider that in a running program the variables have no names. The names are known to the compiler at compile time only, the same applies to the type declarations. But this is nothing special, even the procedure and function names are not part of the executable code. For procedures and functions the compiler makes a code to JUMP to a particular address within the program, or more exactly, calls the piece of code, which returns upon completion to the address in the executable code stream where it was called from.

The same applies to data. The compiler simply makes a reference to a particular address in the data area, in most cases using a register. Example in Assembly language:

Var Price : Integer;

MOV BX,offset PRICE
MOV AX,[BX] ; The BX register points to the address of the price variable in the data area
With CPU's of the 8086 variety an address consists of a segment and an offset, but in many cases (global data area) the segment is the default DS: data segment. So it is sufficient to use the offset address alone, what I did here for simplicity.

The "offset" is the address of the variable, which holds the price, as fixed by the compiler. The source code names it "price", in the machine code it is "3456H" or wherever it may be placed.

The address of a variable is simply a number, the location of the variable in memory. It points to a particular byte. If the variable is bigger than one byte, the address points to the bottom byte.

Assume a record:
Type TPersRec = Record
                  Name : String;
                  Age : Integer;
                  ....
                End;
Var Person : TPersRec;

ASM
MOV   BX,offset Person
MOV   AX,Age[BX] ; get the Age-field, using indexed addressing with ASM
the same as:
MOV   AX,256[BX] ;assuming that the Name occupies the first 256 bytes of the record.
...
With indexed addressing the BX pointer remains pointing to the base address of the record.

Now with arrays:
Var Persons : Array[1..199] of TPersRec;
to get the person's age of # 73:
ASM
MOV   DI,73
DEC   DI ;adjust for [1..nnn]
MUL   DI,sizeof(TPersRec); not a true 8086 instruction, but used for demo here
MOV   BX,offset Persons
MOV   AX,Age[BX+DI]
....
This is not a very clever approach, but should display how an array is addressed in the running machine code.

Pointers

A pointer is a variable that holds the address of anything in memory.

Pointers in Pascal are bound to a particular data type, they "are of" the same type as the data types they point to. Pointers are variables similar to integers or words, having the same size (or double the size) but they do not contain a numerical value as usual, they hold an address in memory.

Pointers in Pascal even can hold the address of a procedure or function, then they "are of a procedure type". (Turbo Pascal special). But this is not dealt with in the following explanations, they deal with pointers to data.

You can consider a pointer as a means to hold a data address at runtime. It can point to different variables in turn, even to unnamed variables which are alive only for a particular time period.

Assume the PersRec defined above, now I make 3 distinct persons as variables:
Var Pers1, Pers2, Pers3 : TPersRec;
The program to fill data into the 3 records is not shown here. You can now access the data eg.
if Pers1.Age > Pers2.Age then ...

Var PPers : ^TPersRec;
PPers is a variable like a Byte or an Integer or a Longint. It is not initialized to point to any useful address, as Integers are not initialized to contain a particular value before it is filled explicitly by the program. Its content is undefined.

Pointers can get a meaningful content with a few special instructions, the most basic is Addr.
PPers := Addr(Pers2); shorthand: PPers := @Pers2;
Now the PPers pointer contains a meaningful value, the address of the variable Pers2 in memory. It can be used with
PPers^.Name := 'Hans Sachs';
This will set the name of Pers2.Name to Hans Sachs (Meistersinger).
PPers := @Pers3;
PPers^.Name := 'Lohengrin Parsivalson';
The PPers^.Name := code does not "know" that it writes to the Pers3 record in this case. And you must be aware that PPers^ has the TYPE of TPersRec! The compiler will alert you if you try to misuse it, eg.
if PPers^.Street = 'Main Road' then ...
will give you a compile time error, when Street is not a field in the TPersRec structure. This behaviour is very similar to the type checking in procedures, where the formal var parameters have a type as well, but the run time code in the procedure does not "know", which particular variable it is working with in the moment.

Obtaining memory from the heap

Pointers are used in the vast majority of applications to obtain memory space from the heap. The heap is the large "remaining" memory until the 640kB upper limit in standard DOS programs, and almost unlimited on protected mode (DPMI) and Windows operating environments.

The compiler never creates named variables on the heap, like global variables in the data area and procedure - local variables on the stack. But Pascal has a heap manager, which has the tools to obtain memory from the heap and return it when it is no longer needed.

The standard procedure to get memory from the heap is the New(ptr) instruction. It gains memory bytes and returns the base address in the Ptr - pointer. The Ptr pointer is then your handle to the memory, you have nothing else.

New(PPers);
will reserve 258 bytes (namestring + age) from the heap. The compiler "knows" the desired size from the TYPE of PPers, TPersRec.
Now you can write to the record with PPers^ and read it until you need it no longer, eg. after writing the contents to disk.
Dispose(PPers);
is the thankful procedure to release the obtained memory space when it is no longer needed. After Dispose the PPers pointer still exists, but it does not contain a valid address. The programmer MUST be aware of this. Usage of invalid pointers is one of the most typical reasons for program crashes!

Dynamic arrays

There is another procedure to obtain memory from the heap, the GetMem procedure. It has 2 parameters.
First define an array:
Type APersArr = Array[1..1999] of TPersRec;
PPersArr = ^APersRec;
Var APers : PPersArr;
But you will normally use less than 1999 persons records. This is where the GetMem comes into effect:

GetMem(APers,579*sizeof(TPersRec));
Now you can access each of the records until 580:
APers^[451].Name := 'Don Carlos';
and when finished:
FreeMem(APers,579*sizeof(TPersRec));

You can read all records from a file:
Var FS,haveread : Integer;
F : File of TPersRec;
... assign+reset
FS := FileSize(F);
GetMem(APers,FS*sizeof(TPersRec));
BlockRead(F,APers^,FS,haveread);
all in a single chunk. If you could obtain all the memory needed.
FreeMem(APers,FS*sizeof(TPersRec));

This is the most primitive way to get the dynamic array: using memory on the heap with a pointer.
BUT:
You must be aware that the type declaration as shown above is necessary - to be an array. As opposed to "C" style programming, where a pointer to a single variable can be used as an array pointer. Pascal programmers treat that as "dirty" programming style.

You can declare the type with a size which would never occur in reality. Because the type declaration AND the pointer declaration does not reserve physical memory (with the pointer only a few bytes), it does not matter how big you declare the array - except you violate internal limitations. Eg. the 16-bit compilers cannot reserve more than 65kB in one chunk, even in the type declaration.

And you must care yourself for the range checking, since the compiler's range checking is overridden explicitly with your program!
A suggestion: If you know the size at compile time, use constants in the program. Anyway check the bounds wherever a violation could occur!

But this is a very primitive approach. It has one benefit: you can read and write to the file in one single chunk, for speed. But it is tedious to handle this huge array of data in memory.

Array of Pointers

Beginners read all records from the file into memory in one single chunk. This is primitive and has some serious drawbacks. Usually there should be a means to add and remove records, but how to do that with one single memory block?

It is much better to declare an array of pointers to single records:

Type APers = Array[1..1999] of ^TPersRec;

This needs 8kB of memory for the pointers (assumed 4 byte for each pointer).

Whenever you create a new person - record, you simply
New(APers[UU]);
obtaining memory from the heap in small chunks of record size. It needs a little more programming, but it is worth the effort for flexibility.

To access it use:
APers[nn]^.Age := 34;
Note that this is different from the above with Don Carlos!

It is now much easyer to sort the records, since your bubble sort or quick sort procedure only needs to exchange tiny pointers in the array, the flesh - the records contents remain unmoved on their locations on the heap.

If you insert a new person, you need only move pointers upwards and insert a New(APers[nnn]) pointer in the array, the same applies to a delete procedure.
---
Type Elements = Array[0..1023] of SomeAtom;
PElements = ^Elements;
Var GlobArr = Array[0..256] of PElements; {needs 1 kB}
...
AnAtom := GlobArr[167]^[672];
Why not?
And even more "dynamic":

Type TBigArr = Array[0..1023] of PElements;
Var BigArr : ^BigArr;
   ... getmem(BigArr, 900*sizeof(Pointer));
   for I := 0 to 899 do
     getmem(BigArr^[I],16*sizeof(SomeAtom));
{actually uses only element arrays with 16 atoms}
   BigArr^[821]^[15] := AnAtom;
BigArr is only 1 pointer, in the datasegment. It
points to an array of pointers, which is on the heap.
These pointers point to one-dimensional arrays.
After use you must dispose the memory:
   for I := 0 to 899 do
     freemem(BigArr^[I],16*sizeof(SomeAtom));
   freemem(BigArr, 900*sizeof(Pointer));
in this order! of course.

Key record (indexing)

In most cases it is not even necessary to fetch the whole data records from the disk to memory. Create a key record and leave the flesh on the disk file.
Type TPkey = Record
               Key : String[29];
               RNo : Integer; {32 bytes total}
             End;
     PPkey = ^TPkey;
Var APkey = Array[0..1999] of PPkey;
At program start you read in the whole file, but create tiny key records in memory only. The Key is composed from .Name and .Address or whatever. And the RNo is the record number on disk, you can later Seek(F,RNo) in the file to get the proper record.

This has 2 important benefits:
1. you need less memory space
2. data remain save on the disk and survive system crashes.

A third point is important for multitasking / multiuser / networking applications: Several users can access the data on the file simultaneously, where each modification is written to the appropriate record immediately.

TCollection

is the ideal object (class) to handle large arrays of records. It does not contain the records, they are on the heap as described above. But it handles the pointers to the records. It has insert and delete methods and it automatically extends the size of the built in pointer array if necessary. Btw.: The pointer array is maintained on the heap too, it does not need valuable data-segment space.

The TSortedCollection
is an extended object of the TCollection, it needs a rather simple virtual function to compare two records, all the rest of the job is performed by the TSortedCollection.

Look in the Objects.TPU (DOS - version) and in your manuals how to use the TCollection. It is primarily used with Turbo Vision. A similar approach is the TList of Delphi.

More about Pascal memory and pointers, and links to more pointer tutors:
TP-memory considerations, Pointer Primer

Dr.John Stockton has an issue in the Borland Pascal Extensions Page

leileilei

Delphi considerations

For Delphi the same is valid, except that on the 32-bit environment you seem to have unlimited heap space. But consider: the heap is lost on a power fail - and your data, the work of hours or probably years, is lost too.

Note that the Delphi compiler creates pointers automatically, when you make an object/class. It hides the "pointer" property even to the programmer.

And Delphi has additional functionality especially for use in in procedure parameters, the SLICE function. See more in your Delphi manual and in the Math page of Earl F. Glynn below.

For more "scientific" subjects look in the excellent EFG's math FAQ

On Peter Haas' homepage you find ! dyna.zip (6kB) [mirror no maint]

   Subject:           Re: re-dimensioning arrays (Source)
      Date:           1 Jul 1998 10:08:29 GMT
      From:           Antivivisektion@t-online.de (Antivivisektion e.V.)
Organization:        http://Antivivisektion.base.org
Newsgroups:          comp.lang.pascal.borland

Toby Cubitt wrote:
>
> What can you do in turbo pascal if you want to re-dimension an array
> based on user input.

{$A+,B-,D+,E-,F-,I-,L+,N-,O-,R-,S-,V-,T-}
{$DEFINE UNIT}
{$IFDEF UNIT}
Unit Plateau;

Interface
{$ENDIF}

Type
IndexArray = Array[1..$3FFF] Of LongInt;
SizeArray = Array[0..$3FFE] Of LongInt;
PlatauObj = Object
         Base: Pointer;
   IndexTable: ^IndexArray;
    SizeTable: ^SizeArray;
          Dim: Word;

    Function Init (pDim,pSize: Word; Var pBound,pIndex): Boolean;
    Function GetAddr: Pointer;
    Procedure Help;
End;

{$IFDEF UNIT}
Implementation
{$ENDIF}

Function Ptr2Long(p: Pointer): LongInt; Assembler;
Asm
   mov ax,word ptr [p+2]
   mov dx,ax
   shr dx,12
   shl ax,4
   add ax,word ptr [p]
   adc ax,0
End;

Function Long2Ptr(l: LongInt): Pointer; Assembler;
Asm
   mov ax,word ptr [l]
   mov dx,word ptr [l+2]
   mov bx,ax
   and ax,0Fh
   shr bx,4
   shl dx,12
   add dx,bx
End;

Function SubPtr(p: Pointer; d: LongInt): Pointer;
Begin
SubPtr := Long2Ptr(Ptr2Long(p)-d);
End;

Function AddPtr(p: Pointer; d: LongInt): Pointer;
Begin
AddPtr := Long2Ptr(Ptr2Long(p)+d);
End;

Function PlatauObj.Init (pDim, pSize: Word; Var pBound, pIndex):
Boolean;
Var Loop: Word; Disp: LongInt; Bound: Array[1..$1FFF,1..2] Of LongInt
Absolute pBound;
Begin
{$IFOPT D+} FillChar (Self,SizeOf(Self),0); {$ENDIF}
IndexTable := @pIndex;
Dim := pDim;
GetMem (SizeTable,SizeOf(LongInt)*Succ(pDim));
SizeTable^ [pDim] := pSize;

Disp := 0;
For Loop := pDim DownTo 1 Do
Begin
Inc(Disp,SizeTable^[Loop]*Bound[Loop,1]);
SizeTable^[Pred(Loop)] :=
SizeTable^[Loop]*Succ(Bound[Loop,2]-Bound[Loop,1]);
End;

If (MaxAvail < SizeTable^[0])
     Then Init := False
     Else
       Begin
         Init := True;
         Base := SubPtr(HeapPtr,Disp);
         HeapPtr := AddPtr(HeapPtr,SizeTable^[0]);
       End;
End;

Function PlatauObj.GetAddr: Pointer;
Var Loop: Word; Disp: LongInt;
Begin
Disp := 0;
For Loop := 1 To Dim Do
Inc(Disp,SizeTable^[Loop] * IndexTable^[Loop]);
GetAddr := AddPtr (Base,Disp);
End;

Procedure PlatauObj.Help;
Begin
WriteLn;
WriteLn ('Unit Plateau V1.2 - (C) by Antivivisektion@t-online.de,
Update: 3. Feb 1992');
WriteLn ('Fields of dynamic dimensions and borders with size > 64 K');
WriteLn;
WriteLn ('1. Static version');
WriteLn ('   Var Field: Array[-3..30,-4..40,-5..50] Of Real;');
WriteLn ('   Begin');
WriteLn ('     Field[-2,-3,-4] := PI;');
WriteLn ('   End.');
WriteLn;
WriteLn ('2. Dynamic version');
WriteLn ('   Uses Plateau;');
WriteLn ('   Const DIM = 3;');
WriteLn ('   Const Bound: Array[1..DIM,1..2] Of LongInt =
((-3,30),(-4,40),(-5,50));');
WriteLn ('   Var Index: Array[1..DIM] Of LongInt; Field: PlatauObj;');
WriteLn ('   Begin');
WriteLn ('     If Not Field.Init (DIM,SizeOf(Real),Bound,Index) Then
RunError(203);');
WriteLn ('     Index[1] := -2; Index[2] := -3; Index[3] := -4;');
WriteLn ('     Real(Field.GetAddr^) := PI;');
WriteLn ('   End.');
WriteLn;
End;

{$IFNDEF UNIT}
Const DIM = 3;
Const Bound: Array[1..DIM,1..2] Of LongInt = ((-3,30),(-4,40),(-5,50));
Var Index: Array[1..DIM] Of LongInt; Field: PlatauObj; i: LongInt;
Begin
If Not Field.Init (DIM,SizeOf(Real),Bound,Index) Then RunError(203);
Index[1] := -2; Index[2] := -3; Index[3] := -4;
Real(Field.GetAddr^) := PI;
{$ENDIF}
End.

--
A.E.Neumann für die Antivivisektion e.V., PO-Box 201, D-53569 Unkel
mailto:Antivivisektion@t-online.de http://Antivivisektion.base.org/
No pascal, no english - sorry -----^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

   Subject: Re: Array Problem (Source)
      Date: 3 Jul 1998 06:57:59 GMT
      From: Antivivisektion@t-online.de (Antivivisektion e.V.)
Organization:http://Antivivisektion.base.org
Newsgroups: comp.lang.pascal.borland, de.comp.lang.pascal.misc

Stephan Erlank wrote in <news:comp.lang.pascal.borland>:
>
> I want to write a program that reads in all the words from a specified
> text file and add them to a dictionary. It must also compare it with
> the rest of the dictionary and then write it to the dictionary if it
> doesn't already exist. I already have an idea to do this, but I have
> to use an array and it seems that I can't use more than 253 indexes.

The following program reads (depending on your memory) up to
2 MB (= 50.000-100.000 textlines) into EMS-memory. Only the
index is stored on heap in a binary-tree.

It's fast. It reads, compares and sorts 36.000 lines (1,5 MB) in
about 20 s. It's small. The EXE-size is only 5 KB, it uses only 4 KB
stack in recursion SHOWTREE.

{ LoadTxt 1.0 Freeware by Antivivisektion@t-online.de (C) 2. Jul 1998 }
{$A+,B-,D+,E-,F-,G+,I+,L+,N+,O-,P+,Q-,R-,S+,T+,V+,X+,Y+,M 4048,0,655360}

Program LoadTxt;
Uses Objects;

Type
pString = ^String;
pNode = ^tNode;
tNode = Record
KeyPos: LongInt;
Left, Right: pNode;
End;
Const
Base: pNode = NIL;
Var
Work: ^pNode;

Var EMS: tEMSstream; TmpStr: String;

Function StrCmpResult: Integer;
{ ======== Funky hack... ========= }
{ -1 = below; 0 = equal; 1 = above }
Inline(
$9F/           { LAHF        }
$B0/$00/       { MOV AL,00 }
$18/$C0/       { SBB AL,AL }
$C0/$D4/$02/   { RCL AH,02 }
$F5/           { CMC         }
$10/$C0/       { ADC AL,AL }
$98            { CBW         }
);

Function GetString(EMSpos: LongInt): pString;
Var X: LongInt;
Begin
X := EMS.GetPos;
EMS.Seek(EMSpos);
EMS.Read(TmpStr[0],1);
EMS.Read(TmpStr[1],Length(TmpStr));
EMS.Seek(X);
GetString := @TmpStr;
End;

Function HeapFunc (Size: Word): Integer; far;
Begin
HeapFunc := 1; { return NIL on Heap overflow }
End;

Procedure MyExit; far;
Begin
EMS.Reset;
EMS.Done;
End;

Procedure ShowTree(Node: pNode);
Begin
If Node = NIL Then Exit;
ShowTree(Node^.Left);
WriteLn(GetString(Node^.KeyPos)^);
ShowTree(Node^.Right);
End;

Var T: Text; Line: String;
Const LineCount: LongInt = 0;

Begin
HeapError := @HeapFunc;
ExitProc := @MyExit;

EMS.Init(1024*1024,10*1024*1024);
If (EMS.Status <> stOk)
Then Begin WriteLn('EMS error'); Halt; End;

Assign(T,'c:\users\antivi~1\mail\inbox.'#0'c:\FILENAME.TXT');
Reset(T);
While Not(EOF(T)) And (EMS.Status = stOK) Do
Begin
    ReadLn(T,Line);
    Inc(LineCount);
    Write(^M,'Line: ',LineCount:5,', MemAvail: ',MemAvail:5,
      ', EMS size: ',EMS.getSize:6);
    Work := @Base;
    While (Work^ <> NIL) Do
    Begin
      If (Line = GetString(Work^^.KeyPos)^) Then ;
      Case StrCmpResult Of
        -1: Work := @Work^^.Left;
         0: Break;
         1: Work := @Work^^.Right;
      End;
    End;
    If (Work^ = NIL)
       Then
         Begin
           New(Work^);
           If (Work^ = Nil) Then Break;
           Work^^.KeyPos := EMS.GetPos;
           Work^^.Left := NIL;
           Work^^.Right := NIL;
           EMS.Write(Line,Length(Line)+1);
         End;
End;
Close(T);

ShowTree(Base);
End.

--
A.E.Neumann für die Antivivisektion e.V., PO-Box 201,
D-53569 Unkel
mailto:Antivivisektion@t-online.de http://Antivivisektion.base.org/

Franz _Glaser: Delphi 4 is said to have provisions for dynamic arrays.

Variant arrays (and safe arrays) are costly in terms of memory and CPU
cycles, so you would not normally use them except in OLE Automation
code, or in special cases where they provide obvious benefits over
standard arrays. For instance, the database code makes some use of
variant arrays.
...
There is a certain amount of overhead in working with variant arrays. If
you want to process the arrays quickly, you can use two functions called
VarArrayLock and VarArrayUnlock. The first of these routines returns a
pointer to the data stored in an array. In particular, VarArrayLock
takes a variant array and returns a standard Object Pascal array. For
this to work, the array must be explicitly declared with one of the
standard types such as Integer, Bool, string, Byte or Float. The type
used in the variant array and the type used in the Object Pascal array
must be identical in all their members.
...
One of the most useful reasons for using a variant array is to transfer
binary data to and from a server. If you have a binary file, say a WAVor
AVI file, you can pass it back and forth between your program and an OLE
server using variant arrays. Such a situation would present an ideal
time for using VarArrayLock and VarArrayUnlock. You would, of course,
use VarByte as the second parameter to VarArrayCreate when you were
creating the array. That is, you would be working with an array of byte,
and accessing it directly by locking down the array before moving data
in and out of the structure. Such arrays are not subject to translation
while being marshaled across boundaries.

Remember that variant arrays are of use only in special circumstances.
They are very useful tools, especially when making calls to OLE
Automation objects. However, they are slower and bulkier than standard
Object Pascal arrays, and should be used only when necessary.

Page hosted by