EStrings, Extra Strings (actually, PChar's) routines

This module is meant as an addition to RTL module Strings, not as a replacement, I haven't fully tested and implemented the examples, just a quick code-example. (ALMOST ALL EXAMPLES 'USE STRINGS', even when often omitted in the examples

This unit is mainly assembler, 3 or 4 procedures are Pascal-only (without assembler equivalents). The pascal procedures are marked with PAS in the procedure/function overview. All assembler procedures have Pascal equivalents.

If you wonder what all these weird string routines are for; I used to maintain an util which scanned several logfiles. These logfiles used certain characters (like +-!@#$%^&*()) to indicate a certain event or type of logline. Using these characters sped up the process, (a CharPos is faster than a Pos) and also made it more version independant (because it didn't rely on fixed positions, but on positions relative to a certain signal-character). Also complex procedures ease complex string processing routines, and makes them more maintainable.



Additional remarks, bugs and principles.

EStrings is a PChar module, which means that all positions start on spot 0, not on spot 1. This is the main difference between unit EStrings and EPasStr. EStrings use PChars and 0-based indices, while EPasStr uses Pascal-strings and all indices (should) start on 1. Therefore, this helpfile applies both to EPasStr and EStrings, at least if you keep those differences in mind. At the moment that I'm writing this, almost all procedures exist in both units, and I intend to keep it that way. The procedures that don't exist in EPasStr like StrInsert, StrDelete and PCopy do exist for pascal-strings, but are included in the RTL as Insert,Delete and Copy. Most EString procedures are prefixed with 'P'.

When I say STRING in this helpfile, it applies to PChars when you use EStrings, and pascal STRING[] when you use EPasStr

Bugs or things to think about:

  1. Routines that enlarge the string can write past the end of the allocated string, both PChars and Pascal strings. Without changes in the compiler, you can't make this safer. See also About PCHARs
  2. The routines have been quickly tested for some standard problems (empty/full string situations), but haven't been put through a real life use. So if you suspect a routine, try to revert to the pascal versions (Change {$DEFINE USEASM} IN {$UNDEF USEASM} in estrings.pp ), and if the error disappears, mail me the code that shows the bug.
My 32-bits programming style isn't entirely optimal yet, but the routines are fast enough for even heavy usage.



Types

Right now, EStrings has one type, CHARSET, which I think should be included it the RTL(since it's the most used set-type).

TYPE CHARSET=SET OF CHAR; { I'm quite fond of the SET OF CHAR
                             construction in string routines
                             (relatively) slow, but very powerfull,
                             and safe, you filter out all unwanted characters}


Procedures

About PCHARs



About PChars......

PChars are defined as a pointer to a character (CHAR), but the pchar can be indexed as if it was a pointer to array of char.

A PChar is always terminated by a CHR(0). If you don't terminate the PChar, your program has a realy fair chance of crashing the next time you use a Strings or EStrings procedure on it.

The room allocated for a PChar is the exact number of characters+1. This number isn't saved, so if you change the length of the string in the PChar without resizing the memoryblock it is in, critical information (the allocated length+1 of the old PChar) is lost. If you enlarge the pchar (like e.g. the following code:)

VAR P : PChar;

BEGIN
 P:=StrCopy('01234');    {Allocates 5+1=6 bytes for PChar}
 P[5]:='5'; P[6]:=CHR(0); {Enlarge PChar with one character}
END;

This program writes the 7th character in unallocated space, or in the next variabele (probably on the heap, or maybe even on the stack).

Reducing the size of the PChar is less problematic, but you if you don't keep track of the allocated space, you can't properly dispose the PChar again:

VAR P : PChar;

BEGIN
 P:=StrPCopy('01234');    {Allocates 5+1=6 bytes for PChar}
 P[4]:=CHR(0);           {Changes length to 4+1 bytes, allocated space is still
                          6}
 StrDispose(P);          {Disposes only the 4+1 bytes, leaving one byte allocated,
                           but unreferenced on the heap, heap pollution}
END;

What are all these routines good for then you'll ask? Well, PChars are most used to interface with the OS (PChars are similar to C, most OSes are written in C) or to put strings in records or arrays on the heap, since only used space is allocated (ARRAY[0..999] OF String; allocates 256000 bytes regardless of what's in it), or if you want to use strings longer than 256 characters.

So often you have a program which, for the above reasons, uses PChar's. To improve speed, and avoid slow conversions, you can do PChar handling entirely in PChar's, and the Strings and EStrings units' routines can be used to accomplish that. However, this is difficult (you have to keep track of far more things), and could introduce hard to trace bugs, and often not even faster than a conversion to pascal strings!

The trick is that you allocate a fixed size for all temporary strings, so you can use those freely within certain limits (you still can't allocate beyond with getmem allocated space).

UNTESTED, for the idea only!

VAR P,P2 : PChar;
    A,B  : LONGINT;

BEGIN
 GetMem(P,1000); GetMem(P2,1000);
 A:=0;
 REPEAT
  P:=StrCopy(P,SomeArrayOfPChar[A]);    { Assuming that StrLen(SomeOtherPChar)<1000}
  PExpandTabs(P1,P2,8);             {Expand all irritating tabs}
  B:=PCharPos(P,'1');               {Search first occurance of character 1}
  IF B<>65535 THEN
   StrDelete(P2,B,5); {Delete it, and the four characters after it if it exists}
 
  {A lot more operations on P2, lets assume that P2 is the end result:}

  StrDispose(SomeArrayOfPChar[a]);    { Deallocate original PChar}
  SomeArrayOfPChar[A]:=StrNew(P2);
 UNTIL A=NrPChars;                    { Untill all PChars are done}
 Freemem(P,1000); Freemem(P2,1000);   { Release temporary strings,
                                        note that the temporary strings are only
                                        (de)allocated once, not for every PChar
                                        in the processed array}

Complicated? Yes. But fast. This is only for special applications, not for everyday string manipulation.

Maintaining two units with the same procedures like I do (EPasStr and EStrings) is 20% more trouble than maintaining the Pascal strings unit only. I'm, as an ex-Modula2'er used to 0-terminated strings anyway.

Delphi (and FPC also almost) has also AnsiStrings, a combination from PChars and Strings, which are supposed to be handy in the cases where normally PChars would be used, but AnsiStrings are considerably slower, and not backwards compatible to BP/TP, the usage is intermediate to pascalstrings and pchar. The AnsiString type resembles PChar with length, reference-count and allocationsize prefixed. The PChar details however are hidden for you by the compiler and RTL, and to the programmer the type seems to have more in common with Pascal-strings.



PLTrim

Declaration

FUNCTION PLTrim (P : PChar;Ch:CHAR):PChar;

Description

Strip all characters Ch from the left (beginning) of the string P.

Returns the PChar(to show this procedure changed it)

See also

Uses None

Example


VAR P : PChar;

BEGIN
 GetMem(P,100);
 P:='       text';
 PLTrim(P,' ');
 Writeln(P);            {writes 'text'}
END;



PRTrim

Declaration

FUNCTION PRTrim (P : PChar;Ch:CHAR):PChar;

Description

Strip all characters Ch from the right (end) of the string P.

Returns the PChar(to show this procedure changed it)

See also

Uses None

Example


VAR P : PChar;

BEGIN
 GetMem(P,100);
 P:='text     ';
 PRTrim(P,' ');
 Writeln(P);            {writes 'text'}
END;



PKillChar

Declaration

PROCEDURE PKillChar(P : PChar;CONST Ch:CHARSET);

Description

PLTrim but then for an entire character set. Strips all characters in set Ch from PChar P

See also

Uses None

Example


VAR P : PChar;

BEGIN
 GetMem(P,100);
 P:='A B A B A Btext';
 PKillChar(P,['A','B',' ']);
 Writeln(P);            {writes 'text'}
END;



PKillBChar

Declaration

PROCEDURE PKillBChar(P : PChar;CONST Ch:CHARSET);

Description

PRTrim but then for an entire character set. Strips all characters in set Ch from end of PChar P

See also

Uses None

Example


VAR P : PChar;

BEGIN
 GetMem(P,100);
 P:='textA B A B A B';
 PKillBChar(P,['A','B',' ']);
 Writeln(P);            {writes 'text'}
END;


PAppendBackslash

Declaration

PROCEDURE PAppendBackslash(P:PChar);

Description

Appends a backslash ('\') to the end of a string if it's not already there.

Under Linux it appends a '/'. Used as a primitive for programs which create a lot of paths.

Using this procedure makes programs more safe. The Dos rtl procedures (the LFN ones anyway) don't work right on paths with two backslashes in it, probably because of the UNC (\\server\sharename) notation of networkdrives.
Using (P)AppendBackslash avoids such problems because it doesn't append a backslash if it's already there, like S:='S'+'\'+name; would.

Uses None

Example


VAR P : PChar;

BEGIN
 GetMem(P,100);
 StrCopy(P,'text\');
 PAppendBackslash(P);
 Writeln(P);            {writes 'text\'}
 StrCopy(P,'text');
 PAppendBackslash(P);
 Writeln(P);            {writes 'text\'}

END;



PReplaceChar

Declaration

PROCEDURE PReplaceChar(S : PChar;ReplaceMe,RepWith:CHAR);

Description

Replace in PChar S the character "ReplaceMe" with "RepWith"

Uses None

Example


VAR P : PChar;

BEGIN
 GetMem(P,100);
 StrCopy(P,'text\ A A A A A ');
 PReplaceChar(S,'A','B');
 Writeln(P);            {writes 'text\ B B B B B '}
END;


PStripChar

Declaration

PROCEDURE PStripChar(S : PChar;C:CHAR);

Description

Remove all characters C from string S.

Uses None

See AlsoPKillChrTot

Example


VAR P : PChar;

BEGIN
 GetMem(P,100);
 P:='text\ A A A A A ';
 PStripChar(S,'A');
 Writeln(P);            {writes 'text\      '}
END;


PKillChrTot

Declaration

PROCEDURE PKillChrTot(S : PChar;CONST C:CHARSET);

Description

Remove all characters in set C from string S.

Uses None

See AlsoPStripChar

Example


Uses Strings;

VAR P : PChar;

BEGIN
 GetMem(P,100);
 StrCopy(P,'text\ A A A A A ');
 PKillChrTot(S,'A');
 Writeln(P);            {writes 'text\      '}
END;


PCharPos

Declaration

FUNCTION PCharPos(P:PChar;C:Char):WORD;

Description

Pos for one char only. Faster than an ordinary Pos, -1 when not found.

PCharPos starts searching at the beginning of the array of char/string

Uses None

See Also

Example

VAR P : PChar;

BEGIN
 GetMem(P,100);
 P:='text\ A A A A A ';
 Writeln(PCharPos(S,'A'));  {writes 6 }
END;


PRCharPos

Declaration

FUNCTION PRCharPos(P:PChar;C:Char):WORD;

Description

Pos for one char only. Faster than an ordinary Pos, -1 when not found, this version starts searching at the back of the string. It returns a standard index in the string (EStrings:first char is 0,EPasStr:first char is 1).

Uses None

See Also

Example

VAR P : PChar;

BEGIN
 GetMem(P,100);
 P:='text\ A A A A A ';
 Writeln(PRCharPos(S,'A'));  {writes 14 }
END;


PNextCharPos

Declaration

FUNCTION PNextCharPos(P:PChar;C:Char;Count:WORD):WORD;

Description

Pos for one char only. Faster than an ordinary Pos, -1 when not found, this version starts searching at character number count, and searches towards the end of the string

It returns a standard index in the string (EStrings:first char is 0,EPasStr:first char is 1).

Uses None

See Also

Example

VAR P : PChar;

BEGIN
 GetMem(P,100);
 P:='text\ A A A A A ';
 Writeln(PNextCharPos(S,'A',6));  {writes 8}
END;


PNextRCharPos

Declaration

FUNCTION PNextRCharPos(P:PChar;C:Char;Count:WORD):WORD;

Description

Pos for one char only. Faster than an ordinary Pos, -1 when not found, this version starts searching at character number count, and searches back to the beginning ofthe string

It returns a standard index in the string (EStrings:first char is 0,EPasStr:first char is 1).

Uses None

See Also

Example

VAR P : PChar;

BEGIN
 GetMem(P,100);
 StrCopy(P,'text\ A A A A A ');
 Writeln(PNextRCharPos(S,'A',13));  {writes 12 (the last but one A)}
END;


PCharPosSet

Declaration

FUNCTION PCharPosSet(P:PChar;CONST C:CHARSET):WORD;

Description

Returns the first occurance in PChar P of character a character in charset C

Uses None

See Also

Example

VAR P : PChar;

BEGIN
 GetMem(P,100);
 StrCopy(P,'text\ A A A A A ');
 Writeln(CharPosSet(S,['A','\']));  {writes 4 }
END;


PStripDoubleChar

Declaration

PROCEDURE PStripDoubleChar(P:PChar;C:Char);

Description

Cleans a string of double(of more) sequences of char C.

Used to make mail from Fido and Newsgroup newbies readable (with a StripDoubleChar for '.', '!' and space) :-)

Uses None

See Also

Example

VAR P : PChar;

BEGIN
 GetMem(P,100);
 StrCopy(P,' 1 2  3   4    5');
 StripDoubleChar(P,' ');
 Writeln(P);            { Writes ' 1 2 3 4 5'}
END;


PRGrow

Declaration

PROCEDURE PRGrow(P:PChar;C:CHAR;Count:WORD);

Description

Make PChar P Count characters big. Pad right (at end of string) with character C.

Uses None

See Also

Example

VAR P : PChar;

BEGIN
 GetMem(P,100);
 StrCopy(P,'1');
 PRGrow(P,' ',10);
 Writeln(P);            { Writes '1        '}
END;


PLGrow

Declaration

PROCEDURE PLGrow(P:PChar;C:CHAR;Count:WORD);

Description

Make PChar P Count characters big. Pad left (beginning of string) with character C.

Uses None

See Also

Example

VAR P : PChar;

BEGIN
 GetMem(P,100);
 StrCopy(P,'1');
 PLGrow(P,' ',10);
 Writeln(P);            { Writes '         1'}
END;


PStrStr

Declaration

PROCEDURE PStrStr(P:PChar;C:Char;Count:LONGINT);

Description

Fill PChar P with Count times character C. Erases existing contents/

The name comes from the old Basic procedure String$, which is often pronounced 'stringstring'.

Uses None

Example


VAR P : PChar;

BEGIN
 GetMem(P,100);
 StrCopy(P,'1');
 PStrStr(P,' ',10);
 Writeln(P);            { Writes '          '}
END;


Item range of procedures.

These are all three PASCAL procedures, but these ARE optimized for speed, so usable.

Declaration

  1. PROCEDURE PItem(Dest,Source:PChar;T: CHAR; N: WORD);
  2. PROCEDURE PItem(Dest,Source:PChar;CONST T: CHARSET; N: WORD);
  3. PROCEDURE PItemS(Dest,Source,T:PChar; N: WORD);

Description

These routines isolate strings separate by one (procedure 1) or more (2 and 3) separators. The original string is in Source, the isolated string will be written to Dest. N=0 gets the first string, N=1 the next etc.

If you ask for a high N, and it can't be found, the string is emptied.

Uses

Example


VAR Source,Dest : PChar;
    A :WORD;

BEGIN
 GetMem(Source,100); GetMem(Dest,100);
 StrCopy(Source,' hello1 hello2 hello3 hello4 ');
 FOR A :=0 TO 4 DO
  BEGIN
   Write(A,' ');
   Pitem(Dest,Source,' ',A);
   IF Dest[0]=CHR(0) THEN
    Writeln('Empty')
   ELSE
    Writeln(Dest);
  END;
END;

Prints:
 0 hello1
 1 hello2
 2 hello3
 3 hello4
 4 Empty


PCopy

Declaration

PROCEDURE PCopy(Source,Dest:PChar;Start,Num: WORD);

Description

Kind of Copy(the internal one for pascal-strings) but for PChars
Copy Num chars from Source, starting at position Start from PChar Source to PChar Dest.

Uses None

Example


VAR Source,Dest : PChar;

BEGIN
 GetMem(Source,100); GetMem(Dest,100);
 StrCopy(Source,'0123456');
 PCopy(Source,Dest,1,3);
 Writeln(Dest);            { Writes '123'}
END;


PGetBetween

This is a PASCAL procedure, but optimized for speed, so usable at a decent speed.

Declaration

FUNCTION PGetBetween (Source,Dest:PCHAR;C1,C2:CHAR):BOOLEAN;

Description

Copy chars between first occurance of C1 and C2 to Dest, return status (TRUE=success).

Using C1=C2 is allowed.

C2 characters before C1 are allowed and ignored. Only a C2 character AFTER C1 is detected. Existance of C1, but no character C2 AFTER C1 will cause the procedure to fail.

Uses

Example

VAR Source,Dest : PChar;

BEGIN
 GetMem(Source,100); GetMem(Dest,100);
 StrCopy(Source,'0123456');
 PGetBetween(Source,Dest,'1','4');
 Writeln(Dest);            { Writes '23'}
END;


StrDelete

Declaration

PROCEDURE StrDelete(P:PChar;Position,Count:WORD);

Description

Deletes Count characters in PChar P on position Position.

Like the normal Delete for Pascal strings, except that it is zero based. (position of first character is zero), programmed because I needed the routine for the Pascal implementation of some EString routines.

Uses None (internal StrLen)

Example


VAR P : PChar;

BEGIN
 GetMem(P,100);
 StrCopy(P,'A B A B A Btext');
 StrDelete(P,2,3);
 Writeln(P);            {writes 'A  B A Btext'}
END;


PUpperCase

Declaration

PROCEDURE PUpperCase(P:PChar); Description

Uppercase all characters in PChar P. Only works for the normal (a..z) character, not for international characters.

See also PLowerCase

Example


VAR P : PChar;

BEGIN
 GetMem(P,100);
 P:='abcde';
 PUpperCase(P);
 Writeln(P);            {writes 'ABCDE'}
END;


PLowerCase

Declaration

PROCEDURE PLowerCase(P:PChar); Description

Lowercase all characters in PChar P. Only works for the normal (A..Z) character, not for international characters.

See also PUpperCase

Example


VAR P : PChar;

BEGIN
 GetMem(P,100);
 P:='ABCDE';
 PLowerCase(P);
 Writeln(P);            {writes 'abcde'}
END;


StrInsert

Declaration

PROCEDURE StrInsert(Tobeins:PCHAR;Dest : PCHAR ;Position:WORD);

Description

Inserts characters from Tobeins in Dest, the first character is put on position Position

See also StrDelete

Example


VAR P,P2 : PChar;

BEGIN
 GetMem(P,100); GetMem(P2,100);
 StrCopy( P,'ABCDE');
 StrCopy(P2,'hello');
 StrInsert(P2,P,3);
 Writeln(P);            {writes 'ABChelloDE'}
END;


PCommaStr

Declaration

PROCEDURE PCommastr(S : PChar;sep:CHAR);

Description

Inserts separation character on each 3rd spot starting from the end of the string. e.g. 2123456789 -> 2,123,456,789

See also StrDelete

Example


Uses Strings;

VAR P : PChar;

BEGIN
 GetMem(P,100);
 StrCopy( P,'12345678');
 PCommaStr(P,'.');
 Writeln(P);            {writes '12.345.678'}
END;


PExpandTabs

Declaration

PROCEDURE PExpandTabs(P,P2:PChar;Tabsize:WORD);

Description

Expands tabs in P to spaces, puts result in P2. (P untouched). Tabsize is the number characters between two tabs.

This procedure implements real tabbing, not simply replacing a hardtab with tabsize spaces. It doesn't implement smart tabbing(place tabstops dependant on text on previous line).

This procedure is used to deal with tabs when reading textfiles. One (P)ExpandTab after each stringread (ReadLn) from the file, and forget all problems with tabs. If the identation doesn't matter, you can just use PReplaceChar (which would be faster) and replace each tab with a space

See also PCompressTabs

Example


Uses Strings;

VAR P : PChar;

BEGIN
 GetMem(P,100); GetMem(P2,100);
 StrCopy( P,'1'+CHR(9)+2');
 PExpandTabs(P,P2,8);            {012345678}
 Writeln(P2);            {writes '1       2'}
END;


PCompressTabs

Declaration

PROCEDURE PCompressTabs(Source,Dest:PChar;Tabsize:LONGINT);

Description

Compress tabs to spaces, with variable tabsize. This procedure doesn't simply compress tabsize spaces to a hardtab, but implements tabbing like in an ordinary texteditor like q.exe (or joe).

Doesn't function well with hardtabs in the Pchar Source. In that case, run PExpandTabs first.

Source PChar is untouched.

Example


Uses Strings;

VAR P : PChar;

BEGIN
 GetMem(P,100); GetMem(P2,100);
 StrCopy( P,'1                      2                   3          4');
 PCompressTabs(P,P2,8);
 Writeln('original length = ',StrLen(P),'  New length = ',StrLen(P2));
END;


PBinaryToStr

Declaration

PROCEDURE PBinaryToStr(P : PChar ;Value,Bits : CARDINAL);

Description

Convert a number(Value) to a PChar(P) in binary representation, with a configurable number of bits(Bits), Bits is 0..32, and doesn't have to be a multiple of 8.

See also

Example

Uses Strings;

VAR P : PChar;

BEGIN
 GetMem(P,100);
 PBinaryToStr(P,$AAAA,15);
 Writeln(P);            {writes '010101010101010'}
END;


PStrToBinary

Declaration

FUNCTION PStrToBinary(P : PCHAR ;Bits : CARDINAL):CARDINAL;

Description

Read first Bits digits from P (e.g. P:='0101010'), and return their binary value as a cardinal. Bits ranges from 0 to 32, though Bits=0 is useless (returns 0)

See also

Example

Uses Strings;

VAR P : PChar;

BEGIN
 P:='010101010101010';
 Writeln(PStrToBinary(P,StrLen(P))); {writes '2AAA'}
END;


POctToStr

Declaration

PROCEDURE POctToStr(P : PChar ;Value,Digits : CARDINAL);

Description

Convert a number(Value) to a PChar(P) in octal representation, with a configurable number of Digits (parmeter Digits), Digits' range is 0..12.

See also

Example

Uses Strings;

VAR P : PChar;

BEGIN
 GetMem(P,100);
 POctToStr(P,$AAAA,6);
 Writeln(P);            {writes '125252'}
END;


PStrToOct

Declaration

FUNCTION PStrToOct(P : PCHAR ;Digits : CARDINAL):CARDINAL;

Description

Read first Digits octal digits from P (e.g. P:='776'), and return their binary value as a cardinal. Digits ranges from 0 to 11, though Digits=0 is useless (returns 0)

See also

Example

Uses Strings;

VAR P : PChar;

BEGIN
 P:='125252';
 Writeln(PStrToOct(P,StrLen(P))); {writes 'AAAA'}
END;


PHexToStr

Declaration

PROCEDURE PHexToStr(P : PChar ;Value,Digits : CARDINAL);

Description

Convert a number(Value) to a PChar(P) in hexadecimal representation, with a configurable number of Digits (parmeter Digits), Digits' range is 0..12.

See also

Example

Uses Strings;

VAR P : PChar;

BEGIN
 GetMem(P,100);
 PHexToStr(P,$AAAA,6);
 Writeln(P);            {writes '00AAAA'}
END;


PStrToHex

Declaration

FUNCTION PStrToHex(P : PCHAR ;Digits : CARDINAL):CARDINAL;

Description

Read first Digits hexadecimal digits from P (e.g. P:='AAF'), and return their binary value as a cardinal. Digits ranges from 0 to 8, though Digits=0 is useless (returns 0)

See also

Example

Uses Strings;

VAR P : PChar;

BEGIN
 P:='ABCDEF';
 Writeln(PStrToHex(P,StrLen(P))); {writes 'ABCDEF'}
END;