ODirTree Recursive directory scanning

ODirTree implements recursive scanning for files using a simple, class orientated yet powerfull interface. ODirTree is an OOP version of EDirTree though ODirTree has some features EDirTree hasn't. However this doesn't mean that EDirtree couldn't have those features; I simply stopped extending EDirTree when I started on ODirTree.



Additional remarks, bugs and principles

Objectives

When I was developing this unit, I kept some things in mind:

The things I wanted to do with this unit were:

  1. I wanted to make a kind of automatic indexer for my home-made archive-CDs. It reads 4DOS-DESCRIPT.ION's, and generated files.bbses and a main-index. I wanted to implement a 00-index.txt and 00index.html system on top of that, but the longfilesupport became more important, and I never extended the program.
    However, I'm planning to do this soon, and probably before you see this text (before the next release), so check the O* files in the demo/ directory.
  2. The other reason was to have a custom scanning module. I use a modified version of FileFind to scan for ARJ and other archive types on the harddisk. This procedure however looks IN the files for detection, the extension doesn't matter. NEW This demo has been ported to FPC, see findarch.pp
I also had some things in mind when creating the OOP interface:

  1. Isolating the different systems (FileScan, BuildTree, and within Buildtree) more than in EDirTree. FileScan and Buildtree no longer share variables of any kind.
  2. More flexibility. Implemented by
  3. Keeping it fast. Actually, the ODirTree's BuildTree routines could be faster than ODirTree's. ODirTree.BuildTree already adds files to the tree while EDirTree.BuildTree doesn't.
Tricks, using ODirTree

The best way to find out how ODirTree works is

  1. read the general parts of these docs,
  2. to compile and run the related demoes (O* in ../demo) and then
  3. to try to understand those using the detailed chapters about each separate procedure/class in these docs

See also Procedures for an overview of this unit's structure.

P.s. ODirTree now uses forward slashes if conditional LINUX exists, and FindClose if conditional CloseFind (default always defined, you might undefine it for non-LFN dos systems, see syseq.inc) exists.



Procedures

The unit basically consists of two standalone types of directory scanning:

  1. TFileScan is a class that implements recursive scanning of a certain directory and below, and a procedure supplied by is run for each file which matches a certain pattern (like *.pp). Depending on the boolean DirsToo matching directories will also be reported to this procedure. All extension handling is left to the OS. So if you use patterns like *this*.* be sure that your OS supports it.
    This class/procedure gets the job done 90% of the time.
  2. The second set of classes (TTreeBuild, TTreeScan and TTreeAddFiles) create and operate on a binary tree in memory
Three methods exist in both main classes:



Types and constants



Procedure-types

 TYPE
      ReportProc    = PROCEDURE (Path:PathStr;Search:SearchRec);
      FileProc      = PROCEDURE(P:PathStr;Search:SearchRec);
      DirProc       = PROCEDURE(P:PathStr);
      DetectProc    = FUNCTION(Search:SearchRec;P:PathStr):BOOLEAN;
These proceduretypes accept one or two parameters.
  1. The PathStr typed parameter is always the path of the file the procedure has to process WITHOUT the filename itself.
  2. The SearchRec parameter (the record used by FindFirst,FindNext, see Dos unit for declaration) describes the found file (Search.Name is it's name, Search.Size it's size in bytes and so on).

The DirProc has only one parameter since it operates on a directory. The DetectProc type is a function, and the return value indicates if the file should be added to the tree (TRUE) or not (FALSE).



FilAttr

TYPE FilAttr = BYTE;

This type is meant to type the attributes for FindFirst and FindNext. A remnant of the units Modula-2 origin where it is a SET type.



Tree building types

These structures are used to build a tree (See procedure BuildTree) in memory. DirTreePoint is the base type. A list of DirTreePointers looks like this:

Horizontally, all directories are on the same level, horizontal lines indicate the NextDir pointer of DirTreeRecord, vertical lines equal the SubDirs pointer of DirTreeRecord. Directory DOS has no subdirectories, directory Windows two (System AND INF). Directory C:\ is the top level, and has two directories. All non-used pointers are nil

Files aren't included in this picture. Imagine every directory having a linked list of files in the direction perpendicular to the screen.

      Fileptr       = ^FilesRec;
      FilesRec      = RECORD
                        Next : FilePtr;   {next file in this directory}
                        DirE : SearchRec; { Unit DOS, record for findfirst}
                       END;


      DirTreePoint  = ^DirTreeRecord;
      DirTreeRecord = RECORD
                       NextDir,                  {next directory on this level}
                       SubDirs  : DirTreePoint;  {subdirectories (Lower than this level)}
                       Name     : PathStr;       {Name of directory}
                       Files    : FilePtr;       {see above}
                       END;


Variables

VAR
    DirsToo      : BOOLEAN;   Are matching directories reported to the main
                               module? See ScanTree and FileScan.

The variables below are updated by FileScan, SearchForFiles(not FoundDirs) and BuildTree (Only FoundDirs), after each scan. ClearStat resets them to zero, ClearStat is also run on startup.

VAR
    FoundCount   : WORD;      Number of files found
    TotalBytes   : LONGINT;   Total bytes in files found. See also ClusterSize
    FoundDirs    : WORD;      Directories found ( . and .. are ignored)

ClusterSize

The LONGINT ClusterSize can be used to influence TotalBytes, , TotalBytes can be rounded up to account for clustersize (or inode size, or however the filesystem calls it). If you specify ClusterSize=0, no rounding will be performed.

    ClusterSize  : LONGINT; Clustersize used for rounding, default
                                   0= no rounding


SetFAttr

Declaration

PROCEDURE SetFAttr (Attr:FilAttr);

Description

Sets attributes used for all FindFirst's in ODirTree. Directory attribute is added or cleared when necessary(if the program searches for directories or just for files).

See also FileScan, SearchForFiles and BuildTree which use the value set by the SetFAttr procedure.

Example:

 SetFAttr(archive+readonly);    include Archive and readonly files in search.



ClearStat

Declaration

PROCEDURE ClearStat;

Description

This is the initcode of the unit. It resets all the Variables to zero or false (like DirsToo) and calls SetFAttr to let the unit include all files except volume-IDs.

The initcode is moved to a procedure so mainprograms can reset the unit, and because TopSpeed modula-2 doesn't allow overlayed units to have initcode.

Example:

 ClearStat


FileScan

Declaration

PROCEDURE FileScan(RootDir,FileName : PChar;Report:ReportProc);

Description

Filesearch in path RootDir and in its subdirectories, for files matching FileName (may be a wildcard, directories are regarded as files when DirsToo=TRUE). Files are reported to the procedure Report, with all information (path and Dos.SearchRec).

To quickly execute a procedure in every directory, enter "." as filename, and assign TRUE toDirsToo.

FileScan is quite powerfull, however if you want to do a very complex scan, or scan a certain drive or directory several (more than 2) times, look at the ScanTree, SearchForFiles and BuildTree combination.

Note: Notes about Tree building

Example:

PROCEDURE WriteOutput(Path:PathStr;FileData:SearchRec); FAR; {procedure variables
                                           are always FAR for BP, FPK doesn't care}

BEGIN
 Write(Path,FileData.name,' ',FileData.size);
END;

BEGIN
 DirsToo:=FALSE;                      {ODirtree procedure, don't report
                                        directories, incase a directory with
                                        extension .pas exists}
 FileScan('c:\','*.pas',@WriteOutput); {searches for *.pas on entire 'c:\'}
END.


BuildTree

Declaration

FUNCTION BuildTree(CONST RootDir: PChar):DirTreePoint;

Description

Searches path RootDir and adds all directories to a DirTreePoint type tree. A pointer to the created tree is returned.

See also

Note: Notes about Tree building

Example:

(Too complex. See DirTest.pp in this package)



SearchForFiles

Declaration

PROCEDURE SearchForFiles(Root:DirTreePoint;CONST Pattern:PChar;Select:DetectProc);

Description

Use after a BuildTree (creates the Root DirTreePointer), searches all directories in memory for occurance of Pattern, and adds those to the tree under the "files" field of all DirTreePoint's. Pattern is something like "*.txt" Select is a function which you supply to do additional checks. If this function returns TRUE the file will be added to the tree, if it returns FALSE it won't. If you don't want to use this feature, pass a procedure which always returns TRUE

Can be used several times, for more than one extension/pattern, however overlapping patterns will result in duplicate files. Somewhere in the future I will write a procedure which fixes this

Note: Notes about Tree building

See also

Example:

(Too complex. See DirTest.pp in this package)



ScanTree

Declaration

PROCEDURE ScanTree(Root : DirTreePoint;DoFile:FileProcFileProc;DoDir : DirProc);

Description

Use after a BuildTree and optinally one or more SearchForFiles.

This procedure scans the directory tree Root and runs DoFile for each found file in the tree. DoDir is also run for every directory when DirsToo=TRUE.

Note: Notes about Tree building

See also

Example:

(Too complex. See DirTest.pp in this package)



KillFileTree

Declaration

PROCEDURE KillFileTree(VAR Root:DirTreePoint); Description

Use after a BuildTree and optionally one or more SearchForFiles.

This procedure simply removes a entire files-and-directory tree referenced by Root from memory.

Can also be used to eliminate unwanted parts of the tree. See also

Example:

(Too complex. See DirTest.pp in this package)



Notes about Tree building

The procedures SearchForFiles and BuildTree build directory trees in memory, and manipulate them, a speedgain is only achieved when you would have to do two or more FileScan.

This is because the treebuilding routines scan at least twice. Once for directories, at least one for files. (to get only a directory structure you can also use filename='.' and DirsToo=TRUE with Filescan)

Also remember that a SearchForFiles is much slower than a ScanTree. So one SearchForFiles with *.* and some extra code in the procedure which ScanTree reports to is often faster than doing multiple SearchForFiles, or FileScans

Generally, one SearchForFiles per driveletter should be enough if you program smart, (I know, I don't do that in the demo, but that's to show that SearchForFiles CAN be run twice.)

Also, heavy caching(Win95, Dos with HyperDisk, and probably also Linux) smoothes the difference between the two search mechanisms