I'm writing an application that will crawl through a folder or disk drive for media files, sort them, remove duplicates and fix metadata. Well... performance is becoming a bit of an issue. I'm just wondering if the way I'm getting subdirectories and filenames is inefficient because it seems to be my bottleneck. Its taking an hour to crawl through the 100GB of data I'm using for testing without any processing.
Pseudo Code
Actual C# Code
Sample Terminal Log Text
Pseudo Code
Code:
Create String Stack
Create filenames array
Push Initial Folder onto Stack
while (Stack has more than 0 items)
Pop folder off stack
Get subdirectory filenames array
Add array items to stack
Get current folder's files array
Add items to filenames array
Actual C# Code
Code:
//Get background worker ref
BackgroundWorker worker = sender as BackgroundWorker; (BackgroundWorker is a wrapper for MultiThreading)
//Create set for filenames
ListSet<string> filenames = new ListSet<string>();
//Get folder
string first = e.Argument as string;
//Create folder stack
Stack<string> folders = new Stack<string>();
//Push initial folder onto stack
folders.Push(first);
//Start traversing through subdirectories
while (folders.Count > 0)
{
//Get current path
string current = folders.Pop();
//Get folder into
DirectoryInfo info = new DirectoryInfo(current);
//Add subfolders to stack
foreach (DirectoryInfo d in info.GetDirectories())
{
//Check if cancellation in progress
if (worker.CancellationPending == true)
{
//Update cancel status
e.Cancel = true;
//Break from loop
break;
}
else
{
try
{
//Add directory
folders.Push(d.FullName);
}
catch (SecurityException ex)
{
//Show error message
MessageBox.Show("Cannot read " + d.FullName + ". Access Denied.", "Access Denied", MessageBoxButtons.OK, MessageBoxIcon.Error);
}
}
}
//Add files to files list
foreach (FileInfo f in info.GetFiles())
{
//Check if cancellation in progress
if (worker.CancellationPending == true)
{
//UPdate cancel status
e.Cancel = true;
//Break from loop
break;
}
else
{
try
{
//Check if media file
if(this.CheckFile(f.FullName));
{
//Add file
filenames.Add(f.FullName);
}
}
catch (SecurityException ex)
{
//Show error message
MessageBox.Show("Cannot read " + f.FullName + ". Access Denied.", "Access Denied", MessageBoxButtons.OK, MessageBoxIcon.Error);
}
}
}
//Clear the stack
folders.Clear();
folders = null;
Sample Terminal Log Text
Code:
Starting crawl of /Volumes/ROBIPOD. 10:31AM
*...Printout of Scanned Files...*
Found 5892 Media Files
Finished at 11:20AM
Last edited: