Is there a way to search for a specific number format like ###-##-#### to find ONLY numbers when using OSForensics?
Is there a way to search for a specific number format like ###-##-#### to find ONLY numbers when using OSForensics?
I assume you have created an index with the "Create index" function and now wish to search it.
The easiest way to do this is to go to the "Search index" function, select the the "Browse index" tab then use the built in "Phone number - US" filter (bottom right of the window).
Thanks, that will help with phone numbers, but I'm also looking for SSN (both with and without dashes) or pretty much any other time I might need spaceholders for numbers only.
From the help file,
"FilterOptionsPerl compatible regular expressions (PCRE) are used when filtering the results displayed when browsing the search index. Several regular expression have been pre defined for quick use but you can also type your own regular expressions in the edit below the list. Currently the search is case insensitive, so "TEST" will return the same results as "test".\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b
For example to search for any entry containing the word "test" select the Custom option from the filter drop down list, type "test" and then click the search button. To find only entries that begin with the word "test" use "^test", the "^" character is used to indicate the pattern match must start at the beginning of the found word.
To search for one of the special characters (eg $ ^ .) you will need to escape the character with "\", eg "\.com". For more information on the format and special characters used see the Perl regular expressions help page.
There are several pre-configured regular expressions available from the drop down list, these are found in the the "RegularExpressions.txt" file in the OSForensics program data directory (ProgramData\PassMark\OSForensics). These have been collected from various sources and are kept as simple as possible while still returning fairly accurate results, please note these will not be 100% accurate in all situations."
Note that the regular expressions in this filter work on the index of words and not directly on the full document text. Words in the index are already divided up based on white space and punctuation. So it doesn't make sense to use a regex containing spaces.
Words well for items like E-mail addresses however. For which the regex is,
Definitely what I was looking for.
I had tried putting the regular expression into the Custom filter with no success before, but I have since tried some other expressions successfully. It seems that the expressions that are simply built to look for SSN formats work as expected, but ones that are more complicated and try to filter out invalid SSNs does not return ANY results. For example:
Just find the format:
^(\d{3}-?\d{2}-?\d{4}|XXX-XX-XXXX)$
^((\d{3}-?\d{2}-?\d{4})|(X{3}-?X{2}-?X{4}))$
Find valid SSN:
/^([0-6]\d{2}|7[0-6]\d|77[0-2])([ \-]?)(\d{2})\2(\d{4})$/
(?!000)(?!666)([0-6]\d{2}|7[0-6]\d|77[0-2])([ \-]?)(?!00)(\d{2})\2(?!0000)(\d{4})
Now, we have used the expressions that filter for only valid expressions in other tools without issue. Does it have something to do with the indexing?
Also, does the Professional version eliminate the 2GB file size limitation?
Thanks!
Are you using 32bit or 64bit?
There shouldn't be any file size limit in 64bit.
Which file are you worried about?
I hate debugging long regex expressions. Can take forever to find a missing slash. Was this a 2 line expression, or did the forum split it up when you posted it? Are you sure it worked at some point and it was a Perl expression?
I'll also check it if have some max line length limit.
The first reason that no results were returned from the more complicated "valid only" expressions was that we were using a function to execute the expression that favored speed, so the more complicated ones were not being executed properly. We'll be making some changes to the next build so we call a slower, but more accurate, PCRE function to return the results.
Another issue may be that these two "valid only" expressions are still not 100% accurate, in some of our tests (through both OSForensics and testing using http://www.regextester.com/) ,
(?!000)(?!666)([0-6]\d{2}|7[0-6]\d|77[0-2])([ \-]?)(?!00)(\d{2})\2(?!0000)(\d{4})
returned more junk results than the simpler ones, for example it will match "000000000000000000001234567890AB"
"001ISB601089506"
and
"0x7619693978f91d90539ae786".
The other one,
/^([0-6]\d{2}|7[0-6]\d|77[0-2])([ \-]?)(\d{2})\2(\d{4})$/
doesn't seem to match even valid SSNs.
Yeah, the second one I had pulled down from a help board while I was just trying to find something that would work in OSForensics. The former was the one we've been using in Spider 2008 (which we are looking at replacing). I'll be looking forward to trying the next build.
OSForensics might be out of the running, however, as we're still having issues with the file size limitation. We've tried it on a Win7 64-bit machine on NTFS (BitLocker enabled) and a Win8 Preview 32-bit on NTFS (no BitLocker) with similar results concerning any files over 2GB. The log simply states to Check Configuration Settings, and when we try to change the setting, a pop-up comes up that says "Max file size must be less than 2GB."
thanks,
tim
If you need a quick patch release for the RegEx change, let me know. Otherwise we'll put it into the next public release in a week or so.
For the file size issue I think the key question is why are you trying to set the file size limit above 2GB? In other words what single file on your hard disk is larger than 2GB and is worth indexing for it's text content.
Note that the limit doesn't apply at all to compound files. For example for Zip files, the limit applies to each of the files inside of the Zip file and not the Zip file itself.
If there is a good reason we can increase the per file limit beyond 2GB, at the expense of using more RAM, but we aren't aware of any reason why this might be required.