MailWalker is software that
is designed to browse web sites to extract email addresses present. This
activity is decomposed into a three distinct tasks performed in parallel:
Walking
When you add new websites to
a meeting, you tell by MailWalker which sites you want to start your collection.
Unless you have clearly stated in the addition MailWalker is not confined to
the analysis of such web sites. Indeed, when analyzing these sites, MailWalker
will try to find links to other websites and add to the list of sites to be
analyzed.
This phase of the course, which is to find and track a number
of external links to a website, is highly customizable:
- a structural point
of view because you can specify at what level of hierarchy you want to do;
- from a quantitative indication of the limits in terms of number of sites to
go;
- and a qualitative point of view since you can specify under what
conditions you want (or not) include a new site in the browse list.
MailWalker is able to browse sites with external links are "classics" (hard
links), but also to detect more subtle relationships, such as indirect links
(including scripting), some links enabled javascript or those contained in the
flow XML. MailWalker is also capable, if you ask him, to follow scripted or HTTP
redirects.
Exploring
As MailWalker that built its
browse list, based on the sites you've added and the options you have, it also
tries to explore the sites. This time to build a list of internal pages for each
web site of the browse list.
Just as the engine of course, the engine of
exploration is also highly configurable:
- a structural point of view because
you can specify how you want the exploration to be done and what type of media;
- from a quantitative indication of the limits in terms of number of pages to
explore and document sizes not to exceed;
- and a qualitative point of view
since you can specify under what conditions you want (or not) include a page of
a site in the list of analysis.
MailWalker has the ability to explore a
number of types of sites with options more or less "open", but you can also ask
him to try to explore the sites of established media that he does not know
explore natively, the results are often very interesting.
Analyzing
While the exploration engine
finds pages "for the good service", the analyzer is responsible to verify the
content of these pages and extract emails depending on the options you have
asked to meet.
Here you can refine the behavior of the analyzer:
- a
structural point of view by choosing the method or methods to be used for
analysis;
- and a qualitative point of view, specifying under what conditions
you want (or not) store and found an email how you want to make it usable.
MailWalker was designed to be very "maléable", the operating
characteristics of engines journey and exploration as well as the analyzer can
be easily adapted to certain scenarios to fully escape its competitors.
It is becoming quite easy to add new features or heuristics to these
engines and that is why, in order to stick to your business and your
applications, you will appear regularly updates MailWalker ... Also, remember to
activate the search for automatic updates!