Web Data Extracting and Analyzing Framework

воскресенье, 14 июня 2009, Александр Краковецкий

Overview

I am glad to annonce a new project called "WDEAF: Web Data Extracting and Analyzing Framework" which is very closed to data mining and can be used for data extracting and analyzing in the web. This framework allows you to extact different data from text or web sources, analyse text content, extract different information and write your own data mining & extracting applications and services. It can be useful for people which need to collect a lot of business data, retreive images from the site and store data in specific format etc.

In this blog I will write how to use this framework in real-world projects.

If you have some specific requirements please contact us (msugvn[at]gmail[dot]com) and we will develop a custom data extraction application for you.

According to your requirements,

Web Data Extracting and Analyzing Framework Features

So, you will be able to:

  • analyze DOM structure
  • get all data from the selected web page (meta info, keywords, title, html code, tables, images, links, working with DOM model from managed code)
  • extract emails, phones, faxes, IPs, credit cards info, guids etc.
  • extract links using conditions, filters
  • extract images
  • extract screenshots and thumbnails of the web pages (see Website Screenshots & Thumbnails Extractor project for more info)
  • analyze content of web pages
  • get Google PR, calculate TD-IDF metrics
  • get sentences, words, remove stop-words etc.
  • extend functionality by writing your extracting logic based on WDEAF framework

Examples of already developed software with WDEAF

P.S. The examples source code will be available very soon on codeplex.


Ищите нас в интернетах!

Комментарии

Свежие вакансии