extraction question
-
yiwong2001 2011-02-20 14:34:00.0
Greeting
How can I extract data from web structure like follows:
main Category
category c1
news under c1
category c2
category c2.1
news under c2.1
category c2.2
category c2.3
..............
The category page contains a list of subcategory links and related news links
I want to start by their root category to go through the whole site, and crawl all the articles in all the categories. The problem I met so far are:
I do not how to extract news url indead of url text from category page. How can I define the Next Page?
Looking forward your help
Thanks!
Ian -
Elena Lyubina 2011-02-26 14:04:00.0
Greetings,
in your case you can do the following: you can create two data patterns. The first one extracts data you need. The second one extracts the category list. Then you create dataview and datasource for the first pattern. And then you create a two level extraction. To move to the child state you use the second pattern, which returns the category list. You need to set up the flag 'recursive'. It means that the crawler enters all the levels of subcategories till it extracts them all.
---
Best regards,
Elena Lyubina
Sundewsoft Customer Service
Page 1 of 1
1