Popular Internet Images

I’m scraping a few Yahoo sites with: most e-mailed, most popular, ratings, AFP photos. With these scrapes you get a nice RSS feed.

I was irritated about the scrape(s) cause I did not scrape the description.
From today: the popular internet images do have the description also

If you have a site with ‘popular’ images I can check if I can scrape it for you.
The examples you can see at the above link

b.t.w. How old is Emma and where is she comming from?
The power of RSS: keep track of changes !

prophecy: RSS in Holland

RSS is getting mainstream. More and more site do implement RSS for syndication usage.

I predict that in the next few months more and more NEWS sites will implement the feeds, so that there is no need for scraping anymore

If not: Dutch News sites will be not interesting as news media anymore (were they?)

I’ve tracked a number of self-scraped sites right now and 90% of the news here is from ANP or NOVUM and they copy it all.

It’s preferred that all the Local NewsPapers with the vignette server do add news. Right now I’m scraping some local stuff and it’s working. So why not use it? I see the headline; an excerpt and I visit the site and see your advertisements. Kewl not?

Planet Multimedia, Quotenet, All Wegener Newspapers and of course Webwereld (yes you have RSS), but make it public, not only the trillian feed !:
For all: please add RSS to your sites

How to scrape with MyHeadlines

Request from a reader: how to scrape with myheadlines?
Myheadlines has an own engine to scrape websites for a RSS/XML feed
How does this work?. Myheadlines does have a tutorial and when you understand this tutorial you can create scraped RSS feeds.
To explain scraping a little I will use as example an feed with almost all options available.
How to scrape GOOGLE WORLD News
Find the URL: (the less overhead the better so I use the lite version of the page).
http://news.google.nl/news/en/us/mainlite.html

{dump}<title>{site_name}</title>
This line is most of the time my ‘test’ line. If this does function (showing the title within Myheadlines, I know I can probably scrape the page)

{dump}<a name=WORLD>
I’m searching for a name=WORLD> this line is unique
After this line the scraper should find the link, title and/or description

{dump}<a class=y href=”{link_1}”>{title_1}</a>
Here I define where the scraper should use the link and title feature

{dump}</b><br>{desc_1}<br>
And here the description.

So the example for a correct scraped Google News Feed would be:

{dump}<title>{site_name}</title>
{dump}<a name=WORLD>
{dump}<a class=y href=”{link_1}”>{title_1}</a>
{dump}<<br>{desc_1}

{dump}<a class=y href=”{link_2}”>{title_2}</a>
{dump}</b><br>{desc_2}<br>

Please be noticed that you NEED to add {dump} at the start of a new line

Output:
Sharon won't tolerate roadmap violations

Jerusalem – Israeli Prime Minister Ariel Sharon warned his government would not tolerate the slightest Palestinian violation of the roadmap for peace, the Israeli media reported on Friday.

Cambodia's Sihanouk to stay out of poll deadlock

Diplomats have said the king, who commands wide respect as the father of national reconciliation in the war-torn Southeast Asian state, could end the deadlock caused by Prime Minister Hun Sen's need for a coalition partner.

Is RSS becoming popular?

More and more sites here in Holland & Belgium do implement the RSS / XML button/feed. More and more Weblogs & sites do see the ‘light’ of syndication.

Perfect. After months of writing, talking about it at my website I see people do actually use it and implement it.

The argument; Why should I? has been changed into: I do not want to be the latest one implementing it into my site. Everyone has the XML button.

And for those sites without RSS feeds: I’m scraping !