Popular Internet Images

I’m scraping a few Yahoo sites with: most e-mailed, most popular, ratings, AFP photos. With these scrapes you get a nice RSS feed.

I was irritated about the scrape(s) cause I did not scrape the description.
From today: the popular internet images do have the description also

If you have a site with ‘popular’ images I can check if I can scrape it for you.
The examples you can see at the above link

b.t.w. How old is Emma and where is she comming from?
The power of RSS: keep track of changes !

prophecy: RSS in Holland

RSS is getting mainstream. More and more site do implement RSS for syndication usage.

I predict that in the next few months more and more NEWS sites will implement the feeds, so that there is no need for scraping anymore

If not: Dutch News sites will be not interesting as news media anymore (were they?)

I’ve tracked a number of self-scraped sites right now and 90% of the news here is from ANP or NOVUM and they copy it all.

It’s preferred that all the Local NewsPapers with the vignette server do add news. Right now I’m scraping some local stuff and it’s working. So why not use it? I see the headline; an excerpt and I visit the site and see your advertisements. Kewl not?

Planet Multimedia, Quotenet, All Wegener Newspapers and of course Webwereld (yes you have RSS), but make it public, not only the trillian feed !:
For all: please add RSS to your sites

How to scrape with MyHeadlines

Request from a reader: how to scrape with myheadlines?
Myheadlines has an own engine to scrape websites for a RSS/XML feed
How does this work?. Myheadlines does have a tutorial and when you understand this tutorial you can create scraped RSS feeds.
To explain scraping a little I will use as example an feed with almost all options available.
How to scrape GOOGLE WORLD News
Find the URL: (the less overhead the better so I use the lite version of the page).
http://news.google.nl/news/en/us/mainlite.html

{dump}<title>{site_name}</title>
This line is most of the time my ‘test’ line. If this does function (showing the title within Myheadlines, I know I can probably scrape the page)

{dump}<a name=WORLD>
I’m searching for a name=WORLD> this line is unique
After this line the scraper should find the link, title and/or description

{dump}<a class=y href=”{link_1}”>{title_1}</a>
Here I define where the scraper should use the link and title feature

{dump} {desc_1} 
And here the description.

So the example for a correct scraped Google News Feed would be:

{dump}<title>{site_name}</title>
{dump}<a name=WORLD>
{dump}<a class=y href=”{link_1}”>{title_1}</a>
{dump}< {desc_1}

{dump}<a class=y href=”{link_2}”>{title_2}</a>
{dump} {desc_2}

Please be noticed that you NEED to add {dump} at the start of a new line

Output:
Sharon won't tolerate roadmap violations

Jerusalem – Israeli Prime Minister Ariel Sharon warned his government would not tolerate the slightest Palestinian violation of the roadmap for peace, the Israeli media reported on Friday.

Cambodia's Sihanouk to stay out of poll deadlock

Diplomats have said the king, who commands wide respect as the father of national reconciliation in the war-torn Southeast Asian state, could end the deadlock caused by Prime Minister Hun Sen's need for a coalition partner.

Requested feed: Turning Tables

Mark requested to scrape: Turning Tables

So find the scraped feed here

From his comments: “it’s a great blog created by a soldier in Iraq. Definitely gives you another view to the war… “

Q: What tool are you using to scrape feeds, btw? I use MyHeadlines
And I have customized it a little to popularize it to my AroundMyRoom. Even I added/changed the option that you can send headlines by e-mail.

New Scraped Feed: David Letterman Top 10

Yeah !
David Letterman Daily Top 10 Finally available in RSS/XML. Updated once a day!

I miss him on Dutch TV

Fun with Scraping #2

Page3 dot com
updated once a day

Fun with scraping

Just scraped the ‘xs4all storingen’ page
So if there is a service failing at my ISP. I see it in RSS .. AWESOME !

Yes, I could do it also for you!

:: update: the URL links were wrong, changed the source URL and fixed it ::

Is RSS becoming popular?

More and more sites here in Holland & Belgium do implement the RSS / XML button/feed. More and more Weblogs & sites do see the ‘light’ of syndication.

Perfect. After months of writing, talking about it at my website I see people do actually use it and implement it.

The argument; Why should I? has been changed into: I do not want to be the latest one implementing it into my site. Everyone has the XML button.

And for those sites without RSS feeds: I’m scraping !

RSS feeds for news.google.com

I’m news addict (you probably already knew this) and thefore: today I scraped the news.google.com website for creating 8 RSS/XML feeds…
After an initial problem with the scrapes it seems to work right now.
Ahh my service is as-is, so :-) it’s working as long as it works ;-)

:: updates are still not possible / do have problems ::
I think I will remove the feeds tonight.

New Scraped Feed: ID&T Radio News

ID&T Radio has also news. But No RSS/XML feed
Today I started to scrape once an hour their news

Have fun with this new feed
please be noticed that I do not publish the RSS feed, you can grab it from my newsfeed site.

AroundMyRoom PlayGround

I'm playing here. And you are invited

RSS/XML Feeds