shell
MacDevCenter.com -- Tapping RSS with Shell Scripts
Sign In/My Account | View Cart
Sign In/My Account | View Cart
Articles
Weblogs
Books
School
Short Cuts
Podcasts
Listen
Print
Discuss
Subscribe to Mac
Subscribe to Newsletters
Tapping RSS with Shell Scripts
by Dave Taylor, coauthor of Learning Unix for Mac OS X Panther
03/12/2004
If you're like me, you want to keep up with the latest news and information.
Shell scripts help me do just that. In this article I'll show you how
I wrote a shell script that watches the news at Slashdot.org
and automatically shows me the latest story headlines every time I launch
a Terminal application.
First Things First
Before any shell script work begins, the first step is to figure out
the URL of the RSS page on Slashdot.
TIP: RSS is Really Simple Syndication,
an XML-format data stream that's much more easily parsed
and tracked than HTML pages, at least programmatically.
The Slashdot home page doesn't make it particularly easy to find, but
the very bottom line, the very rightmost link, is "rss", and
the URL behind that link is http://slashdot.org/index.rss.
To look at it from within the Terminal, I'm going to utilize the powerful
curl application, piping the output to head to ensure that I'm not drowned
in output:
$ curl --silent 'http://slashdot.org/index.rss' | head
or the :
$ curl --silent "$url" | grep -E '(title>|description>)' | head
Slashdot
News for nerds, stuff that matters
Slashdot
Yahoo To Charge For Search Listings
ibi writes "Yahoo will start taking payments
to "tilt the playing field" for companies that want their
listings given more prominence by Yahoo's search engine. ...
Infinium Labs Threatens HardOCP Again
XBox4Evr writes "In a follow-up from two weeks ago,
Infinium Labs is again threatening the tech web site HardOCP
with legal action. This in itself, is no big ...
SCO Postpones Lawsuit, Now Threatening Two
zzxc writes "In a surprise turn of events, SCO says
that they need more time to prepare an announcement of who
they are going to sue. According to SCO, the ...
Gyroscopic Wireless Mouse
Not bad. In fact, that's really almost all we need. So let's turn this
into a shell.
Essential Reading
What Are Syndication Feeds
By Shelley Powers
Table of Contents
Syndication feeds have become a standard tool on the Web. But when you enter the world of syndicated content, you're often faced with the question of what is the "proper" way to do syndication. This edoc, which covers Atom and the two flavors of RSS--2.0 and 1.0--succinctly explains what a syndication feed is, then gets down to the nitty-gritty of what makes up a feed, how you can find and subscribe to them, and which feed will work best for you.
Read Online--Safari
Search this book on Safari:
Only This Book
All of Safari
Code Fragments only
Headlines Only
To turn this command line into a shell script is a breeze: just open
up your favorite Terminal command-line editor (I use vi but I've been
trapped in Unix since 1980 so it's already subverted my neural pathways.
You might prefer pico or even BBEdit or similar) Whichever you choose,
type in the following, a standard shell script preamble:
#!/bin/sh
This tells the operating system that when this particular file is executed,
it should be given to the shell (sh) to be run. Then let's create a
variable that contains the URL:
url="http://slashdot.org/index.rss"
Now we can reference $url and the entire script has become more portable
and easily modified. The next line is the entire command:
curl --silent "$url" | grep -E '(title>|description>)'
NOTE: If you get a "command not found" error with curl, you might need
to specify a full path. In Panther, the curl command can be found at
/usr/bin/curl in standard installations.
This script produces the output already seen, so let's make two tweaks
to it so it's more useful. First off, the first three lines of output,
the Slashdot title and description, never change so it'd be just as
easy to strip them out of the output. This can be done a variety of
ways, but I'm going to turn to the sed command, which has many hidden
powers. One of them is that if you specify the '-n' flag, by default
it won't output any of its input. The value of this? Then we can specify
a pattern of some sort and only output those lines that match the pattern.
Like this:
curl --silent "$url" | grep -E '(title>|description>)' | \
sed -n '4,$p'
Notice the trailing backslash here: rather than have our command pipe
stretch longer and longer, the backslash (which must be the very last
character on the line) let's me wrap the command to multiple lines and
make it generally more readable.
We're getting close to trying the script. The only other tweak worth
making is to strip out the , , ,
and tags themselves. This too can be done with
sed, in a typically Unix-y fashion:
curl --silent "$url" | grep -E '(title>|description>)' | \
sed -n '4,$p' | \
sed -e 's///' -e 's/<\/title>//' -e 's// /' \
-e 's/<\/description>//'
The XML tags are effectively stripped out, except the
tag is replaced by two spaces, just for formatting. The result, assuming
you've saved this as slash-rss.sh, as I have:
$ sh slash-rss.sh | head -4
Yahoo To Charge For Search Listings
ibi writes "Yahoo will start taking payments to "tilt the
playing field" for companies that want their listings given more
prominence by Yahoo's search engine. ...
Infinium Labs Threatens HardOCP Again
XBox4Evr writes "In a follow up from two weeks ago, Infinium Labs
is again threatening the tech web site HardOCP with legal action. This in
itself, is no big ...
This shows the top two stories (4 lines = two titles + two descriptions).
Not bad. Not beautiful, but certainly functional for a first script.
I always spend way too much time fine-tuning scripts to get just the
output I want, so let's continue working on this to ensure that the
output is more readable, shall we? It's so easy, you'll be amazed:
curl --silent "$url" | grep -E '(title>|description>)' | \
sed -n '4,$p' | \
sed -e 's///' -e 's/<\/title>//' -e 's// /' \
-e 's/<\/description>//' | \
fmt
The results, piped through head again:
$ sh slash-rss.sh | head
Yahoo To Charge For Search Listings
ibi writes "Yahoo will start taking payments to "tilt the playing
field" for companies that want their listings given more prominence
by Yahoo's search engine. ...
Infinium Labs Threatens HardOCP Again
XBox4Evr writes "In a follow up from two weeks ago, Infinium
Labs is again threatening the tech web site HardOCP with legal
action. This in itself, is no big ...
SCO Postpones Lawsuit, Now Threatening Two
zzxc writes "In a surprise turn of events, SCO says that they
The problem now is that the head needs to be between the sed invocations
and the fmt command, since we have no way of knowing how many lines
each description is going to produce when fed through fmt. The solution
is to build the next generation of this script!
Headlines, As Many As You Want
The obvious solution is to add a command flag that lets you specify how
many headlines you want: multiply it by two and you'll know what value
to feed head within the script. Here's how that looks as part of a shell
script ($# is the number of arguments and $1 is the first argument):
#!/bin/sh
url="http://slashdot.org/index.rss"
if [ $# -eq 1 ] ; then
headarg=$(( $1 * 2 )) # $(( )) specifies that you're using an equation
else
headarg="-8" # default is four headlines
fi
curl --silent "$url" | grep -E '(title>|description>)' | \
sed -n '4,$p' | \
sed -e 's///' -e 's/<\/title>//' -e 's// /' \
-e 's/<\/description>//' | \
head $headarg | fmt
Now I can specify that I only want the top headline, the newest entry
on the Slashdot site, by simply specifying '-1' when I invoke the script:
$ sh slash-rss.sh -1
Yahoo To Charge For Search Listings
ibi writes "Yahoo will start taking payments to "tilt the playing
field" for companies that want their listings given more prominence
by Yahoo's search engine. ...
That's pretty cool, I think. I could tweak it forever, but let's stop
here and see how to turn this into a Unix command just like ls and cd.
TIP: You can download
this shell script in finished form.
Turning It Into a Command
There are two ways to turn a shell script into a command: create an alias
or make the script executable and ensure it's in your PATH. To create
an alias, if you're using Bash, an alias can be created like this:
alias slashdot="sh slash-rss.sh"
Then you can see the headlines by just typing slashdot on your command
line.
To make the shell script itself executable, first make sure you've saved
it in a directory that's in your PATH by typing:
$ echo $PATH
/bin:/sbin:/usr/bin:/usr/sbin:/sw/bin:/usr/X11R6/bin:
/usr/local/bin:/Users/dt/bin:/sw/bin
You can see that my PATH includes /Users/dt/bin - that's where I save
this script and similar. Once it's in the right place, you'll need to
make it executable by using the chmod command:
$ chmod +x slash-rss.sh
Optionally, you could rename the script to be a bit more friendly, of
course.
Related Reading
Wicked Cool Shell Scripts
101 Scripts for Linux, Mac OS X, and UNIX Systems
By Dave Taylor
Read Online--Safari
Search this book on Safari:
Only This Book
All of Safari
Code Fragments only
Finally, Having It Auto-Execute Upon Terminal Launch
If you're running the Bash shell, which you probably are if you're in
Panther, then it's a breeze: move to your home directory and append
an invocation of the script to your .bash_login file:
$ cd
$ echo "sh slash-rss.sh -2" >> .bash_login
Make extra sure that you use two >>, not one, on that last command!
Now the next time you start up a Terminal application window, you'll
see:
Last login: Tue Mar 2 23:09:36 on ttyp3
Welcome to Darwin!
Yahoo To Charge For Search Listings
ibi writes "Yahoo will start taking payments to "tilt the playing
field" for companies that want their listings given more prominence
by Yahoo's search engine. ...
Infinium Labs Threatens HardOCP Again
XBox4Evr writes "In a follow up from two weeks ago, Infinium
Labs is again threatening the tech web site HardOCP with legal
action. This in itself, is no big ...
$
It's also worth noting that this use of shell scripts to parse and format
XML has more applications. For example, go to http://www.casino-bookstore.com/ and have a close look at the "Latest Gambling News" box: it's
using almost an identical script to keep track of the gambling news
XML feed from about.com. Another example? Go to http://www.healthy-bookstore.com/ and look at the medicinenet news feed. Again, it's using curl and sed to turn the XML data into HTML data.
Dave Taylor
is a popular writer, teacher and speaker of business and technology issues. The founder of The Internet Mall and iTrack.com, he's been involved with UNIX and the Internet since 1980. He's also been a Mac fan since the year it was released.
Return to the Mac DevCenter
Your shell script tricks and tips are welcome here.
You must be logged in to the O'Reilly Network to post a talkback.
Showing messages 1 through 3 of 3.
REBOL is a very good scripting option
2004-03-31 08:08:28
brittlestar
[Reply | View]
curl, grep, sed... these are fundamental nix commands, but it also involves navigating the details of these commands. In contrast, REBOL offers a simple approach:
url: read http://slashdot.org/index.rss
parse/all url [
any [
copy title to
(print title)
|
copy desc to
(print desc)
|
skip
]
to end
]
REBOL is free and platform-independent (although not OSS). It's a very handy, very tiny, cross-platform scripting shell -- and it provides a capable built-in GUI (not yet on OSX).
For more info, see: http://www.rebol.com and http://www.rebol.net/cookbook/
Another little tweek
2004-03-16 05:45:39
jefflargent
[Reply | View]
sed -e 's//^[[0;31;40m/' -e 's/<\/title>/^[[0;37;40m/'
In a color xterm will change the headline to red.
a few things to beware of
2004-03-13 10:12:14
jnazario
[Reply | View]
a few things to beware of, for the slashdot RSS feed and for RSS parsing using regular expressions.
slashdot's got server load problems (they are quite popular, imagine a several year sustained slashdot effect), and one way they try and deal with it is by blocking people who snag their RSS feed more than once every 30 minutes. hence, if you use this login script and log in more than once every half hour (or if this is a system wide thing ...) you're toast. instead, use cron to fetch the RSS once an hour (IIRC they rebuild their RSS only hourly, like most sites) and use a local cache for this script. you'll ensure you get headlines.
secondly, parsing RSS using regular expressions is prone to errors if the feed changes. instead, look at a real XML parser. lightweight ones exist in perl and in python:
http://www-106.ibm.com/developerworks/web/library/w-rss.html
http://www-106.ibm.com/developerworks/webservices/library/ws-pyth11.html
these will be far more flexible and will work for any valid RSS/XML file.
hope this helps.
Search Mac
Tagged Articles
Post to del.icio.us
This article has been tagged:
rss Articles that share the tag rss:Secure RSS Syndication (278 tags)Making Your RSS Feed Look Pretty in a Browser (234 tags)What Is RSS (144 tags)RSS and AJAX: A Simple News Reader (135 tags)The New Bloglines Web Services (59 tags)View Allshell Articles that share the tag shell:Top Ten Mac OS X Tips for Unix Geeks (24 tags)bash on Mac OS X (7 tags)Network Your Shell Scripts with Netpipes (6 tags)Enhanced Interactive Python with IPython (5 tags)Top Ten Data Crunching Tips and Tricks (5 tags)View Allprogramming Articles that share the tag programming:Rolling with Ruby on Rails (1374 tags)Very Dynamic Web Interfaces (279 tags)Ajax on Rails (231 tags)Understanding MVC in PHP (202 tags)A Simpler Ajax Path (186 tags)View Allxml Articles that share the tag xml:Very Dynamic Web Interfaces (595 tags)Introducing del.icio.us (181 tags)How to Create a REST Protocol (161 tags)Secure RSS Syndication (112 tags)XML on the Web Has Failed (109 tags)View Allmac Articles that share the tag mac:Top Ten Mac OS X Tips for Unix Geeks (300 tags)Automated Backups on Tiger Using rsync (98 tags)Getting Things Done with Your Mac (89 tags)How Does Open Source Software Stack Up on the Mac? (81 tags)Write Your Own Automator Actions (55 tags)View All
Sponsored Resources
Inside Lightroom
Related to this Article
Take Control of Switching to the Mac
by Scott Knaster
March 2008
$10.00 USD
Take Control of Easy Backups in Leopard
by Joe Kissell
December 2007
$10.00 USD
Contact Us |
Advertise with Us |
Privacy Policy |
Press Center |
Jobs
Copyright © 2000-2008 O'Reilly Media, Inc. All Rights Reserved. | (707) 827-7000 / (800) 998-9938
All trademarks and registered trademarks appearing on the O'Reilly Network are the property of their respective owners.
For problems or assistance with this site, email
разделы
экг сервис
облицовка панель
пескоструйка
raymond weil
5440.16 (крышка)
сервер hp
тонировка
кассовый машина
5003.17 (крышка)
купить джойстик
стелажи
motorola v3i купить
motorola v3i купить
motorola v3i купить
охота пиранья
подготовка ielts
100 девчонка одна лифт
сдача ielts
компания макса линдера
qtek
урок охота
спецобувь производитель
кулер 775
спецобувь оптом
электромонтажный стол
холодный обзвон
программа шифрование данный
лечение алкоголизма
венеролог
ром доставка
крот dr
гнб
купить архиватор
озеленение
восстановление удаленный информация
тестоделитель
zip lock
слюдопластовые втулка
ziplock
облицовка панель
газонокосилка elmos
брэнд
кофе колониальный товар
радиодоступ
вакуумный упаковочный
сейфовые ячейка
лак краска
nokia 6021 купить
российский флаг
телематические служба
mobihel краска
электрокамин dimplex model plasma (sp9)
роль ставень
кс-4361
restart плита
позитивный психология
вино роза
промальп
peg perego venezia
доставка кулеров
бензопила stihl
восстановление бухучета
доставка
дэнас
покупка кострома
кулер 478
красный площадь мавзолей
бюгельные зубной протез
решетка окон
восстановление потенция
купить каболка
холодильник норд
обзвон
корпаративные вечеринка
аэрография
велюкс
рассылка корреспонденция
трубогиб
скачать длинный нард
крановый тележка
гайковерт электрический
серверные корпус консольный переключатель
shell
стоматологический услуга
люминисцентная краска
жаропрочный фарфор revol
telecomfm gsmphone
пежо 407
враждебный поглощение
софт автошкола
omega
гнб
гуп ритуал
базовый шпатлевка
краска ржавчина
5004.14 (крышка)
циклон цол
thuraya sg 2520
решетка
рак щитовидный железа
оформление свадеб
архитектурный визуализация
электрокардиограф
гайковерт электрический
красный площадь собор
альпинизм
холодильник дешево
химчистка доставка
нестандартный коробка
изделие слойка
теплогенераторы master
градирня вентиляторные грд
жаростойкий краска
лечение щитовидный железа
чиллеры
квн
электросчетчик сэт
двухтарифные электросчетчик
гелусил лак
электротельфер
кулер винчестер
man гильза
эрозия шейка матка
кулер 939
измеритель сопротивление
книга кремль
уничтожение данный
ковры резиновый
краска двухкомпонентный
гравировальный бур
измеритель петля фаза нуль
стелажи
тройник
тройник
тройник
купить раструб
автономный электроснабжение
трость доставка
shell