?

Log in

No account? Create an account

Planet Python

« previous entry | next entry »
Feb. 18th, 2007 | 12:34 am

One of my friends has been rather active in the python world and I've been trying to decide if I should export some of my blog to planet python. Since I'm going to PyCon, I decided now would be a good time to do it.

I hacked together a small python script which reads the ATOM feed at http://alienghic.livejournal.com/data/atom and filters out everything except for posts tagged with "for:planetpython"

The thing that was frustrating was that although pythons xml.etree.ElementTree, was reasonably straight forward for figuring out how to parse the atom xml feed, when I tried to recreate the original feed, ElementTree helpfully added in some extra namespace information. e.g. <feed ...> became <ns0:feed ...>.

I was tired and tired of procrastinating on putting together this script, so I tweaked the feed format back to the atom specification by some careful re.sub("ns0", "", feed) code. (That solution feels rather dirty.)

#!/usr/bin/python2.5
#
# Author: Diane Trout 
# License: free for any use, I'm just not responsible for anything
#          you do while using this code.

import re
import sys
import urllib2
import xml.etree.ElementTree as ET

def filter_journal(journal, tags=[]):
  """
  Remove any entry from an atom feed that doesn't have a tag in tags
  """
  atomns = "{http://www.w3.org/2005/Atom}%s"
  entries = [x for x in journal.findall(atomns % ('entry'))]
  tags_used = set()
  for e in entries:
    has_term = False
    categories = e.findall(atomns % ('category'))
    for c in categories:
      tags_used.add(c.get('term'))
      if c.get('term') in tags:
        has_term = True
        break
    if not has_term:
      journal.remove(e)
  return journal

def cgi_filter(url):
  """Connect to the ATOM feed at url and print the filtered feed
  """ 
  # this works because LJ only returns 25 posts. If it was returning more
  # or was providing a streaming feed, we'd need to do something more
  # clever here
  stream = urllib2.urlopen(url)
  data = stream.read()
  feed = ET.fromstring(data)
  feed = filter_journal(feed, ['for:planetpython'])
  
  # print stream.info().getheader('content-type')
  print "content-type: text/xml"
  print
  print '<?xml version="1.0" encoding="utf-8"?>'
  print "<!-- If you are running a bot please visit this policy page outlining rules you must respect. http://www.livejournal.com/bots/ -->"

  # begin terrible hack
  filtered_feed = ET.tostring(feed)
  filtered_feed = re.sub("<ns0:", "<", filtered_feed)
  filtered_feed = re.sub("</ns0:", "</", filtered_feed)
  filtered_feed = re.sub(":ns0=", "=", filtered_feed)
  print filtered_feed
  
def main(args=None):
  cgi_filter("http://alienghic.livejournal.com/data/atom")
  return 0

if __name__ == "__main__":
  sys.exit(main(sys.argv))

Link | Leave a comment |

Comments {3}

Vicky the Compost Queen

PyCon? I'm jealous

from: vixter
date: Feb. 18th, 2007 05:39 pm (UTC)
Link

Our python club always has reports from PyCon and often we get a repeat of Guido's speech (because he's in our club). It sounds like lots of fun.

Alas I haven't infected my company with it yet. And startups don't usually send people to conferences.

Reply | Thread

Diane Trout

Re: PyCon? I'm jealous

from: alienghic
date: Feb. 18th, 2007 08:10 pm (UTC)
Link

So far I've just paid for my trip, though my boss did think that going was a good idea. So I can probably be reimbursed for some of it.

You've got Guido, in your group? That's pretty neat.

I think Titus Brown and Grig Gheorghiu are probably the highested profile people amongst the python community. (So we get to frequently hear about testing).

Also it looks like your groups larger, our biggest meeting has been around 12 people.

Reply | Parent | Thread

Dilinger

So serious

from: dilinger
date: Feb. 21st, 2007 04:41 pm (UTC)
Link

No I feel bad about wanting to put a stupid joke in about you being a dirty girl, writing filthy dirty code.

Well it is python, and that sounds like the snake and they crawl around in the dirt. So I think there is a thread here. Then again I went to bed at 1am and got up just before 5am to come to work, so I have that lack of sleep buzz rant thing going.

Reply | Thread