A Little Off Code, Computers, Photography and Guns

4Dec/100

Automagic TV Show Calendar

A little while ago I was browsing the web and discovered a website called tvrage.com[1] which seems to be the definitive online TV guide. I didn't originally enter the site on the main index but on a page describing the functionality of an XML API[2] they host for accessing their database of TV shows.

To me, this is like opening presents on christmas day. Just imagine the possibilities! I immediately began exploring the kind of data they provide. The very first idea I had was to use this to create events on my google calendar automatically for unaired episodes of my favorite TV shows.

I've previously written python scripts that interface with gdata but I find their implementation for python to be kind of cumbersome to deal with so I began researching their Protocol API[3]. At first I wasted a lot of time attempting to build the necessary XML structures to add events and the like. This got old very fast and I decided to just give JSON-C[4] a try. Turns out you can use the built-in JSON module in python for creating the necessary structures.

For parsing the results I got from tvrage I ended up using python's xml.etree.ElementTree which was simple enough to setup to retrieve only the information for each episode I was interested in.[5]

I had a bit of trouble initially with adding events to google calendar. This stemmed from the fact that google often will return an HTTP Redirect which includes a url with an appended gsession attribute which you're supposed to resubmit the exact data from the first request to. Once I figured this out it was turtles all the way down. I even managed to get the whole script multi-threaded to speed things up since it's impossible to perform batch-requests with JSON-C.

I should note that for the configuration file the calendar should be the "Calendar ID" for the calendar that can be found by looking at the settings page for the individual calendar, it is grouped with the XML and iCal feeds.

ShowList.txt:[6]

1
2
3
4
5
6
7
8
9
10
11
12
Castle  19267
House   3908
Bones   2870
Big Bang Theory, The    8511
Mentalist, The  18967
Rizzoli & Isles 24996
Venture Bros., The  6270
Top Gear    6753
Mythbusters 4605
Archer  23354
NCIS    4628
Community   22589

Config.cfg:

1
2
3
4
[Credentials]
username = someuser@gmail.com
password = somebase64encodedpassword
calendar = somecalendarid@group.calendar.google.com

AirDate.py:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
import urllib2, urllib, json, ConfigParser, base64
from datetime import date
from xml.etree import ElementTree
from threading import Thread

calendar = ""
header = {}

# Thread for retrieving a list of episodes for a given show_id
class airDate(Thread):
    # Initialize thread and set some local attributes
    def __init__(self, show_name, show_id):
        Thread.__init__(self)
        self.show_name = show_name
        self.show_id = show_id
   
    # Get episode list from tvrage.com based on the show_id
    def run(self):
        # Retrieve XML episode_list from tvrage.com
        xml_data = urllib2.urlopen("http://services.tvrage.com/feeds/episode_list.php?sid=%s" % self.show_id).read()
        # Pares XML into ElementTree.Element()
        xml_tree = ElementTree.fromstring(xml_data)
        self.result = []
       
        # For each season
        for season in xml_tree.findall("Episodelist/Season"):
            # Get the season number
            season_num = int(season.get("no"))
            # For each episode in the episode list
            for episode in season.findall("episode"):
                # Get episode number and title
                episode_num = int(episode.find("seasonnum").text)
                episode_title = episode.find("title").text
               
                # Build the episode code S##E##
                episode_code = "S%02dE%02d" % (season_num, episode_num)
               
                # Parse the airdate into year, month and day
                year, month, day = map(lambda x: int(x), episode.find("airdate").text.split("-"))
                try:
                    episode_airdate = date(year, month, day)
                    today = date.today()
                    # If episode hasn't aired yet
                    if episode_airdate >= today:
                        # Add episode to results list
                        self.result.append("%s %s - %s" % (str(episode_airdate), self.show_name, episode_code))
                except ValueError:
                    # If the airdate is invalid (tvrage.com sometimes
                    # includes 00's for unknown sections of the date
                    pass

class addEvent(Thread):
    # Thread for adding events to google calendar
   
    # Initialize thread and set local episode variable
    def __init__(self, episode):
        Thread.__init__(self)
        self.episode = episode
   
    # Add new entry to google calendar
    def run(self):
        # Build entry structure
        entry = {"data": {"details": self.episode, "quickAdd": True}}
        # Convert to JSON
        entry = json.dumps(entry)
       
        # Build request including necessary headers and data
        calReq = urllib2.Request("http://www.google.com/calendar/feeds/%s/private/full?alt=jsonc" % (calendar), entry, header)
        # Execute the request
        calRes = urllib2.urlopen(calReq)
        # Get the redirect url (gsession appended)
        redirectReq = urllib2.Request(calRes.geturl(), entry, header)
        try:
            redirectRes = urllib2.urlopen(redirectReq)
        except HTTPError:
            # If we get some sort of HTTP error code
            # skip entry, can always run again
            pass
   
# Get list of events already added to
# the calendar from previous executions
def getExistingEpisodes(header):
    # Get JSON-C representation of calendar
    calReq = urllib2.Request(url="https://www.google.com/calendar/feeds/%s/private/full?alt=jsonc" % (calendar), headers=header)
    calRes = urllib2.urlopen(calReq)
   
    # Parse JSON-C
    data = json.loads(calRes.read())
    # If the calendar has events on it
    if "items" in data["data"]:
        # Get the list of events
        events = data["data"]["items"]
        existing_episodes = []
        # For each event
        for event in events:
            # Append just the title of the event to the results
            existing_episodes.append(event["title"])
           
        return existing_episodes
    else:
        # We don't have any events on this calendar
        # so just return an empty list
        return []

if __name__ == '__main__':
    # Open the configuration file and get the necessary
    # credentials and settings
    config = ConfigParser.ConfigParser()
    config.readfp(open("Config.cfg"))
    username = config.get("Credentials", "username")
    password = config.get("Credentials", "password")
    # Password is stored as base64 encoded string just so
    # we don't have our password sitting out in plain sight
    password = base64.b64decode(password)
    calendar = config.get("Credentials", "calendar")
   
    # Build loginData structure, this is used to get
    # authentication data from google
    loginData = {
        "Email": username,
        "Passwd": password,
        "source": "BeMasher-ETR-2",
        "service": "cl"
    }

    # Encode the loginData for usage in a url
    loginData = urllib.urlencode(loginData)
    # Get authentication data
    gdataLogin = urllib2.urlopen("https://www.google.com/accounts/ClientLogin", data=loginData)
    SID, LSID, Auth = gdataLogin.read().splitlines()
   
    # Build header structure, this will be used for
    # all requests to google calendar from now on
    header = {
        "Authorization": "GoogleLogin %s" % (Auth),
        "GData-Version": 2,
        "Content-Type": "application/json"
    }
   
    # Open a list of the shows we're interested in
    # Stored as "show_name\tshow_id", one per line
    show_list = open("ShowList.txt")
    jobs = []
    for line in show_list:
        show = line.strip().split("\t")
        jobs.append(show)
   
    # Get a list of existing events from previous
    # executions so we don't wind up with duplicates
    existingEpisodes = getExistingEpisodes(header)
   
    threadQueue = []
    # For each episode we've retrieved that is unaired
    for job in jobs:
        show_name, show_id = job
        # Create an instance of the airDate thread
        thread = airDate(show_name, show_id)
        # Start it
        thread.start()
        # Add it to the threadQueue
        threadQueue.append(thread)
       
    episodes = []
    # While we've still got running threads
    while len(threadQueue) > 0:
        # Get a thread from the queue
        thread = threadQueue.pop()
        # Block until it completes
        thread.join()
        # For each episode in the results
        for episode in thread.result:
            # If it hasn't already been added to google calendar
            if episode[11:] not in existingEpisodes:
                print episode
                # Add to list of episodes that need events created
                episodes.append(episode)
   
    # For each episode that doesn't have an
    # event on google calendar already
    for episode in episodes:
        # Create an addEvent thread, start it
        # and add it to the threadQueue
        thread = addEvent(episode)
        thread.start()
        threadQueue.append(thread)
   
    # While we still have threads running
    while len(threadQueue) > 0:
        # Get a thread from the queue
        thread = threadQueue.pop()
        # Block until it completes
        thread.join()

This was all done shortly before I discovered that tvrage.com also provides iCal feeds for your favorite shows provided that you register and add some to your list. Unfortunately the iCal feed they generate creates events for exact air times of each episode which I'm not really all that concerned about. So I use this script still to add all-day events for each episode which is easier to view//see when there's a new episode.

I did write another script using their XML API but that will have to wait for another post.

  1. http://tvrage.com/ []
  2. http://services.tvrage.com/ []
  3. Data API Developer's Guide: The Protocol []
  4. Google's own flavor of JSON which is almost identical to plain old JSON. []
  5. I only really needed the original air date, title, season number and episode number. []
  6. You can find the show_id via the show search found on their XML API page. []
28Jul/100

Matplotlib and Live Data: A Tale of Two Technologies

Being unemployed over the summer is never usually a good thing for me. I get bored very easily if I don't have something to occupy myself with. This last bout of boredom led me to unpack some of my electronics. Dusted off my multimeter, Arduino and a digital thermometer I bought a little while ago. Figured I could use these to solve one of my current problems.

Living in Laramie usually subjects people to harsh winters which leaves most housing developments without central air conditioning installed since, well it's never really needed except maybe one or two days over the summer where it gets above 85 oF. This summer has apparently been hotter than previous summers and It's left my condo in an "uncomfortable state". Mind you I'm used to living in hot weather so this isn't such a terrible thing to me, I'm used to it.

What I'm not used to is not having AC and it cooling off enough at night that it's worthwhile to open a few windows and stick a fan in one of them. Which leaves me with this problem: When is the optimal time to open the windows and turn on the fan to get my condo cooled off earliest//fastest?

In comes my Arduino + digital thermometer[1]. Once I rigged up the proper power//data connections on a breadboard for my Arduino I set out to find code for the thermometer. I" ve setup the thermometer with a sketch on my Arduino before I just didn't feel like wasting a few hours trying to do it from scratch again. Soon enough I found some code[2] that worked perfectly. So I trimmed out some code I didn't need for the project and set it up to just write the temperature as fast as possible[3] to the serial port it's connected to.

After that I wrote a logging program on my desktop in Python to record temperatures sent via serial to my desktop. The program is incredibly simple and uses the pySerial library[4] to read temperatures from the serial port of my desktop and append them to a temperature log. I used a simple windows command to do this since it wouldn't lock the file so I could read data from it simultaneously. There are still occasionally collisions with the processing program locking the file and the logger not being able to write the data to the file but these are rare enough that it's negligible in my situation.

1
2
3
4
5
import serial, os

ser = serial.Serial(2)
while True:
    os.system("echo %s>>out.txt" % (ser.readline().strip()))

The next step in this project was visualizing the data. I've used matplotlib[5] before and I was thinking this time I would like to see if I could write the program to update data live as it recieves it. My first foray into this goal was a miserable disaster. Most of the solutions I could find involved just setting up an infinite loop with a short time delay in it. Which works great except that it sleeps the thread running the plot which makes it impossible to resize the plot or do anything at all with the GUI for that matter. So obviosly this wouldn't work at all.

After poking around for different solutions to this and crashing my computer once from spawning an infinite number of instances of the plot I gave up for a bit, only to discover that there was an example in the documentation which wasn't obviously named. I quickly discovered the best way to do this. I even added some pretty annotations and such.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
import gobject
import matplotlib
matplotlib.use('GTKAgg')

import matplotlib.pyplot as plt

current_pos = 0
temps = []
pad = 5.0

f = plt.figure()

def update(vars):
    # Unpack variables that need to be persistent between
    # executions of this method.
    temps = vars[0]
    current_pos = vars[1]
    pad = vars[2]
   
    # Open the data file and get any new data points since
    # the last time we read from this file
    data = open("out.txt", "r")
    data.seek(current_pos)
    new_temps = map(lambda x:
        float(x) * (1 + 4.0/5.0) + 32.0,
        data.read().split("\n")[:-1])
    current_pos = data.tell()
    data.close()
   
    # If we got new data then append it to the list of
    # temperatures and trim to 750 points
    if len(new_temps) > 0:
        temps.extend(new_temps)
        temps = temps[-750:]
   
    f.clear()
    f.suptitle("Live Temperature")
    a = f.add_subplot(111)
    a.grid(True)
    l, = a.plot(temps)
    plt.xlabel("Time (Seconds)")
    plt.ylabel(r'Temperature $^{\circ}$F')
   
    # Get the minimum and maximum temperatures these are
    # used for annotations and scaling the plot of data
    min_t = min(temps)
    max_t = max(temps)
   
    # Add annotations for minimum and maximum temperatures
    a.annotate(r'Min: %0.2f$^{\circ}$F' % (min_t),
        xy=(temps.index(min_t), min_t),
        xycoords='data', xytext=(20, -20),
        textcoords='offset points',
        bbox=dict(boxstyle="round", fc="0.8"),
        arrowprops=dict(arrowstyle="->",
        shrinkA=0, shrinkB=1,
        connectionstyle="angle,angleA=0,angleB=90,rad=10"))

    a.annotate(r'Max: %0.2f$^{\circ}$F' % (max_t),
        xy=(temps.index(max_t), max_t),
        xycoords='data', xytext=(20, 20),
        textcoords='offset points',
        bbox=dict(boxstyle="round", fc="0.8"),
        arrowprops=dict(arrowstyle="->",
        shrinkA=0, shrinkB=1,
        connectionstyle="angle,angleA=0,angleB=90,rad=10"))
   
    # Set the axis limits to make the data more readable
    a.axis([0,len(temps), min_t - pad,max_t + pad])
   
    f.canvas.draw_idle()
   
    # Repack variables that need to be persistent between
    # executions of this method
    vars = {0: temps, 1: current_pos, 2: pad}
   
    return True

vars = {0: temps, 1: current_pos, 2: pad}

# Execute update method every 500ms
gobject.timeout_add(500, update, vars)

# Display the plot
plt.show()

This code generates a plot which updates every 500ms. This is based on an example in the matplotlib examples[6]. An example of the program's output can be seen below.

I imagine that I could have made this simpler by not using the GTK libraries which are a pain to install since there are 3 or 4 modules you have to install in order to make all this work including the GTK+ runtime. I may come back later and post a version written using TK since it can be used without installing extra modules and stuff.

  1. DS18S20 Digital Thermometer Datasheet []
  2. Temperature Measurement using the Dallas DS18B20 by Peter H. Anderson []
  3. Somewhere in the range of 750ms between readings since it is in parasite mode, may change this later to run in non-parasite mode. []
  4. pySerial Python Library []
  5. matplotlib Python Library []
  6. Animation example code: simple_anim_gtk.py []
2Dec/090

Comcast’s Data Usage Meter

It looks like Comcast is starting to roll out a data usage meter to customers in the Portland, OR area so they can gauge how far along they are in their 250GB per year limit. According to Gizmodo, Comcast says their median data usage is 2-4GB per month. I thought this was hilarious so I decided to do a little calculating of my own.

I've got a Linksys WRT-something-or-other router which I've installed DD-WRT on. Recent versions of the firmware have a section that keeps track of overall traffic through WAN that your router handles. It also makes it pretty easy to do a little calculation of your own with it since you can download the data in text format. It logs in terms of total data in and out per day of each month. November was my first full month of data excluding the the first of the month (something broke that day I guess), so I downloaded the log and looked at November's data.

On average we downloaded 1917MB per day and uploaded 562MB per day. This is the total traffic between 3 people. Grand total we downloaded 54GB and uploaded 16GB. If we take a look at the ratio between the two I can approximate what our actual bandwidth is. We're supposed to have a 20Mb down connection and the ratio suggests that our up bandwidth is ~5.86Mb which means our maximum upload rate is 750KB/s which we've never achieved before. When I use bittorrent to download Linux ISO's I assume that in order to not choke our router with ACK's I need to throttle the upload rate to about 70% of the maximum which hovers around about 120-130KB/s which is ~1Mb/s even and that's only 70% of the max.

Basically I wouldn't survive if I had Comcast and a 250GB limit per year.