A Little Off Code, Computers, Photography and Guns

11Aug/100

Choosing an SSD

Before I started my new job I had an inordinate amount of free time and for a majority of that time, nothing to spend it doing[1]. I was still thinking about my desktop wishlist[2] and about choosing a better SSD than the one I had previously selected[3].

A long time ago when I was following the HDD market since I was looking to buy some bulk storage I wrote a php script which loaded newegg's product list based on some search parameters you provided newegg's productlist.xml[4]. The script would then parse the list and produce a list sorted based on price per gigabyte. Which is useful when you're in the market for capacity[5].

I decided to do more or less the same thing with SSD's except this time I did it in python since I'm rusty on PHP and I didn't want to mess with setting up a web server to test on. So I got started by doing a power search on newegg for the specific flavor of SSD I was looking for.

The search parameters are as follows:

  • 2.5" Form Factor
  • SATA II/III
  • 120GB or Greater
  • Less than $300
  • Retail or OEM
  • Support TRIM Command

As of this writing those particular search parameters narrows the result to 17 SSD's. Now comes the code. Before I started coding I needed some way to sort them according to what I thought was important. The metric is as follows:

$$\frac{\text{Read} \times \text{Write} \times \text{Capacity}}{|\text{Read} - \text{Write}| \times \text{Price}}$$

After looking closer at the scores this produces I noticed that it heavily penalizes drives with huge differences between read and write speeds which effectively weeds out drives that still have acceptable read//write speeds. So I removed that section of the metric producing:

\frac{\text{Read} \times \text{Write} \times \text{Capacity}}{\text{Price}}

The basic idea behind this scoring measure is that sequential read and write speeds are important, as well as capacity. Price and difference between sequential read//write are considered bad[6]. In the equation read and write refer to sequential read and write speeds. The ratio of these will produce a score of the SSD's overall performance for capacity, read//write speeds and price.

The code is relatively simple in purpose. Load the data and parse it into a dictionary then sort based on the metric above.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
import urllib2, re

# url = "
# http://www.newegg.com/Product/ProductList.aspx?Submit=Property&Subcatego
# ry=636&Description=&Type=&N=100008120&IsNodeId=1&srchInDesc=&MinPrice=&M
# axPrice=&OEMMark=1&OEMMark=0&PropertyCodeValue=4213:30854&PropertyCodeVa
# lue=4214:30848&PropertyCodeValue=4214:39416&PropertyCodeValue=4214:30849
# &PropertyCodeValue=4214:39415&PropertyCodeValue=4215:55552&PropertyCodeV
# alue=4215:41071&PropertyCodeValue=4215:46319"

# data = open("temp.html", "w")
# data.write(urllib2.urlopen(url).read())
# data.close()
raw = open("temp.html").read()

item_re = re.compile(r'<div class="itemCell".*?>(.*?)<br class="clear".*?</div>')
feature_re = re.compile(r"<li>&nbsp;(.*?)</li>")
feature_list_re = re.compile(r'<b>(.*?)\s?\#?\s?:\s?</b>\s?(.*?)</li>')
speed_re = re.compile(r"(up to )?(\d+).*?MB/s")
capacity_re = re.compile(r"(\d+)GB")
price_re = re.compile(r"</span>\$<strong>(\d+)</strong><sup>.(\d+)</sup>")

item_list = []
valid = ['Read', 'Item', 'Interface', 'Capacity', 'Model', 'Write', 'Size']

for item in item_re.findall(raw):
    current = {}
    no_label = []
    features = feature_re.findall(item)
    current["Size"] = features[0]
    current["Capacity"] = features[1]
    current["Interface"] = features[2]
   
    for feature in feature_list_re.findall(item):
        if feature[1].find("\r") != -1:
            current[feature[0]] = feature[1].split("\r")[0]
        else:
            current[feature[0]] = feature[1]
    current["Read"] = int(speed_re.findall(current["Sequential Access - Read"])[0][1])
    current["Write"] = int(speed_re.findall(current["Sequential Access - Write"])[0][1])
    current["Capacity"] = int(capacity_re.findall(current["Capacity"])[0])
    for feature in current.keys():
        if feature not in valid:
            del current[feature]
    current["Price"] = float('.'.join(price_re.findall(item)[0]))
    current["Item"] = "http://www.newegg.com/Product/Product.aspx?Item=%s" % (current["Item"])
    item_list.append(current)
   
sorted = {}
for item in item_list:
    ratio = (item["Read"] * item["Write"] * item["Capacity"]) / (item["Price"])
    sorted[ratio] = item
   
sort_order = sorted.keys()
sort_order.sort()
sort_order.reverse()
for key in sort_order:
    #print '\t'.join(map(lambda x: str(x), sorted[key].keys()))
    print '\t'.join(map(lambda x: str(x), sorted[key].values()))

Now given that there is quite a lot of data to present and analyze all at once I've decided it would be easiest to just provide you with a pretty graph[7]:


If you look closely at the scores of all the disks in the query, you'll notice that this is a noticeable gap between the top 3 and the rest. They are as follows:

Manufacturer: A-DATA Patriot G.Skill
Series: S599 Inferno Phoenix Series
Capacity: 128GB 120GB 120GB
Read: 280MB/s 285MB/s 285MB/s
Write: 270MB/s 275MB/s 275MB/s
Item: N82E16820211471[8] N82E16820220510[9] N82E16820231372[10]
Price: $295.99 $289.99 $299.00


I noticed that if you ignore capacity in the metric then the Patriot Inferno is the clear winner here. So as it turns out the Western Digital SiliconEdge I had selected when I first wrote the wishlist wasn't the best drive for my needs. But then I've always had a soft-spot for Western Digital. But now I'm convinced that the Patriot Inferno is the SSD I'll be getting unless by the time I get around to buying one there are better options[11].

  1. Nothing worth-while anyway []
  2. See previous post: Wishlist. []
  3. Western Digital SiliconEdge 128GB SSD []
  4. Which no longer exists in it's original form. []
  5. Which I was. []
  6. Although we're excluding read//write speed difference. []
  7. Scores have been normalized to 100%. []
  8. A-Data S599 []
  9. Patriot Inferno []
  10. G.Skill Phoenix Series []
  11. Which there probably will be. []
28Jul/100

Matplotlib and Live Data: A Tale of Two Technologies

Being unemployed over the summer is never usually a good thing for me. I get bored very easily if I don't have something to occupy myself with. This last bout of boredom led me to unpack some of my electronics. Dusted off my multimeter, Arduino and a digital thermometer I bought a little while ago. Figured I could use these to solve one of my current problems.

Living in Laramie usually subjects people to harsh winters which leaves most housing developments without central air conditioning installed since, well it's never really needed except maybe one or two days over the summer where it gets above 85 oF. This summer has apparently been hotter than previous summers and It's left my condo in an "uncomfortable state". Mind you I'm used to living in hot weather so this isn't such a terrible thing to me, I'm used to it.

What I'm not used to is not having AC and it cooling off enough at night that it's worthwhile to open a few windows and stick a fan in one of them. Which leaves me with this problem: When is the optimal time to open the windows and turn on the fan to get my condo cooled off earliest//fastest?

In comes my Arduino + digital thermometer[1]. Once I rigged up the proper power//data connections on a breadboard for my Arduino I set out to find code for the thermometer. I" ve setup the thermometer with a sketch on my Arduino before I just didn't feel like wasting a few hours trying to do it from scratch again. Soon enough I found some code[2] that worked perfectly. So I trimmed out some code I didn't need for the project and set it up to just write the temperature as fast as possible[3] to the serial port it's connected to.

After that I wrote a logging program on my desktop in Python to record temperatures sent via serial to my desktop. The program is incredibly simple and uses the pySerial library[4] to read temperatures from the serial port of my desktop and append them to a temperature log. I used a simple windows command to do this since it wouldn't lock the file so I could read data from it simultaneously. There are still occasionally collisions with the processing program locking the file and the logger not being able to write the data to the file but these are rare enough that it's negligible in my situation.

1
2
3
4
5
import serial, os

ser = serial.Serial(2)
while True:
    os.system("echo %s>>out.txt" % (ser.readline().strip()))

The next step in this project was visualizing the data. I've used matplotlib[5] before and I was thinking this time I would like to see if I could write the program to update data live as it recieves it. My first foray into this goal was a miserable disaster. Most of the solutions I could find involved just setting up an infinite loop with a short time delay in it. Which works great except that it sleeps the thread running the plot which makes it impossible to resize the plot or do anything at all with the GUI for that matter. So obviosly this wouldn't work at all.

After poking around for different solutions to this and crashing my computer once from spawning an infinite number of instances of the plot I gave up for a bit, only to discover that there was an example in the documentation which wasn't obviously named. I quickly discovered the best way to do this. I even added some pretty annotations and such.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
import gobject
import matplotlib
matplotlib.use('GTKAgg')

import matplotlib.pyplot as plt

current_pos = 0
temps = []
pad = 5.0

f = plt.figure()

def update(vars):
    # Unpack variables that need to be persistent between
    # executions of this method.
    temps = vars[0]
    current_pos = vars[1]
    pad = vars[2]
   
    # Open the data file and get any new data points since
    # the last time we read from this file
    data = open("out.txt", "r")
    data.seek(current_pos)
    new_temps = map(lambda x:
        float(x) * (1 + 4.0/5.0) + 32.0,
        data.read().split("\n")[:-1])
    current_pos = data.tell()
    data.close()
   
    # If we got new data then append it to the list of
    # temperatures and trim to 750 points
    if len(new_temps) > 0:
        temps.extend(new_temps)
        temps = temps[-750:]
   
    f.clear()
    f.suptitle("Live Temperature")
    a = f.add_subplot(111)
    a.grid(True)
    l, = a.plot(temps)
    plt.xlabel("Time (Seconds)")
    plt.ylabel(r'Temperature $^{\circ}$F')
   
    # Get the minimum and maximum temperatures these are
    # used for annotations and scaling the plot of data
    min_t = min(temps)
    max_t = max(temps)
   
    # Add annotations for minimum and maximum temperatures
    a.annotate(r'Min: %0.2f$^{\circ}$F' % (min_t),
        xy=(temps.index(min_t), min_t),
        xycoords='data', xytext=(20, -20),
        textcoords='offset points',
        bbox=dict(boxstyle="round", fc="0.8"),
        arrowprops=dict(arrowstyle="->",
        shrinkA=0, shrinkB=1,
        connectionstyle="angle,angleA=0,angleB=90,rad=10"))

    a.annotate(r'Max: %0.2f$^{\circ}$F' % (max_t),
        xy=(temps.index(max_t), max_t),
        xycoords='data', xytext=(20, 20),
        textcoords='offset points',
        bbox=dict(boxstyle="round", fc="0.8"),
        arrowprops=dict(arrowstyle="->",
        shrinkA=0, shrinkB=1,
        connectionstyle="angle,angleA=0,angleB=90,rad=10"))
   
    # Set the axis limits to make the data more readable
    a.axis([0,len(temps), min_t - pad,max_t + pad])
   
    f.canvas.draw_idle()
   
    # Repack variables that need to be persistent between
    # executions of this method
    vars = {0: temps, 1: current_pos, 2: pad}
   
    return True

vars = {0: temps, 1: current_pos, 2: pad}

# Execute update method every 500ms
gobject.timeout_add(500, update, vars)

# Display the plot
plt.show()

This code generates a plot which updates every 500ms. This is based on an example in the matplotlib examples[6]. An example of the program's output can be seen below.

I imagine that I could have made this simpler by not using the GTK libraries which are a pain to install since there are 3 or 4 modules you have to install in order to make all this work including the GTK+ runtime. I may come back later and post a version written using TK since it can be used without installing extra modules and stuff.

  1. DS18S20 Digital Thermometer Datasheet []
  2. Temperature Measurement using the Dallas DS18B20 by Peter H. Anderson []
  3. Somewhere in the range of 750ms between readings since it is in parasite mode, may change this later to run in non-parasite mode. []
  4. pySerial Python Library []
  5. matplotlib Python Library []
  6. Animation example code: simple_anim_gtk.py []