A Little Off Code, Computers, Photography and Guns

11Aug/100

Choosing an SSD

Before I started my new job I had an inordinate amount of free time and for a majority of that time, nothing to spend it doing[1]. I was still thinking about my desktop wishlist[2] and about choosing a better SSD than the one I had previously selected[3].

A long time ago when I was following the HDD market since I was looking to buy some bulk storage I wrote a php script which loaded newegg's product list based on some search parameters you provided newegg's productlist.xml[4]. The script would then parse the list and produce a list sorted based on price per gigabyte. Which is useful when you're in the market for capacity[5].

I decided to do more or less the same thing with SSD's except this time I did it in python since I'm rusty on PHP and I didn't want to mess with setting up a web server to test on. So I got started by doing a power search on newegg for the specific flavor of SSD I was looking for.

The search parameters are as follows:

  • 2.5" Form Factor
  • SATA II/III
  • 120GB or Greater
  • Less than $300
  • Retail or OEM
  • Support TRIM Command

As of this writing those particular search parameters narrows the result to 17 SSD's. Now comes the code. Before I started coding I needed some way to sort them according to what I thought was important. The metric is as follows:

$$\frac{\text{Read} \times \text{Write} \times \text{Capacity}}{|\text{Read} - \text{Write}| \times \text{Price}}$$

After looking closer at the scores this produces I noticed that it heavily penalizes drives with huge differences between read and write speeds which effectively weeds out drives that still have acceptable read//write speeds. So I removed that section of the metric producing:

\frac{\text{Read} \times \text{Write} \times \text{Capacity}}{\text{Price}}

The basic idea behind this scoring measure is that sequential read and write speeds are important, as well as capacity. Price and difference between sequential read//write are considered bad[6]. In the equation read and write refer to sequential read and write speeds. The ratio of these will produce a score of the SSD's overall performance for capacity, read//write speeds and price.

The code is relatively simple in purpose. Load the data and parse it into a dictionary then sort based on the metric above.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
import urllib2, re

# url = "
# http://www.newegg.com/Product/ProductList.aspx?Submit=Property&Subcatego
# ry=636&Description=&Type=&N=100008120&IsNodeId=1&srchInDesc=&MinPrice=&M
# axPrice=&OEMMark=1&OEMMark=0&PropertyCodeValue=4213:30854&PropertyCodeVa
# lue=4214:30848&PropertyCodeValue=4214:39416&PropertyCodeValue=4214:30849
# &PropertyCodeValue=4214:39415&PropertyCodeValue=4215:55552&PropertyCodeV
# alue=4215:41071&PropertyCodeValue=4215:46319"

# data = open("temp.html", "w")
# data.write(urllib2.urlopen(url).read())
# data.close()
raw = open("temp.html").read()

item_re = re.compile(r'<div class="itemCell".*?>(.*?)<br class="clear".*?</div>')
feature_re = re.compile(r"<li>&nbsp;(.*?)</li>")
feature_list_re = re.compile(r'<b>(.*?)\s?\#?\s?:\s?</b>\s?(.*?)</li>')
speed_re = re.compile(r"(up to )?(\d+).*?MB/s")
capacity_re = re.compile(r"(\d+)GB")
price_re = re.compile(r"</span>\$<strong>(\d+)</strong><sup>.(\d+)</sup>")

item_list = []
valid = ['Read', 'Item', 'Interface', 'Capacity', 'Model', 'Write', 'Size']

for item in item_re.findall(raw):
    current = {}
    no_label = []
    features = feature_re.findall(item)
    current["Size"] = features[0]
    current["Capacity"] = features[1]
    current["Interface"] = features[2]
   
    for feature in feature_list_re.findall(item):
        if feature[1].find("\r") != -1:
            current[feature[0]] = feature[1].split("\r")[0]
        else:
            current[feature[0]] = feature[1]
    current["Read"] = int(speed_re.findall(current["Sequential Access - Read"])[0][1])
    current["Write"] = int(speed_re.findall(current["Sequential Access - Write"])[0][1])
    current["Capacity"] = int(capacity_re.findall(current["Capacity"])[0])
    for feature in current.keys():
        if feature not in valid:
            del current[feature]
    current["Price"] = float('.'.join(price_re.findall(item)[0]))
    current["Item"] = "http://www.newegg.com/Product/Product.aspx?Item=%s" % (current["Item"])
    item_list.append(current)
   
sorted = {}
for item in item_list:
    ratio = (item["Read"] * item["Write"] * item["Capacity"]) / (item["Price"])
    sorted[ratio] = item
   
sort_order = sorted.keys()
sort_order.sort()
sort_order.reverse()
for key in sort_order:
    #print '\t'.join(map(lambda x: str(x), sorted[key].keys()))
    print '\t'.join(map(lambda x: str(x), sorted[key].values()))

Now given that there is quite a lot of data to present and analyze all at once I've decided it would be easiest to just provide you with a pretty graph[7]:


If you look closely at the scores of all the disks in the query, you'll notice that this is a noticeable gap between the top 3 and the rest. They are as follows:

Manufacturer: A-DATA Patriot G.Skill
Series: S599 Inferno Phoenix Series
Capacity: 128GB 120GB 120GB
Read: 280MB/s 285MB/s 285MB/s
Write: 270MB/s 275MB/s 275MB/s
Item: N82E16820211471[8] N82E16820220510[9] N82E16820231372[10]
Price: $295.99 $289.99 $299.00


I noticed that if you ignore capacity in the metric then the Patriot Inferno is the clear winner here. So as it turns out the Western Digital SiliconEdge I had selected when I first wrote the wishlist wasn't the best drive for my needs. But then I've always had a soft-spot for Western Digital. But now I'm convinced that the Patriot Inferno is the SSD I'll be getting unless by the time I get around to buying one there are better options[11].

  1. Nothing worth-while anyway []
  2. See previous post: Wishlist. []
  3. Western Digital SiliconEdge 128GB SSD []
  4. Which no longer exists in it's original form. []
  5. Which I was. []
  6. Although we're excluding read//write speed difference. []
  7. Scores have been normalized to 100%. []
  8. A-Data S599 []
  9. Patriot Inferno []
  10. G.Skill Phoenix Series []
  11. Which there probably will be. []