Choosing an SSD
Before I started my new job I had an inordinate amount of free time and for a majority of that time, nothing to spend it doing[1]. I was still thinking about my desktop wishlist[2] and about choosing a better SSD than the one I had previously selected[3].
A long time ago when I was following the HDD market since I was looking to buy some bulk storage I wrote a php script which loaded newegg's product list based on some search parameters you provided newegg's productlist.xml[4]. The script would then parse the list and produce a list sorted based on price per gigabyte. Which is useful when you're in the market for capacity[5].
I decided to do more or less the same thing with SSD's except this time I did it in python since I'm rusty on PHP and I didn't want to mess with setting up a web server to test on. So I got started by doing a power search on newegg for the specific flavor of SSD I was looking for.
The search parameters are as follows:
- 2.5" Form Factor
- SATA II/III
- 120GB or Greater
- Less than $300
- Retail or OEM
- Support TRIM Command
As of this writing those particular search parameters narrows the result to 17 SSD's. Now comes the code. Before I started coding I needed some way to sort them according to what I thought was important. The metric is as follows:
After looking closer at the scores this produces I noticed that it heavily penalizes drives with huge differences between read and write speeds which effectively weeds out drives that still have acceptable read//write speeds. So I removed that section of the metric producing:
The basic idea behind this scoring measure is that sequential read and write speeds are important, as well as capacity. Price and difference between sequential read//write are considered bad[6]. In the equation read and write refer to sequential read and write speeds. The ratio of these will produce a score of the SSD's overall performance for capacity, read//write speeds and price.
The code is relatively simple in purpose. Load the data and parse it into a dictionary then sort based on the metric above.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 | import urllib2, re # url = " # http://www.newegg.com/Product/ProductList.aspx?Submit=Property&Subcatego # ry=636&Description=&Type=&N=100008120&IsNodeId=1&srchInDesc=&MinPrice=&M # axPrice=&OEMMark=1&OEMMark=0&PropertyCodeValue=4213:30854&PropertyCodeVa # lue=4214:30848&PropertyCodeValue=4214:39416&PropertyCodeValue=4214:30849 # &PropertyCodeValue=4214:39415&PropertyCodeValue=4215:55552&PropertyCodeV # alue=4215:41071&PropertyCodeValue=4215:46319" # data = open("temp.html", "w") # data.write(urllib2.urlopen(url).read()) # data.close() raw = open("temp.html").read() item_re = re.compile(r'<div class="itemCell".*?>(.*?)<br class="clear".*?</div>') feature_re = re.compile(r"<li> (.*?)</li>") feature_list_re = re.compile(r'<b>(.*?)\s?\#?\s?:\s?</b>\s?(.*?)</li>') speed_re = re.compile(r"(up to )?(\d+).*?MB/s") capacity_re = re.compile(r"(\d+)GB") price_re = re.compile(r"</span>\$<strong>(\d+)</strong><sup>.(\d+)</sup>") item_list = [] valid = ['Read', 'Item', 'Interface', 'Capacity', 'Model', 'Write', 'Size'] for item in item_re.findall(raw): current = {} no_label = [] features = feature_re.findall(item) current["Size"] = features[0] current["Capacity"] = features[1] current["Interface"] = features[2] for feature in feature_list_re.findall(item): if feature[1].find("\r") != -1: current[feature[0]] = feature[1].split("\r")[0] else: current[feature[0]] = feature[1] current["Read"] = int(speed_re.findall(current["Sequential Access - Read"])[0][1]) current["Write"] = int(speed_re.findall(current["Sequential Access - Write"])[0][1]) current["Capacity"] = int(capacity_re.findall(current["Capacity"])[0]) for feature in current.keys(): if feature not in valid: del current[feature] current["Price"] = float('.'.join(price_re.findall(item)[0])) current["Item"] = "http://www.newegg.com/Product/Product.aspx?Item=%s" % (current["Item"]) item_list.append(current) sorted = {} for item in item_list: ratio = (item["Read"] * item["Write"] * item["Capacity"]) / (item["Price"]) sorted[ratio] = item sort_order = sorted.keys() sort_order.sort() sort_order.reverse() for key in sort_order: #print '\t'.join(map(lambda x: str(x), sorted[key].keys())) print '\t'.join(map(lambda x: str(x), sorted[key].values())) |
Now given that there is quite a lot of data to present and analyze all at once I've decided it would be easiest to just provide you with a pretty graph[7]:

If you look closely at the scores of all the disks in the query, you'll notice that this is a noticeable gap between the top 3 and the rest. They are as follows:
| Manufacturer: | A-DATA | Patriot | G.Skill |
| Series: | S599 | Inferno | Phoenix Series |
| Capacity: | 128GB | 120GB | 120GB |
| Read: | 280MB/s | 285MB/s | 285MB/s |
| Write: | 270MB/s | 275MB/s | 275MB/s |
| Item: | N82E16820211471[8] | N82E16820220510[9] | N82E16820231372[10] |
| Price: | $295.99 | $289.99 | $299.00 |
I noticed that if you ignore capacity in the metric then the Patriot Inferno is the clear winner here. So as it turns out the Western Digital SiliconEdge I had selected when I first wrote the wishlist wasn't the best drive for my needs. But then I've always had a soft-spot for Western Digital. But now I'm convinced that the Patriot Inferno is the SSD I'll be getting unless by the time I get around to buying one there are better options[11].
- Nothing worth-while anyway [↩]
- See previous post: Wishlist. [↩]
- Western Digital SiliconEdge 128GB SSD [↩]
- Which no longer exists in it's original form. [↩]
- Which I was. [↩]
- Although we're excluding read//write speed difference. [↩]
- Scores have been normalized to 100%. [↩]
- A-Data S599 [↩]
- Patriot Inferno [↩]
- G.Skill Phoenix Series [↩]
- Which there probably will be. [↩]

