System Memory Analysis
Choosing the best RAM for your system can be difficult, as there are a lot of things to consider. Doing comparisons by hand can net you some pretty decent results but picking the best price per capacity per ... can get fairly complicated if you're doing it by hand.
A while ago you may remember my SSD analysis script[1] that scraped HTML from Newegg to calculate scores for each product to choose the best one. I've also recently discovered that Newegg does indeed have an API[2] that greatly simplifies this whole process[3].
Once I had explored Newegg's API enough to get the data I needed I set to work to update the SSD script as well as write a few others for HDD's and system memory as well. Of the scripts I wrote the one for system memory turned out to be particularly useful as it made finding great deals very easy. It also illustrated that popular brands may not always be the best deal.
The first major improvement over the previous scripts was the use of threading to make multiple API requests in parallel which sped things up quite a bit. While Python's threading library doesn't allow for parallelism of the CPU[4] it does for file I/O. Below is the class used for grabbing urls throughout the script.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | import threading import urllib, urllib2 import json, re from Queue import Queue class GetURL(threading.Thread): def __init__(self, urlQueue, jsonQueue): threading.Thread.__init__(self) self.urlQueue = urlQueue self.jsonQueue = jsonQueue def run(self): while True: itemNumber, url = self.urlQueue.get() raw = urllib2.urlopen(url).read() jsonQueue.put((itemNumber, json.loads(raw))) self.urlQueue.task_done() |
Newegg's API paginates the data as the Android app displays the data directly to the user which means there's no easy way to retrieve all results in one request. So you must make successive calls incrementing the page number until all results for the query have been retrieved.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | itemSpecURL = "http://www.ows.newegg.com/Products.egg/{}/Specification" searchURL = "http://www.ows.newegg.com/Search.egg/Advanced" itemList = getItems() urlQueue = Queue() jsonQueue = Queue() items = {} for item in itemList: specURL = itemSpecURL.format(item["ItemNumber"]) urlQueue.put((item["ItemNumber"], specURL)) items[item["ItemNumber"]] = item for worker in xrange(2): t = GetURL(urlQueue, jsonQueue) t.setDaemon(True) t.start() urlQueue.join() |
These basic setups are fairly generic and can be used to analyze just about any product from Newegg. Anything beyond this point however is specific to the type of product you're analyzing. This will grab each item's basic data including price as well as it's detailed specifications. I should also note that the parameters passed to the API in getItems is generated using the query builder available in the post about Newegg's API.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | speed_re = re.compile('DDR\d\s(\d+).*') capacity_re = re.compile("(\d+)GB\s\((\d+)\sx\s(\d+)GB\)") timing_re = re.compile('(\d+-\d+-\d+-\d+)') features = ['Brand', 'Model', 'ItemNumber', 'Price', 'Speed', 'Capacity', 'Dimms', 'Timing', 'Voltage'] while not jsonQueue.empty(): itemNumber, specs = jsonQueue.get() item = {} for group in specs['SpecificationGroupList']: for pair in group['SpecificationPairList']: if pair['Key'] in features: item[pair['Key']] = pair['Value'].encode('ascii', errors='ignore') if 'Capacity' in item: capacity = capacity_re.match(item['Capacity']) if capacity: item['Capacity'] = capacity.group(1) item['Dimms'] = capacity.group(2) if 'Speed' in item: speed = speed_re.match(item['Speed']) if speed: item['Speed'] = speed.group(1) if 'Timing' in item: timing = timing_re.match(item['Timing']) if timing: item['Timing'] = timing.group(1).replace('-','\t') else: continue item['Price'] = items[itemNumber]['FinalPrice'] item['ItemNumber'] = specs['NeweggItemNumber'] try: print '\t'.join(map(lambda x: item[x], features)) except KeyError: pass jsonQueue.task_done() |
The basic purpose of the above code is to go through each item and format each feature into usable data[5]. Once the data has been formatted and printed I continue the rest of the filtering and analysis in Microsoft's Excel.
The equation used to calculate a score for each set of system memory is as follows:
Currently it looks like G.Skill has the best to offer in the DDR3 memory market if you're looking for a quad-channel set for Sandy Bridge's enthusiast hardware due out in the next quarter[6].
- Choosing an SSD (A more different S) [↩]
- Newegg's JSON API [↩]
- Even though it was never intended for that what I'm using it for. [↩]
- Python is crippled in this way due to a global interpreter lock. [↩]
- Or at least the data that we're interested in using for analysis. [↩]
- G.SKILL Ripjaws X Series 16GB (4 x 4GB) 1333Mhz [↩]

