Newegg’s JSON API
For the longest time I've wanted access to Newegg's product list. For me they've been one of the better and more structured websites for buying computer hardware. So naturally they're usually my first choice when it comes to finding a good deal on a particular piece of hardware. They're also rather useful for seeing what's out there since their product catalog is fairly complete.
A while back I had started wanting to sort through items to heuristically pick the best deal based on a number of features Newegg generally provides for each item. This method works pretty well on SSD's and system memory. But until a recent discovery I was limited to scraping Newegg's website in order to get any kind of information from them. If you've ever tried this sort of thing you know that it is messy and generally a bad idea because every single time Newegg changes the structure of their website or any minute detail this will almost always break your scraping script.
The discovery came in the form of a mobile application for Android[1]. The mobile app lets you browse their website in a clean and fast manner. But what got me thinking is that unlike some other mobile applications out there that are just application wrappers for the mobile version of their websites this one operates directly through the native GUI. Now this is where it got interesting. I knew that if Newegg had written the app to use the native GUI then they had to be providing the data to it somehow and I knew it had to be more structured than HTML scraping like what I've been doing[2]. You have no idea how happy I was to discover that I was right.
First thing I did was connect my Droid 2 Global to my home network via WiFi in order to sniff some of the traffic going to and from the mobile app. This was accomplished by mounting a CIFS drive from my Windows 7 desktop to my router running Tomato based firmware. The share had a binary for TCPDump which I then used to sniff for traffic originating or going to my phone's IP address. After setting this up and performing all of the basic operations I would need in order to "reverse engineer" the data source I got to work on filtering the important bits.
In WireShark I immediately discovered that they had a sub-domain they were using for these operations. All of the web requests that weren't images or for customer metrics and tracking went to this host:
Because this API is structured more or less the same as navigating their site and the identifiers are different I decided to start with writing a query builder. Basically the purpose was to allow me to browse to the particular category I was interested in analyzing and filter it down to just a few simple requirements to simplify the analysis.
The first major entry point in the process of browsing to what you're interested in pulling is:
http://www.ows.newegg.com/Stores.egg/Menus
This takes no parameters and provides the main menu:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | [ { "StoreDepa": "ComputerHardware", "StoreID": 1, "ShowSeeAllDeals": true, "Title": "Computer Hardware" }, { "StoreDepa": "PCNotebook", "StoreID": 3, "ShowSeeAllDeals": true, "Title": "PCs & Laptops" }, { "StoreDepa": "Electronics", "StoreID": 10, "ShowSeeAllDeals": true, "Title": "Electronics" }, ... |
Once you've selected a store to browse the next uri is:
http://www.ows.newegg.com/Stores.egg/Categories/{StoreID}
The only parameter it takes is StoreID which you'll find in the first query. This will return all of the categories within a store. I haven't really explored this very much as I'm only really interested in browsing system memory and SSD's. Using the Computer Hardware store the output is as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | [ { "Description": "Backup Devices & Media", "StoreID": 1, "NodeId": 6642, "ShowSeeAllDeals": true, "CategoryType": 0, "CategoryID": 2 }, { "Description": "Barebone / Mini Computers", "StoreID": 1, "NodeId": 6668, "ShowSeeAllDeals": true, "CategoryType": 0, "CategoryID": 3 }, { "Description": "CD / DVD Burners & Media", "StoreID": 1, "NodeId": 6646, "ShowSeeAllDeals": true, "CategoryType": 0, "CategoryID": 10 }, ... |
StoreID is included from the parameters of the request. I'm not exactly sure how to describe the purpose of NodeID but it appears to be a distinguishing feature of a category or subcategory. CategoryID is used for filtering results down to a specific category and can be either a root category or a subcategory. CategoryType determines whether CategoryID is a root category or if it contains subcategories. A value of 1 for CategoryType indicates that it is the root category.
Now depending on CategoryType you either move straight to the search query or onto a navigation query. The navigation query is used if there are subcategories:
http://www.ows.newegg.com/Stores.egg/Navigation/{StoreID}/{CategoryID}/{NodeID}
This query takes StoreID, CategoryID and NodeID, which you can get from the category listing of a particular store. It will return a subcategory list. Below is the subcategory listing for the memory category.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | [ { "Description": "Desktop Memory", "StoreID": 1, "NodeId": 7611, "ShowSeeAllDeals": false, "CategoryType": 1, "CategoryID": 147 }, { "Description": "Flash Memory", "StoreID": 1, "NodeId": 8038, "ShowSeeAllDeals": false, "CategoryType": 1, "CategoryID": 68 }, { "Description": "Laptop Memory", "StoreID": 1, "NodeId": 7609, "ShowSeeAllDeals": false, "CategoryType": 1, "CategoryID": 381 }, ... |
From here you will go to the search query[3]. At this point it does get a little tricky as the parameters for the query are no longer sent via GET they are instead sent using POST[4] which basically will require a programmatic method for making a search query. The search query given a category, store and node will list quite a lot of things. The first thing in the list is search filtering parameters, these will allow you to limit the products shown in the listing.
Data being posted is necessary to receive a non-404 response from the server, if you really wanted to you could just send an empty dictionary as this would just query newegg's entire product list. Any of the query options can be omitted, integer values may be omitted by substituting their value with -1.
The parameters you should concern yourself with are as follows along with the URL the data should be posted in JSON format to:
http://www.ows.newegg.com/Search.egg/Advanced
1 2 3 4 5 6 7 8 9 | data = { "SubCategoryId": 147, "NValue": "", "StoreDepaId": 1, "NodeId": 7611, "BrandId": -1, "PageNumber": 1, "CategoryId": 17 } |
NValue is a space separated list of NValues from the search parameters. Mind you, you cannot filter against more than one item in any category of search filters. For example in system memory you can't select DDR3 1333 (PC3 10600), DDR3 1333 (PC3 10660) and DDR3 1333 (PC3 10666). The query will return an unsucessful search result. The rest of the parameters are fairly self-explanatory.
The result returned will contain the following elements: RelatedLinkList, CoremetricsInfo, NavigationContentList, PaginationInfo, ProductListItems. CoremetricsInfo and RelatedLinkList can usually be ignored, the elements we're interested in are the NavigationContentList which is a list of search parameters//filters you can apply to the search. PaginationInfo describes how many elements were returned, what page we're on and how many elements there are per page. Last but not least the ProductListItems which provides a list of the products returned by the query along with some basic listing info for each one.
Below is a portion of the NavigationContentList:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | { "NavigationContentList": [ { "NavigationItemList": [ { "SubCategoryId": -1, "Description": "Free Shipping", "StoreDepaId": 94, "NValue": "100007611 600006050 600052012 4808", "BrandId": -1, "StoreType": 4, "ItemCount": 194, "CategoryId": -1, "ElementValue": "4808" }, { "SubCategoryId": -1, "Description": "Top Sellers", "StoreDepaId": -1, "NValue": "100007611 600006050 600052012 4802", "BrandId": -1, "StoreType": -1, "ItemCount": 39, "CategoryId": -1, "ElementValue": "4802" }, ... |
This section will also contain a group name:
1 2 3 4 5 6 7 8 9 10 11 12 13 | ... "TitleItem": { "SubCategoryId": -1, "Description": "Useful Links", "StoreDepaId": -1, "NValue": "4800", "BrandId": -1, "StoreType": -2, "ItemCount": 0, "CategoryId": -1, "ElementValue": "4800" } ... |
The PaginationInfo and ProductListItem elements will look like the following:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | ... "PaginationInfo": { "TotalCount": 233, "PageNumber": 1, "PageSize": 20 }, "ProductListItems": [ { "SellerId": null, "ItemOwnerType": 0, "Title": "Crucial Ballistix 4GB (2 x 2GB) 240-Pin DDR3 SDRAM DDR3 2133 (PC3 17000) Desktop Memory with Thermal Sensor Model BL2KIT25664FN2139", "ItemGroupID": 0, "ReviewSummary": { "Rating": 5, "TotalReviews": "[1]" }, "IsCellPhoneItem": false, "Discount": null, "FinalPrice": "$104.99", "ItemNumber": "20-148-372", "MappingFinalPrice": "$104.99", "FreeShippingFlag": true, "OriginalPrice": "$104.99", "IsComboBundle": false, "MailInRebateText": null, "ProductStockType": 0, "Model": "BL2KIT25664FN2139", "ShowOriginalPrice": false, "Image": { "FullPath": "http://images17.newegg.com/is/image/newegg/20-148-372-TS?$S125W$", "SmallImagePath": null, "ThumbnailImagePath": null, "Title": null }, "SellerName": null, "ParentItem": null }, ... |
At this point you might be wondering what good will all this do me if I can't get specifications on an item? Well, you can and here's how: In each ProductListItems element you'll find an ItemNumber, this is essentially the primary key that each product is related to within this interface to newegg's product list. Using the following url you can obtain the full details page on any given item using it's ItemNumber:
http://www.ows.newegg.com/Products.egg/{ItemNumber}/Specification
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 | { "SpecificationGroupList": [ { "GroupName": "Model", "SpecificationPairList": [ { "Value": "Crucial", "Key": "Brand" }, { "Value": "Ballistix", "Key": "Series" }, { "Value": "BL2KIT25664FN2139", "Key": "Model" }, { "Value": "240-Pin DDR3 SDRAM", "Key": "Type" } ] }, { "GroupName": "Tech Spec", "SpecificationPairList": [ { "Value": "4GB (2 x 2GB)", "Key": "Capacity" }, { "Value": "DDR3 2133 (PC3 17000)", "Key": "Speed" }, { "Value": "9", "Key": "Cas Latency" }, { "Value": "9-10-9-24", "Key": "Timing" }, { "Value": "1.65V", "Key": "Voltage" }, { "Value": "No", "Key": "ECC" }, { "Value": "Unbuffered", "Key": "Buffered/Registered" }, { "Value": "Dual Channel Kit", "Key": "Multi-channel Kit" } ] }, { "GroupName": "Manufacturer Warranty", "SpecificationPairList": [ { "Value": "Lifetime limited", "Key": "Parts" }, { "Value": "Lifetime limited", "Key": "Labor" } ] } ], "NeweggItemNumber": "N82E16820148372", "Title": "Crucial Ballistix 4GB (2 x 2GB) 240-Pin DDR3 SDRAM DDR3 2133 (PC3 17000) Desktop Memory with Thermal Sensor Model BL2KIT25664FN2139" } |
From this point on you can grab all of the features and specifications of any particular item you're interested in. In the near future I'll be writing a new post for both my memory and SSD analysis scripts using this interface.
The full code for my query builder is as follows, though you should note this was a quick script and is in no way complete or fully functional. As soon as it was to a useable point I moved onto the main point of this whole ordeal. You should also note that this requires CherryPy[5] and lxml[6]. The end result of this program is a query which you can use to retrieve a list of products matching the options you've selected. This is mainly to simplify product list selection and to minimalize the need to hardcode in certain values as newegg as a tendency to change things around on a regular basis.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 | import cherrypy, json, urllib, urllib2 from lxml import etree from lxml.builder import E class QueryBuilder(object): def index(self): request = urllib2.urlopen("http://www.ows.newegg.com/Stores.egg/Menus") response = request.read() data = json.loads(response) body = E.body() ul = E.ul() for store in data: ul.append(E.li(E.a( store['Title'], href= '/Store?StoreID={}'.format(store['StoreID']) ))) page = E.html(E.body(ul)) return etree.tostring(page, pretty_print=True) index.exposed = True def Store(self, StoreID=None): if StoreID is not None: request = urllib2.urlopen("http://www.ows.newegg.com/Stores.egg/Categories/{}".format(StoreID)) response = request.read() data = json.loads(response) body = E.body() ul = E.ul() for category in data: if category['CategoryType'] == 1: ul.append(E.li(E.a( category['Description'], href='/Search?StoreID={}&CategoryID={}&NodeID={}'.format(StoreID, category['CategoryID'], category['NodeId']) ))) else: ul.append(E.li(E.a( category['Description'], href='/Category?StoreID={}&CategoryID={}&NodeID={}'.format(StoreID, category['CategoryID'], category['NodeId']) ))) page = E.html(E.body(ul)) return etree.tostring(page, pretty_print=True) else: return "Invalid parameters." Store.exposed = True def Category(self, StoreID, CategoryID, NodeID): if None not in [StoreID, CategoryID, NodeID]: request = urllib2.urlopen("http://www.ows.newegg.com/Stores.egg/Navigation/{}/{}/{}".format(StoreID, CategoryID, NodeID)) response = request.read() data = json.loads(response) body = E.body() ul = E.ul() for subcategory in data: ul.append(E.li(E.a( subcategory['Description'], href= '/Search?StoreID={}&CategoryID={}&SubCategoryID={}&NodeID={}'.format(StoreID, CategoryID, subcategory['CategoryID'], subcategory['NodeId']) ))) page = E.html(E.body(ul)) return etree.tostring(page, pretty_print=True) else: return "Invalid parameters." Category.exposed = True def Search(self, StoreID=None, CategoryID=None, SubCategoryID=None, NodeID=None): url = "http://www.ows.newegg.com/Search.egg/Advanced" data = { "IsUPCCodeSearch": False, "IsSubCategorySearch": True, "isGuideAdvanceSearch": False, "StoreDepaId": StoreID, "CategoryId": CategoryID, "SubCategoryId": SubCategoryID, "NodeId": NodeID, "BrandId": -1, "NValue": "", "Keyword": "", "Sort": "FEATURED", "PageNumber": 1 } params = json.dumps(data).replace("null", "-1") request = urllib2.Request(url, params) response = urllib2.urlopen(request) data = json.loads(response.read()) if data['NavigationContentList'] is None: return etree.tostring(E.pre(json.dumps(data, indent=4)), pretty_print=True) body = E.body() form = E.form(name='PowerSearch', action='GenerateURL', method='GET') table = E.table() form.append(table) for section in data['NavigationContentList']: index = 0 tr = E.tr(E.td(section['TitleItem']['Description'], colspan='3')) table.append(tr) for option in section['NavigationItemList']: if index % 3 == 0: tr = E.tr() table.append(tr) index += 1 checkbox = E.td(E.input(option["Description"], type="checkbox", name=section['TitleItem']['Description'].replace(" ", ""), value=option['NValue'])) tr.append(checkbox) for param, value in [('StoreID', StoreID), ('CategoryID', CategoryID), ('SubCategoryID', SubCategoryID), ('NodeID',NodeID)]: try: form.append(E.input(type='hidden', name=param, value=value)) except KeyError: pass form.append(E.input(type='submit', value='Submit')) page = E.html(E.body(form)) return etree.tostring(page, pretty_print=True) Search.exposed = True def GenerateURL(self, StoreID=None, CategoryID=None, SubCategoryID=None, NodeID=None, **kwargs): NValue = set([]) for arg in kwargs: if type(kwargs[arg]) == list: for value in kwargs[arg]: NValue.add(value) else: NValue.add(kwargs[arg]) NValue = list(NValue) NValue.sort() if StoreID is None: StoreID = -1 if CategoryID is None: CategoryID = -1 if SubCategoryID is None: SubCategoryID = -1 if NodeID is None: NodeID = -1 data = { "StoreDepaId": int(StoreID), "CategoryId": int(CategoryID), "SubCategoryId": int(SubCategoryID), "NodeId": int(NodeID), "BrandId": -1, "NValue": ' '.join(NValue), "PageNumber": 1 } return etree.tostring(E.pre(json.dumps(data, indent=4)), pretty_print=True) GenerateURL.exposed = True cherrypy.quickstart(QueryBuilder()) |
- And iOS devices I assume as well. [↩]
- Because lets face it, that would be stupid. [↩]
- ... or get to the search query from selecting a root category in the main category listing for a store [↩]
- At least this is the method used by the mobile app. [↩]
- CherryPy: CherryPy is a pythonic, object-oriented HTTP framework. [↩]
- lxml: A Pythonic binding for the C libraries libxml2 and libxslt. [↩]
Installing Ubuntu via Network
At some point in the last 6 months or so I may or may not have accidentally left my 1GB Sandisk Cruzer in a pair of jeans when they went through the washer AND the dryer. As such it's not exactly in peak physical condition[1] and for whatever reason I've had issues with using it for installing certain things[2] lately[3].
Anyway it has become time again to get my file server back up and running and I needed to reinstall Ubuntu on it. Given my extreme laziness when it comes to doing this sort of stuff I was in no mood to move everything to the top of my desktop[4] so I opted to try pxe booting[5] again.
I've messed with pxe booting in the past, particularly with GeeXboX[6] for my media center and that was a nightmare at the time and essentially required you to have a linux system in order to do it. Since then a wonderful application has made its way into the internet: tftpd32[7]. Tftpd32 greatly simplifies the whole process by not requiring you to install anything or make any major system changes.
Before you continue take note, these instructions assume a few things:
- You're serving the netboot images from a windows system.
- You have a tomato based router, although these instructions can be easily modified to work with any router firmware that uses DNSMasq or allows you to change advanced settings for the DHCP server.
Things you'll need:
- Ubuntu Alternative ISO: This will be used for setting up the local http repository.
- Ubuntu NetBoot Image: Grab netboot.tar.gz
- tftpd32: This will be used for serving files during PXE booting.
- HFS ~ HTTP File Server: This will be used for setting up a local http repository for installing from our local network instead of having ubuntu download everything from a mirror.
Router Settings:
- Advanced -> DHCP / DNS -> Dnsmasq Custom Configuration
- dhcp-boot=pxelinux.0,,[tftpd32 server ip address]
- Save.
For ease of readability from this point forward files will be bolded and directories will be italicized.
- Untar netboot.tar.gz into a folder, which I'll refer to as netboot from now on.
- Delete pxelinux.0 and pxelinux.cfg from netboot/ as these are symlinks which will not work in windows.
- Create the directory netboot/pxelinux.cfg/
- Copy pxelinux.0 from netboot/ubuntu-installer/i386/ to netboot/
- Copy sysconfig.cfg from netboot/unbuntu-installer/i386/boot-screens/ to netboot/pxelinux.cfg/
- Rename netboot/pxelinux.cfg/sysconfig.cfg to netboot/pxelinux.cfg/default
Preparing tftpd32:
- Run tftpd32
- Browse to the netboot folder we just finished setting up.
- Tftpd32 should be serving the files in that directory at this point.
Preparing the local HTTP Ubuntu Repository:
- Run HFS.exe
- Extract all of the files from ubuntu-10.10-alternate-i386.iso to a folder which I'll refer to as ubuntu-alt from this point on.
- In the Virtual File System pane right click -> Add Folder from disk...
- Browse to and select ubuntu-alt
- When HFS prompts you to ask what kind of folder it should be added as, select Real Folder
- Note the link in the address bar next to Open in browser, you'll use this link when installing ubuntu.
Installing Ubuntu:
- Boot the system you're attempting to install Ubuntu on from your network device.
- If you have tftpd32 up on another monitor at this point you should see a deluge of requests in the tftp server tab.
- Ubuntu should show a boot menu select install.
- Now I'm not going to go into full detail on how to install Ubuntu but when you get to mirror selection at the very top of the list there should be the option to enter a mirror manually this is where you should enter the address from the address bar in HFS, be sure to also include the port value.
- If all goes well it should start installing and you should see another deluge of requests in HFS.
- In fact it's pretty far from peak physical condition. [↩]
- Like ubuntu for example. [↩]
- I'm not entirely sure if this is due to washing it or just from it being nearly 5 years old. [↩]
- So the cable for the USB adapter I've got my DVD drive connected to in my desktop can reach my mini-itx board. [↩]
- Preboot Execution Environment [↩]
- GeeXboX [↩]
- tftpd32: An open-source tftp//dhcp//syslog server for Windows. [↩]
Choosing an SSD (A more different S)
I've been periodically going back and revisiting the results for my SSD analysis script for newegg.com. The last few times I ran it I noticed that it was broken. It looks like newegg has modified a few things in their power search results page. One thing which is a little obnoxious[1] is that they no longer include the capacity in the description of the item or as a feature in the feature list when viewing the results page. This only seems to be an issue on the SSD page although I can't figure out why they decided it didn't need to be there in the first place. I see it this way: SSD's are first and foremost a storage device, you'd think that one of the most important features that should be listed with every SSD is the capacity at least.
Anyway, this change broke my script which I had been meaning to rewrite since regular expressions are definitely not the most efficient or cleanest way to parse HTML. I've been working with XML a more often lately despite my original prejudice against it for being a really bloated way to transfer data. One thing I discovered that makes XML a lot less painful is XPath[2] which is an incredibly useful "language" for selecting data from an XML document.
Once I had gone through and read several tutorials and references about XPath I set out to use it in writing a show calendar script which parses data from tvrage.com's XML API. After that useful exercise I realized I could very easily and cleanly apply it to my SSD analysis script. Since HTML is similar in nature to XML[3] I set out to parse Newegg's results page using XPath. This presented the first problem: Newegg's page isn't strictly XML or even XHTML for that matter. After a great deal of googling and research I landed on the lxml[4] website which as it turns out has an HTML parser for navigating and extracting data from HTML in the same way you would from an xml.etree.ElementTree[5]. With this in mind I immediately began rewriting the script.
First off lets consider my criteria for a "good" SSD on Newegg. The SSD can be either the typical 2.5" form factor, or a PCI-Express card[6]. The interface can be SATAII, SATAIII or PCI-Express. Capacity must be greater than or equal to 120GB[7]. Last but not least, the disk should be sub $300[8].
The above requirements give us the following power search[9] which we will be using as the source for the script:
1 2 3 4 5 6 | url = "http://www.newegg.com/Product/ProductList.aspx?Submit=Property&N=100008" + \ "120&IsNodeId=1&maxPrice=300&OEMMark=1,0&PropertyCodeValue=4213:30854,421" + \ "3:41472,4213:47725,4214:46019,4214:72313,4214:57574,4214:58118,4214:3941" + \ "6,4214:47732,4214:30849,4214:47171,4214:46300,4214:77918,4214:72311,4214" + \ ":77919,4214:55178,4214:47733,4214:57755,4214:44038,4215:55552,4215:47726" + \ ",4215:41071&bop=And&Pagesize=100" |
Now the first thing that made me cringe as I was rewriting this was the fact that I would basically have no choice but to load each individual product page from the results page as capacity is no longer included in either the description or the features list of each product in the results page. Eventually I will get around to multi-threading this to make it a little less painful, or I'll get lucky and Newegg will add the capacity feature back to the item listing in power searches for SSD's. The following is the full source code of the parser:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 | import re, math from lxml import etree url = "http://www.newegg.com/Product/ProductList.aspx?Submit=Property&N=100008" + \ "120&IsNodeId=1&maxPrice=300&OEMMark=1,0&PropertyCodeValue=4213:30854,421" + \ "3:41472,4213:47725,4214:46019,4214:72313,4214:57574,4214:58118,4214:3941" + \ "6,4214:47732,4214:30849,4214:47171,4214:46300,4214:77918,4214:72311,4214" + \ ":77919,4214:55178,4214:47733,4214:57755,4214:44038,4215:55552,4215:47726" + \ ",4215:41071&bop=And&Pagesize=100" featureMap = { 'Capacity': 'capacity', 'Sequential Access - Write:': 'write', 'Sequential Access - Write': 'write', 'Sequential Access - Read:': 'read', 'Sequential Access - Read': 'read', 'Interface Type': 'interface', 'Brand': 'brand', 'Model': 'model', 'Series': 'series' } speed_re = re.compile(r'(\d+)\s?MB/s') capacity_re = re.compile(r'(\d+)GB') parser = etree.HTMLParser() # tree = etree.parse("temp.html", parser) tree = etree.parse(url, etree.HTMLParser()) root = tree.getroot() items = [] for node in root.findall(".//div[@class='itemCell']"): item = {} # Get link link = node.find(".//a[@title='View Details']") item["link"] = link.attrib["href"] # Get feature list (loads each item's url, should multi-thread this in the future) itemPage = etree.parse(item["link"], etree.HTMLParser()).getroot() featureList = map(lambda n: n.text, itemPage.findall(".//fieldset/dl/dt")) valueList = map(lambda n: n.text, itemPage.findall(".//fieldset/dl/dd")) features = zip(featureList, valueList) for feature, value in features: if value is not None and feature in featureMap: # If it's a speed feature parse out the speed if featureMap[feature] in ("read", "write"): item[featureMap[feature]] = min(map(lambda x: int(x), speed_re.findall(value))) # If it's a capacity feature, parse out the capacity elif featureMap[feature] == "capacity": item[featureMap[feature]] = min(map(lambda x: int(x), capacity_re.findall(value))) # If the value doesn't need to be parsed, just store the value in item else: item[featureMap[feature]] = value.strip() # Get price price = map(lambda n: n.text, node.findall(".//li[@class='priceFinal']/*")) item["price"] = float(''.join(price[1:])) # Only add the item if it has the features we need in it if "read" in item and "write" in item and "capacity" in item and "series" in item: score = (item["read"] * item["write"] * item["capacity"]) / ((math.log(abs(item["read"] - item["write"])) + 1) + item["price"]) item["score"] = score items.append(item) sorted = {} for item in items: # Open addressing like in a hash table, so we don't wind # up with any collisions, unlikely but good practice anyway score = item["score"] while score in sorted: score += 1 sorted[score] = item sortOrder = sorted.keys() sortOrder.sort() sortOrder.reverse() headers = ['brand', 'series', 'model', 'link', 'interface', 'price', 'capacity', 'read', 'write', 'score'] print '\t'.join(headers) for key in sortOrder: item = sorted[key] print '\t'.join(map(lambda x: str(item[x]), headers)) |
At this point if you've gone through and read the entire script you'll probably notice that I've made a slight change to the scoring equation, it has been changed from the following:
To the following:
I discovered that using the difference in read//write speed heavily penalized drives with anything greater than 10MB/s difference. So I figured that it may be a little more subtle to simply penalize drives based on the magnitude of the difference.
Now you're probably wondering: "When is this blathering idiot going to get to the damned results already?". And you'd be pleasantly surprised to know that I'm getting to them as you waste your time reading this.
| Manufacturer: | OCZ | OCZ | G.Skill | OCZ |
| Series: | RevoDrive | Vertex 2 | Phoenix Pro Series | Agility 2 |
| Capacity: | 120GB | 180GB | 120GB | 120GB |
| Read: | 540MB/s | 285MB/s | 285MB/s | 285MB/s |
| Write: | 490MB/s | 275MB/s | 275MB/s | 275MB/s |
| Item: | N82E16820227578[10] | N82E16820227602[11] | N82E16820231378[12] | N82E16820227593[13] |
| Price: | $299.99 | $294.99 | $214.99 | $214.99 |
As you can see the RevoDrive far out-scores all the rest of the SSD's considered in this analysis. The main reason is that they've essentially included two 60GB SSD's on the same card and you're expected to perform software raid on them in your own system[14]. Despite the incredible speeds they boast I don't think I would purchase one of these to use as my OS//Program disk because compatibility is a major limitation. You must be sure that your motherboard's BIOS supports booting via PCI-Express cards. And last but not least, the main reason I would pass up this card is the lack of TRIM support. As far as I can tell these cards do not support TRIM which is a major downside as far as I'm concerned.
The second disk in the list is the OCZ Vertex 2 180GB version. I'd probably skip this one just because I don't really consider the extra 60GB worth the extra $80.
Which leaves me with the last two disks which are as far as my analysis is concerned, identical. If you take into account the detailed features you'll notice that the G.Skill claims 50k IOPS on the 4k Random write test which seems a bit... optimistic. The OCZ makes no such claim and as far as I'm concerned both disks are more less the same thing. So it's pretty much up to brand preference at this point.
- I've already sent feedback to them suggesting that they fix this. [↩]
- Only if the XML parser you're using supports it, which it seems is not a whole lot of them. At least not all of them support the full specification which is annoying since nobody really seems to document which bits and pieces they support and which whey don't. [↩]
- Although not necessarily XML depending on the particular doctype you've chosen, Newegg's is transitional HTML. [↩]
- lxml: http://codespeak.net/lxml/ [↩]
- xml.etree.ElementTree: http://docs.python.org/library/xml.etree.elementtree.html [↩]
- Some of the PCI-Express SSD's are stupidly fast and more expensive except that it doesn't look like any of them support TRIM yet which is a major problem for me. [↩]
- It is rare that I have a matured (read: haven't reformatted in a while) install of windows along with all of my most commonly used programs and games that exceeds 60GB so I estimate that doubling this should accommodate for any sudden urges to install really big things. [↩]
- I can't really justify spending much more than $300 on a single storage device. It had better be one hell of a storage device if I ever find myself spending more than $300 on it. [↩]
- This will likely need to be updated at least once a month as Newegg is constantly adding new criteria and changing things. [↩]
- OCZ RevoDrive [↩]
- OCZ Vertex 2 [↩]
- G.SKILL Phoenix Pro Series [↩]
- OCZ Agility 2 [↩]
- They show up as two separate physical devices despite being located on the same card. [↩]
Choosing an SSD (Update)
My brother is in the planning stages of building a new desktop. One of the things he's planning on doing differently from his last build[1] is using an SSD for OS + Programs.
I had mentioned to him previously that I a wrote a program for helping to choose an SSD based on what SSD's are meant for and are good at doing. So he asked if I could recommend him one. Below are the results of the latest run of my script based on the most current listings[2] of SSD's newegg offers.
| Manufacturer: | OCZ | G.Skill | OCZ |
| Series: | Agility 2 | Phoenix Pro | Vertex 2 |
| Capacity: | 120GB | 120GB | 120GB |
| Read: | 285MB/s | 285MB/s | 285MB/s |
| Write: | 275MB/s | 275MB/s | 275MB/s |
| Item: | N82E16820227543[3] | N82E16820231378[4] | N82E16820227551[5] |
| Price: | $235.99 | $239.99 | $240.00 |
It looks like OCZ has two of the top three places this run and G.Skill is still maintaining one of the top three from before. Between the 3 of them I think i would likely still go for the G.Skill just because of personal preference despite there not really being any significant differences between the three. Excepting price of course.
- Which incidentally was right when SSD's were just becoming available to the average consumer. [↩]
- As of this date 10/09/2010. [↩]
- OCZ Agility 2 [↩]
- G.Skill Phoenix Pro [↩]
- OCZ Vertex 2 [↩]
Choosing an SSD
Before I started my new job I had an inordinate amount of free time and for a majority of that time, nothing to spend it doing[1]. I was still thinking about my desktop wishlist[2] and about choosing a better SSD than the one I had previously selected[3].
A long time ago when I was following the HDD market since I was looking to buy some bulk storage I wrote a php script which loaded newegg's product list based on some search parameters you provided newegg's productlist.xml[4]. The script would then parse the list and produce a list sorted based on price per gigabyte. Which is useful when you're in the market for capacity[5].
I decided to do more or less the same thing with SSD's except this time I did it in python since I'm rusty on PHP and I didn't want to mess with setting up a web server to test on. So I got started by doing a power search on newegg for the specific flavor of SSD I was looking for.
The search parameters are as follows:
- 2.5" Form Factor
- SATA II/III
- 120GB or Greater
- Less than $300
- Retail or OEM
- Support TRIM Command
As of this writing those particular search parameters narrows the result to 17 SSD's. Now comes the code. Before I started coding I needed some way to sort them according to what I thought was important. The metric is as follows:
After looking closer at the scores this produces I noticed that it heavily penalizes drives with huge differences between read and write speeds which effectively weeds out drives that still have acceptable read//write speeds. So I removed that section of the metric producing:
The basic idea behind this scoring measure is that sequential read and write speeds are important, as well as capacity. Price and difference between sequential read//write are considered bad[6]. In the equation read and write refer to sequential read and write speeds. The ratio of these will produce a score of the SSD's overall performance for capacity, read//write speeds and price.
The code is relatively simple in purpose. Load the data and parse it into a dictionary then sort based on the metric above.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 | import urllib2, re # url = " # http://www.newegg.com/Product/ProductList.aspx?Submit=Property&Subcatego # ry=636&Description=&Type=&N=100008120&IsNodeId=1&srchInDesc=&MinPrice=&M # axPrice=&OEMMark=1&OEMMark=0&PropertyCodeValue=4213:30854&PropertyCodeVa # lue=4214:30848&PropertyCodeValue=4214:39416&PropertyCodeValue=4214:30849 # &PropertyCodeValue=4214:39415&PropertyCodeValue=4215:55552&PropertyCodeV # alue=4215:41071&PropertyCodeValue=4215:46319" # data = open("temp.html", "w") # data.write(urllib2.urlopen(url).read()) # data.close() raw = open("temp.html").read() item_re = re.compile(r'<div class="itemCell".*?>(.*?)<br class="clear".*?</div>') feature_re = re.compile(r"<li> (.*?)</li>") feature_list_re = re.compile(r'<b>(.*?)\s?\#?\s?:\s?</b>\s?(.*?)</li>') speed_re = re.compile(r"(up to )?(\d+).*?MB/s") capacity_re = re.compile(r"(\d+)GB") price_re = re.compile(r"</span>\$<strong>(\d+)</strong><sup>.(\d+)</sup>") item_list = [] valid = ['Read', 'Item', 'Interface', 'Capacity', 'Model', 'Write', 'Size'] for item in item_re.findall(raw): current = {} no_label = [] features = feature_re.findall(item) current["Size"] = features[0] current["Capacity"] = features[1] current["Interface"] = features[2] for feature in feature_list_re.findall(item): if feature[1].find("\r") != -1: current[feature[0]] = feature[1].split("\r")[0] else: current[feature[0]] = feature[1] current["Read"] = int(speed_re.findall(current["Sequential Access - Read"])[0][1]) current["Write"] = int(speed_re.findall(current["Sequential Access - Write"])[0][1]) current["Capacity"] = int(capacity_re.findall(current["Capacity"])[0]) for feature in current.keys(): if feature not in valid: del current[feature] current["Price"] = float('.'.join(price_re.findall(item)[0])) current["Item"] = "http://www.newegg.com/Product/Product.aspx?Item=%s" % (current["Item"]) item_list.append(current) sorted = {} for item in item_list: ratio = (item["Read"] * item["Write"] * item["Capacity"]) / (item["Price"]) sorted[ratio] = item sort_order = sorted.keys() sort_order.sort() sort_order.reverse() for key in sort_order: #print '\t'.join(map(lambda x: str(x), sorted[key].keys())) print '\t'.join(map(lambda x: str(x), sorted[key].values())) |
Now given that there is quite a lot of data to present and analyze all at once I've decided it would be easiest to just provide you with a pretty graph[7]:

If you look closely at the scores of all the disks in the query, you'll notice that this is a noticeable gap between the top 3 and the rest. They are as follows:
| Manufacturer: | A-DATA | Patriot | G.Skill |
| Series: | S599 | Inferno | Phoenix Series |
| Capacity: | 128GB | 120GB | 120GB |
| Read: | 280MB/s | 285MB/s | 285MB/s |
| Write: | 270MB/s | 275MB/s | 275MB/s |
| Item: | N82E16820211471[8] | N82E16820220510[9] | N82E16820231372[10] |
| Price: | $295.99 | $289.99 | $299.00 |
I noticed that if you ignore capacity in the metric then the Patriot Inferno is the clear winner here. So as it turns out the Western Digital SiliconEdge I had selected when I first wrote the wishlist wasn't the best drive for my needs. But then I've always had a soft-spot for Western Digital. But now I'm convinced that the Patriot Inferno is the SSD I'll be getting unless by the time I get around to buying one there are better options[11].
- Nothing worth-while anyway [↩]
- See previous post: Wishlist. [↩]
- Western Digital SiliconEdge 128GB SSD [↩]
- Which no longer exists in it's original form. [↩]
- Which I was. [↩]
- Although we're excluding read//write speed difference. [↩]
- Scores have been normalized to 100%. [↩]
- A-Data S599 [↩]
- Patriot Inferno [↩]
- G.Skill Phoenix Series [↩]
- Which there probably will be. [↩]
Wishlist
I've noticed recently that I tend to spend a lot of time shopping for things I can't afford when I don't have any excess income. I can't really tell if it's just because I'm bored a lot more often over the summer. Especially this one since I've been unemployed for the maojority of it so far[1].
As it stands there is a rather long list of things I intend on buying//upgrading//replacing in the future. First and foremost on this list is a new laptop since my current ASUS Eee-PC 1000H is driving me nuts. It's useful for... writing, and not even that sometimes. For the last year I've used it almost strictly for taking notes in class, which it does well enough. But using it for anything else is essentially impossible. I've found this to be even more true in the last few weeks since I've been spending every other weekend with my parents on their ranch or in Cheyenne. I've just been using it when I went since it's pretty impractical to take my desktop with me everytime. Especially considering a lot of the stuff I work on needs a decent amount of bandwidth and my parents' internet connection is satellite based on their ranch at least so it would be pointless to try and get any real work done.
I've essentially decided that my next laptop will be a 13" Macbook Pro. The main reason is that for the amount of money I intend on spending on a new laptop the Macbook Pro is far superior in both build quality and components to the equivelant Dell which is the manufacturer I've used for all my mobile computing needs until my netbook. Easy decision don't you think?
The next item on my list was building a new desktop. I only really need to replace the core components of my desktop since everything else is more or less in good working order. But that's boring so I've made an entire list of components including core and secondary components to build a new desktop, excluding optical drive and hard drives[2]
The first part I always start with when building a wishlist[3] is the CPU and for this particular one it was a pretty simple choice. Intel's Core i7 series is pretty much the way to go when building a workstation. In this case there were only really two requirements I had for selecting the particular Core i7 I need for this build.
- $0 < Price < $500
- Supports Triple Channel DDR3
These requirements narrow down the selection to two processors. The Core i7-920 and the Core i7-930. There are only two differences. The 930 is 2.8Ghz and the 920 is 2.66Ghz and the 930 is $10 cheaper than the 920, so it's pretty obvious which one is the one to go with.
Intel Core i7-930 Bloomfield 2.8Ghz LGA 1366 130W Quad-Core
Model #: BX80601930
Item #: N82E16819115225
Price: $289.99
The second component I select after the CPU is the motherboard. Now this is where it gets tricky because the restrictions I use for selecting a motherboard have a lot less to do with technical capabilities than they do with reliability and proper functionality. This is where newegg becomes the right place to shop. Their product review system is by far the best in the online tech shopping world. I tend to score motherboards based on the number of reviews they receive and the score of the review. This is of course after I've removed motherboards incompatible with the other components I intend on using in the system.
- LGA 1366
- Intel X58
- Intel ICH10R
- ATX form factor
The motherboard that comes out on top after these restrictions is an EVGA board.
EVGA E758-TR Intel X58
http://www.newegg.com/Product/Product.aspx?Item=N82E16813188046
Model #: 132-BL-E758-TR
Item #: N82E16813188046
Price: $269.99 IMIR -$40.00: $229.99
Next up is system memory. This almost always follows from motherboard since some motherboards support odd RAM speeds and I tend to stick with standard speeds since they are a lot less prone to compatibility issues and just work. In this case the motherboard calls for DDR3 1333[4]. I usually filter out for the specific kind of RAM I want which leaves me with a few dozen sets. Then I score based on CAS latency and price. I've used G.SKILL before and was pleased with it and in this case a G.SKILL set won on both price and CAS latency.
G.SKILL 3GB (3 x 1GB) DDR3 1333 (PC10666) Triple Channel
Model #: F3-10666CL7T-3GBPK
Item #: N82E16820231229
Price: $84.99
Now you're probably thinking "Why does he only want 3GB that's puny!". There are some very good reasons for it. First of all I'm not really all that into 64-bit yet, I still have a few devices without 64-bit compatible drivers[5]. For the most part 2GB of RAM has suited me just fine for nearly anything I've ever needed//wanted to do on my desktop until this point, why should I pile in twice or even three times that amount? Besides if I so desire I could just purchase a second set in the future. The only reason I might consider doing that is if I suddenly became obsessed with running a dozen virtual machines simultaneously[6].
Next in line isn't exactly a component I need to buy, but I've been wanting to upgrade for a long time now and I figure a wishlist is the best place to do it. Ever since I saw an article on Gizmodo[7] about the new NVIDIA GeForce GTX 460 I've pretty much been set on that specific chipset. It was pretty easy to select a brand since they're all exactly the same price at this point and I only want the 768MB model. I've been wanting to do some CUDA development so here's my chance.
EVGA NVIDIA GeForce GTX 460 (Fermi) 768MB
Model #: 768-P3-1360-TR
Item #: N82E16814130562
Price: $199.99
Another component I consider to be core but isn't necessarily a core component is the storage device used for the OS. In the past I've strictly used HDD's for my desktop. But since I installed a Patriot 32GB SSD in my laptop I've fallen in love with SSD's for the OS//Programs drive. You might hear people moan and complain about SSD's being disproportionatly priced based on their capacity. Well I've got news for you, you don't buy SSD's for capacity, you buy them for speed. Anyone reasonably knowledgeable about computer components and their functionality would know that. I'm not interested in price per GB as quite a lot of people might be, at least not for SSD's[8]. I'm a lot more interested in price per MB/s sequential read-write. The particular disk that won in this case is one of the new Western Digital SSD's.
Western Digital SiliconEdge Blue 128GB SSD MLC 220/170 SR/SW
Model #: SSC-D0128SC-2100
Item #: N82E16820250002
Price: $199.99
In case the title of that product doesn't make sense the drive is 220MB/s Sequential Read and 170MB/s Sequential Write.
Everything from this point on I consider to be secondary components as they don't directly do any computation or data transfer//storage.
For this build I've decided that even though I don't need to get a new case, I've added one to the list anyway since my current case is due for an upgrade, especially in regards to aesthetics. I've been oggling this particular case for quite a while now, since it replaced it's predecessor at least. This case won by a long shot in aesthetics and functionality.
Antec P183 Black Aluminum/Steel ATX-Mid
Model #: P183
Item #: N82E16811129061
Price: $179.99 IMIR -$25.00: $154.99
The next item due for an upgrade was actually necessary considering the major increase in power needs for the core components. I've always hated shopping for power supplies because there are far more factors to consider when it comes to selecting one that matches your needs and is of reasonable build quality. If you don't have a decent power supply you may as well just give up. In this case I stuck as close as possible to the power supply I have now. I was only really interested in making sure that there were enough PCI-e power connectors since my current PSU has none. I let the reviews do the majority of selecting for me in this case.
Corsair 650W (ATX|EPS)12V
Model #: CMPSU-650TX
Item #: N82E16817139005
Price: $119.99 IMIR -$30.00: $89.99
This power supply matched most closely to the one I had now, it's simple, doesn't have too many "certifications" and marketing nonsense tacked onto the name and the cables are sheathed in black mesh[9].
Now I don't normally bother with purchasing a 3rd party heatsink//cooling system for my CPU but in this case I had heard mention of a self-contained water cooling system with radiator, pump and CPU waterblock from Corsair. So I checked it out and I am impressed. Since it is self-contained it removes a lot of the frustration with resevoirs and replacing the coolant on a regular basis.
Corsair H50 CPU Cooler
Model #: CWCH50-1
Item #: N82E16835181010
Price: $74.99
The last item in the list is more for interior neatness and organization. I've always hated just leaving components without a proper fastening inside the case. In this situation the SSD I select[10] is 2.5" form factor, suitable for notebooks and less suitable for desktops. So I looked around for a set of 2.5" to 3.5" brackets to secure the drive in one of the HDD bays.
iStarUSA 2.5" to 3.5" HDD Bracket
Model #: DIY-RP-HDD2.5
Item #: N82E16816215157
Price: $5.99
The subtotal for the build excluding shipping and including all instant mail in rebates comes out to $1330.91. Pretty good wouldn't you say? For a decently beefy workstation that would likely last me another 5-6 years before upgrading again. I'm currently on the 5th year since a major overhaul of my current system an Intel Core 2 Duo based rig.
- I do have an interview coming up so wish me luck. [↩]
- Again, excluding SSD from this list of parts I don't intend to buy. [↩]
- Always on Newegg.com, they're the de facto standard in online computer components. [↩]
- DDR3 SDRAM PC10666 [↩]
- And will probably never be compatible for that matter. [↩]
- Which I won't, so I won't. [↩]
- I think. [↩]
- The only component I consider price per GB on is standard HDD's. [↩]
- Which lends nicely to aesthetics should I ever decide to show someone my desktop's inards. [↩]
- Like all SSD's. [↩]
Open-Source Print Server
One of the things I had always wanted to play with until I moved was a wireless router that was "deluxe" enough to have a USB port installed.
Most of the time these sorts of routers expect that you're just going to plug an external hard drive into it and pretend like it's a full-fledged NAS becuase typically they manage to squeeze CIFS into the firmware that governs the router and for the most part this is good enough for the most basic of NAS needs. Except they tend to forget that throughput with such a simple processor is usually miserable. I've never actually tested any of this myself but all of the forums I've read on the subject seem to report that there is a throughput ceiling of around 5-6MB/s which is pretty bad.
The other purpose for the USB port that I bet people would get more use out of is the ability to use your router as a simple print server. The first reason this is a lot more practical is that nearly all consumer printers these days use USB for their interface. This means that essentially any printer that can be made to work for your current computer with USB can also be put behind a print server without too much trouble. The other way this is useful is for the growing number of laptops you'll find in any given household. Except for minority of people that do enough computer work that owning a desktop is a necessity most every house will mostly be comprised of laptops and all of them will likely communicate with Wifi.
Every time you need to print something from your laptop you'll have to take it back to wherever you've parked your printer and plug it in. But you'll also need to make sure that you plug it back into the same USB port you installed it with in the first place or your computer will go into full-retard mode and "Duhhh... new printer! I'll look for drivers for your awesome new printer."
Now you see where the real hassle is. But this is the point at which a lot of people go the wrong direction. The first thought a lot of people have[1] is that they'll just solve this problem by throwing money at it which usually means shopping for and buying a printer with network capabilities built in.
The main problem with this is that, network attached printers are expensive, quite a lot more expensive than their non-network-aware brethren. The secondary problem is that a grand majority of home networks are made possible by DHCP[2] which divvies out IP addresses as devices connect to the network. This presents a problem when the main method for connecting to network attached printers involves knowing the printer's IP address, which can be problematic when your router arbitrarily hands out IP addresses on a regular basis. Every time the lease is up for your printer's IP address there is the possibility of that printer to get a new IP address which causes issues. So unless you are tech-savvy enough to setup static DHCP leases this will cause problems.
The next option for a lot of people is to shoehorn round peg into a square hole by buying a print server to make their current printer network friendly. See the above problem for why this isn't an optimal solution.
In comes the router with a USB port. I recently purchased a Netgear WNR3500L[3] from newegg. If you're interested in my adventurous experience with installing Tomato on it see my previous[4] post about that.
Now you're probably wondering how on earth a router with a USB port is any better than a print server or a network attached printer. The reason it is better is that in typical household networks the router will have the same IP address no matter how you've got DHCP configured. For the most part your router lives at 192.168.1.1 or 192.168.0.1 and this will for most situations never change.
So given a router with a USB port and proper firmware to allow for print serving you can host your printer on your router which will almost always have the same IP address and you can do this all with one device instead of having to buy a separate device.
In my situation I got lucky. I bought the router originally only intending to install dd-wrt[5] on it. Later I found Tomato[6] which looked like it would suit my needs a lot better than dd-wrt would. Except there was some initial stupidity on my part and eventually I got that all sorted out by installing a fork[7] of the Tomato project which added USB support to Tomato.
For now the TomatoUSB fork only supports broadcom based routers like the original firmware but adds support for a few others which have USB ports. This is where I got lucky, I had never checked before I bought my router to see if it would be supported since I originally only intended on installing dd-wrt on it and it just happened to be supported.
Eventually I got around to unpacking my printer and decided to give it a try. While watching the USB section of the web interface of my router which at this point was running Tomato[8]. I plugged in my printer's USB cable into my router and about 3 seconds later[9] the printer showed up and it began serving the printer using raw data on port 9100 and with LPR[10] queue lp0.
So that was easy... a little too easy. Lo' and behold, it was just that easy. All that was left to do was add the printer using a TCP/IP port with 192.168.1.1 as the address and use the driver that I had previously installed to use the printer via USB. That was pretty much all I had to do, it worked exactly as it was meant to the first time I tried it.
- If they're tech-savvy enough to realize that this is possible. [↩]
- Dynamic Host Configuration Protocol [↩]
- Netgear WNR3500L [↩]
- 3rd Party Router Firmware [↩]
- http://www.dd-wrt.com/ [↩]
- Tomato Firmware [↩]
- http://tomatousb.org/ [↩]
- Which has in my opinion a much more beautiful interface that dd-wrt does. [↩]
- The default refresh time of pages that have dynamic content on Tomato. [↩]
- Line Printer Remote [↩]
3rd Party Router Firmware
Until Monday I'm without any kind of proper internet connection which means until this point I've just been using my phone to browse the internet and chat with friends, which is frustrating to say the least. This prompted me to do a little more research on 3rd party firmware for wireless routers.
The first project that comes to mind is obviously DD-WRT[1] the most popular and probably the most powerful of them all right out of the box. Before I moved here I researched and purchased a new router to use at my new place. I stumbled upon a Netgear router which seemed to match all the features I needed at a reasonable price. I've never really been a huge fan of Netgear routers but I checked that it was compatible at least with DD-WRT before I bought it. The router I'm talking about is a Netgear WNR3500L[2] which includes:
- 802.11n WiFi
- 4x 10/100/1000 Ethernet ports
- 1x USB 2.0 port
I think grand total it was ~$90[3]. Anyway first thing I did was install DD-WRT which is standard practice for me. Ran exactly as intended except when I tried to set it up to act as a wifi client which failed miserably, I never did figure out how to make it do what I wanted. Everything else worked as intended. But I recently discovered a new firmware I wanted to try, which was Tomato[4] another open-source project like DD-WRT.
Tomato is essentially a watered down version of DD-WRT. I think the only useful feature it's missing that DD-WRT has is virtual wifi interfaces, but that's not such a big deal. On the other hand though Tomato has the most polished bandwidth monitoring features of any other project I've ever seen or used. DD-WRT has essentially the same feature but it's much weaker and not nearly as thought out and well designed as Tomato's is.
My main gripe with Tomato is it's lack of community support. DD-WRT is so popular that just about any router you can get your hands on has a forum post somewhere about someone's woes with installing something on it and getting it working the way they needed it. Tomato isn't quite the same way. Also it appears that Tomato mostly only supports broadcom chipsets which is what my new router has.
Well anyway I just decided to download the latest version and try it out. Bad idea. I hadn't really considered that putting firmware on it that doesn't support USB would brick it. Figured USB just wouldn't work, WRONG. Got the firmware uploaded and reset it nothing. Nothing at all. So first thing I did was look up instructions for uploading a new firmware (one that I knew worked) using tftp. No luck, the router responds to pings for about 2 seconds immediately after booting but then ceases to respond. This was a good sign at least, means some basic features were still working properly. Also discovered that because Windows 7 implements CTCP instead of a simpler TCP protocol this breaks most ability to upload new firmware via tftp. So I downloaded an atftp.deb for my linux box and that didn't work either.
Eventually I stumbled upon an article about using a USB-TTL cable to unbrick the router. This article was mostly useless because all they were doing was using a USB serial adapter and dissecting the cable to work with the serial connection on the routers board, which I could just as easily have done with my arduino. But hidden deep in the comments was a far simpler trick than that. Only thing I needed to buy was a torx screw driver set to get the thing open. I looked up the chip used for storing settings and found that there were two pins on it that could be shorted to erase nvram (the structure responsible for storing all the router's settings).
So I busted open the router and proceeded to power on the router while shorting the two pins. No effect. I tried powering it on and then shorting the pins. No effect. Finally I tried shorting the pins exactly when the router responded to pings during that 1-2 second window. Success!
Then I fired up tftp and uploaded a modified version of Tomato to support USB and vaula! Tomato works properly on my router now. As well as the wifi client mode and wifi bridge mode. Anyway that occupied several hours of tinkering where I would have otherwise been bored out of my mind.
Analog Monitor Calibration
Many of you probably already are aware that LCD Monitors typically will come with an analog VGA port[1] but the typical way that LCD Monitors work is that they reference individual pixels to draw something where CRT's use a scanning electron beam to illuminate pixels. Because of the way this works LCD Monitors usually come with an "Auto" button or "Auto-calibrate" which will align the analog image to the displayable area of the monitor. But doing this with just any image won't always give you the sharpest alignment and calibration.
The method I've used for quite some time is a program that generates a black and white cross-hatch that basically makes a checkerboard of every single pixel on the screen. This makes it very easy for the auto-calibration feature of the monitor to almost exactly align the image to exact pixels. The program i use to do this is called lcdtest[2]. There are even windows binaries[3] that can be found by digging through the page, I've provided a link in the footnotes to the page I usually get them from.
The program starts and immediately draws the test pattern on the entire screen. You'll want to press w to change the color to white on black. Then you'll want to press x to change the pattern to crosshatch. Then using the - key to zoom out until it looks nearly grey. This is a good point at which you can see how well your monitor is calibrated already. If it looks very grey this is a sign that the scanning frequencies may not be exact, you might also see waves where it's clear and areas where it is blurry, these will likely go away once we're done. At this point you should use the auto-calibration feature of your monitor to calibrate the display using the crosshatch being displayed.
For those of you running more than one monitor at a time, this program will only display the pattern on the primary monitor. But never fear, there is an easy fix. Using paint or paint.net or your favorite image editing program you can create an image of the test pattern and set it as the wallpaper for the other monitors and calibrate them as well. Using Alt+PrtScn to capture only the current application's area of the screen (the test pattern) and pasting into a new image you can then save the image as a png[4] preferably as it will compress it losslessly.
Once you've saved the image simply set it as the wallpaper of your system and use tiling in the case that your other monitor is not the same dimensions as your primary monitor and then run the auto-calibration process on the remaining monitors. You'll find that this will significantly improve the quality of the image displayed on your monitors if they were improperly calibrated to begin with. But of course you could completely avoid this by switching to DVI[5]
- [W: VGA_connector] [↩]
- http://www.brouhaha.com/~eric/software/lcdtest/ [↩]
- http://code.google.com/p/lcdtest-win32/ [↩]
- [W: Portable_Network_Graphics] [↩]
- [W: Digital_Visual_Interface] [↩]
DVD Ripping Made Easy
Reading through my normal list of RSS feeds I stumbled upon a post claiming to have found some software that greatly simplifies the process of decrypting and ripping DVD's. And surprisingly for the most part they were right.
The software in question is called MakeMKV[1]. The software seems to do a decent job of both decrypting and ripping DVD's. Mind you this software is not meant for transcoding DVD video into a different format.
The software functions much like most DVD decrypting software does. DVDFab[2] and DVD Decryptor[3] provide the same basic functions as MakeMKV with one major exception. Where both DVDFab Decryptor and DVD Decryptor will provide you with the ability to decrypt a DVD and dump it's contents to a directory, MakeMKV instead muxes all of the video, audio and subtitle streams into a single container instead of having several VOB files from the entire DVD. Each title on the disk is muxed into a single container, which really simplifies the process when backing up TV Seasons from DVD to your computer since each episode is typically it's own title.
All that is left to do once you've ripped to a Matroska[4] container is either leave it by itself since it's a perfectly fine container and format (albiet nearly the same size as the original content) or transcode it into your favorite format//container. Typically when ripping DVD's I use Handbrake and encode the DVD using the High Profile preset which performs decombing and detelecine. The high profile preset is also uses constant quality encoding which seems to be the preferred method for encoding these days since it provides the best perceptible quality vs. compression ratio.
Now the first thing that bothered me about MakeMKV is the fact that the site specifically states that it is free for the beta. But when you read closer it does get better:
Functionality to open DVD discs is free and will always stay free.
So that's promising. At least if you're only interested in ripping DVD's and not Blu-Ray or HDDVD then you'll be golden with this software.
Overall I've decided to just stick with MakeMKV for all my decrypting//backup needs from now on since it seems to do as good of a job or better than any of the other DVD rippers on the market at the moment.

