# Newegg’s JSON API

## Update

I’ve posted more complete documentation to a github repository: NeweggMobileAPI

## The Goods

For the longest time I’ve wanted access to Newegg’s product list. For me they’ve been one of the better and more structured websites for buying computer hardware. So naturally they’re usually my first choice when it comes to finding a good deal on a particular piece of hardware. They’re also rather useful for seeing what’s out there since their product catalog is fairly complete.

A while back I had started wanting to sort through items to heuristically pick the best deal based on a number of features Newegg generally provides for each item. This method works pretty well on SSD’s and system memory. But until a recent discovery I was limited to scraping Newegg’s website in order to get any kind of information from them. If you’ve ever tried this sort of thing you know that it is messy and generally a bad idea because every single time Newegg changes the structure of their website or any minute detail this will almost always break your scraping script.

The discovery came in the form of a mobile application for Android[1]. The mobile app lets you browse their website in a clean and fast manner. But what got me thinking is that unlike some other mobile applications out there that are just application wrappers for the mobile version of their websites this one operates directly through the native GUI. Now this is where it got interesting. I knew that if Newegg had written the app to use the native GUI then they had to be providing the data to it somehow and I knew it had to be more structured than HTML scraping like what I’ve been doing[2]. You have no idea how happy I was to discover that I was right.

First thing I did was connect my Droid 2 Global to my home network via WiFi in order to sniff some of the traffic going to and from the mobile app. This was accomplished by mounting a CIFS drive from my Windows 7 desktop to my router running Tomato based firmware. The share had a binary for TCPDump which I then used to sniff for traffic originating or going to my phone’s IP address. After setting this up and performing all of the basic operations I would need in order to “reverse engineer” the data source I got to work on filtering the important bits.

In WireShark I immediately discovered that they had a sub-domain they were using for these operations. All of the web requests that weren’t images or for customer metrics and tracking went to this host:

http://www.ows.newegg.com/

Because this API is structured more or less the same as navigating their site and the identifiers are different I decided to start with writing a query builder. Basically the purpose was to allow me to browse to the particular category I was interested in analyzing and filter it down to just a few simple requirements to simplify the analysis.

The first major entry point in the process of browsing to what you’re interested in pulling is:

This takes no parameters and provides the main menu:

 1234567891011121314151617181920 [     {         "StoreDepa": "ComputerHardware",         "StoreID": 1,         "ShowSeeAllDeals": true,         "Title": "Computer Hardware"     },     {         "StoreDepa": "PCNotebook",         "StoreID": 3,         "ShowSeeAllDeals": true,         "Title": "PCs & Laptops"     },     {         "StoreDepa": "Electronics",         "StoreID": 10,         "ShowSeeAllDeals": true,         "Title": "Electronics"     },     ...

Once you’ve selected a store to browse the next uri is:

http://www.ows.newegg.com/Stores.egg/Categories/{StoreID}

The only parameter it takes is StoreID which you’ll find in the first query. This will return all of the categories within a store. I haven’t really explored this very much as I’m only really interested in browsing system memory and SSD’s. Using the Computer Hardware store the output is as follows:

 1234567891011121314151617181920212223242526 [     {         "Description": "Backup Devices & Media",         "StoreID": 1,         "NodeId": 6642,         "ShowSeeAllDeals": true,         "CategoryType": 0,         "CategoryID": 2     },     {         "Description": "Barebone / Mini Computers",         "StoreID": 1,         "NodeId": 6668,         "ShowSeeAllDeals": true,         "CategoryType": 0,         "CategoryID": 3     },     {         "Description": "CD / DVD Burners & Media",         "StoreID": 1,         "NodeId": 6646,         "ShowSeeAllDeals": true,         "CategoryType": 0,         "CategoryID": 10     },     ...

StoreID is included from the parameters of the request. I’m not exactly sure how to describe the purpose of NodeID but it appears to be a distinguishing feature of a category or subcategory. CategoryID is used for filtering results down to a specific category and can be either a root category or a subcategory. CategoryType determines whether CategoryID is a root category or if it contains subcategories. A value of 1 for CategoryType indicates that it is the root category.

Now depending on CategoryType you either move straight to the search query or onto a navigation query. The navigation query is used if there are subcategories:

This query takes StoreID, CategoryID and NodeID, which you can get from the category listing of a particular store. It will return a subcategory list. Below is the subcategory listing for the memory category.

 1234567891011121314151617181920212223242526 [     {         "Description": "Desktop Memory",         "StoreID": 1,         "NodeId": 7611,         "ShowSeeAllDeals": false,         "CategoryType": 1,         "CategoryID": 147     },     {         "Description": "Flash Memory",         "StoreID": 1,         "NodeId": 8038,         "ShowSeeAllDeals": false,         "CategoryType": 1,         "CategoryID": 68     },     {         "Description": "Laptop Memory",         "StoreID": 1,         "NodeId": 7609,         "ShowSeeAllDeals": false,         "CategoryType": 1,         "CategoryID": 381     },     ...

From here you will go to the search query[3]. At this point it does get a little tricky as the parameters for the query are no longer sent via GET they are instead sent using POST[4] which basically will require a programmatic method for making a search query. The search query given a category, store and node will list quite a lot of things. The first thing in the list is search filtering parameters, these will allow you to limit the products shown in the listing.

Data being posted is necessary to receive a non-404 response from the server, if you really wanted to you could just send an empty dictionary as this would just query newegg’s entire product list. Any of the query options can be omitted, integer values may be omitted by substituting their value with -1.

The parameters you should concern yourself with are as follows along with the URL the data should be posted in JSON format to:

 123456789 data = {     "SubCategoryId": 147,     "NValue": "",     "StoreDepaId": 1,     "NodeId": 7611,     "BrandId": -1,     "PageNumber": 1,     "CategoryId": 17 }

NValue is a space separated list of NValues from the search parameters. Mind you, you cannot filter against more than one item in any category of search filters. For example in system memory you can’t select DDR3 1333 (PC3 10600), DDR3 1333 (PC3 10660) and DDR3 1333 (PC3 10666). The query will return an unsucessful search result. The rest of the parameters are fairly self-explanatory.

The result returned will contain the following elements: RelatedLinkList, CoremetricsInfo, NavigationContentList, PaginationInfo, ProductListItems. CoremetricsInfo and RelatedLinkList can usually be ignored, the elements we’re interested in are the NavigationContentList which is a list of search parameters//filters you can apply to the search. PaginationInfo describes how many elements were returned, what page we’re on and how many elements there are per page. Last but not least the ProductListItems which provides a list of the products returned by the query along with some basic listing info for each one.

Below is a portion of the NavigationContentList:

 123456789101112131415161718192021222324252627 {     "NavigationContentList": [         {             "NavigationItemList": [                 {                     "SubCategoryId": -1,                     "Description": "Free Shipping",                     "StoreDepaId": 94,                     "NValue": "100007611 600006050 600052012 4808",                     "BrandId": -1,                     "StoreType": 4,                     "ItemCount": 194,                     "CategoryId": -1,                     "ElementValue": "4808"                 },                 {                     "SubCategoryId": -1,                     "Description": "Top Sellers",                     "StoreDepaId": -1,                     "NValue": "100007611 600006050 600052012 4802",                     "BrandId": -1,                     "StoreType": -1,                     "ItemCount": 39,                     "CategoryId": -1,                     "ElementValue": "4802"                 },                 ...

This section will also contain a group name:

 12345678910111213 ...             "TitleItem": {                 "SubCategoryId": -1,                 "Description": "Useful Links",                 "StoreDepaId": -1,                 "NValue": "4800",                 "BrandId": -1,                 "StoreType": -2,                 "ItemCount": 0,                 "CategoryId": -1,                 "ElementValue": "4800"             }             ...

The PaginationInfo and ProductListItem elements will look like the following:

 1234567891011121314151617181920212223242526272829303132333435363738 ...     "PaginationInfo": {         "TotalCount": 233,         "PageNumber": 1,         "PageSize": 20     },     "ProductListItems": [         {             "SellerId": null,             "ItemOwnerType": 0,             "Title": "Crucial Ballistix 4GB (2 x 2GB) 240-Pin DDR3 SDRAM DDR3 2133 (PC3 17000) Desktop Memory with Thermal Sensor Model BL2KIT25664FN2139",             "ItemGroupID": 0,             "ReviewSummary": {                 "Rating": 5,                 "TotalReviews": "[1]"             },             "IsCellPhoneItem": false,             "Discount": null,             "FinalPrice": "$104.99", "ItemNumber": "20-148-372", "MappingFinalPrice": "$104.99",             "FreeShippingFlag": true,             "OriginalPrice": "$104.99", "IsComboBundle": false, "MailInRebateText": null, "ProductStockType": 0, "Model": "BL2KIT25664FN2139", "ShowOriginalPrice": false, "Image": { "FullPath": "http://images17.newegg.com/is/image/newegg/20-148-372-TS?$S125W\$",                 "SmallImagePath": null,                 "ThumbnailImagePath": null,                 "Title": null             },             "SellerName": null,             "ParentItem": null         },         ...

At this point you might be wondering what good will all this do me if I can’t get specifications on an item? Well, you can and here’s how: In each ProductListItems element you’ll find an ItemNumber, this is essentially the primary key that each product is related to within this interface to newegg’s product list. Using the following url you can obtain the full details page on any given item using it’s ItemNumber:

http://www.ows.newegg.com/Products.egg/{ItemNumber}/Specification

 1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677 {     "SpecificationGroupList": [         {             "GroupName": "Model",             "SpecificationPairList": [                 {                     "Value": "Crucial",                     "Key": "Brand"                 },                 {                     "Value": "Ballistix",                     "Key": "Series"                 },                 {                     "Value": "BL2KIT25664FN2139",                     "Key": "Model"                 },                 {                     "Value": "240-Pin DDR3 SDRAM",                     "Key": "Type"                 }             ]         },         {             "GroupName": "Tech Spec",             "SpecificationPairList": [                 {                     "Value": "4GB (2 x 2GB)",                     "Key": "Capacity"                 },                 {                     "Value": "DDR3 2133 (PC3 17000)",                     "Key": "Speed"                 },                 {                     "Value": "9",                     "Key": "Cas Latency"                 },                 {                     "Value": "9-10-9-24",                     "Key": "Timing"                 },                 {                     "Value": "1.65V",                     "Key": "Voltage"                 },                 {                     "Value": "No",                     "Key": "ECC"                 },                 {                     "Value": "Unbuffered",                     "Key": "Buffered/Registered"                 },                 {                     "Value": "Dual Channel Kit",                     "Key": "Multi-channel Kit"                 }             ]         },         {             "GroupName": "Manufacturer Warranty",             "SpecificationPairList": [                 {                     "Value": "Lifetime limited",                     "Key": "Parts"                 },                 {                     "Value": "Lifetime limited",                     "Key": "Labor"                 }             ]         }     ],     "NeweggItemNumber": "N82E16820148372",     "Title": "Crucial Ballistix 4GB (2 x 2GB) 240-Pin DDR3 SDRAM DDR3 2133 (PC3 17000) Desktop Memory with Thermal Sensor Model BL2KIT25664FN2139" }

From this point on you can grab all of the features and specifications of any particular item you’re interested in. In the near future I’ll be writing a new post for both my memory and SSD analysis scripts using this interface.

The full code for my query builder is as follows, though you should note this was a quick script and is in no way complete or fully functional. As soon as it was to a useable point I moved onto the main point of this whole ordeal. You should also note that this requires CherryPy[5] and lxml[6]. The end result of this program is a query which you can use to retrieve a list of products matching the options you’ve selected. This is mainly to simplify product list selection and to minimalize the need to hardcode in certain values as newegg as a tendency to change things around on a regular basis.

1. And iOS devices I assume as well. []
2. Because lets face it, that would be stupid. []
3. … or get to the search query from selecting a root category in the main category listing for a store []
4. At least this is the method used by the mobile app. []
5. CherryPy: CherryPy is a pythonic, object-oriented HTTP framework. []
6. lxml: A Pythonic binding for the C libraries libxml2 and libxslt. []
• Erol Akarsu

Thanks for information.

I was able to write Java application also to fetch product data from newegg. Please let me know if you want to add it to this blog.

• Andy

If you don’t mind, can you please post a sample of your java code? Thanks.

• http://www.bemasher.net bemasher

When he provides some code I’ll be sure to post it (or link to it).

• Erol Akarsu

I have zipped all java code as eclipse project but can not attach to here. Can you mail me so I can reply with attachment?

Thanks

• Tim

Seriously. This is 2011. Put it as a Gist on github.com or something. There are about a million sites which let you post code. Even better make a real project and post it on github and not just as a gist.

• http://disruptiontheory.com Eric Wehrly

I wrote an app that covers a few other functions. It uses a good deal of the same code, just goes into different depth. You can easily recreate these functions in Java using the base I’ve provided. The code is available here:
https://github.com/DisruptionTheory/Eggscraper/tree/master/src/com/disruptiontheory/eggfetcher

• Tom

You are the man of the hour, but I get a syntax error with the else: statement on line 49. Is this something wrong on my end?

• http://www.bemasher.net bemasher

You may want to check that the whitespace is correct as copying and pasting from code blocks online will usually produce some issues with that. At some point I’ll post all of my code to a github repo so you could just check it out and avoid this problem altogether.

• Tom

Awesome. Great post. Thanks.

• http://www.indolering.com Zach Lym

Still not quite as good as a real api, item objects do not contain status (out of stock, deactivated, etc) and their key:value pairs still mix up presentation/structure:

Key:PCI Express 2.0 x16, Value:4 x PCI Express 2.0 x16\n\nSlot configurations:\nDual CrossFire etc
Key:PCI Express 2.0 x16, Value:7

Thank god for regular expressions…

• http://www.indolering.com Zach Lym

You should get a public method reference wiki going…

• http://www.bemasher.net bemasher

Not a bad idea. I’ll have to do some more traffic sniffing to get all of the protocol down. I only really documented stuff for getting product info.

• http://www.bemasher.net bemasher

Ah yes, something i’ve found very very useful for cleaning up the data you get from it is Google Refine. But i’m still much happier with this than i ever was with scraping html. At least the structure of this data doesn’t change to much as they update the site.

• http://www.indolering.com Zach Lym

Does the catalog list deactivated items? Newegg also has XML sitemaps (broken down by category) that link to every product they have ever carried (SEO) -when you query a specific one it will show up.

Worried about hammering their servers and getting blocked. They do have an affiliate’s API, and I am thinking about a newegg price comparison service…

• http://www.bemasher.net bemasher

I haven’t noticed any. I imagine it doesn’t unless explicitly queried as this is just the json interface for the mobile application. No sense in showing the user items they can’t buy on the mobile app.

So far the largest query i’ve done was a little over 500 items and that’s retrieving 25 pages of items then specifications for each one.

I’m mostly using the results to create scoring functions for determining the best product within a price range for your needs. As it turns out this works very well for things like hard drives, SSD’s, system memory and flash memory.

• http://indolering.com Zach Lym

I specifically need specs for previous gen motherboards. Even if NewEgg has deactivated a mobo, that doesn’t mean there aren’t fresh ones in someone else’s inventory.

I need to know the specific number of PCE-Express slots, which is very difficult to search for as there is no standard notation for the channels (even newegg is bad at this). However, I can query against the NewEgg DB, find all matching mobo’s and then do the same analysis you are doing but against used equipment. I also want to factor in CPU costs…

-Zach

• http://rigset.com Trevor Senior

Thank you for this article – it’s perfect for what I need. I am writing a Java equivalent for use in my GWT project and so far it’s going well.

If I ever get the free time I’ll re-write it the Java equivalent separate from my program and link it here. Thanks again.

• http://www.albertbori.com Albert

I’m currently building a C# implementation for this, thanks to your hard work figuring out the API. I have hit the road block of not knowing how to do an item search using a text query.

Also, what kind of data goes in the “NValue” property of the advanced search? Is it a space-separated list of strings that you are searching for?

• http://edwardbetts.com/price_per_tb/internal_hdd/ Edward Betts

Thanks for figuring this out. I made a lists of hard drives and SSDs sorted by price per TB:

http://edwardbetts.com/price_per_tb/internal_hdd/

• http://www.disruptiontheory.com Eric Wehrly

Do you know if doing a search by BrandID returns all of the items for that brand (if you do repeated searches for pagination)?
It seems like if I only specify page and brand, I get a list about the size of the largest Category in that brand, and I’m wondering if I’ll need to troll through the NavigationItemList data.

• Some Guy

I am using PHP, trying to update the price of about 400 products that I have their NewEgg ID’s stored in the Database.

Currently I use file_get_contents on http://www.ows.newegg.com/Products.egg/theNewEggID, then json_decode, and take the price out of there.

I always get a timeout / error at some point. ( like, after 100 products for example )

What do you think I’m doing wrong, or how can I get this to work properly?

• http://www.bemasher.net bemasher

How often are you running your program? Newegg might be blocking your requests if it looks malicious, which could be just because of update frequency.

• denz

Dont you know what update frequency is required for not to look malicious ?

• jesterjunk

The following will retrieve user feedback if any exist.

http://www.ows.newegg.com/Products.egg/{ItemNumber}/reviews

• sean

this is free?

• David

can your script also get the bottom price for this item?

http://www.newegg.com/Product/Product.aspx?Item=N82E16889102667

where the retail price is only shown at the checkout page

• Pingback: New Fiddle: Eggratings

• Aaron

This is really helpful, but I was wondering if by now Newegg has an official documented API for developers? A google search does not seem to turn up anything official.

• http://www.bemasher.net bemasher

This is highly unlikely for the same reason Digikey had and got rid of their API several years ago. Online retailers who provide an API make it really easy for competitors to automatically undercut their prices.

• http://disruptiontheory.com Eric Wehrly

Thanks to this article, I put together a script a little while ago to get product ratings across an entire manufacturer. It deals with a few other uses in the API not discussed in this article, and may be useful to someone looking for something other than pricing information.
I wrote up an article on it, here: http://disruptiontheory.com/new-fiddle-eggratings/ and posted the (Java) source code on GitHub, which is linked in the article.

• stepdragon

In any of your tinkering on newegg, have you found any sources for the comments? I’m looking to pull and congregate comment information for the sake of product research… The JSON API you listed looks promising, but I do not know how to discover anything other than what is spelled out in black and white in this list… In other words I can search for product, or get product information, but you have nothing on comments. Are they available in this source? if so where? If its possible to search the heiarchy of the API to figure these things out on my own, that would be great, but I don’t even know that. Any information would be awesome. Any and all searches on Newegg API seem to link back here.

• http://www.gooeypc.com Richard Fox

I believe you may have just made my life much easier. I’ve suffered through three changes at NewEgg. This last one was the last straw, so I sought this out – and found it. I may have to rebuild parts of my factory to parse the data into something I can use, but like an earlier poster said, thank God for Regular Expressions. Thank you. Thank you.

• Oversea

Thank you for your great article.

From your code, it looks that we can search products by UPC code, “IsUPCCodeSearch”.

So, do you know how to search by UPC code?

• bemasher

I haven’t investigated that yet. When I get some time I’ll try out the UPC scanning feature and see what sort of requests it makes.