User talk:Xerxes: Difference between revisions

→‎JALT01: Scraping info
(→‎JALT01: Scraping info)
Line 107: Line 107:


So you're telling me that every digital Wii game has publicly available title information sorted in this exact consistent format across all regions? What kind of tools are you using for the data scraping? Are you comparing against a database (like ours) or doing it manually or what? I've never really trusted GameTDB, so being able to stop relying on them for IDs would be awesome if it can be done. - [[User:Xerxes|Xerxes]] ([[User talk:Xerxes|talk]]) 01:15, 5 October 2017 (CEST)
So you're telling me that every digital Wii game has publicly available title information sorted in this exact consistent format across all regions? What kind of tools are you using for the data scraping? Are you comparing against a database (like ours) or doing it manually or what? I've never really trusted GameTDB, so being able to stop relying on them for IDs would be awesome if it can be done. - [[User:Xerxes|Xerxes]] ([[User talk:Xerxes|talk]]) 01:15, 5 October 2017 (CEST)
:Yep. Every digital Wii game has publicly avaliable information. Even the ones that Nintendo's removed from the Wii Shop are still up there. I'm using a simple CURL command with some basic assumptions. I'm assuming that System Codes and Region Codes are only letters, and that all of the GameIDs for Wii Shop content can only use uppercase letters or digits. These assumptions were based on WiiBrew's title database and mostly just done to condense the search space so I wouldn't be waiting around all day for the scrape to finish.
curl -f http://ccs.cdn.shop.wii.com/ccs/download/00010001{41,42,43,44,45,46,47,48,49,4A,4B,4C,4D,4E,4F,50,51,52,53,54,55,56,57,58,59,5A}{30,31,32,33,34,35,36,37,38,39,41,42,43,44,45,46,47,48,49,4A,4B,4C,4D,4E,4F,50,51,52,53,54,55,56,57,58,59,5A}{30,31,32,33,34,35,36,37,38,39,41,42,43,44,45,46,47,48,49,4A,4B,4C,4D,4E,4F,50,51,52,53,54,55,56,57,58,59,5A}{41,42,43,44,45,46,47,48,49,4A,4B,4C,4D,4E,4F,50,51,52,53,54,55,56,57,58,59,5A}/tmd --create-dirs --output ..\..\TMDs\00010001#1#2#3#4.tmd
This is the command I used. It only saves a TMD file if there actually is one, which is important because otherwise, it would save the 404 error HTML page as every wrong GameID which would make verification a hassle. My scrape should be done in a couple hours so we'll be able to take full advantage of it. If I was to throw away all assumptions as to what GameIDs are allowed I'd end up with a search space 4,294,967,296 items large which might take until 2019 to fully scrape and would be extremely unlikely to actually find anything more. - [[User:PowerKitten|PowerKitten]] ([[User talk:PowerKitten|talk]]) 01:38, 5 October 2017 (CEST)
64

edits