Homepage › Forums › RetroPie Project › Everything else related to the RetroPie Project › Auto Scraper v0.6
Tagged: scraper
- This topic has 117 replies, 20 voices, and was last updated 8 years, 11 months ago by rafaelr.
-
AuthorPosts
-
10/13/2014 at 03:54 #81716sselphParticipant
This is an auto-scraper that runs from a command line that supports:
NES, SNES, N64, GB, GBC, GBA, MD, SMS, 32X, GG, PCE, A2600, LNX, MAME(see below) ROMsIt works by crawling a directory of ROM files looking for known extensions. When it finds a file it hashes the ROM data minus any headers or special file formatting with the goal of hashing only the data pulled from the original game. It compares this hash to a DB and downloads the metadata and builds the gamelist.xml file.
You can find the source on github:
https://github.com/sselph/scraperAnd for trusting people who don’t want to bother compiling I cross compiled for several platforms, even the rpi:
https://github.com/sselph/scraper/releasesBasic Instructions:
Download/Build the scraper executable for the system of choice.
Copy it into your ROM folder.
Make sure it is executable on linux/mac(chmod +x scraper)
Run it (./scraper -thumb_only)
You will see it processing the roms and it will write a gamelist.xml file in the ROM folder.
If you are not running this on the RPI then copy everything(roms, image folder, gamelist.xml) to the retropi.If you want to run directly on a rpi or rpi2 you can follow these instructions:
https://github.com/sselph/scraper#install-from-my-binariesMAME has a slightly different and is based on names. You need to start the script with the -mame flag:
./scraper -mameIf these instructions are unclear you can checkout floob’s videos:
10/13/2014 at 23:53 #81730FloobMemberHi,
I just gave that a go in this dir:
/home/pi/RetroPie/roms/gbaI ran it with this (using the rpi scraper version here: https://github.com/sselph/scraper/releases):
./scraper -thumb_onlyIt created a gamelist.xml with this content only:
<gameList></gameList>I imagine I’m doing something wrong, can you see what?
10/14/2014 at 03:22 #81735sselphParticipantThe script is looking at the file extensions and doesn’t know what to do with zip files. From what I understood emulationstation couldn’t find roms inside zip files so I didn’t set up code to look inside them. If that has changed I can make a few modifications. For now if you unzip them it should see the .gba files.
You can issue something like the following to unzip all your files.
for f in *.zip; do unzip $f; done10/14/2014 at 16:14 #81758FloobMemberThanks, I should have realised I shouldnt have them zipped.
I now have this error appearing
2014/10/14 14:10:08 INFO: Starting: 0035 - Namco Museum (U).gba 2014/10/14 14:10:24 ERR: error processing 0035 - Namco Museum (U).gba: XML syntax error on line 22: element <meta> closed by </head> 2014/10/14 14:10:24 INFO: Starting: 0035 - Namco Museum (U).gba 2014/10/14 14:10:39 ERR: error processing 0035 - Namco Museum (U).gba: XML syntax error on line 22: element <meta> closed by </head> 2014/10/14 14:10:39 INFO: Starting: 0083 - Final Fight One (E).gba 2014/10/14 14:10:55 ERR: error processing 0083 - Final Fight One (E).gba: XML syntax error on line 22: element <meta> closed by </head> 2014/10/14 14:10:55 INFO: Starting: 0083 - Final Fight One (E).gba 2014/10/14 14:11:11 ERR: error processing 0083 - Final Fight One (E).gba: XML syntax error on line 22: element <meta> closed by </head> 2014/10/14 14:11:11 INFO: Starting: 0083 - Final Fight One (E).gba 2014/10/14 14:11:27 ERR: error processing 0083 - Final Fight One (E).gba: XML syntax error on line 22: element <meta> closed by </head> 2014/10/14 14:11:27 INFO: Starting: 0070 - Kaze no Klonoa - Yumemiru Teikoku (J).gba 2014/10/14 14:11:43 ERR: error processing 0070 - Kaze no Klonoa - Yumemiru Teikoku (J).gba: XML syntax error on line 22: element <meta> closed by </head> 2014/10/14 14:11:43 INFO: Starting: 0070 - Kaze no Klonoa - Yumemiru Teikoku (J).gba
2014/10/14 14:11:59 ERR: error processing 0070 – Kaze no Klonoa – Yumemiru Teikoku (J).gba: XML syntax error on line 22: element <meta> closed by </head>
10/14/2014 at 16:34 #81761sselphParticipantSeems like thegamedb.net is having issues and my script doesn’t present a nice error for that. Should hopefully work if you try again in a few minutes.
10/16/2014 at 13:50 #81807sselphParticipantSeems like thegamedb.net is back up now.
10/18/2014 at 22:02 #81874FloobMemberLooks like a great script, thanks very much.
I put a basic video together for it here. Let me know if I should add anything to it.10/18/2014 at 23:10 #81876sselphParticipantVery nice video and thanks for pointing out the bug about players. I didn’t notice it in the gamexml spec. I’ll get that added to the output. If you have any other feedback on issues, improvements, or platforms to add let me know.
10/19/2014 at 00:53 #81879FloobMemberWould be great if it could support the Megadrive as well.
Maybe it could start by outputting if it could connect to thegamesdb, as earlier it confused me when it couldnt get the data.
10/23/2014 at 17:44 #81964sselphParticipantReleased a new version of the script to add the players, the check if thegamesdb is up, and Megadrive support.
A note on the MD support.
There seems to be 4 accepted extensions(bin, md, smd, zip). BIN is trivial and should work without issue. MD and SMD are interleaved bin files and I have to deinterleave them before computing the hash. Since I don’t have any of these files, I had to work on documentation I found and files I made that hopefully conformed to the format. Let me know if it doesn’t work. Also the SMD format seems to support spliting files, I didn’t add any support for that. For ZIP I wasn’t sure how they are handled and need to do some hands on testing. Does ES treat it as a single file and the emulator just chooses the largest or first valid file from the zip or is it treated as a directory ie. (file.zip/rom.bin, file.zip/rom2.bin)10/23/2014 at 21:15 #81966FloobMemberThanks very much for this.
I’d really like to try it, but all my Megadrive roms are .gen.Do you think this is a .bin?
http://www.openthefile.net/extension/gen/2908
http://yoyofr.proboards.com/thread/731For me the emulationstation config looks for these extensions
<extension>.smd .SMD .bin .BIN .gen .GEN .md .MD .zip .ZIP</extension>
10/23/2014 at 21:38 #81967sselphParticipantAh I must have an older version or something. You could try renaming a few of them to .bin, .smd, or .md to see if the script can hash them to a known hash. I might just need to add an alias for that format to one of the other formats. I would assume it should be similar to bin since .bin is a very generic extension for a binary file so someone probably created .gen to try and make them easier to sort.
10/23/2014 at 22:13 #81971FloobMemberI gave it a go and am getting this at the moment
pi@raspberrypi ~/RetroPie/roms/megadrive $ ./scraper -thumb_only It appears that thegamesdb.net isn't up. If you are sure it is use -skip_check to bypass this error.
thegamesdb.net seems down for me manually checking as well.
10/23/2014 at 22:18 #81973sselphParticipantIf you -skip_check you can bypass that error and when it is processing the rom you’ll see either a hash not found, or the XML syntax error. If you see the syntax error it means it found the hash but couldn’t download the data.
10/23/2014 at 22:27 #81975FloobMemberXML syntax error on the files I renamed .bin – so looking good :)
10/24/2014 at 00:04 #81984sselphParticipantI added the .gen support to mimic .bin. Hopefully that works.
10/24/2014 at 16:16 #81997ceuseParticipantGreat Tool, i run into a problem though
i have subfolders in my Rom directory (translation, europe, Japan, us). i ran the script in every folder seperatly but the pi does not recognise the gamelist File.
is there a way that you can implement subfolder scraping , or at least tell me how i get emulationstation to recognise my subfolders with gamelists.xml?
Thanks in advance
10/25/2014 at 02:29 #82011sselphParticipantSure I’ll take a look this weekend. It should be possible to recursively crawl the subdirectories and generate a single gamelists.xml.
10/25/2014 at 10:31 #82014FloobMemberAre there any other data sources that you could use besides thegamesdb.net ?
Seems down so often at the moment.Is this one possible?
archive.vg10/25/2014 at 17:32 #82019sselphParticipantI’ll look into adding more sources. I have to be careful since the way I’m matching is by taking the hash of the rom data(minus headers, etc) and matching that to a thegamesdb gameID. I do this with a csv file I manually create. I don’t want to be manually creating a second set of IDs since the process is time consuming.
archive.vg has api calls to accept hashes of the rom files. This might work well. If I can figure out how to get an API key, I’ll see about adding it.
There is also https://github.com/OpenVGDB/OpenVGDB/releases which is mapping the rom hash to a name, image link, and a description. The image CDN appears to be down so if that comes back up I can look into adding it as well.
10/25/2014 at 18:28 #82022FloobMemberAh I see. I imagine the single source will be fine, no doubt it will normally be fine. Not sure if this helps: http://api.archive.vg/2.0/
I dont know how difficult it would be, but a lot of people would love MAME support, as thats obviously a key system for emulation, if that got added at some point it would be great.
Separately, a check to see if the ‘image’ directory exists before running it would help forgetful people. Like me……
10/26/2014 at 03:23 #82031FloobMemberDo you know why the <releasedate> node looks odd in the gamelist.xml but displays ok in Emulation Station?
<releasedate>19921220T000000</releasedate>
10/26/2014 at 04:36 #82033exParticipantFirst off the scraper works great! Pulled about 90% of my titles with no problem and I have a ton! Thanks for your work!
The only issue I am having is with unzipped .md roms. It seems to flag an error because of the file extension. Is there any workaround for this?
10/26/2014 at 13:48 #82039sselphParticipant[quote=82031]Do you know why the <releasedate> node looks odd in the gamelist.xml but displays ok in Emulation Station?
<releasedate>19921220T000000</releasedate>
[/quote]
This is the way EmulationStation chose to encode a datetime. https://github.com/Aloshi/EmulationStation/blob/unstable/GAMELISTS.mdYYYYMMDDTHHMMSS since no releases have an exact time the second half is T000000 so YYYYMMDDT000000
[quote=82033]The only issue I am having is with unzipped .md roms. It seems to flag an error because of the file extension. Is there any workaround for this?
[/quote]
What error are you seeing exactly? is it just not matching hashes for any MD files or is it throwing some other error? I suspect there are issues in the way I’m converting these back to bin files for hashing. I just found an issue with my smd code. I’ll write some code to convert the bin file I have to a md and smd and see if the emulator plays it then I’ll know that the code is working.10/26/2014 at 21:17 #82054ceuseParticipantthanks for adding my subfolder Scraping :-)
info for everybody : you need to create a image folder in the root with a Folder for each subfolder youre scraping. so basicly images\europe Images\Usa images\Japan etc
edit i think you have a error in there though (at least windows version).
my xml shows : .\images/Europe\ … i checked the original xmls and there it is allways / .. perhaps its just the windows version with this problem. anyway i just replaced every \ with / in notepad++ and it works fine now. thanks for the great tool :) now just add more and more systems *ggg*
10/27/2014 at 01:08 #82062sselphParticipantThanks for the windows test. I used the golang functions to join paths but it is os dependent and windows uses \. I shouldn’t do that for the gamelist.xml portions since the gamelist.xml will always be read on linux. I saw some issues with retropie displaying the data for roms inside folders but I’m on an older version so that might be working.
I didn’t intend for the images to require sub folders but it might be a good thing. I added support to create the single images directory to make things easier but I’ll clean up the code so you don’t have to create a bunch of extra folders either by having it create them for you or by flattening the structure.
I also researched more about megadrive roms. The documentation I saw had .md as a Multi Game Doctor file but other documentation has this as .mgd. And looking at the emulator in emulationstation they don’t support the Multi Game Doctor format only raw binary and the smd format so I will assume .md is actually the raw binary like .bin and .gen. I also fixed the smd block size and it appears to be working.
The next things I’ll work on are fixing the issues that ceuse has found then add .zip support.
10/27/2014 at 14:22 #82083ramchipParticipantThis looks so awesome! I cannot wait for GamesDB to get back up so I can test this!! Is it possible to add Master System, Mame And FBA? I am extremely excited to use this as scraping has been the weak point of emulationstation! I should have a perfect build with XBMC, OwnCloud, PS3 Controllers and my favorite emulators/games after this!
10/27/2014 at 14:46 #82084sselphParticipantConsole games are much easier since I have a DB of hash values mapped to names from no-intro. I’ve also only ever worked with console roms. For MAME I’ll have to hunt down a list, ask for help creating one, or find a DB that already has them mapped by hash. After I get zip support added and add support for at least one extra data source I’ll take a look at adding more systems.
10/27/2014 at 20:50 #82094ramchipParticipantI got to try this finally and I LOVE IT!! Here are my findings – NES 90%, SNES 80%, GB 90%, GBA 85% and GBC 15%. For some reason it barely found any of my GBC roms but those go through the scraper really well anyways! Thanks for your hard work on this, it has massive potential and should be included with RetroPie in my opinion!
10/27/2014 at 22:57 #82099ceuseParticipant[quote=82094]I got to try this finally and I LOVE IT!! Here are my findings – NES 90%, SNES 80%, GB 90%, GBA 85% and GBC 15%. For some reason it barely found any of my GBC roms but those go through the scraper really well anyways! Thanks for your hard work on this, it has massive potential and should be included with RetroPie in my opinion![/quote]
just wanted to report the same thing… alot of gbc wont scrape even though i randomly checked a few and they defently are in the gamesdb.net .. is there a way to check if the hash of my / our roms are diffrent or if its a problem with the code?
10/28/2014 at 00:38 #82101sselphParticipantSure gameboy color roms are a simple raw binary format so you can do shasum *.gbc and get a list of hashes to file names. Feel free to send me the list in a file. If you want to troubleshoot you can look at the csv here:
https://stevenselph.appspot.com/csv/hash.csvI can think of one issue. If these games were clones of a normal gameboy game just with added color they could be listed in thegamedb as gameboy and not gameboy color. I might just need to expand my search.
10/28/2014 at 00:52 #82102ceuseParticipantGenerated with a tool from one gbc file (all hail the gui!) which the scraper doesnt find :
MD5 Checksum: BA85A2AE8AA5829C440EEF2D5549506C SHA-1 Checksum: 4E6F676EC15E0E6238CB81853B5A74BBB20657A1 SHA-256 Checksum: 8EB56E0D55A04AA3FCF940F172757F4F60BAA6C53C82707DEF8AE4E78844B1DA SHA-512 Checksum: BB5B8C43865D38B3609EA8D1E818A6F2019D9AFFD8538F0D0F05A84F56A55ABFF8334F8B3A276467B54F9B60A6A2E6616E3AB1356E5F54A03F9A2E049577FE55 Generated by MD5 & SHA Checksum Utility @ http://raylin.wordpress.com/downloads/md5-sha-1-checksum-utility
thegamesdb.net link : https://thegamesdb.net/game/21997/
cant find the sha-1 in the csv.. is the rom broken or the list somewhat off? as said by previous poster, there are quite low sucsess rates compared to gb and all the other supported platforms. at least something seems off
10/28/2014 at 01:25 #82104sselphParticipantRom seems fine and I see the exact hash in the no-intro set. I must have made some mistake generating my csv. I’ll go back over gbc dataset to figure it out. Thanks for finding the issue.
10/28/2014 at 17:08 #82129ramchipParticipantThis is incredible! Thanks man! Between your tool and the MAME/FBA scraper I have 100% images and info!!
10/28/2014 at 17:35 #82130FloobMemberYes, it really is a very effective tool, that is pretty easy to use.
I’m sure it will help a lot of people make their RetroPie experience even better.It would be interesting if EmulationStation could hook it into their GUI.
-
AuthorPosts
- The forum ‘Everything else related to the RetroPie Project’ is closed to new topics and replies.