Homepage › Forums › RetroPie Project › Everything else related to the RetroPie Project › New version of sselph/scraper v0.9.0-beta
- This topic has 58 replies, 20 voices, and was last updated 8 years, 8 months ago by sselph.
-
AuthorPosts
-
07/04/2015 at 21:38 #101350sselphParticipant
Hi Everyone,
I’ve been working on my scraper to refactor much of the code to make it easier to add features and I added a few features since I’ve posted last.
https://github.com/sselph/scraperNew Features:
- MAME/Arcade descriptions – I added in information from arcade-history so that MAME and other arcade systems should have more complete data.
- PSX Support – I added support for bin/cue PSX games from redump dat files. It will create a single entry for each cue file.
- Dreamcast Support – I added support for gdi/bin games from redump dat files. It seems reicast supports this format but it isn’t enabled in es_systems.cfg.
- Zip/Gzip support – since retroarch added zip/gzip support I now scan inside zip files for the first file that looks like a rom and scan it.
- More accurate and complete scraping on several systems. Thanks to @robertybob for adding literally ~1000 games to thegamesdb.
- Ability to append to a gamelist – You can now use -append to skip files that are already in the gamelist.zml file.
Guide:
Thanks to Floob there is a very nice video guide that is still valid:
Issues:
Since I’ve changed most of the code and don’t have a lot of tests, I’m sure I have created bugs. Please create issues here:
https://github.com/sselph/scraper/issues07/04/2015 at 23:18 #101360FloobMemberThanks very much for the update. Its great!
Loving the extra description detail on mame roms.I’ve added an error I found on your issue list, it may just be me doing something odd though.
I like the PSX support, although as I have very few PSX games, and I dont use a .cue for single track games I’ll probably still use the ad-hoc in built scraper for those.
Thanks again for all the work you put into this, it makes Emulation Station so much nicer to use.
I’ll try to sort a new video for all these updates!
07/05/2015 at 03:05 #101368sselphParticipantThanks for the report. I’ll release a fix soon if I don’t hear any other issues.
Regarding bin/cue: The scraper will still scrape the bin file if there isn’t a cue file. How it works is it looks for cue files, parses them then gets a list of associated bin files. Then hashes files cue/track1/track2/etc until it finds a match and uses that. So if there isn’t a cue it will just treat the .bin as a binary and hash that like normal.
07/05/2015 at 03:29 #101369FloobMemberAh I see – thats great. I’ll give it a go.
Can you remind me how the mame lookup works – which database does it check?
For example, I’ve got ddp2.zip which is:
http://www.progettoemma.net/index.php?gioco=ddp2&lang=enbut nothing scraped?
07/05/2015 at 03:39 #101370FloobMemberScrap that – it found it this time – just no image returned.
One it didnt find was wyvernf0.zip
http://www.progettoemma.net/index.php?gioco=wyvernf0&lang=en07/05/2015 at 03:48 #101371sselphParticipantIt uses mamedb.com. It strips off the file extension and pulls the url http://www.mamedb.com/game/wyvernf0
mamedb.com uses .147 and wyvernf0 is .154
07/05/2015 at 03:49 #101372FloobMemberAlso, when processing mame4all roms I seem to periodically get these errors
I dont think its rom specific though, as its a consecutive batch, then next scrape they are fine and others complain?
/07/05 01:47:12 INFO: Starting: bosco.zip 2015/07/05 01:47:12 ERR: error processing bosco.zip: ILM Bad HTML 2015/07/05 01:47:12 INFO: Starting: bouldash.zip 2015/07/05 01:47:12 ERR: error processing bouldash.zip: ILM Bad HTML 2015/07/05 01:47:12 INFO: Starting: bouldash.zip 2015/07/05 01:47:12 ERR: error processing bouldash.zip: ILM Bad HTML 2015/07/05 01:47:12 INFO: Starting: bouldash.zip 2015/07/05 01:47:12 ERR: error processing bouldash.zip: ILM Bad HTML 2015/07/05 01:47:12 INFO: Starting: brain.zip 2015/07/05 01:47:13 ERR: error processing brain.zip: ILM Bad HTML 2015/07/05 01:47:13 INFO: Starting: brain.zip 2015/07/05 01:47:13 ERR: error processing brain.zip: ILM Bad HTML 2015/07/05 01:47:13 INFO: Starting: brain.zip 2015/07/05 01:47:13 ERR: error processing brain.zip: ILM Bad HTML 2015/07/05 01:47:13 INFO: Starting: breakers.zip 2015/07/05 01:47:13 ERR: error processing breakers.zip: ILM Bad HTML 2015/07/05 01:47:13 INFO: Starting: breakers.zip 2015/07/05 01:47:13 ERR: error processing breakers.zip: ILM Bad HTML 2015/07/05 01:47:13 INFO: Starting: breakers.zip 2015/07/05 01:47:14 ERR: error processing breakers.zip: ILM Bad HTML 2015/07/05 01:47:14 INFO: Starting: brkthru.zip 2015/07/05 01:47:14 ERR: error processing brkthru.zip: ILM Bad HTML 2015/07/05 01:47:14 INFO: Starting: brkthru.zip 2015/07/05 01:47:14 ERR: error processing brkthru.zip: ILM Bad HTML 2015/07/05 01:47:14 INFO: Starting: brkthru.zip 2015/07/05 01:47:14 ERR: error processing brkthru.zip: ILM Bad HTML 2015/07/05 01:47:14 INFO: Starting: brubber.zip 2015/07/05 01:47:15 ERR: error processing brubber.zip: ILM Bad HTML 2015/07/05 01:47:15 INFO: Starting: brubber.zip
07/05/2015 at 03:49 #101373FloobMember[quote=101371]It uses mamedb.com. It strips off the file extension and pulls the url http://www.mamedb.com/game/wyvernf0
[/quote]
Ah – ok, that explains it. Thanks.
07/05/2015 at 03:56 #101375sselphParticipantHmm those errors are from the mame scraper trying to parse the result of getting the URL and getting a response it can’t parse. Since it happens with different roms and in bursts might be some throttling or issues with the website.
07/05/2015 at 03:58 #101376FloobMemberCould a backupdb query work like this?
http://www.progettoemma.net/gioco.php?game=wyvernf0
with the image being:
http://www.progettoemma.net/snap/wyvernf0/0000.pngJust a thought. I’m more than impressed with what it does already!
07/05/2015 at 04:02 #101377sselphParticipantYeah we can create a backup DB. The metadata I could probably download another dat file parse it and shove it in the same data store I’m using for history then point to images in another site or see how taxing it would be to host them.
07/05/2015 at 04:03 #101378FloobMember[quote=101375]Hmm those errors are from the mame scraper trying to parse the result of getting the URL and getting a response it can’t parse. Since it happens with different roms and in bursts might be some throttling or issues with the website.
[/quote]
Just tried it again, and its fine now. Must have been a temporary bottleneck like you said.
07/05/2015 at 13:25 #101396FloobMemberJust had a major meltdown with some atarilynx rom scraping which seemed fine before. Can you see where the issue may be?
github.com/sselph/scraper/ds.(*Hasher).Hash(0x1080aa90, 0x10f1d320, 0x23, 0x0, 0x0, 0x0, 0x0) /home/sselph/go/src/github.com/sselph/scraper/ds/hasher.go:32 +0x170 fp=0x1a462a4c sp=0x1a4629e0 github.com/sselph/scraper/ds.(*Hasher).Hash(0x1080aa90, 0x10f1d320, 0x23, 0x0, 0x0, 0x0, 0x0) /home/sselph/go/src/github.com/sselph/scraper/ds/hasher.go:32 +0x170 fp=0x1a462ab8 sp=0x1a462a4c github.com/sselph/scraper/ds.(*Hasher).Hash(0x1080aa90, 0x10f1d320, 0x23, 0x0, 0x0, 0x0, 0x0) /home/sselph/go/src/github.com/sselph/scraper/ds/hasher.go:32 +0x170 fp=0x1a462b24 sp=0x1a462ab8 ...additional frames elided... created by main.CrawlROMs /home/sselph/go/src/github.com/sselph/scraper/scraper.go:173 +0x5e4 goroutine 1 [chan send]: main.CrawlROMs(0x11522cc0, 0x10a48010, 0x1, 0x1, 0x10810140, 0x1080aa88, 0x0, 0x0) /home/sselph/go/src/github.com/sselph/scraper/scraper.go:184 +0xf98 main.Scrape(0x10a48010, 0x1, 0x1, 0x10810140, 0x1080aa88, 0x0, 0x0) /home/sselph/go/src/github.com/sselph/scraper/scraper.go:285 +0x194 main.main() /home/sselph/go/src/github.com/sselph/scraper/scraper.go:414 +0xf54 goroutine 5 [syscall]: os/signal.loop() /usr/local/go/src/os/signal/signal_unix.go:21 +0x1c created by os/signal.init·1 /usr/local/go/src/os/signal/signal_unix.go:27 +0x40 goroutine 15 [chan receive]: main.func·003() /home/sselph/go/src/github.com/sselph/scraper/scraper.go:187 +0x60 created by main.CrawlROMs /home/sselph/go/src/github.com/sselph/scraper/scraper.go:184 +0x938 goroutine 14 [chan receive]: main.func·002() /home/sselph/go/src/github.com/sselph/scraper/scraper.go:177 +0x94 created by main.CrawlROMs /home/sselph/go/src/github.com/sselph/scraper/scraper.go:180 +0x6b8 goroutine 10 [select]: net.func·019() /usr/local/go/src/net/dnsclient_unix.go:241 +0x310
07/05/2015 at 15:39 #101406sselphParticipantThanks!
I think I see the error and have submitted a fix and releasing a new version. Hopefully I get all the issues before I hit 1.0.0 :)
07/06/2015 at 23:05 #101522robertybobParticipantKeep up the great work Sselph! If ever you want to add more systems and want someone to help you match up IDs or whatever, just ask me :)
07/08/2015 at 14:01 #101618ekstremeParticipantWorking well for me. Just scraped my GnGeo set.
07/19/2015 at 08:18 #102257socalretrogamerParticipantThanks for this scraper! It works great! Much, much better than the scraper on Emulation Station. There are still a lot of games it didn’t scrape, but I think that’s because some of the ROM file names are truncated. For example, “Zelda2” (no space) didn’t scrape for NES. At some point, I plan on renaming all the files that didn’t scrape, so would you have any tips to ensure the scraper recognizes the title? Particularly a sequel game like Zelda 2? Thanks again!
07/19/2015 at 13:17 #102273sselphParticipantHi socialretrogamer,
To minimize false positives, On consoles I’m not actually using the name of the file only the extension so you could name them 1.nes, 2.nes and it should still work. The scraper is using the rom data itself. It hashes it and compare it to a hash to ID mapping database I generated by hand for each system it supports.
So there are several reasons it may not have scraped a rom:
- Different ROM dump and therefore different hash. The Zelda2 could be bad, hacked, overdumped, or a rev no in the no-intro hashes I used.
- No entry in thegamesdb, for SNES there are 3385 No-Intro roms and only 1055 games in the GDB. With clones I matched 2434.
- No entry in my DB, because I have to manually add the hash>ID, I don’t automatically have new entries.
07/19/2015 at 17:02 #102294gutossnParticipantThe scraper is amazing! Very fast and doesn’t freezes the ES. So could you include the wonderswan and neogeo pocket (and color too) to database? Thank you.
07/19/2015 at 17:59 #102297sselphParticipantI have issues tracking adding new systems on github. It is a function of: are there available hashes, what are the file formats, are there entries in thegamesdb.net, how many games, how busy I am, etc.
Feel free to add issues for each system but I can’t make any promises until I look more closely.
08/02/2015 at 02:46 #103154AnonymousInactiveHi sselph!!
First of all, too many thanks for this awesome scraper!!!I’ve one question that I can’t find a solution: (may be, I’m to newbee ;)
I start one scraper session and, if for any reason (like I abort execution crtl+C, or scraper show errors and exit), the scraper don’t finish a complete rom directory, ¿How can I continue the scraper session without analyze all roms I’ve now correctly scraped?
Thanks again for your hard work with this great super-tool!! :)
*EDIT*
Ok, I think I need to use -append=true param…08/02/2015 at 03:03 #103156sselphParticipantHi,
Yes the -append flag should be what you are looking for, although the scraper will skip downloading any images that already exist so should be fast to catch back up either way.
I have too many flags :)
08/17/2015 at 16:08 #104126OmnijaParticipantWill there be support for psx .pbp formats?
08/18/2015 at 03:45 #104178sselphParticipantI don’t know enough about the pbp file format to know if I could translate the information it contains to what would have been in the original bin file to match it against the hash in redump.
08/19/2015 at 10:35 #104268AnonymousInactiveGreat work on version 1.0.0 sselph!!
I have a question, i have a complete collection of PAL Megadrive boxart……why you may ask, well i feel that the PAL look of the boxart is much more appealing to me (being from the UK) and actually has MegaDrive on the boxart. Is there a way we can implement scrapping just PAL box art for the Megadrive at all. I can upload these images to a place of your discretion if you like, if this would bring this idea into reality??
08/19/2015 at 17:08 #104281sselphParticipantThere are a couple issues with the whole megadrive/genesis situation. First one is when I did the mapping from hash to gamedb id I didn’t really care which version I chose as long as there was a match. So if there were a US version and a EU version I just chose one at random, sometimes I looked to see which one had the best description or clearer image. The other issue is data quality from thegamedb, there are several megadrive games that have genesis art and possibly vice versa.
When I have time to remap MD and GEN I’ll take better care at only giving a MD version a GEN match if there isn’t a MD entry in the DB and vice versa. Ideally we could get the entries in thegamesdb fixed and improved so that other projects benefit as well.
I have tinkered with the idea of setting up a repository of my own to improve some of the MAME stuff but haven’t had time. If I do, I’ll see if I could do something similar for other systems but I imagine the cost would be prohibitive and I won’t actually do any of it :)
08/23/2015 at 14:17 #104523greyhulkParticipanthi guys, im using the inbuilt scraper on psx games its finds the relevants artwork etc but when i restart my pi its all missing again? any advice..
thanks
steve08/23/2015 at 17:18 #104533herbfargusMemberIt may not be writing manual changes unless you cleanly exit emulationstation. So select quite emulationstation from the start menu and when it reloads see if your changes save.
08/28/2015 at 17:52 #104907AnonymousInactiveIs there a build for windows at all?
08/29/2015 at 02:43 #104946sselphParticipantI make several prebuilt binaries available at https://github.com/sselph/scraper/releases
or if your the type that likes compiling it yourself, there are no special instructions for doing it on windows.
08/31/2015 at 14:08 #105092AnonymousInactiveNice!, thanks
11/12/2015 at 19:53 #109770phantom27ParticipantOk… So I might be dumb…. No… I’m pretty sure I am… but I need help.
I have a ROM database that I tried running this on. I did it on my mac. It looked like it worked. Even said saving session… etc. But I can’t find the gamelist.xml file. I even searched my mac for it.
I’m probably doing something wrong.
11/12/2015 at 19:55 #109771phantom27ParticipantYep, I’m an idiot apparently. I didn’t realize it would put it in my ‘home’ folder. Found it.
Ok, stupid question. If I put this file in my ROM folder on my Pi, will it work or is the paths all messed up since I ran it on my mac?
11/16/2015 at 14:48 #110041sselphParticipantHmm the gamelist should be in the same directory where you ran the script was run. I’ve heard some other complaints about this so maybe something has changed.
Anyway if you ran the script from inside a folder with a bunch of roms and didn’t change any of the flags, all the paths should be correct just put the gamelist in the rom folder along with all the roms and the images folder.
01/23/2016 at 03:41 #114810proxycellParticipantHey Steven,
Long time since I last used your scraperI hope this thread is the one to be used for such things:
How would I go about ADDING to this database? I have every fan-translated game there is and I would love for them to be scraped as the original game
-
AuthorPosts
- The forum ‘Everything else related to the RetroPie Project’ is closed to new topics and replies.