Homepage › Forums › RetroPie Project › Everything else related to the RetroPie Project › Auto Scraper v0.6
Tagged: scraper
- This topic has 117 replies, 20 voices, and was last updated 8 years, 11 months ago by rafaelr.
-
AuthorPosts
-
10/28/2014 at 18:30 #82131ceuseParticipant
Hey mate, i saw you fixed the error with backslashs & image subfolders. although i kinda liked the way it organised with sub-image folders. would be great if you put a switch in (-seperate_folders or smth)
also another gbc hashfile / problem gbc rom :
MD5 Checksum: 1F1FB3CF8783F880BC796D667BE60231 SHA-1 Checksum: DD6E952B730C4BD85F8734156D43A2616B68C053 SHA-256 Checksum: 4B9EDBB8BFA01AF9FE525E3B645D396493AF7E5ABAFFD1245D11DE698A23257F SHA-512 Checksum: 68E26E68D3B76B44579C50CE78C80948C474C995C36170A5FF9E6742A428DE2FF9A3009D64CEFC6A06050FC384FAEC8D1174960439CBA599692A0802C37E7BDF Generated by MD5 & SHA Checksum Utility @ http://raylin.wordpress.com/downloads/md5-sha-1-checksum-utility
it scrapes this image/game : https://thegamesdb.net/game/238/
but it should actialy scrape : https://thegamesdb.net/game/20734/hope this info helps (since it shows something else then not found)
10/29/2014 at 04:06 #82154sselphParticipantI hopefully fixed the GBC entries in the DB. I also added zip support. Right now it is a little dumb in that it searches the files in the zip for the first one with a valid extension and attempts to hash it. It doesn’t look to see if the extension is one that should be zipped (currently only MD).
ceuse, I’ll take a look at separating the images into sub folders, and will make sure the script generates the folder structure for you. It originally did this because of a bug in the parsing of paths in windows so when I fixed the path issues this side effect went away.
10/31/2014 at 04:41 #82252sselphParticipantWhile theGamesDB was up today I scrapped all the games my script can match and propped up a service to serve images and xml to mimic the API. Now if the DB is down you can use -use_cache to query my service. To save bandwidth I only downloaded the thumbnail sized images but it is better than nothing.
10/31/2014 at 17:24 #82268ceuseParticipantHey mate tried the newest beta (sega master system support)
sadly nothing scraped (amd64 release used)
example game :
https://thegamesdb.net/game/2679/
hash :MD5 Checksum: 0713F2E55A1EEA0D9E2FB7044740261B SHA-1 Checksum: 2A9090ED365E7425CA7A59F87B942C16B376F0A3 SHA-256 Checksum: B9E65FF66D3F82006E9A3D0DDF98DC2525960903D7F8F557CE81185F28BCD9E2 SHA-512 Checksum: E7F178C80224F87066F756B4C7777F69433AF1B9EDDA86BF9C6A4BC676758906523E382896CB8F9107A0A99E1D63AE4C3336FB3BDB5B16BBB097C96AFD1FA6D1 Generated by MD5 & SHA Checksum Utility @ http://raylin.wordpress.com/downloads/md5-sha-1-checksum-utility
hope my posts help you out. doesnt meen to anoy you just trying to help :)
10/31/2014 at 18:14 #82272sselphParticipantHaha oops. I forgot to actually push the new CSV file. It should work now. If you’ve run the script in the past 30m just rm /tmp/hash.csv to clear the cached copy of the hashes.
The feedback definitely helps.
10/31/2014 at 18:25 #82273ceuseParticipantworks great now :-).
btw did you edit anything about the gbc hash.csv yet? and how about that subfolder switch now? ;-)
your awesome mate. slowly but surely my collection gets imaged :)11/01/2014 at 01:54 #82285sselphParticipantYeah the GBC issues should be much better. I think there was ~60% coverage in theGamesDB. I also just added in your new flag -nested_img_dir to tell it to create the nested sub directories under images and it should create them all for you.
11/01/2014 at 10:06 #82289ceuseParticipantI Love you *blush* cant wait for the next release just saw the last commitments cant wait for the next release :-)
11/01/2014 at 21:42 #82317ceuseParticipanttried the new version :-) 32x and gg worked great. tourbographx-16 seems like an awefull low find rate
another example :
https://thegamesdb.net/game/23253/
hash from file my rom :
MD5 Checksum: 628F051FD0F90EB422664A1AAA53670B SHA-1 Checksum: 37FD9288404B749739DB1CA228EEE220932040FB SHA-256 Checksum: 9B074EEBF105E34F18735D54A5EC345AC3418E819C7833C7B0FAB8F1C23B4078 SHA-512 Checksum: DCBD7249EEA90532FD9F95CC7E7EEF299A9F74C5DB8FB5D1850C62408986C19612C2F3983493B0262C84075C19CFBDF0D16811BFEC5219FA43BCADB703805BBE Generated by MD5 & SHA Checksum Utility @ http://raylin.wordpress.com/downloads/md5-sha-1-checksum-utility
hope it helps. keep up the good work (i love the nested tag :-)
edit :
i think i broke my installation :-( complete ui is broken cant even get in the menue anymore everything blank. i think i will reinstall and copy everything over again. all started with white rom pictures and got worse from there. anyone had this problem earlier?11/02/2014 at 01:07 #82327sselphParticipantYeah turbographx-16 was a low hit rate. Of the hashes I did have ~50% matched and appears the hashes from no-intro aren’t complete. That hash you have doesn’t match the hash I have for that game. If you want you can send me all the sha1 hashes along with the file names for your collection and I can update my DB. shopt -s globstar && shasum **/*.pce from a linux machine will do the trick easy enough or if the tool you are using can do a similar output that’d work. You can send it to me in a PM or email.
11/02/2014 at 12:19 #82339ceuseParticipanta binary update fixed my installation :-) im slowly getting the hang of linux
also send you the Hashfile via pn
Edit : ok as soon as i readd the turbograph games & gamelist the complete emulationstation gets fucked up.
after i Removed them again its stilled messed up (i notice by alot of [][][][] in the options for sort by in the gamelist. also quite a few white pictures in the gamelist. strange stuff) gonna update binaries again and stay away from turbographfx games for a while
11/02/2014 at 15:14 #82347sselphParticipantOdd is this only with the gamelist.xml there? If so, maybe there is something about certain file names or the downloaded data that I’m not escaping correctly and/or ES isn’t probably handling? I’ll try and reconstruct your gamelist.xml from the information you sent and see if I see something that might cause that. I know ES doesn’t like unicode so I may double check and make sure to remove anything that can’t be encoded in acsii.
11/02/2014 at 15:29 #82348ceuseParticipantUnicode could be a thing. prolly need to check my gamelist.xmls which i edited manually if it gets encoded wrong. Is UTF-8 without BOM the right encodeing or better ansi?
edit: first google result said ansi would be better. prolly that was my problem since i edited the gamelist.xml manually also. well first the emulationstation needs finish compile ^^
edit2: it also seems that its a memory issue when i added another system to the gui. gotta have to reallocate memory according to friend google
edit3: yep it was the number of systems. 12 are working fine, when i added 2 more it broke.
11/07/2014 at 17:00 #82493ceuseParticipantsooo. how about atari 2600 and gba support ? :-)
11/07/2014 at 17:46 #82494sselphParticipantI thought I had GBA support already but never really tested it so maybe there are issues. I can look into the atari support but was currently looking at what it would take to add MAME but with something like 28000 games it might be easier to add atari.
11/08/2014 at 19:40 #82525ceuseParticipantme neither, didnt knew gba worked aswell since it wasnt descriped anywhere :-) works quite well :-) and atari i ask because i got alot roms form that system ;-)
11/12/2014 at 02:56 #82644sselphParticipantI added some Atari 2600 hashes to the mix. The coverage wasn’t as good as I would’ve liked. I’m in the process of refactoring so I can pull in other data sources which will hopefully fill in some of the gaps with some basic data. Sorry it took a little longer than normal, don’t have as much time these days.
11/15/2014 at 02:18 #82744sselphParticipantAdded OpenVGDB as an alternative datasource. It should hopefully fill in some of the gaps in theGamesDB’s DB. It will use this DB when it can’t find data in my original hash.csv lookup. I’ll get back to trying to add MAME now.
11/15/2014 at 10:56 #82757ceuseParticipantthanks for the atari code :-) ovgdb seems rather bad though. checked with atari2600 and they got no images and mostly wrong data. dont know if atari2600 is just bad there or ovdb is just bad
11/15/2014 at 13:38 #82762sselphParticipantThat is unfortunate because it looked like it could be a good alternative. The stuff I looked at was okay but didn’t look at atari. I wanted to start trying to get basic information even if it was not as complete but the last thing I want is something being incorrectly identified. I can filter out results from ovgdb that don’t have description and images which might help. I can also switch -use_ovgdb flag to default to false but in the meantime you can do -use_ovgdb=false to disable it.
11/21/2014 at 23:55 #82960ceuseParticipantsooooo…. how about even more systems? :-) atari 5200 & 7800? or whatever else is missing :-)
11/29/2014 at 17:36 #83207ceuseParticipantok i tried the scraper again today and found some strange errors while using it :
any idea what this is?
edit ok i just think atari 2600 generaly is broken.. alot of double scrapes. completly wrong scrapes and even got a ps4 picture scraped .. strange stuff
12/31/2014 at 00:35 #84310imsuperduckieParticipantTried out both version 54 and 53 and am getting the following errors. Can someone help?
2014/12/30 13:47:15 ERR: error processing 2020 Super Baseball (U).smc: image: unknown format
12/31/2014 at 02:59 #84317sselphParticipantThis error is from Go trying to detect the format of one of the images it downloaded and can’t match it to jpeg or png. If the error is happening consistently, maybe there is something odd with the image or thumbnail on thegamesdb, you could try the -use_cache to use my copy of the images. Maybe Google’s caching service will have fixed the issue.
12/31/2014 at 05:05 #84323techstepParticipantDoesn’t scrap atari st. The hash updates and then it just stops.
12/31/2014 at 05:07 #84324techstepParticipantAtari2600 scraped perfectly for me. 375 games all with pictures and descriptions.
12/31/2014 at 16:11 #84340nolageekParticipantRunning it now, cant wait! :)
Question, do we have to have the scraper file in each of the rom directories, or can we just keep it in /roms/ and then run ‘scraper snes’ to process the snes directory?
Edit: I see this is already an option – I should have done ‘./scraper -?’ before posting! :)
12/31/2014 at 16:37 #84342nolageekParticipantsnes seemed to work really well, atari2600, not so much. Probably 80%-90% failed to process the .bin file almost all had “hash not found” errors. nes is having a few errors with hashes not being found.
Do these mean there’s an issue with the rom file?
12/31/2014 at 17:50 #84353sselphParticipantI originally designed this to build the gamelist.xml on my desktop then copy everything over to the pi so I added options to give the directories on the local and remote systems but the default, to make things easy, was to copy it to the rom dir. It would be possible to add an option to detect if it is running on a retropie installation to be smarter about directories.
The hash not found means that it hashed the rom file and didn’t find a match in the list I compiled. This means the hash wasn’t part of the no-intro set which only has known good rom hashes or I didn’t find the game on thegamesdb. With atari the coverage is not great, I only found ~%50 of the no-intro roms in thegamesdb.
12/31/2014 at 20:36 #84367nolageekParticipantThis script has been a game changer for me (almost literally!), thanks so much!
One request I have would be a way to include our own database file some kind of way? I have a quite a few homebrew games that I’ve downloaded and I’d like to have those not be overwritten if I have to run this again, since I have to add them by hand.
12/31/2014 at 21:36 #84368sselphParticipantI’m glad it helped. For the DB the simple solution might be to just provide the base xml file that would be appended with missing information. The other option is to have you provide a leveldb or sqlite type db with the hash and information but that might be a lot of overhead.
01/02/2015 at 21:16 #84436proxycellParticipantif i combined -use_cache -use_gdb _use_ovgdb together, what would the process of the app be?
and would using -skip_check still use gdb but just not check first?
01/02/2015 at 22:06 #84438sselphParticipant-use_cache affects the -use_gdb in that it doesn’t actually download data from the gdb but uses my cached version of the gdb data. -use_cache with -use_gdb=false doesn’t change anything. -use_ovgdb and -use_gdb together will check GDB first then fall back to the OpenVGDB if there isn’t a match in my hashes. So all three would use my cached version of the GDB if I have a matching hash in my DB then fallback to the OpenVGDB if there isn’t a match.
-skip_check
I just noticed some issues with gdb check but how it is supposed to work is if you are using gdb and and not using the cache then I try and determine if the gdb is up first by trying to fetch the game with id=1. I do this so I can give a nice looking error upfront. The -skip_check flag is there in case there are issues with fetching this game but the user knows for sure that the GDB is otherwise working. Right now the GDB check runs even if you aren’t using the GDB, I’ll fix that.01/06/2015 at 01:16 #84655imsuperduckieParticipantThanks, the -use_cache works perfectly.
Any chance you have the scraper built for mame yet? If not, what is the preferred method for mame on the pie? I’ve tried the built in scraper and like everyone else mentioned, it’s pretty much crap. Not to mention, i only got descriptions and blank/black for thumbnail.
Any advice is appreciated. Thanks!
01/06/2015 at 02:00 #84657sselphParticipantI’ve been working on MAME but it has been slow going for a few reasons. We had our first child a few months ago which reduced the amount of free time, I’m not familiar with MAME, the process and datasources are different from what I have now, and there are just so many titles. I plan to add a mame/fba mode that switches from hash matching to name matching and uses a different source of data for the images.
Floob has a video on getting MAME/FBA gamelists built:
-
AuthorPosts
- The forum ‘Everything else related to the RetroPie Project’ is closed to new topics and replies.