Homepage › Forums › RetroPie Project › Everything else related to the RetroPie Project › Auto Scraper v0.6
Tagged: scraper
- This topic has 117 replies, 20 voices, and was last updated 8 years, 9 months ago by rafaelr.
-
AuthorPosts
-
01/06/2015 at 17:52 #84677sur0xParticipant
MAME/FBA scraper from Floob works perfect, the only drawback it’s doesn’t really scrape the metadata, so no description, info, rating , etc :(
01/06/2015 at 19:34 #84688imsuperduckieParticipantThanks, will check it out. Congrats on your newborn!
01/06/2015 at 20:32 #84693petrockblogKeymastercongrats on the new arrival :)
01/20/2015 at 07:56 #85464sselphParticipantThanks!
Some good news. I was able to rewrite the core functionality of my scraper in C++ and get it merged into Emulation Station so it will eventually be an option in the list of scrapers.
While I was there I noticed a pull request to add a mame scraper so I ported that over to Go and added it to my scraper and you can access it with the -mame flag. It should get name, image, player count, rating, developer, genre, date. I don’t have any MAME ROMs so I haven’t fully tested it.
01/20/2015 at 21:20 #85510FloobMemberThats great news! It will be good to see it part of EmulationStation.
The update to your scraper to support mame is amazing! It works very well.
Would a future update be possible to get the description as well?01/20/2015 at 22:07 #85519sselphParticipantThe data is coming from mamedb.com and there doesn’t seem to be a description. If there is another source that is indexed based on filename that has the description, let me know and I’ll see if I can pull it in.
01/20/2015 at 22:21 #85521RooParticipantMAME’s history.dat is kind of the defacto standard for that information.
http://www.arcade-history.com/
Not sure where to get a older version, since some ROM names have changed…
01/20/2015 at 22:48 #85523FloobMemberIs there a way of parsing data from this file?
https://code.google.com/p/romcollectionbrowser/wiki/HowToAddMAMEOfflineOr a way to grab the description here (url has filename):
http://caesar.logiqx.com/php/history.php?id=bloodbro01/20/2015 at 23:11 #85527RooParticipantI’m no programmer. That said, MAMEUI and other front ends and emulators use history.dat to provide info in the GUI. So it can’t be too hard.
It looks to me like the format is:
$info=romname, $bio blaa blaa blaa history here $end
So search the file for the rom name and just grab the bio section, right? It may sound like I’m being a smart-ass but I promise you I’m not :) Just trying to be helpful
01/21/2015 at 02:46 #85547sselphParticipantParsing the history file doesn’t look too difficult. Find $info=<name>\n$bio then grab text until I get to a – UPPERCASEWORD – line or an $end. That appears to be the description portion. I’m clueless when it comes to MAME so the data I’m pulling from mamedb says it is 1.47 I would’ve assumed 1.57 was just more complete version of 1.47 but you are saying that some files were renamed. This file I found seems to show the renaming http://www.progettosnaps.net/renameset/pS_renameSET.txt but seems like they are shuffling things around with every version.
What version is retropie supposed to be using? From there I could verify that the files are correctly named based on hashes of the internal files then go through the renaming to get them to 1.47 for mamedb data and to 1.57 for the history.
01/21/2015 at 05:10 #85550proxycellParticipanthey Steven, I once again wanted to heap praise upon you for not only creating this amazing tool but implementing so many of the suggestions that have been put forth
now i have yet another suggestion, i’m never certain as to how much work these require you but please consider adding in Vectrex support – I’m a fan of any controller-based console that lets me exit back to ES lol…
01/21/2015 at 06:07 #85552RooParticipant[quote=85547]
What version is retropie supposed to be using? From there I could verify that the files are correctly named based on hashes of the internal files then go through the renaming to get them to 1.47 for mamedb data and to 1.57 for the history.
[/quote]RetroPie’s (v2.4x beta, not sure about earlier versions) default MAME emulator is MAME4ALL, which is a fork from MAME v0.37beta5. Which is [b]old[/b] :) Like circa 2000 old.
Not sure why this is where the magical spot they forked from, but in general the older version performs better on the Pi.
01/23/2015 at 03:27 #85749sselphParticipant[quote=85550]please consider adding in Vectrex support[/quote]
Adding new systems isn’t too difficult especially then there are only 30ish games. The issue with this platform is that thegamesdb doesn’t contain the platform or the games. If you get them added to the DB I’ll happily get the hashes and IDs added to my mapping.[quote=85552]RetroPie’s (v2.4x beta, not sure about earlier versions) default MAME emulator is MAME4ALL, which is a fork from MAME v0.37beta5.[/quote]
Thanks. When I get a moment I’ll see if I can do a more accurate job of scraping the mame games for the RetroPie.01/24/2015 at 01:47 #85831FloobMemberI made a quick MAME based update here:
02/25/2015 at 21:47 #89136ceuseParticipantI would have another Request. I fucked up my manually modified Gamelist because i forgot that the tool creates a completly new list:-(
could you built in a optional switch (i.e) found somethig difrent in gamelist allready. do you want to overwrite it ?
also i would love to have a way to manually create a gamelist with thegamesdb ids (somewhat like a batch programm manuscrape.exe -attend -path .\gamelist.xml -imgdir “.\imgs” -thegamesdbid 1255) and that would add to the text file the completly scraped data for that id (for faster manual adding missing stuff)
02/26/2015 at 14:19 #89268zbh23ParticipantI’m not sure if you guys have seen this:
But a co-worker and I got this working brilliantly on a rpi B+
02/27/2015 at 00:10 #89372sselphParticipantceuse:
I’ve got an open issue on github to allow appending to an existing gamelist and if I get than implemented it will have options on how to handle conflicts.I have a tool that will report missing rom data and create a CSV. If I modified that slightly to add allow you to add your own ID information, that could be used as input to the script to fetch the missing data for those files. It has the side effect of allowing you to send it to me so that I could improve my DB in the future.
zbh23:
Nice I hadn’t seen that. I wrote this because the older version of that script and the ES scraper weren’t very good unless you manually chose everything. Maybe people can use it to find things my scraper doesn’t know about.02/27/2015 at 21:55 #89491killer101ParticipantJust tried your scraper. Very good work, saves quite some time.
There is one function that I miss. It would be nice to have an option to “fake images”!
To explain …
I have all the screenshots I need right at hand and don’t want to use the ones provided by whom ever. A function to add the image information to the gamelist file whether or not it is found in a database would be great!
02/27/2015 at 22:59 #89495sselphParticipantThe scraper will skip the image download if it sees a file named the same as the one it would save so if you have a rom roms/nes/rom.nes and a image named rom.jpg you could place the images in roms/nes/images and add the flags -image_suffix=”” -no_thumb
the suffix defaults to “-image” (rom-image.jpg) which i think I copied from ES’s scraper. no_thumb says skip a thumbnail which isn’t used by ES i just include it since it is part of the gamelist spec. I always convert the image to jpg and don’t include an option to change it right now so if you have png’s you could convert or i can expose a flag to choose the image format.
02/28/2015 at 00:54 #89501killer101ParticipantExcellent, thanks!
02/28/2015 at 02:05 #89513killer101ParticipantI found another slight issue.
I have images which the scraper doesn’t find online. So there is no image entry in the gamelist created and ES can’t display the images I already have.
02/28/2015 at 02:41 #89520sselphParticipantAh okay. Let me make a few tweaks to fix that.
02/28/2015 at 03:23 #89527sselphParticipantI added checks to see if the file exists locally even if there isn’t a file on the server. Also added a -download_images flag that you can set to false to force it to only look locally. It is in the process of pushing the new release now.
02/28/2015 at 03:32 #89530killer101ParticipantThanks, will give it a try!
02/28/2015 at 16:04 #89563killer101ParticipantTried it, doesn’t really work.
Before scraping I put all my images into the images folder and start scraping with this command …
scraper -add_not_found=true -download_images=false -image_suffix=”” -mame -no_thumbs
Gamelist is generated, but no images on not found games.
For example …
I have the screenshot for the game abcop.zip. This is what the gamelist look like:
<game id=”abcop” source=”mamedb.com”>
<path>./abcop.zip</path>
<name>A.B. Cop (World, FD1094 317-0169b) </name>
<desc></desc>
<rating>0.833</rating>
<releasedate>1990</releasedate>
<developer>Sega</developer>
<publisher></publisher>
<genre>Driving / Race (chase view) Bike</genre>
<players>1</players>
</game>02/28/2015 at 19:01 #89583sselphParticipantAh sorry, my mame handling is completely separate from the console handling. I just implemented the console part. I’ll go back in this evening to add similar code to mame.
02/28/2015 at 19:54 #89599vretroParticipantHello sselph, nice work!
Have you considered adding Amiga to your compatible systems?There are a few resources online which catalogue Commodore Amiga box art and screen shots.
I would guess you’d have to use a similar technique to how you handle MAME name lookup because of the nature of Amiga adf files.
Useful resources to help, if you consider this:
http://www.exotica.org.uk/wiki/Amiga_Game_Box_Scans (wiki style, box art)
http://hol.abime.net/hol_search.php (large collection of 6616 entries, screenshots, box art)
http://www.lemonamiga.com (3518 entries, screen shots, title screens, some box art)
https://archive.org/details/Commodore_Amiga_TOSEC_2012_04_10 (xml file containing disk titles, meta data and old reviews – perhaps newer versions of these files are available in the same format?)Thank you for your efforts
03/01/2015 at 13:46 #89704killer101ParticipantI tried to scrape GBA today, but it doesn’t work for me too.
Command..
scraper -add_not_found=true -image_suffix=”” -no_thumb
I have this game “007 – Everything or Nothing (UE) (M3) [!].gba” and the corresponding sreeenshot.
This is what the gamelist looks like ..
<game id=”” source=””>
<path>./007 – Everything or Nothing (UE) (M3) [!].gba</path>
<name>007 – Everything or Nothing (USA, Europe) (En,Fr,De)</name>
<desc></desc>
<releasedate></releasedate>
<developer></developer>
<publisher></publisher>
<genre></genre>
</game>Image is still missing.
03/01/2015 at 18:27 #89723sselphParticipantkiller101:
Thanks for testing. I obviously didn’t think this change through all the way. I’ve updated the change so that if something is being written to XML with empty image lines, it will check to see if they exist and add them. This will hopefully cover all the different options.vretro:
My initial thought is that Amiga seems difficult. The hash data I’ve found is for the ipf formatting not the adf formatting and it doesn’t seem possible to convert from one to the other. The HOL site has the best set of data but nothing is keyed off file name. So would end up relying on search similar to the built in scraper which would have similar issues of being unreliable. I’ll continue to look in to it.03/01/2015 at 22:16 #89748killer101ParticipantScraped around a bit, works really good now. Thanks!
I stumbled over 2 bugs, I think.
When scraping FBA or NeoGeo with the -mame and the -no_thumb switches, the scraper generates the thumb listings anyway. Scraping MAME with these 2 switches, no problem.
When scraping Megadrive, I almost get an unexpected EOF error on nearly every file. Looked a bit closer. Seems to occur if the file ending is anything else than .md!
03/03/2015 at 02:37 #89954sselphParticipantI’m assuming these are .SMD since .MGD aren’t accepted by the emulator.
This could occur if the file’s size wasn’t an increment of 16kB. The SMD file format breaks the file in to blocks of 16k then swaps bits around so that all the even bits are at the beginning of the block and odd bits at the end. So I read the file in chunks of 16k assuming it is possible since all no-intro entries except a couple prototypes have sizes divisible by 16384.
The other possibility is there is a bug. I’ll read over the code and write some tests. In the meantime, do you mind checking the size of a couple of these to confirm they are indeed divisible by 16384.
03/03/2015 at 13:31 #89985killer101ParticipantI checked about 15 out of 140. All are divisible by 16384. I checked the filesize of the .smd files. All my files are zipped by the way!
03/04/2015 at 04:44 #90094sselphParticipantThink I found the issue. SMD are supposed to have a 512 byte header so it shouldn’t be divisible by 16384. I went ahead and refactored the logic for MD. I read the file’s content to try and determine the formatting instead of relying on the extension and make the 512 byte header optional.
Thanks again for testing this.
03/05/2015 at 19:57 #90258AnonymousInactiveIs the anyway to scrap just for PAL Megadrive boxart?
03/05/2015 at 20:15 #90260FloobMember[quote=90258]Is the anyway to scrap just for PAL Megadrive boxart?[/quote]
There are 526 PAL boxart covers available from Emumovies if that helps:
http://emumovies.com/forums/index.php/page/portalYou could then use these local images for the scrape.
-
AuthorPosts
- The forum ‘Everything else related to the RetroPie Project’ is closed to new topics and replies.