Homepage › Forums › RetroPie Project › Everything else related to the RetroPie Project › Auto Scraper v0.6
Tagged: scraper
- This topic has 117 replies, 20 voices, and was last updated 9 years, 1 month ago by
rafaelr.
-
AuthorPosts
-
01/06/2015 at 17:52 #84677
sur0x
ParticipantMAME/FBA scraper from Floob works perfect, the only drawback it’s doesn’t really scrape the metadata, so no description, info, rating , etc :(
01/06/2015 at 19:34 #84688imsuperduckie
ParticipantThanks, will check it out. Congrats on your newborn!
01/06/2015 at 20:32 #84693petrockblog
Keymastercongrats on the new arrival :)
01/20/2015 at 07:56 #85464sselph
ParticipantThanks!
Some good news. I was able to rewrite the core functionality of my scraper in C++ and get it merged into Emulation Station so it will eventually be an option in the list of scrapers.
While I was there I noticed a pull request to add a mame scraper so I ported that over to Go and added it to my scraper and you can access it with the -mame flag. It should get name, image, player count, rating, developer, genre, date. I don’t have any MAME ROMs so I haven’t fully tested it.
01/20/2015 at 21:20 #85510Floob
MemberThats great news! It will be good to see it part of EmulationStation.
The update to your scraper to support mame is amazing! It works very well.
Would a future update be possible to get the description as well?01/20/2015 at 22:07 #85519sselph
ParticipantThe data is coming from mamedb.com and there doesn’t seem to be a description. If there is another source that is indexed based on filename that has the description, let me know and I’ll see if I can pull it in.
01/20/2015 at 22:21 #85521Roo
ParticipantMAME’s history.dat is kind of the defacto standard for that information.
http://www.arcade-history.com/
Not sure where to get a older version, since some ROM names have changed…
01/20/2015 at 22:48 #85523Floob
MemberIs there a way of parsing data from this file?
https://code.google.com/p/romcollectionbrowser/wiki/HowToAddMAMEOfflineOr a way to grab the description here (url has filename):
http://caesar.logiqx.com/php/history.php?id=bloodbro01/20/2015 at 23:11 #85527Roo
ParticipantI’m no programmer. That said, MAMEUI and other front ends and emulators use history.dat to provide info in the GUI. So it can’t be too hard.
It looks to me like the format is:
$info=romname, $bio blaa blaa blaa history here $end
So search the file for the rom name and just grab the bio section, right? It may sound like I’m being a smart-ass but I promise you I’m not :) Just trying to be helpful
01/21/2015 at 02:46 #85547sselph
ParticipantParsing the history file doesn’t look too difficult. Find $info=<name>\n$bio then grab text until I get to a – UPPERCASEWORD – line or an $end. That appears to be the description portion. I’m clueless when it comes to MAME so the data I’m pulling from mamedb says it is 1.47 I would’ve assumed 1.57 was just more complete version of 1.47 but you are saying that some files were renamed. This file I found seems to show the renaming http://www.progettosnaps.net/renameset/pS_renameSET.txt but seems like they are shuffling things around with every version.
What version is retropie supposed to be using? From there I could verify that the files are correctly named based on hashes of the internal files then go through the renaming to get them to 1.47 for mamedb data and to 1.57 for the history.
01/21/2015 at 05:10 #85550proxycell
Participanthey Steven, I once again wanted to heap praise upon you for not only creating this amazing tool but implementing so many of the suggestions that have been put forth
now i have yet another suggestion, i’m never certain as to how much work these require you but please consider adding in Vectrex support – I’m a fan of any controller-based console that lets me exit back to ES lol…
01/21/2015 at 06:07 #85552Roo
Participant[quote=85547]
What version is retropie supposed to be using? From there I could verify that the files are correctly named based on hashes of the internal files then go through the renaming to get them to 1.47 for mamedb data and to 1.57 for the history.
[/quote]RetroPie’s (v2.4x beta, not sure about earlier versions) default MAME emulator is MAME4ALL, which is a fork from MAME v0.37beta5. Which is [b]old[/b] :) Like circa 2000 old.
Not sure why this is where the magical spot they forked from, but in general the older version performs better on the Pi.
01/23/2015 at 03:27 #85749sselph
Participant[quote=85550]please consider adding in Vectrex support[/quote]
Adding new systems isn’t too difficult especially then there are only 30ish games. The issue with this platform is that thegamesdb doesn’t contain the platform or the games. If you get them added to the DB I’ll happily get the hashes and IDs added to my mapping.[quote=85552]RetroPie’s (v2.4x beta, not sure about earlier versions) default MAME emulator is MAME4ALL, which is a fork from MAME v0.37beta5.[/quote]
Thanks. When I get a moment I’ll see if I can do a more accurate job of scraping the mame games for the RetroPie.01/24/2015 at 01:47 #85831Floob
MemberI made a quick MAME based update here:
02/25/2015 at 21:47 #89136ceuse
ParticipantI would have another Request. I fucked up my manually modified Gamelist because i forgot that the tool creates a completly new list:-(
could you built in a optional switch (i.e) found somethig difrent in gamelist allready. do you want to overwrite it ?
also i would love to have a way to manually create a gamelist with thegamesdb ids (somewhat like a batch programm manuscrape.exe -attend -path .\gamelist.xml -imgdir “.\imgs” -thegamesdbid 1255) and that would add to the text file the completly scraped data for that id (for faster manual adding missing stuff)
02/26/2015 at 14:19 #89268zbh23
ParticipantI’m not sure if you guys have seen this:
But a co-worker and I got this working brilliantly on a rpi B+
02/27/2015 at 00:10 #89372sselph
Participantceuse:
I’ve got an open issue on github to allow appending to an existing gamelist and if I get than implemented it will have options on how to handle conflicts.I have a tool that will report missing rom data and create a CSV. If I modified that slightly to add allow you to add your own ID information, that could be used as input to the script to fetch the missing data for those files. It has the side effect of allowing you to send it to me so that I could improve my DB in the future.
zbh23:
Nice I hadn’t seen that. I wrote this because the older version of that script and the ES scraper weren’t very good unless you manually chose everything. Maybe people can use it to find things my scraper doesn’t know about.02/27/2015 at 21:55 #89491killer101
ParticipantJust tried your scraper. Very good work, saves quite some time.
There is one function that I miss. It would be nice to have an option to “fake images”!
To explain …
I have all the screenshots I need right at hand and don’t want to use the ones provided by whom ever. A function to add the image information to the gamelist file whether or not it is found in a database would be great!
02/27/2015 at 22:59 #89495sselph
ParticipantThe scraper will skip the image download if it sees a file named the same as the one it would save so if you have a rom roms/nes/rom.nes and a image named rom.jpg you could place the images in roms/nes/images and add the flags -image_suffix=”” -no_thumb
the suffix defaults to “-image” (rom-image.jpg) which i think I copied from ES’s scraper. no_thumb says skip a thumbnail which isn’t used by ES i just include it since it is part of the gamelist spec. I always convert the image to jpg and don’t include an option to change it right now so if you have png’s you could convert or i can expose a flag to choose the image format.
02/28/2015 at 00:54 #89501killer101
ParticipantExcellent, thanks!
02/28/2015 at 02:05 #89513killer101
ParticipantI found another slight issue.
I have images which the scraper doesn’t find online. So there is no image entry in the gamelist created and ES can’t display the images I already have.
02/28/2015 at 02:41 #89520sselph
ParticipantAh okay. Let me make a few tweaks to fix that.
02/28/2015 at 03:23 #89527sselph
ParticipantI added checks to see if the file exists locally even if there isn’t a file on the server. Also added a -download_images flag that you can set to false to force it to only look locally. It is in the process of pushing the new release now.
02/28/2015 at 03:32 #89530killer101
ParticipantThanks, will give it a try!
02/28/2015 at 16:04 #89563killer101
ParticipantTried it, doesn’t really work.
Before scraping I put all my images into the images folder and start scraping with this command …
scraper -add_not_found=true -download_images=false -image_suffix=”” -mame -no_thumbs
Gamelist is generated, but no images on not found games.
For example …
I have the screenshot for the game abcop.zip. This is what the gamelist look like:
<game id=”abcop” source=”mamedb.com”>
<path>./abcop.zip</path>
<name>A.B. Cop (World, FD1094 317-0169b) </name>
<desc></desc>
<rating>0.833</rating>
<releasedate>1990</releasedate>
<developer>Sega</developer>
<publisher></publisher>
<genre>Driving / Race (chase view) Bike</genre>
<players>1</players>
</game>02/28/2015 at 19:01 #89583sselph
ParticipantAh sorry, my mame handling is completely separate from the console handling. I just implemented the console part. I’ll go back in this evening to add similar code to mame.
02/28/2015 at 19:54 #89599vretro
ParticipantHello sselph, nice work!
Have you considered adding Amiga to your compatible systems?There are a few resources online which catalogue Commodore Amiga box art and screen shots.
I would guess you’d have to use a similar technique to how you handle MAME name lookup because of the nature of Amiga adf files.
Useful resources to help, if you consider this:
http://www.exotica.org.uk/wiki/Amiga_Game_Box_Scans (wiki style, box art)
http://hol.abime.net/hol_search.php (large collection of 6616 entries, screenshots, box art)
http://www.lemonamiga.com (3518 entries, screen shots, title screens, some box art)
https://archive.org/details/Commodore_Amiga_TOSEC_2012_04_10 (xml file containing disk titles, meta data and old reviews – perhaps newer versions of these files are available in the same format?)Thank you for your efforts
03/01/2015 at 13:46 #89704killer101
ParticipantI tried to scrape GBA today, but it doesn’t work for me too.
Command..
scraper -add_not_found=true -image_suffix=”” -no_thumb
I have this game “007 – Everything or Nothing (UE) (M3) [!].gba” and the corresponding sreeenshot.
This is what the gamelist looks like ..
<game id=”” source=””>
<path>./007 – Everything or Nothing (UE) (M3) [!].gba</path>
<name>007 – Everything or Nothing (USA, Europe) (En,Fr,De)</name>
<desc></desc>
<releasedate></releasedate>
<developer></developer>
<publisher></publisher>
<genre></genre>
</game>Image is still missing.
03/01/2015 at 18:27 #89723sselph
Participantkiller101:
Thanks for testing. I obviously didn’t think this change through all the way. I’ve updated the change so that if something is being written to XML with empty image lines, it will check to see if they exist and add them. This will hopefully cover all the different options.vretro:
My initial thought is that Amiga seems difficult. The hash data I’ve found is for the ipf formatting not the adf formatting and it doesn’t seem possible to convert from one to the other. The HOL site has the best set of data but nothing is keyed off file name. So would end up relying on search similar to the built in scraper which would have similar issues of being unreliable. I’ll continue to look in to it.03/01/2015 at 22:16 #89748killer101
ParticipantScraped around a bit, works really good now. Thanks!
I stumbled over 2 bugs, I think.
When scraping FBA or NeoGeo with the -mame and the -no_thumb switches, the scraper generates the thumb listings anyway. Scraping MAME with these 2 switches, no problem.
When scraping Megadrive, I almost get an unexpected EOF error on nearly every file. Looked a bit closer. Seems to occur if the file ending is anything else than .md!
03/03/2015 at 02:37 #89954sselph
ParticipantI’m assuming these are .SMD since .MGD aren’t accepted by the emulator.
This could occur if the file’s size wasn’t an increment of 16kB. The SMD file format breaks the file in to blocks of 16k then swaps bits around so that all the even bits are at the beginning of the block and odd bits at the end. So I read the file in chunks of 16k assuming it is possible since all no-intro entries except a couple prototypes have sizes divisible by 16384.
The other possibility is there is a bug. I’ll read over the code and write some tests. In the meantime, do you mind checking the size of a couple of these to confirm they are indeed divisible by 16384.
03/03/2015 at 13:31 #89985killer101
ParticipantI checked about 15 out of 140. All are divisible by 16384. I checked the filesize of the .smd files. All my files are zipped by the way!
03/04/2015 at 04:44 #90094sselph
ParticipantThink I found the issue. SMD are supposed to have a 512 byte header so it shouldn’t be divisible by 16384. I went ahead and refactored the logic for MD. I read the file’s content to try and determine the formatting instead of relying on the extension and make the 512 byte header optional.
Thanks again for testing this.
03/05/2015 at 19:57 #90258Anonymous
InactiveIs the anyway to scrap just for PAL Megadrive boxart?
03/05/2015 at 20:15 #90260Floob
Member[quote=90258]Is the anyway to scrap just for PAL Megadrive boxart?[/quote]
There are 526 PAL boxart covers available from Emumovies if that helps:
http://emumovies.com/forums/index.php/page/portalYou could then use these local images for the scrape.
-
AuthorPosts
- The forum ‘Everything else related to the RetroPie Project’ is closed to new topics and replies.