User:Keller999/projects/gamepageupdate

I am working on doing several things for all existing game pages:


 * If it's not already in place, copy/paste a cleaned-up Infobox from Wikipedia into the game page
 * Update templates to use the most recent versions, rather than redirected versions
 * Ensure that all sections that need user updates have a template to copy-paste, or a link to further documentation
 * Update pages to conform to the standard game page -- Problems, Configuration, Version Compatibility, Testing, Gameplay Videos
 * Remove existing non-used config variables if they are present in the page

Scripts
Hooray for programming mini-project! I put together a script that takes the Infobox and summary from Wikipedia, then the existing Dolphin Wiki article, and creates a standards-compliant merge of the two. For an example of what this script does, please see Call of Duty: Black Ops/sandbox. Revision 23655 is before, Revision 23677 is after. I did NOT do any additional touchup between these revs, as I wanted to show exactly what the script was doing.

I have updated the page to match what I believe to be current standard. I intend to start running this script for many pages starting tomorrow evening, so PLEASE review the newest revision of Call of Duty: Black Ops/sandbox to ensure that I'm not missing anything! I would hate to do a bunch of mass changes and then have to go repair them all.

Also, expect this script to have a bug or two until I can get it working and find out the issues with it.

Let me know your thoughts!

--Keller999 13:15, 18 August 2011 (CEST)

Version 1.02 Updated
I started out regenerating every page with the latest Wikipedia information, but quickly realized that was going to be not only really slow, but I didn't think it was right to replace all the hard work that's been done on the game pages. So if a page already has good information, I'm running it through the script without inputting Wikipedia information. Fortunately, the script handles that just fine and still cleans up the article and adds new categories as expected.

I updated the script twice, to v1.02 now. Most of the changes are a result of bugs I've hit as I've been processing. At some point, I may see if I can script this process JUST for article cleanup, not for new Wikipedia info. Would have to be extremely careful and go through a lot of testing, but this is definitely an automate-able process.

Did several game page updates tonight, working through Category:Pages using the outdated Testing template. Once I'm through that, I'll see how the other "Pages with..." categories and looking. This one should at LEAST take me several nights, though. =P

--Keller999 12:48, 20 August 2011 (CEST)

Sports Games games]] category. =P
 * Some feedback...
 * "Gameplay" is a non-standard section, and should be purged rather than specially handled Was confused by the handler for "Gameplay Videos", may still need to watch out for the undesired "Gameplay" section on some pages though.
 * This should be handled properly. The regex looking for "Gameplay Videos" is /^\ *\={1,4}\ *Gameplay\ {1,}Videos\ *\=*$/i, so the key term Videos is required for it to be detected.  Anything in the Gameplay section at the top of the page will get stored in the description, and will be re-used if no Wikipedia data is supplied.  I would consider this a manual cleanup need -- I'm erring on the side of KEEPING data with the script, in case we ever automate it.
 * Some pages have a section at their end for outside links (i.e. Wikipedia, Dolphin forum posts, etc.) not clear that's handled here. Such requires some cleanup as I think different section titles are used on different pages, though I believe it is consistently the last section on the page.
 * Currently anything after Gameplay Videos but before categories/nav will be lumped in with Gameplay Videos. I'm not currently doing any processing on Gameplay Videos at all, so the entry would be preserved exactly as-is.  This may need further investigation.  Do you have a sample page?
 * Regarding "Platforms" in the infobox, generally we're only interested in platforms supported by Dolphin (i.e. GameCube, Wii, Virtual Console systems). I'd actually prefer to omit this line, as it's confusing for titles on multiple platforms which have multiple pages on the Wiki (though it would be good to interlink the platform page pairs where they exist.
 * The script is purging any platform that is not supported by Dolphin, and then wikifying the ones that remain. If we were to agree on a standard for how pages should be named when they are available on more than one platform, it would be easy to add a 'This page is about the Gamecube version of Super Tonka Trucks.  You may also be interested in Super Tonka Trucks (Wii).' to the top of the page, a la Wikipedia style.  Might also make sense to make disambiguation pages, and link to those.  Your thoughts?
 * Auto-categories: I'm not clear why the auto-generated categories need to be specially labeled (is it we don't trust that they are correct?). I think such may be bad for maintenance in future.
 * This is a just-in-case. As not all wikipedia articles are formatted the same, there exists the small chance that non-sense categories could be created.  When I'm processing pages, I am checking to make sure that the category links are linking to existing categories, and the indication line lets me know which ones the script did and which ones were existing.  Another option would be to automatically add the category  and then manually remove them as they're checked with human eyes.  I'm just trying to keep us from having pages in the [[Category: Baseball
 * Kolano 00:45, 21 August 2011 (CEST)
 * Thanks for your feedback, keep em coming. --Keller999 01:10, 21 August 2011 (CEST)

dolphinpageupdate.pl
Runs great on my Linux box, uses no special modules. I know for a fact that system('clear') does NOT work in Windows, but you could probably replace it with system('cls') and get the same effect.

Version History
1.0 1.01 1.02 1.02.1
 * Initial release
 * Image is now always parsed, and will set the size based on platforms detected. Defaults to Wii (300px)
 * If no Wikipedia Infobox is supplied, re-use the one from the Dolphin wiki page
 * If no Infobox is supplied at all, generate a generic one
 * Better supports not providing information (for example, if you don't supply a Wikipedia entry, the description from the Dolphin page will be reused)
 * Added shortcut to just generate a generic template
 * input and platforms are now preferred from the original article, if they exist
 * added some more regex-magic to clean up formatting I found in wikipedia articles
 * whenever the script reads in Infobox params, it now picks them apart and recompiles them so that they always look the same. About the only thing NOT be recompiled now is the Problems and Video sections.
 * instead of using the existing article's Infobox as-is, we now treat it like it came from Wikipedia so that it gets the same processing
 * The "Automatic Categories" note now only shows up if automatic categories are, in fact, generated
 * Virtual Console handling in place. Unfortunately, the script strips out the platform being emulated -- need to work on this.
 * Categories are now always capitalized
 * Mention of any of our supported systems in description text is now automatically Wiki-fied
 * Aligned format shown in testing section comment


 * 1) !/usr/bin/perl

my (@wikipedia, @originalPage); my (@infoboxSection, @descriptionSection, @problemSection, @configurationSection, @versionCompatSection, @testingSection, @videoSection, @categorySection); my @finalResult;

$imageSize = 300; # Assuming Wii by default

my ($image, $savedSizeLine, $savedInputLine, $savedPlatformsLine);

system('clear');
 * 1) INPUT#
 * 1) INPUT#

print "*********************************\n"; print "* Dolphin Wiki Page Update v1.02 *\n"; print "*********************************\n\n";

print "First, copy and paste the game's Wikipedia article from the top to wherever you'd like to end the new description. Zero input is fine.  Enter \'-1\' to indicate the end, or -2 to just get a blank template.\n\n";

while (($line ne "-1") and ($line ne "-2")) { $line = ; chomp($line);
 * 1) Wikipedia input

if (($line ne "-1") and ($line ne "") and ($line ne "-2")) { push (@wikipedia,$line); } }

if ($line eq "-2") { $line = "-1"; } else { $line = ""; }

system('clear');

print "*********************************\n"; print "* Dolphin Wiki Page Update v1.0 *\n"; print "*********************************\n\n";

print "Now, copy and paste the existing Dolphin wiki article to import existing information. Zero input is fine.  Enter \'-1\' to indicate the end.\n\n";

while ($line ne "-1") { $line = ; chomp($line);
 * 1) Dolphin page input

if (($line ne "-1") and ($line ne "")) { push (@originalArticle,$line); } }


 * 1) EXISTING ARTICLE PROCESSING#
 * 1) EXISTING ARTICLE PROCESSING#

$currentSection = "none"; $foundDolphinInfobox = 0;
 * 1) (@infoboxSection, @descriptionSection, @problemSection, @configurationSection, @versionCompatSection, @testingSection, @videoSection, @categorySection);

foreach (@originalArticle) { $newLine = $_; $checkLine = $_;

if   ($newLine =~ /^\{\{\ *Infobox.*/i) { $currentSection = "infobox"; $foundDolphinInfobox = 1; } elsif ($newLine =~ /^\ *\={1,4}\ *Problems\ *\=*$/i) { $currentSection = "problems"; } elsif ($newLine =~ /^\ *\={1,4}\ *Configuration\ *\=*$/i) { $currentSection = "configuration"; } elsif ($newLine =~ /^\ *\={1,4}\ *Version Compatibility\ *\=*$/i) { $currentSection = "versionCompat"; } elsif ($newLine =~ /^\ *\={1,4}\ *Testing\ *\=*$/i) { $currentSection = "testing"; } elsif ($newLine =~ /^\ *\={1,4}\ *Gameplay\ {1,}Videos\ *\=*$/i) { $currentSection = "videos"; } elsif ($newLine =~ /^\ *\[\[Category\:.*\]\]$/i) { $currentSection = "categories"; } elsif ($newLine =~ /^\ *\{\{Navigation\ .*\}\}$/i) { $currentSection = "categories"; }

if ($currentSection eq "infobox") { if ($newLine =~ /^\{\{\ *Infobox.*/i) {  # This is the start of the Infobox, ignore # ignore } elsif ($newLine =~ /^\}\}$/) {  # This is the end of the Infobox, change section and ignore $currentSection = "description"; } else {  # We want to keep anything else in the infobox. This is in case the user didn't give us a wikipedia article to generate a new one from $newLine =~ s/^\|(\S*)\ *=\ *(.*)/\|$1 \= $2/i; push (@infoboxSection, $newLine);

# Now we save the image filename and size to be added onto wikipedia's infobox, if its provided if ($newLine =~ /^\|\ *image\ *=\ *.+/gi) { $image = $newLine; # just keep the filename, and format $image =~ s/\|image\ *\=\ *\[\[(?:File|Image)\ *\:\ *(.*?)(?:\||\]){1,2}.*/$1/i; } elsif ($newLine =~ /^\|\ *size\ *=\ *.+/gi) { $savedSizeLine = $newLine; # saved for later $savedSizeLine =~ s/^\|size\ *=\ *(.*)/\|size \=\ $1/i; } elsif ($newLine =~ /^\|\ *input\ *=\ *.+/gi) { $savedInputLine = $newLine; # saved for later $savedInputLine =~ s/^\|input\ *=\ *(.*)/\|input \=\ $1/i; } elsif ($newLine =~ /^\|\ *platforms{0,1}\ *=\ *.+/gi) { $savedPlatformsLine = $newLine; # saved for later $savedPlatformsLine =~ s/^\|platforms{0,1}\ *=\ *(.*)/\|platforms \=\ $1/i; }		}	} elsif ($currentSection eq "description") { # We're pretty much going to assume that if it's in this section, we want to keep it all as-is push (@descriptionSection, $newLine); } elsif ($currentSection eq "problems") { if ($newLine =~ /^\ *\={1,4}\ *Problems\ *\=*$/i) {  # Matches the section heading, ignore # ignore } elsif ($newLine =~ /^\ *\={1,4}\ *(.*?)\ *\=*\ *$/gm) {  # This is a sub-heading. Reformat it			push (@problemSection, "\=\=\= $1 \=\=\="); } else {  # This is something else in the problem section, like user input. Keep it			push (@problemSection, $newLine); }	} elsif ($currentSection eq "configuration") { if ($newLine =~ /\|\ *.*\=\ *\S+.*$/gi) {  # This is a config entry that has been filled out push (@configurationSection, $newLine); } # Anything besides filled-out config params are not needed, the rest will be regenerated } elsif ($currentSection eq "versionCompat") { if ($newLine =~ /^\{\{VersionCompatibilityVersion\|\s*(.+)\s*\|\s*(.+)\s*(\|\s*(.+)\s*)?\}\}$/gi) {  # Version compat report that's been filled out $versionCompatEntry = "\{\{VersionCompatibilityVersion\|$1\|$2".(($3)?"\|$3":"")."\}\}"; push (@versionCompatSection, $versionCompatEntry); } # Anything besides filled-out compat reports are not needed, the rest will be regenerated } elsif ($currentSection eq "testing") { if ($newLine =~ /^\{\{.+?\|revision\=\s*(.*?)\s*\|os\=\s*(.*?)\s*\|cpu\=\s*(.*?)\s*\|gpu\=\s*(.*?)\s*\|result\=\s*(.*?)\s*(\|tester\=\s*(.*?)\s*)?\}\}/i) { # Matches test reports with all variables set (tester is optional) and dissects for reassembly (muahahaha!) $testResult = "\{\{testing\/entry\|revision\=$1\|OS\=$2\|CPU\=$3\|GPU\=$4\|result\=$5\|tester\=".(($6)?"$6":"")."\}\}"; push (@testingSection, $testResult); }	} elsif ($currentSection eq "videos") { if ($newLine =~ /^\ *\={1,4}\ *.*Videos\ *\=*$/i) {  # Matches the section heading, ignore # ignore } else {  # Keep everything else push (@videoSection, $newLine); }	} elsif ($currentSection eq "categories") {  # We keep all existing categories, and add some new auto-generated ones! if ($newLine =~ /^\[\[Category\:\ *(.*)\ *\]\]/i) {  # This is a category entry push (@categorySection, "\[\[Category:" . $1 . "\]\]"); } elsif ($newLine =~ /^\{\{Navigation\ *(.*)\ *\}\}/i) {  # This is a navigation entry push (@categorySection, "\{\{Navigation " . $1 . "\}\}"); }	}	}


 * 1) WIKIPEDIA PROCESSING#
 * 1) WIKIPEDIA PROCESSING#

$foundWikipediaInfobox = 0; $insideInfobox = 0;

my @autoCategory;

system('clear');

if (not(@wikipedia)) { #if we didn't get anything from Wikipedia, use the existing info from the article push (@wikipedia, ''); push (@wikipedia, @descriptionSection); }

foreach (@wikipedia) { $newLine = $_;

$platforms = "\|platforms \= "; $platformAltered = 0; $skip = 0; $skipUnWiki = 0;

if ($newLine =~ /^\{\{\ *Infobox.*/i) { push (@finalResult, '{{Infobox VG'); $foundWikipediaInfobox = 1; $insideInfobox = 1; $skip = 1; }

#Platforms if ($newLine =~ /\|\ *platforms/gi) {

# If the previous article, had a platforms list, we trust it over the Wikipedia one if ($savedPlatformsLine) { $newLine = $savedPlatformsLine; }

#Wii if ($newLine =~ /.*Wii.*/i) { $platforms .= 'Wii '; if (@autoCategory eq 0) { push (@autoCategory, ''); } push (@autoCategory, ""); $platformAltered = 1; $imageSize = 300; }

#GameCube if ($newLine =~ /GameCube/i) { $platforms .= 'GameCube '; if (@autoCategory eq 0) { push (@autoCategory, ''); } push (@autoCategory, ""); $platformAltered = 1; $imageSize = 300; }

#WiiWare if ($newLine =~ /WiiWare/i) { $platforms .= 'WiiWare '; if (@autoCategory eq 0) { push (@autoCategory, ''); } push (@autoCategory, ""); $platformAltered = 1; $imageSize = 175; }

#Virtual Console #TODO: If Virtual Console is found, we need to include the WHOLE list of platforms without filtering if ($newLine =~ /.*Virtual\ *Console.*/i) { $platforms .= 'Virtual Console '; if (@autoCategory eq 0) { push (@autoCategory, ''); } push (@autoCategory, ""); $platformAltered = 1; $imageSize = 300; }

#TriForce if ($newLine =~ /TriForce/i) { $platforms .= 'Triforce '; if (@autoCategory eq 0) { push (@autoCategory, ''); } push (@autoCategory, ""); $platformAltered = 1; $imageSize = 300; }

push (@finalResult,$platforms); $skip = 1; $skipUnWiki = 1; }

#Purge un-used parameters if ($insideInfobox) { #Replace whatever title Wikipedia is using with our own if ($newLine =~ /^\|\ *title\ *=\ *.+/gi) { $newLine = '|title = '; }

# If the parameter is not in our list, it's ignored if ($newLine =~ /^\|\ *(title|developer|publisher|distributor|director|producer|designer|programmer|artist|composer|license|series|engine|resolution|released|genre|mode|ratings|size|fps|dspcode|dtkadpcm|channeltype|mode|modes)\ *=\ *.{2,}/gi) { $skip = 0; $newLine =~ s/^\|(\S*)\ *=\ *(.*)/\|$1 \= $2/i; } else { $skip = 1; }	}

#Un-wiki-fy everything if ($skipUnWiki eq 0) { $newLine =~ s/\//gi; # remove citations references $newLine =~ s/\{\{vgy\|([0-9]{4})\}\}/$1/gi; # we don't do Template:vgy, removing $newLine =~ s/\{\{cite.*?\}\}//gi; # remove references $newLine =~ s/\[\[(([.]|[^\|])+?)\]\]/$1/g; # un-wiki-fy wiki links in the format link $newLine =~ s/\[\[.+?\|(.+?)\]\]/$1/g; # un-wiki-fy wiki links in the format name }

#Set genre categories if ($newLine =~ /^\|\ *genre/ ne "") { $genreLine = $newLine; $genreLine =~ s/^\|\ *genre\ *\=\ *//i; $genreLine =~ s/(\<.+?\>)|\(|\)/\,/gi; # try to clean this line up a bit @genres = split (/\,|\/,$genreLine); foreach (@genres) { if (@autoCategory eq 0) { push (@autoCategory, ''); } $line = $_;

if ($line ne "") { $line =~ s/^\ *//; $line =~ s/\ +$//; $line =~ s/\ *(game|games)\ *//i; $line = ucfirst $line; push (@autoCategory, "\[\[Category:" . $line . " games\]\]"); }		}	}	#Set mode categories if ($newLine =~ /^\|\ *mode/ ne "") { $modeLine = $newLine; $modeLine =~ s/^\|\ *modes{0,1}\ *\=\ *//i; $modeLine =~ s/(\<.+?\>)|\(|\)/\,/gi; # try to clean this line up a bit @modes = split (/\,|\/,$modeLine); foreach (@modes) { if (@autoCategory eq 0) { push (@autoCategory, ''); } $line = $_;

if ($line ne "") { $line =~ s/(\<.+?\>)|\(|\)/\,/gi; $line =~ s/^\ *//; $line =~ s/\ +$//; $line = ucfirst $line; push (@autoCategory, "\[\[Category:" . $line . " games\]\]"); }		}	}

#New-line if this is the end of the Infobox if ($newLine =~ /^\}\}$/) { if ($savedSizeLine ne "") { push (@finalResult, $savedSizeLine); } # Saved size if ($savedInputLine ne "") { push (@finalResult, $savedInputLine); } # Saved input if ($image ne "") { # Saved image line push (@finalResult, '|image = '); } elsif ($image eq "") { push (@finalResult, '|image = '); }

push (@finalResult, $newLine); push (@finalResult, " "); $skip = 1; $insideInfobox = 0; }

if (($foundWikipediaInfobox eq 1) and ($skip eq 0)) { if ($insideInfobox eq 0) { $newLine =~ s/GameCube/GameCube/i; $newLine =~ s/Nintendo GameCube/GameCube/i; $newLine =~ s/\ Wii\ /Wii/i; $newLine =~ s/WiiWare/WiiWare/i; $newLine =~ s/Virtual Console/Virtual Console/i; $newLine =~ s/Triforce/Triforce/i; }

push (@finalResult,$newLine); } }


 * 1) COMPILE FINAL RESULT#
 * 1) COMPILE FINAL RESULT#


 * 1) At this point, the Infobox and the summary are in place.  Now, time to add our checked and re-formatted content sections.

foreach (@autoCategory) { $fullCat = $_; my $shortCat; $dupe = 0;
 * 1) CATEOGRY PROCESSING
 * 2) Need to combine our categories and make sure there are no duplicates

if (($fullCat =~ /^\[\[Category\:\ *(.*)\ *\]\]/i) or ($fullCat =~ /^\{\{Navigation\ *(.*)\ *\}\}/i)) { $shortCat = $1; }

foreach (@categorySection) { $compareCatLong = $_; my $compareCatShort;

if (($compareCatLong =~ /^\[\[Category\:\ *(.*)\ *\]\]/i) or ($compareCatLong =~ /^\{\{Navigation\ *(.*)\ *\}\}/i)) { $compareCatShort = $1; }

if ($compareCatShort eq $shortCat) { $dupe = 1; }			}

if ($dupe eq 0) { push (@categorySection, $_); }	}

if (($foundWikipediaInfobox eq 1) and ($foundDolphinInfobox eq 1)) { # If a Wikipedia Infobox was received, it was already pushed into @finalResult. # TODO: Wikipedia parsing should push into its own array, which is then added in this section } elsif (($foundWikipediaInfobox eq 0) and ($foundDolphinInfobox eq 0)) { # We didn't get ANY infoboxes, create a generic one push (@finalResult, ''); }
 * 1) INFOBOX PROCESSING

if ((@descriptionSection) and ($foundWikipediaInfobox eq 0)) {  # Only re-use the original description if there was one and we didn't get Wikipedia data push (@finalResult, "\n"); push (@finalResult, @descriptionSection); }
 * 1) DESCRIPTION PROCESSING

push (@finalResult, "\n\=\= Problems \=\="); push (@finalResult, @problemSection);

push (@finalResult, "\n\=\= Configuration \=\="); push (@finalResult, ''); push (@finalResult, "\{\{Config"); push (@finalResult, @configurationSection); push (@finalResult, "\}\}");

push (@finalResult, "\n\=\= Version Compatibility \=\="); push (@finalResult, "\{\{VersionCompatibility\}\}"); push (@finalResult, ''); push (@finalResult, @versionCompatSection); push (@finalResult, "\{\{VersionCompatibilityClose\}\}");

push (@finalResult, "\n\=\= Testing \=\="); push (@finalResult, "\{\{testing\/start\}\}"); push (@finalResult, ''); push (@finalResult, @testingSection); push (@finalResult, "\{\{testing\/end\}\}");

push (@finalResult, "\n\=\= Gameplay Videos \=\="); push (@finalResult, @videoSection);

push (@finalResult, "\n"); push (@finalResult, @categorySection);


 * 1) FINAL OUTPUT#
 * 1) FINAL OUTPUT#

system ('clear');

print "****************\n"; print "* FINAL OUTPUT *\n"; print "****************\n";

foreach (@finalResult) { print $_. "\n"; }

print "\n\n";