Aller au contenu

Photo

The Vault Preservation Project


  • Veuillez vous connecter pour répondre
161 réponses à ce sujet

#26
Eric of Atrophy

Eric of Atrophy
  • Members
  • 4 messages
A thought - would .pdf captures of the pages be workable? In addition to getting the files, of course...

#27
kamal_

kamal_
  • Members
  • 5 246 messages
Why not use a web scraper?

#28
Rolo Kipp

Rolo Kipp
  • Members
  • 2 791 messages
<dancing...>

WooHOO! You're my favorite evilly grinning cat! =)

Tarot Redhand wrote...
Rolo I would love to help but I am on pre-paid inet at the moment. Having looked through your list, I had a thought. As there is a lot of nwn2 stuff have you posted a link in the nwn2 custom content section of these boards. I'm sure that there must be people in there who would love to come on-board. Actually now I think about it, it wouldn't hurt to post a link to this thread in the modules, toolset and scripting forums of the nwn section either.

I know you have bandwidth issues Tarot. I think you're amazing doing anything at all! Maybe Brits do remember how to rock! ;-)

Tchos posted a link on the NwN2 forums. Cross-posting here isn't a bad idea, but my time is *really* limited today ;-/ Anyone want to do it?

Couple of practicalities spring to mind.

  • Do you want the pages saving as a web archive (mht) or just html with seperate pictures etc.?
  • What do you want to do with multi-page comments?

Basically, all the info will have to be cut and pasted when I get the VPP content type built (sometime tomorrow). So whatever method is easiest for you to work with. You can also save it as a PDF if that works for you. The idea is to keep the structure of the Vault entries for compatibility as a mirror going forward, while migrating the content to a viable CMS (Content Management System).

<...like a fool with an invisible cat from cheshire>

#29
Rolo Kipp

Rolo Kipp
  • Members
  • 2 791 messages
<dancing...>

WooHOO! My favorite... er, new friend =)

Eric of Atrophy wrote...
A thought - would .pdf captures of the pages be workable? In addition to getting the files, of course...

What ever would be easiest for you to cut and paste into the form when I get that ready (sometime tomorrow - see response to Tarot Redhand).

<...like a whithered old man>

#30
Rolo Kipp

Rolo Kipp
  • Members
  • 2 791 messages
<just a little...>

kamal_ wrote...
Why not use a web scraper?

(see my response to Virusman) Basically, I couldn't get it working efficiently. After several hours of spidering down evry link (including ads) I had very few pages of actual projects. If someone has a lot better luck (or skill), that would be a great option.

On the otherhand, I'm getting damn good vibes from the peoplerallying 'round. It really demonstrates the great beating heart of this community =)

Can't beat that with a stick!

<...frustrated with wget>

#31
Pstemarie

Pstemarie
  • Members
  • 2 745 messages
I sent Barry a message about this so he can let the CEP Team know. Maybe we'll pull some volunteers from them as well.

Modifié par Pstemarie, 29 septembre 2012 - 02:45 .


#32
Bannor Bloodfist

Bannor Bloodfist
  • Members
  • 924 messages
are you planning on "editing as an editor" the posts/replies in the forum contents?  I know we get huge amounts of "hey, great idea, can't help you" type of responses etc, that just waste space and bandwidth, and also frustrate folks looking for an actual answer.
<ducking all the flying cow chips I see coming my way>
I know, I know, "editing" is a pain, and sometimes considered negative, BUTT (where is that smiley with the shaking derriere anyway?), BUTT, removing extraneous stuff for something important like this could really prove invaluable.  Not sure how to handle that without further discussion, as a community needs to feel like they "belong."   BUTT the extraneous posts make it sometimes very frustrating for someone just looking for an "answer" instead of a discussion about how pretty the MOON is tonight. 

Also, beyond removing extraneous postings, editing for spelling and clarity might also prove useful.  Most especially for important aspects of the thread.  Things like spelling issues where someone accidentally typed .DDA when we ALL know they meant .DDS instead, excpet of course the newbie that is left desperately searching for that danged .DDA file that is causing all the problems and can't find any anywhere.
<lift my head to see if the incoming is done>
<<smack>>
<spluttering, dang it, knew I should have kept my mouth shut!>

If this "group" of folks copying this stuff and formalizing it into a final format wishes, I will offer my services as an editor prior to final posting wherever you all decide to post the online version.  Provided of course, that whatever format/document style you decide upon is something that I can fairly easily figure out and use.  (I hate .wiki as .wiki may be powerful, but takes a HUGE amount of extra code typing to make it work.)  BBCODE comes to mind as a fairly simple, yet still very powerful way to offer formatting, assuming (yes, I know, A S S out of me) of course that the final product is intended as a web available resource, not a .PDF which requires a special reader and is a bit of a pain to create as it typically also requires a paid for application.  (Remember, I am in hospital, waiting for medicade to decide whether or not I get to keep my leg or they will just cut it off since I don't have insurance, a job, nor any fut.ure hope of any such). 

Anyway, I have time, and some limited skill, and I can read/write with the best 4th grader, so I will help if I can.
<after finally getting my mouth clear, I peak over the top again, and notice something that looks like an automated firing mechanism aimed directly at me but just can't seem to duck down fast enough as it fires again>

Edit: Sorry Rolo, attempting to attempt to follow your weird double or is it triple talking style with very limited success. Posted Image

#33
NWN_baba yaga

NWN_baba yaga
  • Members
  • 1 232 messages
Well i had something very easy in mind that is like this process:
1: Grab the creature hak + authors information + a screenshot
2: Create an entry on our new cc page with a title like "berzerk ogre by "athors name"
3: upload the hak/ rar or zip and just paste the original authors information + a screenshot into the entry page.

So every content has one page for it´s own with a title like on the vault.

Thats it basically!

p.s.
and a link to the original vault entry of course!!!

pp.s.
and maybe we state one single time somewhere on the mainpage that we do this w/o the peoples explicit permission and just for the safety of their availability if something will happen (what we dont hope...) to the vaults entry one time in the future.

Modifié par NWN_baba yaga, 29 septembre 2012 - 04:42 .


#34
Rolo Kipp

Rolo Kipp
  • Members
  • 2 791 messages
<adding in all kinds...>

*shakes head sadly at BB's pitiful attempt* My dear... er, beast. It's really quite simple. <for him>
I say things and the bird... <that's lady stormshadow, wizard. since you're being all patronizing and everything>
...whom I call Bother for very good reason... <keep it up, old man. *i* know how to operate the chip-chucker>
...comments. Occasionally, Cestus Dei... <stumpy & grumpy to his... friends>
...also er, chips in. :-) <graceful, boss. smooth, even>

So you see, extraneous text can add flavor, ambiance, humor and interest. :-) <eye-strain>
That, too.

In regards to the VPP, the framework I'm using (drupal) has great search functions and much more editing control for authors over their own posts. That said, I understand what you are talking about.

I have made several of the volunteers "Moderators" (I have no problem making moderators - You're one =) so they *can* do editing if it really warrants it - as in the case of obvious misspellings, etc.

OTOH, any editing done is at the discretion of the author and, if necessary, editing can be reversed (all revisions are stored and can be rolled back at need).

On the third hand, you produce great tutorials and are a constant source of reference. If you volunteer to er, "massage" posts and authors agree, the community will benefit. And you *do* have some time... though the reason for that still angers me :-/ <easy, boss>
Heh. Right.

Short: You *have* the *power*, he-beast! Now you just have to exercise it responsibly, eh?

<...of extraneous stuff for flavor>

#35
Rolo Kipp

Rolo Kipp
  • Members
  • 2 791 messages
<scribbling legalese...>

NWN_baba yaga wrote...
Well i had something very easy in mind that is like this process:
1: Grab the creature hak + authors information + a screenshot
2: Create an entry on our new cc page with a title like "berzerk ogre by "athors name"
3: upload the hak/ rar or zip and just paste the original authors information + a screenshot into the entry page.

So every content has one page for it´s own with a title like on the vault.

The form entry will mirror the Vaults current form. The same info will need to be entered (including Author). The idea is to mirror, to *preserve* the Vault, so it will be very much the same.

p.s.
and a link to the original vault entry of course!!!

pp.s.
and maybe we state one single time somewhere on the mainpage that we do this w/o the peoples explicit permission and just for the safety of their availability if something will happen (what we dont hope...) to the vaults entry one time in the future.

I should have pointed out earlier that I *am* an admin on the Vault (until they give me the boot :-P ) and I have no intention of abusing that trust. This entire project is not to take anything away from the Vault, but to safeguard it. I consider all of the content we are preserving to retain the rights granted and reserved by both the original author and the Vault. That will be posted prominantly in the VPP.

<...that makes little sense>

#36
OldTimeRadio

OldTimeRadio
  • Members
  • 1 400 messages
If there are any coders out there, I just found that the source code for the software I used to mirror off the old Bioware forums is available here.  A custom build which captured metadata from the various fields specific to the vault entries (such as for modules) could go a long way to making full searching a lot easier.  Not to mention the default functionality of preserving multiple pages of comments, some of which hold useful information about how (whatever) is used but don't appear in the main description.

Modifié par OldTimeRadio, 29 septembre 2012 - 05:51 .


#37
Michael DarkAngel

Michael DarkAngel
  • Members
  • 368 messages
Thanks OTR, I'll take a look at it next chance I get some free time.  It's a birthday party weekend, places to go, people to see, songs to sing and little, valuable free time for me :(

Posted Image
  MDA

#38
Bannor Bloodfist

Bannor Bloodfist
  • Members
  • 924 messages

Rolo Kipp wrote...
<snip>

I have made several of the volunteers "Moderators" (I have no problem making moderators - You're one =) so they *can* do editing if it really warrants it - as in the case of obvious misspellings, etc.



Interesting, since I can't seem to remember ever having an account there?

#39
meaglyn

meaglyn
  • Members
  • 807 messages
I think it would be really valuable to have all this data in another location.

That said, doing it all by hand is the path to madness. This kind of thing is what computers
are for (and NWN of course). All of us have more interesting things we could be doing than
tediously and error-proneously copying dynamic web content from one form to another.

If we could get the database and access to the content storage it would be a matter of teaching
the new web interface to query and display it. If that's not possible a tool could be written to
scrape the pages and dump to a storage directory and xml or simpler key=value metadata file.

95% of the links and such on those pages is noise. I'm not surprised wget mirror choked. The interesting bits are more or less between the "<!-- Output Meta Data Rows -->" and the "<!-------Start Network Connections Box -->" comments. That significantly reduces the parsing needed and the number of links to deal with.

The forum down loader OTR mentioned might do most of that already. It may be a matter of going
back and getting the actual downloads. That may be the place to start...

If I have time I can try to come up with something. I'm more of a systems person than a web
programmer but perl is always fun...

Cheers,

Meaglyn

#40
Tarot Redhand

Tarot Redhand
  • Members
  • 2 674 messages
As you requested Rolo, I have done it. At the risk of being accused of spamming, I have posted a short message and a link to this forum in the modules, toolset, scripting and persistant worlds thread areas. Hope it helps.

TR

#41
acomputerdood

acomputerdood
  • Members
  • 219 messages
apologies, that i've not read through the whole thread yet. it's pretty late. just wanted to give a few quick thoughts:

started looking at the "scripts" section, and noticed they followed the pattern:

http://nwvault.ign.c....Detail&id=3795

where 3795 is the entry number. changing that (up or down) will yield a new entry.

so it should be a simple case to just enumerate from 1 ... n, curl-ing each page into a file.

i started work on a perl script to then scan the curl'd file until it got to the "Files" section, and i parsed out the download links. i've not tested it much, but it looks like it's gonna work.


#!/usr/bin/perl


#$page = `wget http://nwvault.ign.com/View.php?view=Scripts.Detail\\\\&id=3794`;
$page = `curl -s http://nwvault.ign.com/View.php?view=Scripts.Detail\\\\&id=3794`;

@lines = split /\\n/, $page;

$files = 0;
foreach $l (@lines){
        if($l =~ /<a name="Files"><\\/a>Files/){
                $files = 1;
        }
#       if($l =~ /Post New Comments/){
        if($l =~ /<\\/TABLE>/){
                $files = 0;
        }
        next if !$files;


        if($l =~ /<a href="(fms\\/Download\\.php.*?)"/){
#               print "$1\\n";
#               `wget http://nwvault.ign.com/$1`;
                print "wget http://nwvault.ign.com/$1\\n";
        }
}

it's VERY unfinished, but it's too late for me to work on it anymore tonight.

it should be a simple exercise of saving the files to whatever directory structure/format rolo set up in his post.


i'll give it another go tomorrow and try to get a "final version" of the script done, unless somebody takes the effort over.

#42
AndarianTD

AndarianTD
  • Members
  • 701 messages
A quick observation on this effort: you may need to be careful to distinguish between content that's hosted on the Vault, and content that's linked to on the Vault but actually hosted on another site. Mirroring the links for the latter should be fine, but mirroring the content could be problematic and require the hosting site's permission. (In that case you would not only be mirroring the Vault, but also mirroring some content from other sites as well.) Some of us host our work on our own sites and only link to it from the Vault, and wouldn't want it to be re-hosted anywhere else.

#43
Just a ghost

Just a ghost
  • Members
  • 146 messages
I think acomputerdood's approach is the way to go.You really don't want to do this by hand. I can imagine you may want to put some filter on the output, though. I don't think we need all those uncommented module packs in a central archive.

#44
meaglyn

meaglyn
  • Members
  • 807 messages
Computerdude's on the right track. That's what I was looking at doing. Once you get the interesting bits you can feed it to something that understands html. Then it should be easier to capture all the metadata as well. The comments, for multipage comment lists will take a bit more...

Rolo's objective was to get more than just the files themselves.

#45
Tarot Redhand

Tarot Redhand
  • Members
  • 2 674 messages
I looked into the raw html of a single vault page and went :blink: :pinched::crying::(:sick: as it is tables within tables within tables. So I am glad others with more recent experience of web programming are looking at the problem.

TR

#46
Rolo Kipp

Rolo Kipp
  • Members
  • 2 791 messages
<cutting off...>

AndarianTD wrote...
A quick observation on this effort: you may need to be careful to distinguish between content that's hosted on the Vault, and content that's linked to on the Vault but actually hosted on another site. 
...
Some of us host our work on our own sites and only link to it from the Vault, and wouldn't want it to be re-hosted anywhere else.

Absolutely. In addition, there are many projects that are quite dead (my dear old friend Ratbert #CP# being a good, and rather sorrowful example). In this case, what is the value of the project being included at all, except for historical purposes (it does still link to the Mad Lemur's seldom used blog... *sigh* )? I *do* still wish to include them, but... Perhaps I can flag them for research to try to recover the content at leisure. Flags are easy ;-)

There is also the case where several projects have links to other projects (like the haks required by a series of mods). In that case also the *links* should be preserved and *not* followed.

<...the branches that lead to infinity>

#47
icywind1980

icywind1980
  • Members
  • 310 messages
I followed the link from the PW section. I can do very little to help as my HD is currently sitting at 8gb free space and crashes out daily, but I wanted to offer a sincere and hearty thanks to everyone involved. This has been and always will be my favorite game and seeing the community come together like this makes me feel proud to be a part of it. Kudos everyone!

#48
Tarot Redhand

Tarot Redhand
  • Members
  • 2 674 messages
It's just a thought but would not a grab pages and download the downloads now and worry about extracting the data from the pages later, be the way to go? At least that way the stuff we are concerned about is preserved even if by not being immediately processed it takes up more room.

On the topic of extracting the information. Using Nvu it would appear that the information required can be split into 2 sections. The first is a table that contains 2 sub-tables. The first of these sub-tables contains the details of the submission and the second sub-table contains the submission itself. The second is table containing 0 to many sub-tables which host the comments.

TR

Modifié par Tarot Redhand, 30 septembre 2012 - 09:55 .


#49
Rolo Kipp

Rolo Kipp
  • Members
  • 2 791 messages
<nodding and...>

How about this then, put in the required fields (title, author and a couple others) and upload the captured page as an archive with the file. That is one file is a .7z of the snapshotted page and the other files are the downloads...? That way the stuff is there, just not convenient... as you said.

Damn, I forgot the original link field for several of the content types :-P

Edit: Did two samples of the textures. Ugly but working.

<...waving from the hole he's dug himself into>

Modifié par Rolo Kipp, 01 octobre 2012 - 12:15 .


#50
Rolo Kipp

Rolo Kipp
  • Members
  • 2 791 messages
 <coming up...>

Ok, so far I've built (bare bones, no tweaks to display or anything cool :-P ):
  • NwN Character
  • NwN Screenshots
  • NwN Creatures
  • NwN Models
  • NwN Other
  • NwN Textures
If your category is listed, try uploading stuff and give me feedback on just how yucky it is :-P

Note: Keeping the Vault's hierarchy, the add content menu is something like:

Add Content -> VPP -> NwN -> Community (nothing there)
                          -> Files -> Characters
                                   -> Creatures
                                   -> Models
Etc. Hover over the menu to the left and explore :-P

<...for air...and java>

Modifié par Rolo Kipp, 30 septembre 2012 - 10:21 .