Like many of you, I have been spending a considerable amount of time reclaiming my data and spaces online. A lot of that is focused on downloading and archiving my data (especially blog posts, reviews, comments, etc) from a myriad of websites I have used through the years. Well, decades now. I don’t know if this post will be of interest to anyone, but it will be a record (Jim Groom-style) for me – and hopefully someone will stumble across a couple of problems I have run into and have some suggestions for me.
So this all started several years (or more) ago when I ran into the idea of the IndieWeb and realized I didn’t have to lose data to dying websites like MySpace and Jaiku. I could take a proactive approach by collecting my information and storing it on my own (and the awesome folks at Reclaim Hosting make it super easy in many ways). So I started downloading data from various websites, and importing blog or informational posts from any website that I could. Then I realized two email addresses I used for a lot of websites through the years could possibly die someday, so I started going back to where ever I could find those email addresses and reclaimed access to those services. Which was mostly on a bunch of dead or dying websites, but it uncovered more posts and blogs to archive. Then several unexpected unfortunate events happened to me last year and this year. Finding out my job in academia was being eliminated caused me to comb through 15 years of signing up for all kinds of services and journals and all kinds of things to discover even more stuff to reclaim. Then an unexpected divorce also caused me to have to comb through even more stuff online, causing even more stuff to reclaim to come to light. So here are the basics of what I found out.
Downloading your data from websites is usually the most straight forward process, as long as the site offers a data download option or an export feature for your posts. One thing I have noticed is that the data that is downloaded does change from time to time – for instance, a good friend of mine suddenly died a few years ago and his family deleted all of his online accounts. So now there are posts on Facebook where he and I had long conversations that just look like I am arguing with myself. So instead of deleting previous data downloads with new, fresh downloads – I keep an archive of past exports. Did a past one capture those conversations that are now one-sided? I don’t know, but I should go look. I really hope so.
Then there were things like Jaiku that are long gone, but I never got a chance to download the data. Bummer. However, thanks to the work of the Internet Archive I did find a lot of my Jaiku posts in their archives. So I decided to copy the html and stitch together my own archive of some my jaikus – including a few comment that I could also find and some pages from the Jaiku site just for nostalgia. Clicking on any avatar on that page leads to me. Some of the other links work as well. But this little archive shows that even 12 years ago Jaiku was way more interesting than Twitter. I also archived as much as I could of the EduGeek Journal Jaiku channel as well. Interesting that this is where Twitter Hashtags directly got the # from (even though technically it came from older sources, it was Jaiku’s Channels that made Twitter users start using the # to mimic the function).
One site that is sadly long gone is MySpace. I can’t even sign in or reset my password anymore (probably hacked a long time ago). But the important data is gone – it seems MySpace lost or deleted most of it. I should have captured the html and custom CSS I worked for hours on way back in the day. But even the mighty Internet Archive didn’t capture any of that. However, after digging around some, I found this form to submit a support ticket, and then a GitHub project that has Tom’s MySpace profile html. And then searching through my files at home, of course I kept a copy of the CSS I created to customize my profile. So I might have to just make up a bunch of stuff about myself to replace the stuff about Tom, but I could actually have an archive of all of the time I wasted…. errr… “invested” in learning how to hack a custom MySpace profile.
Of course, the biggest project has been capturing my blogs. I thought I only had a handful of Blogger sites to import to WordPress, but then I kept digging up more. WordPress sites for several grad classes. Old conference blogs. Old work blogs. Some attempts to use Known. Even a short attempt at Tumblr. So many short blogs. So I imported all that I could into one WordPress blog archive on my own site. All of that is easy. Some of the blogs that I liked I even created html archives of the layout. The one that I am having trouble with is Instagram. I would love to import all of my Instagram posts to WordPress blog with a template like the one I set up for my artwork gallery. I found some suggestions online for how to do that, but they only import the last 20 entries. I can import the rest one by one using copy and paste if I want to, but hopefully someone will come up with a way to automate it. Any ideas?
Of course, some of these blogs were older WordPress installations on my website, while others were attached to classes like the HumanMOOC that only make sense as a complete package. But its a pain to keep over a dozen WordPress installations updated and working. So I decided it was time to archive some sites as they are as html exports and shut down the WordPress version. The problem is, I really wanted a stand alone html export that could be moved to any folder or website and still work. The most recommended WordPress html export tool that I found when I started a few years ago (WP Static) doesn’t really work well for the relative links needed to do that. I could export to a defined folder on my site and it would hard code those specific links into every page, but then I can’t move it around (the Jaiku archive I created above can work any where I put it, or even offline if needed). WP Static does have a relative link function, but it keep messing up the number of “../”‘s you need to make links work. Half the time, it just gets lost and serves up a blank page. Even a quick search and replace on a page doesn’t fix it.
So I looked around at other options, none worked any different. Even desktop based site suckers well… they suck too much. What I mean is, if there is a link to another website on your site, it will try to suck that entire site as well! Finally, I found Simply Static. It has a relative link function as well, and it doesn’t work right out of the download either. But it only messes up in one way, and a quick find and replace on a page makes your archived page spring to life. The only problem is that because of the layers upon layers of sub directories that WordPress uses, you have to do a find and replace per page to get the correct number of “../”‘s right. So it’s a quick process on simple sites… but a longer process on more complex sites. But it works in the end. I have a standalone html archive of the HumanMOOC that I helped to co-design and co-teach that will work where ever I put it. A bonus feature is that I got to finally fix some of the things that I didn’t have time to get right in the WordPress version. The activity bank images never worked right, but now I can have an image per activity. The blog hub now has individual avatars per person so you can see who posted what. The DALMOOC, OpenEdMOOC, and Pivot MOOC should be coming soon. ish.
Then there were other random things I needed to archive. All of my Storify archives, which neatly exported to html, but are slowly dying out as people close accounts, or Twitter changes how they display pictures, or a hundred other reasons. Is it worth going through each one and grabbing what is left? Several chatbots I created are still kicking around, but also falling apart as I need to apparently update the code to not point to the dead LINK Lab website. Add that one to my massive to-do list. Even an old OLC presentation that I did “choose your own presentation topic” style with the audience.
Oh, and going way back there are a good number of html websites I designed 1999-2005 that I am still keeping around for memory sake. Most are too embarrassing to link to, but the one I like the most is the one that I mention in several bios – the website I created to help students when I was an 8th grade Science teacher: Mr. Crosslin’s Class Online. Also my first serious attempt at putting course work online.
Speaking of old sites, I have so many sites that I built in Flash that I have been trying to figure out what to do with for years. I can still open Flash on an ancient computer I have, so I have exported all of my Flash files to image and/or movie files. But some are still a bit complex for that, and even the less complex ones are no fun to watch as a movie. Is there a way to convert FLA files to HTML5? I have looked a little and didn’t like what I found. If anyone knows of a way, even if I have to pay, please let me know.
So I thought for a while that my archives of several websites I created with Flash would be limited to still images of what happened. But then I came across Ruffle. You drop a couple of files on your site, and a few lines of code on your page, and – BAM! – your Flash files start magically working. So now I can get the old U Monthly Magazine archives back online (a lot is still missing, but I will dig it out eventually). My favorite Flash website I (mostly) created is the E-SPY X-500 – a goofy attempt at an educational game that I created for a company that I worked for after teaching. Go ahead and kick around in there – not every thing works (yet, but on the list), but see if you can find the hidden Easter eggs. You can log in with any username or password over three characters. It has been totally disconnected from the MySQL database, so no data is collected. I should point out that the cartoon characters you will see once inside were not drawn by me, but our staff artist at the time Samuel Torres.
Of course, I have also be going through and making sure that my main portfolio is up to date, because it really serves as an archive of papers, presentations, videos, artwork, and other projects as well. I have also been working on things like a games archive. All kinds of random attempts to create games are in there, including some of the ones I mentioned above (I still need to create a Twine environment for the This Picture app game idea). Oh, and somewhere in the middle of all of this, I am also trying to work with my Mom to create a tribute site to my Grandfather’s artwork, since he sold paintings and worked as a staff artist for a newspaper in a major city.
Changing over email address is quite the chore. I had to look for old accounts with two old email addresses in them, and then I had to go through 15 years of work emails to see which accounts I would want to keep after leaving (mostly access to journals I published in, review accounts, professional website accounts, and others like that). Most places were pretty straight forward. Some places were not. It took a lot of work to get control of my Flickr account. I still can’t get control of my MySpace account – does their support team still even exist? A lot of these accounts I will probably shut down. But I was surprised at how haphazard I was in using whatever email address to sign up for whatever account. At least its all back with me again. And, of course, trying to separate 20 years of joint accounts from my former marriage was a huge undertaking. Some places make it nearly impossible to do that. But then I had to go back through all of these accounts I got back or websites I created and update bio listings about family where needed.
So, even though there isn’t a light at the end of tunnel, I know that a sighting of that light should come soon. Despite all that is left, I still feel that I have cut back my online presence to a streamlined, manageable amount. Someday I will be shutting down some massive websites like this one, so I hope to find even better ways to convert WordPress to html as well. Which I guess I will… give to my son some day? Donate to a museum? Will be people even care about archives like this in a few decades? I guess I will figure that out someday…
Matt is currently an Instructional Designer II at Orbis Education and a Part-Time Instructor at the University of Texas Rio Grande Valley. Previously he worked as a Learning Innovation Researcher with the UT Arlington LINK Research Lab. His work focuses on learning theory, Heutagogy, and learner agency. Matt holds a Ph.D. in Learning Technologies from the University of North Texas, a Master of Education in Educational Technology from UT Brownsville, and a Bachelors of Science in Education from Baylor University. His research interests include instructional design, learning pathways, sociocultural theory, heutagogy, virtual reality, and open networked learning. He has a background in instructional design and teaching at both the secondary and university levels and has been an active blogger and conference presenter. He also enjoys networking and collaborative efforts involving faculty, students, administration, and anyone involved in the education process.