Category Archives: Uncategorized

Bulk harvesting of newpaper articles from Trove on MacOS 10.9 or 10.10 using Retailer – Instructions

The following is a cut-and-paste from my blog entry on this subject :-)

Conal Tuohy (@conal_tuohy) presented a session at THATCamp Canberra 2014 on Retailer, an interface tool he’s developing to provide the National Library of Australia's Trove service with an Open Archives Initiative Protocol for Metadata Harvesting-compliant interface.

The aim of the session was to get attendees to install Retailer on their laptops and then perform some searches.
It turned out that installing Retailer on the Mac laptops present wasn’t quite as straight-forward as might have been hoped (the linux-heads present had no such problems).

During the session, we worked out a procedure that does work for users of MacOS 10.9 (Mavericks) and MacOS 10.10 (Yosemite). This procedure is explained, step-by-step, below. Please read through these instructions in their entirety before you try to install Retailer on your Mac, so that you don’t make incorrect assumptions about the following steps :-) Please note that I’m going to make the following assumptions:

  1. You haven’t moved your default Downloads location from the default location (ie the Downloads folder in your home directory)
  2. That you know how to open the Applications folder to see the complete list of your installed applications.
  3. That you’ve applied for, and received, a Trove API key. You’re not going to get far without one.

The installation instructions

  1. Start by reading Con’s blog post introducing Retailer. You may not understand all of it, and it’s very Debian-centric, but read it anyway, so you understand what Retailer is and how it works, and why you need to download various pieces of software.
  2. Download the Java Development Kit (JDK) installer. Yes, you want the JDK (which installs a full Java compiler & tools), not the Java Runtime Environment (JRE), which is just a plugin for your web browsers). You also need to ensure that you download Java 8 update 25 or newer; earlier version of the installer weren’t aware of MacOS 10.10 (the latest, greatest), and treated it as 10.1 (ye olde ancient version from the early 2000s), and would refuse to install because they thought your OS was too old. You can download the installer from http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html You need to click on the radio button that says “Accept Licence Agreement” then you can download jdk-8u25-macosx-x64.dmg. Let it put it into your default Downloads location. Do not install it at this time.
  3. Download Apache Tomcat v. 8 from https://tomcat.apache.org/download-80.cgi Look in the section labelled “Binary Distributions”. The first sub-section is labelled “Core”. You should download the tar.gz version. Do not unpack the compressed file at this time.
  4. Download jOAI from http://www.dlese.org/dds/services/joai_software.jsphttp://www.dlese.org/dds/services/joai_software.jsp Click on the “Download from SourceForge” link, and let it put the download in your default downloads folder. Do not unpack the download at this time.
  5. Download Retailer from https://github.com/Conal-Tuohy/Retailer/releases You should click on the green button with the down arrow and “retailer.war” on it. Let it put it in your default download location. Do not do anything with this file at this time.
  6. OK, at this point your default downloads location should contain (please note the version numbers in the following were current at time of writing, your mileage may vary):
    1. apache-tomcat-8.0.14.tar.gz
    2. jdk8u25-macosx-x64.dmg
    3. joai_v3.1.1.3.zip
    4. retailer.war
  7. Now it’s time to visit our friend the command-line. Open up a Terminal window (the Terminal is in the “Utilities” folder inside your Application” folder). Do not close this Terminal window until you’re told it’s safe to do so much later; you’re going to be making a great deal of use of it.
  8. You need to decide where you want to put the apache-tomcat installation. I recommend the /Users/Shared folder. Type
    cd /Users/Shared
    into the terminal window, and hit return.
  9. Now type the following three lines into the Terminal, hitting the return key after you’ve typed each line. The first line unpacks the tomcat server, the second line copies retailer.war to where it needs to be, and the third line extracts oai.war from the archive and puts it where it needs to be.
    tar -xvf ~/Downloads/apache-tomcat-8.0.14.tar.gz --gunzip
    cp ~/Downloads/retailer.war apache-tomcat-8.0.14/webapps/
    unzip -j ~/Downloads/joai_v3.1.1.3.zip joai_v3.1.1.3/oai.war -d apache-tomcat-8.0.14/webapps
    
  10. OK, now you should install the Java Development Kit. Double click on the jdk-8u25-macosx-x64.dmg file to open the disc image, then run the enclosed installer. Once the installation has completed, eject the disc image.
  11. Go back to the Terminal. Type
    java -version
    If all has gone well, you should see something like:

    java version "1.8.0_25"
    Java(TM) SE Runtime Environment (build 1.8.0_25-b17)
    Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)
  12. Now type
    ./apache-tomcat-8.0.14/bin/startup.sh
    If all goes well, you should see something like:

    Using CATALINA_BASE: /Users/Shared/apache-tomcat-8.0.14
    Using CATALINA_HOME: /Users/Shared/apache-tomcat-8.0.14
    Using CATALINA_TMPDIR: /Users/Shared/apache-tomcat-8.0.14/temp
    Using JRE_HOME: /Library/Java/JavaVirtualMachines/jdk1.8.0_25.jdk/Contents/Home
    Using CLASSPATH: /Users/Shared/apache-tomcat-8.0.14/bin/bootstrap.jar:/Users/Shared/apache-tomcat-8.0.14/bin/tomcat-juli.jar
    Tomcat started.
  13. Start your web browser of choice, and point it at:
    http://localhost:8080
    If all goes well, you should see a web page for Apache Tomcat.
  14. When Tomcat started up, it should have unpacked the two .war files into separate directories for you. You need to edit Retailer’s configuration file. Go back to your Terminal window, and type

    open -a TextEdit apache-tomcat-8.0.14/webapps/retailer/WEB-INF/web.xml
    to open the file in TextEdit. Replace the text “INSERT TROVE API KEY HERE” with your Trove API key.
    Now you need to add an additional parameter, to tell Retailer that you’re going to use it to perform Trove searches. Add the following lines just before the <servlet> line:

    <context-param>
    <param-name>xslt</param-name>
    <param-value>trove.xsl</param-value>
    </context-param>

    Save the file and exit TextEdit.

  15. Go back to the Terminal window, and type
    cp apache-tomcat-8.0.14/webapps/retailer/WEB-INF/web.xml /Users/Shared/retailer-config-backup.xml
    This will make a backup of your configuration file outside the Retailer web app; I’ve had my web.xml “restored” to the default a couple of times through no action of my own, so having a backup on hand has been useful.
  16. Point your web browser at:
    http://localhost:8080/oai/admin/harvester.do
    and click on “Add new harvest”.
  17. Fill in the settings as per Con’s blog post. For your first harvest, I suggest you use “search: international cometary explorer”; this doesn’t match too many items (most are in The Canberra Times, post 1954) Note the section “Save files from this harvest:”.
    The default harvest location is
    /Users/Shared/apache-tomcat-8.0.14/webapps/oai/WEB-INF/harvested_records
    You’ll probably want to put these somewhere else, so select “at a location I specify…” and type in a folder path (eg /Users/Shared/harvested_records/ICE ). Click on the “save” button
  18. Click on “All” under “Manual Harvest”. You’ll be asked if you want to replace the results of a previous harvest. Since you haven’t harvested before, your answer should be “OK” (in future, you’ll be better off clicking on the “New” button to add any new results to your pre-existing harvest).
  19. Wait. Depending upon your search parameters, your harvest may take some time. You can keep an eye on it by clicking on “View harvest history and progress” and then occasionally refreshing the page.
  20. Your harvested records will be stored in the location you specified.

Please note that unless you specifically turn it off, the Tomcat server will continue running until your computer is shut down or rebooted; even if you log out and log in as a different user, the Tomcat server will continue running. You can turn it off by typing
./apache-tomcat-8.0.14/bin/startup.sh
into the Terminal window.

Instant cooperative editing of a Wikipedia article – so what’s new?

A quick, last-minute proposal for a ‘play’ session, just to engage some expertise and find out how ‘quick’ cooperative work is (and how good the Wikipedia engine is).

Wikipedia is the pre-eminent example of a wiki – software providing a place for co-operative development of content on a given subject – writing it and changing it. In this session we would

1) select a Wikipedia article to edit

2) individuals or ad hoc groups edit that article (at this point separately i.e. save rather than publish)

3) compare our edits

It would be interesting to see how people combine their expertise in a (non-competitive) way to edit something quickly. The first challenge of this game would be in selecting a suitable article – presumably one for which at least one participant has expert knowledge. Given that ‘camp’ participants in general are self-selected for interest both in humanities and in technology the available fields will be many. Though even this natural assumption of a good starting point could be dumped if we feel like it. But contributions to the development / changes to the article should be made by everyone, not just the main subject-matter expert(s) (if any have been identified).

NB. Should also discuss if time that fact that we would not in fact be replicating the collaborative paradigm of Wikipedia articles, as we will all be in the same room and talking.

(Susan Ford)

Bring your ideas!

The unconference part of THATCamp Canberra kicks off tomorrow morning. Hopefully the workshops today will have inspired some ideas, or raised some new problems you’d like to discuss. If so, propose a session! Either login to the site and add a post, or bring your idea along to the scheduling session.

Remember, you don’t have to be an expert in the topic you propose. Some of the best discussions start with a problem or a question.

And if you’ve got something you’d like to share but don’t think it’s enough for a whole session — remember we’ll also have a series of lightning talks or Speedos after lunch. Show off your latest projects or a favourite website — it’s up to you (as long as it only takes 3 minutes)!

The return of THATCamp Canberra

In 2010 THATCamp Canberra was born — the first THATCamp in Australia, the first in the southern hemisphere. And we discovered there were other people like us, people interested in the intersection of technology and the humanities. We were a community.

In 2011 the legend continued. But with more workshops.

And then silence…

Until now.

This Trovember, THATCamp Canberra returns to your screens in

THATCamp Canberra 2014: The rise of the bots.

or…

george

THATCamp Canberra 2014

31 October — 2 November Trovember

National Library of Australia

We’ll be kicking off Trovember by taking over the 4th floor of the National Library of Australia and turning it into a Digital Humanities discovery space.

There’ll be public workshops on 31 October, offering everyone an opportunity to pick up some new skills.

Then across the weekend 1–2 November we’ll be unscheduling, unpowerpointing, and unconferencing our way through all those questions about Digital Humanities that you always wanted to ask.

It’ll be fun, it’ll be exhausting, it’ll be THATCamp Canberra 2014.

Registration will open soon.