My Project Blog


Setting up Heritrix (ver. 1.14.1) src with Maven (ver. 1.0.2) on Ubuntu Hardy Heron
September 10, 2008, 5:40 am
Filed under: Uncategorized | Tags: , , , , , , , ,

I’m going to keep a log of how I set up Heritrix on Linux… in case I need to do it again.

Ingredients:
1) Heritrix (ver. 1.14.1)
2) Maven (ver. 1.0.2) from Link to Maven Archives
3) JDK (version 1.5, Maven 1.0.2 requirement) from synaptic, search for sun-java5-jdk.
4) subversion (ver. 1.4.6 was used but i think any would work, pretty stable stuff)

Instructions:
Getting Maven Ready
1) Get the correct version of Maven. Don’t get the latest because this version of Heritrix is a Maven 1.x project (Again, latest is not the greatest here :) )
2) tar zxvf .. ... ... tar.gz
to unzip and install Maven.
3)Set up JAVA_HOME environment variable, by
export JAVA_HOME=/usr/lib/jvm/java, blah blah (your directory may differ)
4)Set up MAVEN_HOME environment variable, by
export MAVEN_HOME=/your/maven/home
5)Set up PATH to point to grab maven binary
export PATH=$PATH:$MAVEN_HOME/bin

6)Check to see if you have maven set by
maven -v

This is what I get…
__ __
| \/ |__ _Apache__ ___
| |\/| / _` \ V / -_) ‘ \ ~ intelligent projects ~
|_| |_\__,_|\_/\___|_||_| v. 1.0.2

(Sry Maven folks, WordPress murdered your ascii picture :( !).

Getting Heritrix and First Build!
1) Need to use subversion to checkout the latest code.
svn co https://archive-crawler.svn.sourceforge.net/svnroot/archive-crawler/branches/heritrix_1_4_0/heritrix/
2) If you try to build Heritrix now, you will get errors. This is because it won’t be able to fetch important files from maven. Need to update the project.properties file to reflect this change. Add the following lines…
maven.repo.remote=http://repository.atlassian.com,http://mirrors.ibiblio.org/pub/mirrors/maven,http://www.ibiblio.org/maven
3) If you try to build Heritrix now, it will go a bit further, but it will fail some tests involving a maven plugin, sdocbook. According to the manual, it is optional because it is only used if you want to write documentation. However, for some reason, some tests required it. For completions sake, i got the plug-in.
Grab it from the following link: Version 1.4.1 and then tar zxvf it. Then move the jar file to the $MAVEN_HOME/plugins directory. Now, of course, there is some OTHER dependencies. You need to grab a jimi.jar file for the plugin to work.
Grab it here. Unzip it then move a file from it (called JimiProClasses.zip) into the repository directory for maven in your root folder and rename the file jimi-1.0.jar. The repository directory that contains this jar file should be /.maven/repository/jimi/jars.
4) Now build and test Heritrix by doing a
maven dist.

Da Da! Never a walk in the park with open-source SW.

Useful References:
1)Heritrix’s (well written) Developers’ Manual
2)Maven 1 repository move


No Comments Yet so far
Leave a comment



Leave a comment
Line and paragraph breaks automatic, e-mail address never displayed, HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <pre> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>