Search This Blog

27 May 2010

XPath

I spent quite a bit of time trying to find an easy way to do XPath on my NexusOne.  While 2.2 adds that as part of the API (why oh why can't we have the entire JDK?), it does not handle malformed HTML very well (like Google's homepage including an unclosed <link> tag).  Well, that and it isn't been OTA'd to my device yet.

One thing I really did *not* want to do is resort to walking the DOM tree or writing 50 times more code to handle it via SAX.  Yes, I understand there are a few classes available in Android 2.1; but using them reminds me of the pre-JAXB days (which I have no desire to repeat).

For this test, we'll use HtmlCleaner.  It does OK with some XPath and sanitizes the page we are searching...

Start by creating a basic project:

F:\work> mvn archetype:generate -DarchetypeCatalog=http://kallisti.eoti.org:8081/content/repositories/snapshots/archetype-catalog.xml


choose the galatea-archetype plugin
groupId: org.eoti.android
artifactId: XPathTest
version: 1.0-SNAPSHOT


Assuming you have your emulator running...


F:\work> cd XPathTest
F:\work\XPathTest> mvn clean install

Add this repository to your pom.xml (or Nexus):
    <repositories>
        <repository>
            <id>xwiki</id>
            <name>xwiki</name>
            <url>http://maven.xwiki.org/externals</url>
        </repository>
    </repositories>
   
Add this dependency to your pom.xml:
        <dependency>
            <groupId>net.sourceforge.htmlcleaner</groupId>
            <artifactId>htmlcleaner</artifactId>
            version>2.1</version>
        </dependency>
   

Add this to your AndroidManifest.xml


            <uses-permission android:name="android.permission.INTERNET" />
           
Update your activity (src\main\java\org\eoti\android\XPathTestActivity.java in my case):
public class XPathTestActivity extends Activity {
    private static String TAG = "XPathTest";
    private static String TEST_URL = "http://www.google.com/profiles/malachid";
    private static String XPATH_GUSER = "//div[@class='g-unit']/h1/span[@class='fn']/text()";
    private static String XPATH_LOCATION = "//span[@class='adr']/text()";

    /**
     * Called when the activity is first created.
     */
    @Override
    public void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.main);

        try{
            HtmlCleaner cleaner = new HtmlCleaner();
            CleanerProperties props = cleaner.getProperties();
            props.setAllowHtmlInsideAttributes(true);
            props.setAllowMultiWordAttributes(true);
            props.setRecognizeUnicodeChars(true);
            props.setOmitComments(true);
            URL url = new URL(TEST_URL);
            URLConnection conn = url.openConnection();
            TagNode node = cleaner.clean(new InputStreamReader(conn.getInputStream()));
            Log.v(TAG, node.evaluateXPath(XPATH_GUSER)[0] + " is from " + node.evaluateXPath(XPATH_LOCATION)[0]);
        }catch(Exception e){
            Log.e(TAG, "Failed", e);
        }
    }
}


Redeploy (mvn clean install). Make sure you are watching 'adb logcat' for messages and launch the app.

You should see something like: V/XPathTest(  271): Malachi de Ælfweald is from Beaverton, OR

No comments:

Post a Comment