Tagged: HttpUnit Toggle Comment Threads | Keyboard Shortcuts

  • Subinkrishna Gopi 11:03 am on January 21, 2009 Permalink |
    Tags: HttpUnit, , , ,   

    Disabling HttpUnit script loading 

    Those who work with HttpUnit (for testing or development needs) might have faced issues with scripts (and its loading). This situation is more obvious if we are working on a third party web site which will be loading numerous script files from different domains (or sub-domains), sometimes thru a secured channel. The one possible exception is:

    java.lang.RuntimeException: Error loading included script: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target

    Even if we are setting the following,

    HttpUnitOptions.setScriptingEnabled(false);
    HttpUnitOptions.setExceptionsThrownOnScriptError(false);
    

    it will not avoid the above situation as the problem is with the file itself (either an invalid URL or an invalid certificate).

    One possible solution is to modify the HttpUnit code so that it can avoid loading the script itself. The right place will be com.meterware.httpunit.ParsedHTML.getScript(). Change the method body to:

    private String getScript( Node scriptNode )
    {
       String scriptLocation = NodeUtils.getNodeAttribute
          (scriptNode, "src", null);
       if (null != scriptLocation)
          System.out.println("Blocking script from: " +
             scriptLocation);
    
       return (null);
    }
    

    This is how the actual code look like:

    private String getScript(Node scriptNode)
    {
       String scriptLocation = NodeUtils.getNodeAttribute
          (scriptNode, "src", null);
    
       if (scriptLocation == null)
       {
          return NodeUtils.asText(scriptNode.getChildNodes());
       }
       else
       {
          try
          {
             return getIncludedScript(scriptLocation);
          }
          catch (IOException e)
          {
             throw new RuntimeException("Error loading included script: "
                 + e);
          }
       }
       return (null);
    }
    

    We were working on a crawler where we used HttpUnit as a supporting API, and JavaScript was never an important factor for our process. And because of this tweak we were able to improve the performance by 4 times approx.

    Note:
    Be very careful while modifying the source code. And do it at your own risk.

     
  • Subinkrishna Gopi 10:38 am on July 11, 2007 Permalink |
    Tags: HttpUnit   

    When HttpUnit failed… 

    I have been working on HttpUnit for almost two years. The funniest part is, we have used it as a development tool than a testing tool. And for your information, its an awesome tool. I really mean it.

    But even HttpUnit failed quite a few times too. I will tell you when. When I was parsing a website, I needed to submit a WebForm which contained a text filed with name “name ;  ; ; ;“. Seems quite funny right? Even I felt the same. But whenever HttpUnit parsed that form, it replaced those characters with blank spaces. I think it has to. Will tell you why.

    ; – special character for carriage return
    ; – special character for line feed
    ; – special character for horizontal tab

    Check http://www.asciitable.com

    And when I submitted the form, HttpUnit sent the form data as “name++++“, server was not ready to accept it. The server showed us its protest by always sending “The server didn’t find the page you had requested”  page. WTF !!!!

    I had tried so many thinks & things to fix it (of course using HttpUnit itself), but nothing worked out. I still don’t know is there any direct way to solve it.

    How I solved it 🙂

    I tried something what I call “API bridging” to solve this problem. 

    Steps:

    • Get the corresponding page response as a WebResponse object
    • Generate the form action URL (You will get it from WebResponse & WebForm objects)
    • Prepare the HTTP POST request string for the form

      (Do it by iterating thru the WebForm object’s parameter list)

    • Get all the cookies from the WebConversation object & create a cookie header string, so that we can set it as a request header.
    • Make a HttpUrlConnection object to the URL
    • Set all the HTTP request headers such as “Content-Length”, “User-Agent” etc
    • Open the output stream of the connection and write the post string
    • Accept the response & get all the cookies set by the server.
    • Set all the cookies to the WebConversation obect using the wc.putCookie(<name>, <value>);

    – Continue with HttpUnit

    All this stuff worked b’ coz of the fact the HTTP protocol is state-less.

     
c
Compose new post
j
Next post/Next comment
k
Previous post/Previous comment
r
Reply
e
Edit
o
Show/Hide comments
t
Go to top
l
Go to login
h
Show/Hide help
shift + esc
Cancel