Wednesday, October 2, 2013

Detecting URLs/Links Clicked on a Webpage

This has more to do with WebView in Windows Store APIs but it could apply to other situations since its just javascript. I wanted to find a way to detect what page the user navigates to, and unfortunately WebView does not have that functionality. So I had to resort to injecting javascript...

Now I'm not a javascript expert, so if there are any better ways to do this please let me know :)


Instead of changing each and every link and adding my own custom handlers, I decided to override the onclick event in the body. Whenever the user clicks anywhere on the page, my custom onclick handler will receive the event and handles it accordingly. 


It first checks to see if it is inside an <a> tag, if it is then we are pretty much done since we found the link. If it isnt, it continues to check the parent tag until it reaches the body. This handles cases where you have an <img> tag inside an <a> tag. So when the user clicks on the image, the onclick event will be received on the image and not the <a> tag. So we have to traverse upwards.


Here is the code:


Link Detection Script



 document.body.onclick = function(e)
 {
 //If element has a tag type of a, then return href tag but if element is of another type, check its parent to see if it is embedded in an A tag, if not keep on
 //checking parents until it reaches the top most tag (html)
    var currentElement = e.target;
    while(currentElement.nodeName!='HTML')
    {
       //console.log('Parent Node: '+parent.nodeName);
       if(currentElement.tagName == 'A')
       {
          if(currentElement.href.indexOf('javascript:')==0)
          {
             window.external.notify('{\'id\':\'message_printout-'+GenerateID()+'\',\'action\':\'message_printout\',\'message\':\'Link was clicked with javascript void or some javascript function\'}');
             return true;
          }
          var rel = currentElement.rel;
          var target = currentElement.target;
          var newpage = false;
          if(rel=='external' || target=='_blank')
            newpage = true;
          window.external.notify('{\'id\':\'leaving_page-'+GenerateID()+'\',\'action\':\'leaving_page\', \'url\':\'' + currentElement.href +'\', newpage:\''+newpage+'\'}');
         return false;
       }
       currentElement = currentElement.parentNode;
    }
}
 return true;
 }
Note: The window.external.notify code is specific to WebView inside Windows Store. What I'm doing here is basically notifying my application from javascript inside the WebView. So when a link is clicked I would get a message with the url, which I will then handle my self. You could just replace window.external.notify with console.log or your own function call.

This should detect links for 80% of the cases. The 20% cases will be iFrames, and dynamic websites that use jquery and ajax. You would have to handle iframes separately, look at each iframe, find its document element and execute this javascript inside it. 


For other complex websites, the script above might not detect on click events. An example would be mail.yahoo.com, when you load an email, the link detection script above would not detect any clicks inside the email body. I wasn't able to figure out why this is happening other than the onclick is being handled by some other script. So for these small cases I just altered the url (inside the href tag) to call my function. It would look like this:  



function CustomOnClick(url, newpage)
{
   //console.log('link-detect: ' + url + ' ' + newpage );
}


function linkReplacementScript()
{
    var aTagList = document.getElementsByTagName('a');
    for(i = 0; i<aTagList .length; i++)
    {
      var url = aTagList[i].href;
      var rel = aTagList[i].rel;
      var target = aTagList[i].target;
      aTagList[i].rel = '';
      aTagList[i].target = '';
      var newpage = false;
      if(rel=='external' || target=='_blank')
         newpage = true;
      if(url.indexOf('javascript:')==0)
     {
        //do nothing if its javascript code
      }
      else
      {
        aTagList[i].href = 'javascript:CustomOnClick(\''+url+'\',\''+newpage+'\');';
      }
   }
}
But then there are cases where the dom is altered after the page has loaded. An example of this would be dynamic websites that insert content using ajax/jquery. For this case we would have to detect when the dom has changed and then call the link replacement script above. There is a way to detect this using MutationObserver (see this post). 

Basically whenever you get a dom updated event, you would call the link replacement script. Note: the link replacement script is only for cases where the link detection script has failed. It's sort of a catch all fallback just incase.