Posts Tagged ‘publishing’
Working with text in Java: Using BreakIterator API
java.text.BreakIterator is a very good API to find boundaries – character, word, sentence & line break – with in a text. The API provides a factory method to create the appropriate Iterator of our choice.
// Instantiating a word iterator with optional locale parameter BreakIterator anIterator = BreakIterator.getWordInstance(locale); // Without locale parameter BreakIterator anotherIterator = BreakIterator.getWordInstance();
The locale is an optional parameter to have locale specific breaks. We can instantiate BreakIterator without specifying the locale also. (java.util.Locale) The locale is important when we are working with languages like Arabic or Chinese where the standards may be different compared to English.
Once we have an instance of BreakIterator, iterating through the boundaries / breaks is the same. The API offers methods like first(), last(), previous(), next(), preceding(), following() to iterate through boundaries.
Iterating through boundaries
BreakIterator aWordIterator = null;
String targetString = null;
int nextIndex = -1;
int anIndex = -1;
// Initialising the word iterator
aWordIterator = BreakIterator.getWordInstance();
targetString = "This is a sample text";
aWordIterator.setText(targetString);
// Iterating through the boundaries
nextIndex = aWordIterator.first();
while (BreakIterator.DONE != nextIndex)
{
anIndex = nextIndex;
nextIndex = aWordIterator.next();
if ((BreakIterator.DONE != nextIndex) &&
Character.isLetterOrDigit(targetString.charAt(anIndex)))
{
System.out.format("%10s (%d, %d) \n",
targetString.substring(anIndex, nextIndex),
anIndex, nextIndex);
}
}
The constant BreakIterator.DONE indicates the end of boundaries.
Output
This (0, 4) is (5, 7) a (8, 9) sample (10, 16) text (17, 21)
Useful links
http://java.sun.com/docs/books/tutorial/i18n/text/examples/BreakIteratorDemo.java – Sun’s BreakIterator demo
http://java.sun.com/docs/books/tutorial/i18n/text/index.html – Sun’s tutorial on “Working with text”
Today’s tool: Gnome Blog
Gnome Blog is a nice tool using which we can publish to our blogs – currently they support Blogger.com / Blogspot.com, Advogato.org, Movable Type, WordPress, LiveJournal.com, Pyblosxom & Any other blog using bloggerAPI or MetaWeblog.
There is post in Ankur’s blog about this tool. I am yet to try it.
What is Gnome ? – Wikipedia/Gnome
Links
Home page: http://www.gnome.org/~seth/gnome-blog/
Download page: http://www.gnome.org/~seth/gnome-blog/download.html
A basic Jdom parser for RSS
Almost two years back I posted a SAX based RSS parser (find it here) which was intented for J2ME. But we have JDOM parser which I think is a lot easier than SAX. In this post you can find a very simple JDOM based RSS parser.
Know more about RSS here: http://en.wikipedia.org/wiki/RSS_(file_format)
Step1: Import the JDOM libraries
import org.jdom.Document; import org.jdom.Element; import org.jdom.input.SAXBuilder;
Step 2: Initialize
// I have not given the implementation of
// getUrlConnectionInputStream(url)
inputStream = getUrlConnectionInputStream(url);
if(null == inputStream)
throw new Exception("No input stream for " + url);
saxBuilder = new SAXBuilder();
document = saxBuilder.build(inputStream);
rssFeed = _build(document);
Step 3: Implementation of _build(org.jdom.Document)
// Entry point. Returns a RssFeed object corresponding
// to the given RSS feed URL
private RssFeed _build(Document document)
throws Exception
{
RssFeed rssFeed = null;
Element rootElement = null;
Element channelElement = null;
String rssFeedVersion = null;
if (null == document)
throw new Exception("Empty document");
rootElement = document.getRootElement();
if (!"rss".equalsIgnoreCase(rootElement.getName()))
throw new Exception("Invalid XML");
rssFeedVersion = rootElement.getAttributeValue("version");
channelElement = rootElement.getChild("channel");
if (null == channelElement)
throw new Exception("Empty feed");
// Getting the feed contents
rssFeed = _getHeader(channelElement);
rssFeed.version = rssFeedVersion;
_addFeedItems(channelElement, rssFeed);
return (rssFeed);
}
Step 4: _getHeader(org.jdom.Element)
This method reads the feed header and sets the values to the RssFeed object.
// Sets the RSS feed heder information to the // RssFeed object rssFeed = new RssFeed(); rssFeed.title = getValueOfChildElement(channelElement, "title"); rssFeed.link = getValueOfChildElement(channelElement, "link"); rssFeed.description = getValueOfChildElement(channelElement, "description");
Step 5: _addFeedItems(org.jdom.Elements, subin.xml.RssFeed)
This method extracts each feed item from the XML and adds it to the given RssFeed object.
// Iterates through the feed item list, extracts the
// feed item details, creates corresponding RssItem object
// and adds it to the RssFeed item list
java.util.List<element> itemElements = null;
RssItem anRssItem = null;
itemElements = channelElement.getChildren("item");
if (null != itemElements)
{
for (Element anItemElement : itemElements)
{
anRssItem = new RssItem();
anRssItem.title = getValueOfChildElement(anItemElement, "title");
anRssItem.link = getValueOfChildElement(anItemElement, "link");
anRssItem.description = getValueOfChildElement(anItemElement, "description");
anRssItem.pubDate = getValueOfChildElement(anItemElement, "pubDate");
rssFeed.addItem(anRssItem);
}
}
Step 6: getValueOfChildElement(org.jdom.Element, String)
This method extract the value of the child node (specified by the name) from the given JDOM Element.
// Get the child node value
private String getValueOfChildElement(Element parentElement,
String tagName)
{
Element childElement = null;
String tagValue = null;
childElement = parentElement.getChild(tagName);
tagValue = (null != childElement)
? childElement.getValue().trim() : null;
return (tagValue);
}
Step 7: RssFeed & RssItem
Two classes to hold the feed information.
class RssFeed
{
public String version;
public String title;
public String description;
public String link;
public List <rssitem> items;
}
class RssItem
{
public String title;
public String description;
public String link;
public String pubDate;
}
If you find it difficult to follow as it is not a single file, I’m very sorry. But I hope this will be useful.
RSS Parser (SAX)
RSS (Really Simple Syndication)
RSS is way to publish frequently changing contents like blog posts, news updates, stock quotes & things like that. An RSS document, which is called a “feed,” “web feed,” or “channel,” contains either a summary of content from an associated web site or the full text. RSS formats are specified using XML, a generic specification for the creation of data formats.
I have attached a simple SAX parser for RSS. Please let me know if there is any flaw in the attached code. This code is provided for learning purpose with less focus on coding standards & it’s efficiency. You are free to use & modify it.
package subin.rnd.xml;
import java.io.IOException; import java.io.InputStream; import java.net.URL; import java.util.ArrayList; import java.util.HashMap; import java.util.Properties;
import javax.xml.parsers.SAXParser; import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes; import org.xml.sax.helpers.DefaultHandler;
public class RssParser extends DefaultHandler
{
private String urlString;
private RssFeed rssFeed;
private StringBuilder text;
private Item item;
private boolean imgStatus;
public RssParser(String url)
{
this.urlString = url;
this.text = new StringBuilder();
}
public void parse()
{
InputStream urlInputStream = null;
SAXParserFactory spf = null;
SAXParser sp = null;
try
{
URL url = new URL(this.urlString);
_setProxy(); // Set the proxy if needed
urlInputStream = url.openConnection().getInputStream();
spf = SAXParserFactory.newInstance();
if (spf != null)
{
sp = spf.newSAXParser();
sp.parse(urlInputStream, this);
}
}
/*
* Exceptions need to be handled
* MalformedURLException
* ParserConfigurationException
* IOException
* SAXException
*/
catch (Exception e)
{
System.out.println("Exception: " + e);
e.printStackTrace();
}
finally
{
try
{
if (urlInputStream != null) urlInputStream.close();
}
catch (Exception e) {}
}
}
public RssFeed getFeed()
{
return (this.rssFeed);
}
public void startElement(String uri, String localName, String qName,
Attributes attributes)
{
if (qName.equalsIgnoreCase("channel"))
this.rssFeed = new RssFeed();
else if (qName.equalsIgnoreCase("item") && (this.rssFeed != null))
{
this.item = new Item();
this.rssFeed.addItem(this.item);
}
else if (qName.equalsIgnoreCase("image") && (this.rssFeed != null))
this.imgStatus = true;
}
public void endElement(String uri, String localName, String qName)
{
if (this.rssFeed == null)
return;
if (qName.equalsIgnoreCase("item"))
this.item = null;
else if (qName.equalsIgnoreCase("image"))
this.imgStatus = false;
else if (qName.equalsIgnoreCase("title"))
{
if (this.item != null) this.item.title = this.text.toString().trim();
else if (this.imgStatus) this.rssFeed.imageTitle = this.text.toString().trim();
else this.rssFeed.title = this.text.toString().trim();
}
else if (qName.equalsIgnoreCase("link"))
{
if (this.item != null) this.item.link = this.text.toString().trim();
else if (this.imgStatus) this.rssFeed.imageLink = this.text.toString().trim();
else this.rssFeed.link = this.text.toString().trim();
}
else if (qName.equalsIgnoreCase("description"))
{
if (this.item != null) this.item.description = this.text.toString().trim();
else this.rssFeed.description = this.text.toString().trim();
}
else if (qName.equalsIgnoreCase("url") && this.imgStatus)
this.rssFeed.imageUrl = this.text.toString().trim();
else if (qName.equalsIgnoreCase("language"))
this.rssFeed.language = this.text.toString().trim();
else if (qName.equalsIgnoreCase("generator"))
this.rssFeed.generator = this.text.toString().trim();
else if (qName.equalsIgnoreCase("copyright"))
this.rssFeed.copyright = this.text.toString().trim();
else if (qName.equalsIgnoreCase("pubDate") && (this.item != null))
this.item.pubDate = this.text.toString().trim();
else if (qName.equalsIgnoreCase("category") && (this.item != null))
this.rssFeed.addItem(this.text.toString().trim(), this.item);
this.text.setLength(0);
}
public void characters(char[] ch, int start, int length)
{
this.text.append(ch, start, length);
}
public static void _setProxy()
throws IOException
{
Properties sysProperties = System.getProperties();
sysProperties.put("proxyHost", "<Proxy IP Address>");
sysProperties.put("proxyPort", "<Proxy Port Number>");
System.setProperties(sysProperties);
}
public static class RssFeed
{
public String title;
public String description;
public String link;
public String language;
public String generator;
public String copyright;
public String imageUrl;
public String imageTitle;
public String imageLink;
private ArrayList <Item> items;
private HashMap <String, ArrayList <Item>> category;
public void addItem(Item item)
{
if (this.items == null)
this.items = new ArrayList<Item>();
this.items.add(item);
}
public void addItem(String category, Item item)
{
if (this.category == null)
this.category = new HashMap<String, ArrayList<Item>>();
if (!this.category.containsKey(category))
this.category.put(category, new ArrayList<Item>());
this.category.get(category).add(item);
}
}
public static class Item
{
public String title;
public String description;
public String link;
public String pubDate;
public String toString()
{
return (this.title + ": " + this.pubDate + "n" + this.description);
}
}
}
Using RssParser.java :
RssParser rp = new RssParser("<RSS Feed URL>");
rp.parse();
RssFeed feed = rp.getFeed();
// Listing all categories & the no. of elements in each category
if (feed.category != null)
{
System.out.println("Category List: ");
for (String category : feed.category.keySet())
{
System.out.println(category
+ ": "
+ ((ArrayList<Item>)feed.category.get(category)).size());
}
}
// Listing all items in the feed for (int i = 0; i < feed.items.size(); i++) System.out.println(feed.items.get(i).title);



