Retrieve Structured, Textual Data from Various Web Sources


[Up] [Top]

Documentation for package ‘tm.plugin.webmining’ version 1.3

Help Pages

tm.plugin.webmining-package Retrieve structured, textual data from various web sources
assignValues Extract Main HTML Content from DOM
calcDensity Extract Main HTML Content from DOM
corpus.update Update/Extend 'WebCorpus' with new feed items.
corpus.update.WebCorpus Update/Extend 'WebCorpus' with new feed items.
encloseHTML Enclose Text Content in HTML tags
encloseHTML.character Enclose Text Content in HTML tags
encloseHTML.PlainTextDocument Enclose Text Content in HTML tags
extract Extract main content from 'TextDocument's.
extract.PlainTextDocument Extract main content from 'TextDocument's.
extractContentDOM Extract Main HTML Content from DOM
extractHTMLStrip Simply strip HTML Tags from Document
feedquery Buildup string for feedquery.
getEmpty Retrieve Empty Corpus Elements through '$postFUN'.
getEmpty.WebCorpus Retrieve Empty Corpus Elements through '$postFUN'.
getLinkContent Get main content for corpus items, specified by links.
getMainText Extract Main HTML Content from DOM
GoogleFinanceSource Get feed Meta Data from Google Finance.
GoogleNewsSource Get feed data from Google News Search <http://news.google.com/>
json_content Read content from WebXMLSource/WebHTMLSource/WebJSONSource.
NYTimesSource Get feed data from NYTimes Article Search (<http://developer.nytimes.com/docs/read/article_search_api_v2>).
nytimes_appid AppID for the NYtimes-API.
parse Wrapper/Convenience function to ensure right encoding for different Platforms
readGoogle Get feed Meta Data from Google Finance.
readNYTimes Get feed data from NYTimes Article Search (<http://developer.nytimes.com/docs/read/article_search_api_v2>).
readReutersNews Get feed data from Reuters News RSS feed channels. Reuters provides numerous feed
readWeb Read content from WebXMLSource/WebHTMLSource/WebJSONSource.
readWebHTML Read content from WebXMLSource/WebHTMLSource/WebJSONSource.
readWebJSON Read content from WebXMLSource/WebHTMLSource/WebJSONSource.
readWebXML Read content from WebXMLSource/WebHTMLSource/WebJSONSource.
readYahoo Get feed data from Yahoo! Finance.
readYahooHTML Get news data from Yahoo! News (<https://news.search.yahoo.com/search/>).
readYahooInplay Get News from Yahoo Inplay.
removeNonASCII Remove non-ASCII characters from Text.
removeNonASCII.PlainTextDocument Remove non-ASCII characters from Text.
removeTags Extract Main HTML Content from DOM
ReutersNewsSource Get feed data from Reuters News RSS feed channels. Reuters provides numerous feed
source.update Update WebXMLSource/WebHTMLSource/WebJSONSource
source.update.WebHTMLSource Update WebXMLSource/WebHTMLSource/WebJSONSource
source.update.WebJSONSource Update WebXMLSource/WebHTMLSource/WebJSONSource
source.update.WebXMLSource Update WebXMLSource/WebHTMLSource/WebJSONSource
tm.plugin.webmining Retrieve structured, textual data from various web sources
trimWhiteSpaces Trim White Spaces from Text Document.
WebCorpus WebCorpus constructor function.
webmining Retrieve structured, textual data from various web sources
WebSource Read Web Content and respective Link Content from feedurls.
YahooFinanceSource Get feed data from Yahoo! Finance.
YahooInplaySource Get News from Yahoo Inplay.
yahoonews WebCorpus retrieved from Yahoo! News for the search term "Microsoft" through the YahooNewsSource. Length of retrieved corpus is 20.
YahooNewsSource Get news data from Yahoo! News (<https://news.search.yahoo.com/search/>).