Netseer.com
UCLA LUG presents Netseer.com Infosession!
Published by NNikzad on Thu, 2008-02-21 15:21 Tags: infosession | LUG | Netseer.com calendarhe UCLA Linux Users Group is proud to host an InfoSession with local web
statup NetSeer.com
****This Thrusday February 21st at 6pm in Boelter Hall 4760****
"Distributed crawling in Linux with Nutch/Hadoop and Amazon's Elastic Compute
Cloud (EC2)"
There are many technical issues inherent in crawling the billions of pages of
that make up the Web. Building a large, fixed web crawling infrastructure
could be quite expensive and complicated, particularly if you do not really
need to frequently re-crawl. Such a system needs to be platform independent,
highly scalable, fault tolerant and complete decentralized.
Additionally, in order to handle such a large set of data, we require an
underlying distributed filesystem that is equally scalable and fault
tolerant. In this talk, we will discuss our approach to web crawling using
Nutch/Hadoop, an open source distributed search engine system, and Amazon's
Elastic Compute Cloud (EC2.)
At NetSeer, we analyze and process the results of our web crawling, and
extract semantic concepts and conceptual relationships from the Web. These
concepts can then be used for improving search results, providing more
relevant ads, and a host of other applications. After the technical portion
of our talk, we will also provide more information on NetSeer, and we will be
accepting resumes.
Pizza and soda will be provided!!!!
For details visit http://linux.ucla.edu and check out http://NetSeer.com to
find out more about the company.
