Week /Date March 20

summarizing papers, resources, pulling idea's together.

Week 5

critique of site
what it does
technology using to do it

Categorize it
like or not like
can export data from
aggregate together.

revamp as you go.

page or two on capabilities and what the api is how it works
  • Tool Summary Added this page to categorize and add functionality types for the tools found
    • Need to build the pages in length and re-organize the wiki based on Tool summary etc...
    • go through all links and summarize them
  • downloaded Weka and perusing through the read me documents

What tools are out there right now
know what's been done
what you propose to do
building initial prototype proof of concept.
XML to dom translation
php object find documentation on it and utilize php libraries to parse
1 page summary of each link what its doing.
talk to branton on the php parsing for LDS

Week 4
  • alta vista/ asta la vista

go thru and write something for a test case to connect to this in php
  • Fixed their api that I downloaded to be current PHP and was able to get XML from login page =).
  • working to figure out xml parser

hmm google API too?

research Facebook api possibility apps already available.

ask for php api for one rootsweb and accesss how to or where to access it

follow up on other api's

Week 3

Mark Cahill idea's to research
  • Meaningful patterns, multiple regression, memory mapping,
  • R-ETL tool, SPSS, Machine learning, clementine
  • Project WEKA,, Peter Wiley

contact the databases and find out if they have an api that I can use for this project
contact the people who wrote the papers and find out what the current state of automating genealogy with data mining is. and go through latest references to understand what's been done.

surname searching
easy to use site.

how to get to genealogical data sources.
then mining that data to family tree's

how to access programmatically the family tree or LDS
  • LDS research on membership to gain access to databases
  • emailed LDS,

-Municipal records raw data. What are the data sources and types of sources available

-all database links email to Branton.
-has someone done this and what techniques did they use to do it?
-Where did the data come from for those particular sites that we have access to and then how did they turn it into family tree's.

-whats been done
-what could be done as problem to solve!

-title for project
  • Automating Genealogy with Web Data Mining

-abstract: one paragraph summary of your project email him draft.
  • The internet has enabled the spread of information with the world's
    collaborative efforts, yet genealogists still finely sift through
    individual search engines and applications to find the data that
    matches. We propose to automate this process utilizing data mining and
    data analysis techniques. Algorithms that work for the varying data
    sets will be used to narrow results further down than basic search
    engines are capable of doing. Additional classifications will be done
    in order to maintain timelines and geographical accuracy of information.
    The results will be integrated into a database for lineage retrieval
    reports and data will be maintained in a format for export that is complete.

-Current Web tools and how they work

research gedcom

Week 1-2
Gain Resource papers for Data Mining and Genealogy
  • Summary paragraph each that you maybe using

Side Notes

HowToSites these the links found that I need to organize
Idea's idea's on organizing data and potential other resources