Predictive modelling is the process by which a model is created or chosen to try to best predict the probability of an outcome.[1] In many cases the model is chosen on the basis of detection theory to try to guess the probability of an outcome given a set amount of input data, for example given anemail determining how likely that it is spam.
Models can use one or more classifiers in trying to determine the probability of a set of data belonging to another set, say spam or 'ham'.


Clustering
Link: http://www.answers.com/topic/cluster-analysis-1
Cluster analysis or clustering is the assignment of a set of observations into subsets (called clusters) so that observations in the same cluster are similar in some sense. Clustering is a method of unsupervised learning, and a common technique for statistical data analysis used in many fields, including machine learning, data mining, pattern recognition, image analysis, information retrieval, and bioinformatics.
Besides the term clustering, there are a number of terms with similar meanings, including automatic classification, numerical taxonomy,botryology and typological analysis.
Read more: http://www.answers.com/topic/cluster-analysis-1#ixzz1CdiJX6K2


Bayesian inference is a method of statistical inference in which some kind of evidence or observations are used to calculate theprobability that a hypothesis may be true, or else to update its previously-calculated probability.

Machine learning, a branch of artificial intelligence, is a scientific discipline that is concerned with the design and development of algorithms that allowcomputers to evolve behaviors based on empirical data, such as from sensor data or databases. A learner can take advantage of examples (data) to capture characteristics of interest of their unknown underlying probability distribution. Data can be seen as examples that illustrate relations between observed variables. A major focus of machine learning research is to automatically learn to recognize complex patterns and make intelligent decisions based on data; the difficulty lies in the fact that the set of all possible behaviors given all possible inputs is too large to be covered by the set of observed examples (training data). Hence the learner must generalize from the given examples, so as to be able to produce a useful output in new cases. Machine learning, like all subjects in artificial intelligence, requires cross-disciplinary proficiency in several areas, such as probability theory,statistics, pattern recognition, cognitive science, data mining, adaptive control, computational neuroscience and theoretical computer science.


SOAP, originally defined as Simple Object Access Protocol, is a protocol specification for exchanging structured information in the implementation of Web Services in computer networks. It relies on Extensible Markup Language (XML) for its message format, and usually relies on other Application Layer protocols, most notably Remote Procedure Call (RPC) and Hypertext Transfer Protocol (HTTP), for message negotiation and transmission. SOAP can form the foundation layer of a web services protocol stack, providing a basic messaging framework upon which web services can be built. This XML based protocol consists of three parts: an envelope, which defines what is in the message and how to process it, a set of encoding rules for expressing instances of application-defined datatypes, and a convention for representing procedure calls and responses.
As an example of how SOAP procedures can be used, a SOAP message could be sent to a web-service-enabled web site such as a real-estate price database, with the parameters needed for a search. The site would then return an XML-formatted document with the resulting data, e.g., prices, location, features. With the data being returned in a standardized machine-parseable format, it can then be integrated directly into a third-party web site or application.
The SOAP architecture consists of several layers of specifications: for message format, Message Exchange Patterns (MEP), underlying transport protocol bindings, message processing models, and protocol extensibility. SOAP is the successor of XML-RPC, though it borrows its transport and interaction neutrality and the envelope/header/body from elsewhere (probably from WDDX).[speculation?]

A web service is a method of communication between two electronic devices. It is a "solution logic" that can be exposed over the World Wide Web.
The W3C defines a "web service" as "a software system designed to support interoperable machine-to-machine interaction over a network. It has an interface described in a machine-processable format (specifically Web Services Description Language WSDL). Other systems interact with the web service in a manner prescribed by its description using SOAP messages, typically conveyed using HTTP with an XML serialization in conjunction with other Web-related standards."[1]
The W3C also states, "We can identify two major classes of Web services, REST-compliant Web services, in which the primary purpose of the service is to manipulate XML representations of Web resources using a uniform set of "stateless" operations; and arbitrary Web services, in which the service may expose an arbitrary set of operations."[2]


http://en.wikipedia.org/wiki/Extract,_transform,_load
Extract, transform and load (ETL) is a process in database usage and especially in data warehousing that involves:

  • Extracting data from outside sources
  • Transforming it to fit operational needs (which can include quality levels)
  • Loading it into the end target (database or data warehouse)

R is a programming language and software environment for statistical computing and graphics. The R language has become a de facto standard among statisticians for the development of statistical software,[2][3] and is widely used for statistical software development and data analysis.[3]
R is part of the GNU project.[6][7] Its source code is freely available under the GNU General Public License, and pre-compiled binary versions are provided for various operating systems. R uses a command line interface, however several graphical user interfaces are available for use with R.

Soundex codes
http://www.highprogrammer.com/alan/numbers/soundex.html

Fuzzy Logic
http://en.wikipedia.org/wiki/Fuzzy_logic

Record Linkage
http://en.wikipedia.org/wiki/Record_linkage

Graph Matching
http://en.wikipedia.org/wiki/Matching_(graph_theory)
http://en.wikipedia.org/wiki/Graph_(mathematics)