Tensor Factorization for
Multi-Relational Learning

ECML/PKDD Nectar Track
25. September 2013, Prague

Hello everyone! First of all, it is great to be here
My name is Max Nickel, and I am just starting my postdoc at Tomsaso Poggio's group at MIT. However, what I want to talk about today is work I did for my PhD that I just finished at LMU under the supervision of Volker Tresp who is at Siemens of course.
Unfortunately, I've catched a really bad cold, so I hope the talk doesn't get too difficult to follow, so if you have questions please ask.
Anyway, the title of the talk is ... and it covers work over the last 2 years and summarizes of papers from ICML, NIPS and WWW

Relational Learning

Focus on dyadic multi-relational data, i.e. (subject, predicate, object) triples
Important relational learning tasks include

Link Prediction: Prediction of relationships between entities
Collective Classification: Classification of entities based on their relationships
Link-Based Clustering: Clustering of entities based on their relationships
Entity Resolution: Deduplication of entities

Relationships are a rich source of information, which allow to include information that is distant in the relational graph, what can significantly improve learning results.

Example:
Party membership of US presidents
Relations:

presidentOf
partyOf

Relational learning is a branch of machine learning concerned with learning from data where information is represented in form of relationships between entities
Now, a very important property of relational modeling is that relations allow us to access information that is more distant in the graph
And this is a large reason for relations being such a rich source of information
For instance, consider we want to predict the part membership of Al Gore. rarara attributes bad, relations good
Ok and this ability of an algorithm to read collective learning def
So, this collective learning effect is something that we really want to have, for example for link prediction, but also for other learning task where we think the relations hold important information like clustering, entity resolution and so on

Why Tensor Factorization?

Problem: Existing relational learning methods
- do not scale (easily) to large data
- often require extensive prior information about the problem
- can be complex and difficult to implement
Tensor factorization in general:
- Modeling simplicity: Multi-relational data has an efficient tensor represenation
- No structure learning: Tensor factorization does not require structure learning or extensive prior knowledge
- Weak collective learning: Most tensor factorizations show no or only a very reduced collective learning effect or show unreasonable constraints
Our approach:
- Strong collective learning via unique latent representation of entities
- $\Rightarrow$ State-of-the-art results on benchmark data
- High scalability as algorithm scales linearly with the data size

Tensor Representation
of Relational Data

RDF, or any other dyadic relationaldata, has a natural and efficient representation as a sparse third-order tensor
Two modes refer to the entities, one mode to the relation types
\begin{align} \Rightarrow & |Mode\ 1| = |Mode\ 2| = |Entities| = n\\ & |Mode\ 3| = |Predicates| = m \end{align}
We also require that the ordering of the entities on both modes is identical

$\color{cornflowerblue}{x_{ijk}}=\cases{1, & if triple ($\color{yellowgreen}{i}$-th entity, $\color{gold}{k}$-th predicate, $\color{orangered}{j}$-th entity) exists\cr 0, & otherwise\cr}$

In the following we will refer to this modeling as adjacency tensor

Another way to look at this construction is, that each of these frontal-slices of the tensor correspond exactly to the adjacency matrix of the predicate for that particular index and what we are then simply doing is that we take all these adjacency matrices and stack them behind each other

RESCAL

Main Idea: RESCAL is a factorization of multi-relational data, or multigraphs, into unique entity and predicate representations

\begin{align} X_{k\phantom{ij}} \approx & A_{\vphantom{k}} R_k A^T \\ \ten{X}_{\phantom{ijk}} \approx & \ten{R} \ttm_1 A \ttm_2 A \\ x_{ijk} \approx & \sum\nolimits_{q,r} a_{iq} r_{qrk} a_{jr} \end{align}

$X_k$

\approx

A_{\vphantom{k}} R_k A^T

$\color{orangered}{A} \in \R^{n \times r}$ represents the entity-latent-component space
$\color{orangered}{R_k} \in \R^{r \times r}$ is an asymmetric matrix that specifies the interaction of the latent components for the $k$-th predicate
$\color{orangered}{r}$ denotes the number of latent components of the factorization
$\color{orangered}{\approx}$ means best approximation under some loss function
To compute the factorization, we solve the optimization problem \[\min_{A,R_k} \sum_k \text{loss}(X_k; A_{\vphantom{k}} R_k A^T) + \lambda_A \|A\|^2 + \lambda_{\ten{R}} \|\ten{R}\|^2 \]

RESCAL factories each of the frontal slices of $X$, $X_k$ into the matrix product $A_{\vphantom{k}}R_kA^T$
Graphical view
Interpretation
Optimization problem - algorithm deffered until later
Unique representation enables strong collective learning effect

Probabilistic Interpretation

MAP estimate of $A$ and $\ten{R}$ for joint distribution
\[ P(\ten{X}|A,\ten{R}) = \prod_{i=1}^n \prod_{j=1}^n \prod_{k=1}^m P(x_{ijk}|\re{i}{j}{k}) \]
reconstructed values $\re{i}{j}{k}$ are natural parameter of exponential family distribution
Nature of the distribution is determined by choice of loss function
- Least Squares Loss $\Rightarrow$ Gaussian distribution
- Logistic Loss $\Rightarrow$ Bernoulli distribution
Because of colliders in graphical model, $x_{ijk}$ depends indirectly on any other variable via its associated latent variables

When looking at it probabilistically, we can interpret each entry in the adjacency tensor as a RV that represents the state of the relationship
Under appropritate loss function, matrix and tensor factorization can be interpreted as maximum a posteriori estimation of the latent factors
Because of the collider structure the model can capture global dependencies in the data, because latent variables can possible depend on any other variable

Latent variable structure decouples prediction and learning
- $x_{ijk}$ is conditionally independent of all variables given $\re{i}{j}{k}$
- Simultaneously, latent variables are not $d$-separated from any other variable

Link Prediction using RESCAL

\pred{X}_k

\leftarrow

A_{\vphantom{k}} R_k A^T

Basic procedure of link prediction with RESCAL
- Compute "rank-reduced" factorization ($\triangleq$ latent-variable representations)
- Entries after reconstruction (i.e. $\pred{x}_{ijk} = \re{i}{j}{k}$) can be regarded as confidence values of the model that the corresponding triple exists
Allows for very fast prediction of unknown triples, since it is equivalent to simple matrix-vector products

Ok, this was the basic model, we have actually some extensions to it, for instance to handle attributes efficiently. However, to talk about this would violate the time constraints. Instead, I'd like to move on an talk about some of the applications of the model,
Starting with link-prediction...
Original tensor -> sparse, reconstructed tensor -> dense
Why does this work?
- When computing the factorization, we simultaneously learn LC interactions and map entities -> LC
- Al Gore = Liberal Politician, Dem Party = Liberal Party
- Liberal Politicians interact very likey with Liberal Parties in partyOf
- So when we don't know that Al is member of Dems., we can predict it
- And since we have this unique representation of $A$ over all predicates and s/o, we for instance can use the information that Al Gore is VP of Bill Clinton, which in turn is Liberal for this prediction task
What also very interesting is, that all this information sharing is already done during learning the model, and for prediction we only need the latent representations. This of course provides very fast inference, because we only need to calculate a vector matrix vector product for one triple
This is a feature that might become interesting for prediction of unknown triples during query time in large scale knowledge bases

Similarity of Entities in RESCAL

$X_k$

$\approx$

$A R_k A^T$

$A$ can be regarded as an embedding of the entities into a vector space representation
Similarity of entities in the latent-space $A$ corresponds to similarity of entities in the relational domain
$\Rightarrow$ Any feature-based machine-learning method (e.g. for clustering, classification etc.) can instantly be applied to $A$
Due to the collective learning effect of RESCAL, the latent-variable representation also contains information about distant relations

Collective Learning - U.S. Presidents

Data on party memberships of U.S. presidents extracted from DBpedia
Task: Predict party-membership for persons in dataset (link prediction)
Relations: partyOf, presidentOf, vicePresidentOf
Experimental setting: 10-fold cross-validation over all persons
Evaluation measure: Area under precision-recall curve (AUC-PR)

AUC-PR

Link Prediction Benchmarks

RESCAL shows state-of-the-art results, compared to MLN, IRM, BCTF
RESCAL shows very fast training times, i.e. under one minute on these datasets

AUC-PR

Kinships

UMLS

Nations

Learning Taxonomies on IIMB Benchmark

Taxonomies are an integral part of ontologies, consist of
- a list of concepts
- a concept hierarchy
- an assignment of entities to concepts
Task: Unsupervised taxonomy learning via hierarchical clustering of entities
Method: Hierarchical clustering on $A$ using the OPTICS method
Data: Instance matching benchmark from movie domain, consisting of $\approx$ 1400 entities, organized in 80 concepts
- Top Level Concepts: Film, Language, Budget, Creature, Location
- The concepts Creature, Film and Location are again subdivided into concepts like Person, Character, Country or Action Movie

Level 1	0.982
Budget	1.0
Creature	1.0
Film	1.0
Language	0.963
Location	0.953
Level 2	0.806
Character	1.0
City	0.992
Country	0.632
Person	0.991
Parody	0.33
Zombie	1.0
Level 3	0.946
Actor	0.987
Capital	0.992
Character Creator	0.533
Director	0.781

RESCAL-ALS Algorithm

RESCAL-ALS is an alternating-least squares algorithm to compute the RESCAL factorization, i.e. it optimizes

\[ \min_{A,\ten{R}}\sum_k \|{X}_k - AR_kA^T\|^2 + \lambda_A \|A\|^2 + \lambda_{\ten{R}} \|R_k\|^2 \]

Updates for factor matrices $A$ and $R_k$: \begin{align*} A_{\phantom{k}} & \leftarrow \left( \sum_{k=1}^m \class{rescalALS1}{X_k}AR_k^T + \class{rescalALS1}{X_k}^TAR_k \right) \left( \class{rescalALS2}{\sum_{k=1}^m B_k + C_k + \lambda I} \right)^{-1} \\ R_k & \leftarrow \left( \class{rescalALS2}{A^TA \otimes A^TA + \lambda I} \right)^{-1} \vect(A^T\class{rescalALS1}{X_k}A) \end{align*} where \begin{align*} B_k = R_kA^TAR_k^T,\quad C_k = R_k^TA^TAR_k \end{align*}

Exploit sparsity of $X_k$

Inversions can be computed in $\color{gold}{\mathcal{O}(r^3)}$

Largest dense matrix in memory: $\color{gold}{n \times r}$

ALS = BCD, blocks naturally given by factors
To compute the factorization we basically optimize a regularized least-squares problem
A nice thing about this approach is, that we can provide an efficient Alternating Least-Squares algoritm
Well and the update steps of this ALS algorithm to compute the factor matrices A and R_k are given here at the bottom
So, at first they might look expensive, but...

Scalability Experiments

Parameter value

Time (s)

Computational Complexity of RESCAL-ALS for sparse adjacency tensors

Entities: linear
Predicates: linear
Known Facts: linear
Latent Components: cubic

It turns out, and we can actually show that!, that the algorithm scales linear with the data size, meaning linear with the number of entities, the number of predicates and the number of known facts

Factorizing YAGO2 (core)

AUC-PR

Scalability allowed us to factorize the YAGO 2 core ontology with approximately
- 3 million entities
- 38 relations
- 41 million known facts
Runtime for a rank-20 factorization on commodity hardware ~3 hours
Successfully predicted typeOf relationships for various higher-level classes in the ontology

Concluding Remarks & Future Work

RESCAL offers a flexible and scalable way to machine learning on multi-relational data in the size of complete knowledge bases
RESCAL shows state-of-the-art results in link prediction and entity resolution due to a strong collective learning effect
Scalability and state-of-the-art relational learning combined within a single model
Future work with regard to tensor factorizations for structured and relational data
- Scalable Logistic Model
- Handling scale-free nature of many multigraphs $\rightarrow$ Regularization
- Reduce cubic complexity for $r$ by including prior knowledge in the factorization

Factor matrices enable standard machine learning algorithms on relational data
Fix Implications of using ALS
Normal distribution default model in many cases, for good reason, often does reasonably well
Limiting distribution of the sum of random variables
$X^TX$ can be ill-conditioned, what can be overcome by using regularization methods
Possible Justification: Big Data assumption -> More data beats better algorithms, therefore it might be beneficial to trade scalability for exact modeling, but this really depends on the problem

Code & Tools

Python and Matlab code for RESCAL available from my website and Github


  from rescal import rescal
  
  # compute rank-20 factorization of tensor X
  A, R = rescal(X, 20, init='nvecs')

Easy to use tools to convert from N-Triples to Python and Matlab tensors available from my website and Github

Thank You!

Questions?

References

[1]

Nickel, M. et al. 2011. A Three-Way Model for Collective Learning on Multi-Relational Data. Proceedings of the 28th International Conference on Machine Learning (Bellevue, WA, USA, 2011), 809–816.

[2]

Nickel, M. et al. 2012. Factorizing YAGO: scalable machine learning for linked data. Proceedings of the 21st international conference on World Wide Web (New York, NY, USA, 2012), 271–280.

[3]

Nickel, M. and Tresp, V. 2013. An Analysis of Tensor Models for Learning on Structured Data. Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2013 (2013), to appear.

[4]

Nickel, M. and Tresp, V. 2011. Learning Taxonomies from Multi-Relational Data via Hierarchical Link-Based Clustering. NIPS Workshop - Learning Semantics (Granada, Spain, 2011).

[5]

Nickel, M. and Tresp, V. 2013. Logistic Tensor-Factorization for Multi-Relational Data. ICML Workshop - Structured Learning: Inferring Graphs from Structured and Unstructured Inputs (Atlanta, GA, USA, 2013).

[6]

Nickel, M. and Tresp, V. 2012. Machine Learning on Linked Data: Tensors and their Applications in Graph-Structured Domains.

[7]

Nickel, M. and Tresp, V. 2013. Tensor Factorization for Multi-Relational Learning. Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2013 (2013), to appear.

[8]

Nickel, M. and Tresp, V. 2010. Three-Way DEDICOM for Relational Learning. NIPS 2010 Workshop - Tensors, Kernels and Machine Learning (Whistler, Canada, 2010).

[9]

Tresp, V. and Nickel, M. 2013. Relational Models. Encyclopedia of Social Network Analysis and Mining. J. Rokne and R. Alhajj, eds. Springer. to appear.

References

[1]

Airoldi, E.M. et al. 2008. Mixed Membership Stochastic Blockmodels. Journal of Machine Learning Research. 9, (Sep. 2008), 1981–2014.

[2]

Alon, N. 1995. Handbook of combinatorics (vol. 2). R.L. Graham et al., eds. MIT Press. 1749–1783.

[3]

Getoor, L. and Taskar, B. eds. 2007. Introduction to statistical relational learning. MIT Press.

[4]

Kemp, C. et al. 2006. Learning systems of concepts with an infinite relational model. Proceedings of the 21st national conference on Artificial intelligence - Volume 1 (Boston, Massachusetts, 2006), 381–388.

[5]

Kok, S. and Domingos, P. 2007. Statistical predicate invention. Proceedings of the 24th international conference on Machine learning (New York, NY, USA, 2007), 433–440.

[6]

Kolda, T.G. et al. 2005. Higher-order web link analysis using multilinear algebra. Proceedings of the Fifth International Conference on Data Mining (2005), 242–249.

[7]

Koller, D. and Pfeffer, A. 1998. Probabilistic frame-based systems. Proceedings of the National Conference on Artificial Intelligence (1998), 580–587.

[8]

Richardson, M. and Domingos, P. 2006. Markov logic networks. Machine Learning. 62, 1 (2006), 107–136.

[9]

Singh, A.P. and Gordon, G.J. 2008. A Unified View of Matrix Factorization Models. Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II (Berlin, Heidelberg, 2008), 358–373.

[10]

Singla, P. and Domingos, P. 2006. Entity Resolution with Markov Logic. Proceedings of the Sixth International Conference on Data Mining (Washington, DC, USA, 2006), 572–582.

[11]

Srebro, N. et al. 2005. Generalization Error Bounds for Collaborative Prediction with Low-Rank Matrices. Advances in Neural Information Processing Systems (Cambridge, MA, 2005), 1321–1328.

[12]

Srebro, N. 2004. Learning with matrix factorizations. Massachusetts Institute of Technology.

References

[13]

Sutskever, I. et al. 2009. Modelling Relational Data using Bayesian Clustered Tensor Factorization. Advances in Neural Information Processing Systems 22 (2009), 1821–1828.

[14]

Warren, H.E. 1968. Lower Bounds for Approximation by Nonlinear Manifolds. Transactions of the American Mathematical Society. 133, 1 (Aug. 1968), 167–178.

[15]

Xu, Z. et al. 2006. Infinite Hidden Relational Models. Proceedings of the Twenty-Second Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI-06) (Arlington, Virginia, 2006), 544–551.

[16]

Yılmaz, Y.K. et al. 2011. Generalised coupled tensor factorisation. Advances in Neural Information Processing Systems (2011).

Backup Slides

Linked Data & Semantic Web

Relational Models

Approach of most relational learning methods: Template mechanism to create a grounded graphical model
- Probabilistic Relational Models:
  Class diagram $\mapsto$ grounded Bayesian network
- Markov Logic Networks:
  Logical formulas $\mapsto$ grounded Markov random field
Alternative approach: latent variable methods such as IHRM, IRM or mixed-membership stochastic block models

Problems of existing methods

Require extensive prior knowledge or structure learning
Require expensive inference methods such as MCMC or variational infercence

Relational learing is hard!

Typical indepence assumptions of ML do not hold

Some method needed to reduce computational complexity

RARARA

However, these existing methods have serious drawback that hindered their application

Also, difficult, complicated to implement

Collective Learning

Probabilistically, each possible edge in the multigraph can be interpreted as a binary random variable
"Traditional" Machine Learning: Observations are assumed to be independent
- Joint distribution factorizes nicely
- $P(x_{11},\ldots,x_{nm}) = \prod_{i=1}^n P(x_{i1},\ldots,x_{im})$
- Binary random variables: $n2^{m}$ possible states
Relational Learning: Each random variable is possibly dependent on any other random variable
- Without additional information, joint distribution can not be factorized
- Previous example: $2^{n^2}$ possible states
- For $n$ entities over $m$ dyadic relations: $2^{n^2m}$

Collective learning implies that every random variable can possibly be dependent on every other random variable in the data set
The problem is now that the usuall i.i.d. assumption of most machine learning methods doesn't hold anymore
This assumption is made to simplify computations, as with independent observations, the joint distribution factors nicely.
For instance, consider a $n \times m$ matrix, usually we would assume that we have $n$ observations involving $m$ features. The features are possibly dependent on each other, but the observations, i.e. the rows of the matrix are independent.

Relational Learning & Bipartite Models

Tensor factorizations usually model a bipartite multigraph, i.e. assumes that the entries on the 1st mode are not identical to the entries on the 2nd mode
For instance, Tucker-2: $X_k \approx A_{\vphantom{k}}R_kB^T$, similar for CP
Bipartite Models do not account for the identity of subjects and objects
For general relational data, such as Linked Data, this assumption is violated
Bipartite modeling prevents information propagation over latent variables, or at least makes it very difficult (indirect effects can still exist)

Now, the problem with nearly all tensor factorizations in this context is, that by applying the CP decomposition to an adjacency tensors, we actually assume that the entities are bipartite
This is the case because CP uses a different latent vector representation for each of the modes, i.e. on time the vectors a, the other time the vectors b
Make sense in User-Movie-Time data
Violated on Adjacency tensors $\Rightarrow$ Breaks an information flow over the latent variables that we really want to have for collective learning
Lets say the observed variable are (Al, vP, Bill) and (Bill, p, Dem)
During learning the latent factors reflect that Bill is member of Dem. Party by some suitable representation
However when we model a graph is bipartite, occurences of entities as subjects and objects have a different representation, meaning that those representationa are different
And therefore, we have no guarentee, that the latent representation of Bill as an object also reflects the fact that he is member of the Democratic party (there might be some distant effects, but are quite unlikely, and this was only one hop!)
So, the information flow through the latent factors breaks here or at least is limited
What we would want instead of course is that all entities have a unique representation whether they occur as subjects or objects, because then, the fact that Bill is Dem. is more or less directly connected

Similarity of Entities in RESCAL

General approach: similarity of entities $\approx$ similarity of their relational patterns

Outgoing links

\begin{align*} \vect(X_{i,:,:}) = & \color{deeppink}{\vec{a}_i} R_{(1)} (I \otimes A^T) \\ \vect(X_{j,:,:}) = & \color{cornflowerblue}{\vec{a}_j} R_{(1)} (I \otimes A^T) \end{align*}

Incoming links

\begin{align*} \vect(X_{:,i,:}) = & \color{deeppink}{\vec{a}_i} R_{(2)} (I \otimes A^T) \\ \vect(X_{:,j,:}) = & \color{cornflowerblue}{\vec{a}_j} R_{(2)} (I \otimes A^T) \end{align*}

{X}_k

$\approx$

A_{\vphantom{k}} R_k A^T

For both incoming and outgoing links, the similarity of entities $e_i$ and $e_j$ is uniquely determined by the similarity of $\vec{a}_i$ and $\vec{a}_j$
Due to the collective learning effect of RESCAL, the latent-variable representation also contains information about distant relations
$\Rightarrow$ Any feature-based machine-learning method (e.g. for clustering, classification etc.) can instantly be applied to $A$

Entity Resolution - Cora

Cora is a hand-labeled dataset consisting of machine learning publications, authors, venues and paper title
- Authors: Noisy approx. 90 $\mapsto$ True approx. 50
- Venues: Noisy approx. 210 $\mapsto$ True approx. 100
- Citation: Noisy approx. 400 $\mapsto$ True approx. 130

	Author	Citation	Venue
NB	0.986	0.913	0.738
MLN(B)	0.987	0.915	0.736
MLN(BCTS)	0.992	0.988	0.807
CP	0.984	0.991	0.746
RESCAL	0.997	0.991	0.810

AUC-PR

Due to noisy citations, collective learning can significantly improve entity resolution results for authors and venues

Data

Instance matching: Ranking of entities by their similarity in the entity-latent-component space
In total 61.000 matching decisions

Visualization

Node size corresponds to degree of the nodes
Red nodes: authors, e.g. large node in center is michael kearns
Green nodes: venues, e.g. large node at to right is "Machine Learning" journal
Blue nodes: paper titles, e.g. blue node above ML journal is "the strength of weak learnability" from Robert E. Schapire, 1990

Scalable Updates

Exploit sparsity of relational data in computations such as $XAR^T$
Complexity of straightforward implementation of updates for $R_k$: \begin{align*} \text{Computational: } \color{orangered}{\BigO(r^5)} && \text{Memory: } \color{orangered}{\BigO(r^4)} \end{align*}
Fortunately, $(A^TA \otimes A^TA + \lambda I )^{-1}\vect(A^T{X}_kA)$ has a lot of structure that we can exploit
By using properties of the Kronecker product and the vectorization operator we can show that updates for $R_k$ can be computed via

\[ R_k \gets V (P \hadamard U^T X_k U) V^T \]

where $A = U \Sigma V^T$ and $P_{ij} = \frac{\Sigma_{ii} \Sigma_{jj}}{\Sigma_{ii}^2\Sigma_{jj}^2 + \lambda}$
Complexity of improved updates for $R_k$: \begin{align*} \text{Computational: } \color{cornflowerblue}{\BigO(r^3)} && \text{Memory: } \color{cornflowerblue}{\BigO(r^2)} \end{align*}

Efficient Handling of Attributes

Much of the information in Linked Data is in form of attributes (literals)
New approach: Include attributes via coupled tensor-matrix factorization
\begin{align*} \ten{X} \approx \ten{R} \ttm_1 A \ttm A \quad \land \quad D \approx AV \end{align*}
Entity-latent-component-space $A$ is shared between factorization of $\ten{X}$ and $D$

Updated optimization objective for RESCAL: \[ \min_{A,\ten{R},F} \sum_k \|X_k - A R_k A^T \|^2 + \|D - AF\|^2 + \lambda_A \|A\|^2 + \ldots \]
Computational complexity of coupled updates $\color{white}{\BigO(mn + \ell)}$

Meaning, if we have 40 predicates, the coupled algorithm is 40 times faster

Tensor Factorization for Multi-Relational Learning

Relational Learning

Why Tensor Factorization?

Tensor Representation of Relational Data

RESCAL

Probabilistic Interpretation

Link Prediction using RESCAL

Similarity of Entities in RESCAL

Collective Learning - U.S. Presidents

Link Prediction Benchmarks

Learning Taxonomies on IIMB Benchmark

RESCAL-ALS Algorithm

Scalability Experiments

Factorizing YAGO2 (core)

Concluding Remarks & Future Work

Code & Tools

Thank You!

Questions?

References

References

References

Backup Slides

Linked Data & Semantic Web

Relational Models

Collective Learning

Relational Learning & Bipartite Models

Similarity of Entities in RESCAL

Entity Resolution - Cora

Data

Visualization

Scalable Updates

Efficient Handling of Attributes

Tensor Factorization for
Multi-Relational Learning

Tensor Representation
of Relational Data