Word similarity and word relatedness are fundamental to natural language processing and more generally, understanding how humans relate concepts in semantic memory. A growing number of datasets are being proposed as evaluation benchmarks, however, the heterogeneity and focus of each respective dataset makes it difficult to draw plausible conclusions as to how a unified semantic model would perform. Additionally, we want to identify the transferability of knowledge obtained from one task to another, within the same domain and across domains. Hence, this paper first presents an evaluation and comparison of eight chosen datasets tested using the best performing regression models. As a baseline, we present regression models that incorporate both lexical features and word embeddings to produce consistent and competitive results compared to the state of the art. We present our main contribution, the best performing model across seven of the eight datasets-a Gated Recurrent Siamese Network that learns relationships between lexical word definitions. A parameter transfer learning strategy is employed for the Siamese Network. Subsequently, we present a secondary contribution which is the best performing non-sequential model: an Inductive and Transductive Transfer Learning strategy for transferring decision trees within a Random Forest to a target task that is learned from only few instances. The method involves measuring semantic distance between hidden factored matrix representations of decision tree traversal matrices.