Quantcast
Channel: python Archives - Graph Database & Analytics
Viewing all 195 articles
Browse latest View live

Holiday fun with Neo4j

$
0
0

Looking for something fun to do during the holidays? Here are a few suggestions for some new cool Neo4j things that you can play around with.

A very recent addition to the Neo4j space is the JRuby library Neo4jr-social by Matthew Deiters:

Neo4jr-Social is a self contained HTTP REST + JSON interface to the graph database Neo4j. Neo4jr-Social supports simple dynamic node creation, building relationships between nodes and also includes a few common social networking queries out of the box (i.e. linkedin degrees of seperation and facebook friend suggestion) with more to come. Think of Neo4jr-Social is to Neo4j like Solr is to Lucene.

Neo4jr-social is built on top of Neo4jr-simple:

A simple, ready to go JRuby wrapper for the Neo4j graph database engine.

There’s also the Neo4j.rb JRuby bindings by Andreas Ronge which have been developed for quite a while by multiple contributors.

Staying in Ruby land, there’s also some visualization and other social network analysis stuff going on.

Looking for something in Java? Then you definitely want to take a look at jo4neo by Taylor Cowan:

Simple object mapping for neo. No byte code interweaving, just plain old reflection and plain old objects.

There’s also a blog post where Taylor shows how to model a User/Roles pattern using jo4neo.

There’s apparently a lot of work going on right now in the Django camp to enable support for SQL and NOSQL databases alike. Tobias Ivarsson (who’s the author and maintainer of the Neo4j Python bindings) recently implemented initial support for Neo4j in Django. Read his post Seamless Neo4j integration in Django for a look at what’s new.

One more recent project is the Neo4j plugin for Grails. There are already some projects out there using it. We want to make sure Neo4j is a first-class Grails backend so expect more noise in this area in the future.

You can find (some of the) projects using Neo4j on the Neo4j In The Wild page. From the front page of the Neo4j wiki you’ll find even more language bindings, tutorials and other things that will support you when playing around with Neo4j!

Happy Holidays and Happy Hacking wishes from the Neo4j team!

Want to learn more about graph databases? Click below to get your free copy of O’Reilly’s Graph Databases ebook and discover how to use graph technologies for your application today. Download My Ebook

The post Holiday fun with Neo4j appeared first on Neo4j Graph Database.


Modeling Categories in a Graph Database

$
0
0
Storing hierarchical data can be a pain when using the wrong tools.

However, Neo4j is a good fit for these kind of problems, and this post will show you an example of how it can be used.

To top it off, today it’s time to have a look at the Neo4j Python language bindings as well.

Introduction


A little background info for newcomers: Neo4j stores data as nodes and relationships, with key-value style properties on both. Relationships connect two different nodes to each other, and are both typed and directed.

Relationships can be traversed in both directions (the direction can also be ignored when traversing if you like). You can create any relationship types; they are identified by their name.

For a quick introduction to the Neo4j Python bindings, have a look at the Neo4j.py component site. There’s also slides and video from a PyCon 2010 presentation by Tobias Ivarsson of the Neo4j team, who also contributed the Python code for this blog post.

This blog post only contains simplified snippets of code, to get full working source code – which exposes a domain layer on top of the underlying graph data – go to:
If you take a look at a site like stackoverflow.com you will find many questions on how to store categories or, generally speaking, hierarchies in a database.

In this blog post, we’re going to look at how to implement something like what’s asked for here using Neo4j. However, using a graph database will allow us to bring the concept a bit further.

Data Model


It may come as a surprise to some readers, but even though we’re using a graph database here, we’ll use a common Entity-Relationship Diagram.

The entities we want to handle in this case are categories and products. The products holds attribute values, and we want to be able to define types and constraints on these attributes. The attributes that products can hold are defined on categories and inherited to all descendants. Products, categories and attribute types are modeled as entities, while the attributes have been modeled as relationships in this case. Categories may contain subcategories and products.

So this is the data model we end up with:



What can’t be expressed nicely in the ER-Diagram are the attribute values, as the actual names of those attributes are defined as data elsewhere in the model.

This mix of metadata and data may be a problem when using other underlying data models, but for a graph database, this is actually how it’s supposed to be used. When using an RDBMS with it’s underlying tabular model, the Entity-Attribute-Value model is a commonly suggested way of dealing with the data/metadata split. However, this solution comes with some downsides and hurts performance a lot.

That was it for the theoretical part, let’s get on to the practical stuff!

Node Space


What we want to do is to transfer the data model to the node space – that’s Neo4j lingo for a graph database instance, as it consists of nodes and relationship between nodes.

What we’ll do now is to simply convert some of the terminology from the Entity-Relationship model to the Neo4j API:
ER-model Neo4j
Entity Node
Relationship Relationship
Attribute Property
That wasn’t too hard, was it?! Let’s put some example data in the model and have a look at it (click for big image):



The image above gives an overview; the rest of the post will get into implementation details and good practices that can be useful.

Getting to the details


When a new Neo4j database is created, it already contains one single node, known as the reference node. This node can be used as a main entry point to the graph. Next, we’ll show a useful pattern for this.

In most real applications you’ll want multiple entry points to the graph, and this can be done by creating subreference nodes. A subreference node is a node that is connected to the reference node with a special relationship type, indicating it’s role. In this case, we’re interested in having a relationship to the category root and one to the attribute types. So this is how the subreference structure looks in the node space:



Now someone may ask: Hey, shouldn’t the products have a subreference node as well?! But, for two reasons, I don’t think so:
    1. It’s redundant as we can find them by traversing from the category root.
    2. If we want to find a single product, it’s more useful to index them on a property, like their name. We’ll save that one for another blog post, though.

Note that when using a graph database, the graph structure lends itself well to indexing.

As the subreference node pattern is such a nice thing, we added it to the utilities. The node is lazily created the first time it’s requested. Here’s what’s needed to create an ATTRIBUTE_ROOT typed subreference node:

import neo4j
from neo4j.util import Subreference
attribute_subref_node = Subreference.Node.ATTRIBUTE_ROOT(graphdb)

… where graphdb is the current Neo4j instance. Note that the subreference node itself doesn’t have a “node type”, but is implicitly given a type by the ATTRIBUTE_ROOT typed relationship leading to the node.

The next thing we need to take care of is connecting all attribute type nodes properly with the subreference node.

This is simply done like this:

attribute_subref_node.ATTRIBUTE_TYPE(new_attribute_type_node)

Always doing like this when adding a new attribute type makes the nodes easily discoverable from the ATTRIBUTE_ROOT subreference node:



Similarly, we want to have a subreference node for categories, and in this case we also want to add a property to the subreference node. Here’s how this looks in Python code:

category_subref_node = Subreference.Node.CATEGORY_ROOT(graphdb, Name="Products")

This is how it will look after we added the first actual category, namely the “Electronics” one:



Now let’s see how to add subcategories.

Basically, this is what’s needed to create a subcategory in the node space, using the SUBCATEGORY relationship type:
computers_node = graphdb.node(Name="Computers")
electronics_node.SUBCATEGORY(computers_node)




To fetch all the direct subcategories under a category and print their names, all we have to do is to fetch the relationships of the corresponding type and use the node at the end of the relationship, just like this:

for rel in category_node.SUBCATEGORY.outgoing:
  print rel.end['Name']

There’s not much to say regarding products, the product nodes are simply connected to one category node using a PRODUCT relationship:



But how to get all products in a category, including all it’s subcategories? Here it’s time to use a traverser, defined by the following code:

class SubCategoryProducts(neo4j.Traversal):
  types = [neo4j.Outgoing.SUBCATEGORY, neo4j.Outgoing.PRODUCT]
  def isReturnable(self, pos):
      if pos.is_start: return False
      return pos.last_relationship.type == 'PRODUCT'

This traverser will follow outgoing relationships for both SUBCATEGORY and PRODUCT type relationships. It will filter out the starting node and only return nodes reached over a PRODUCT relationship.

This is then how to use it:

for prod in SubCategoryProducts(category_node):
  print prod['Name']

At the core of our example is the way it adds attribute definitions to the categories. Attributes are modeled as relationships between a category and an attribute type node. The attribute type node holds information on the type – in our case only a name and a unit – while the relationship holds the name, a “required” flag and, in some cases, a default value as well.

From the viewpoint of a single category, this is how it is connected to attribute types, thus defining the attributes that can be used by products down that path in the category tree:



Our last code sample will show how to fetch all attribute definitions which apply to a product. Here we’ll define a traverser named categories which will find all categories for a product. The traverser is used by the attributes function, which will yield all the ATTRIBUTE relationship.

A simple example of usage is also included in the code:

def attributes(product_node):
  """Usage:
  for attr in attributes(product):
      print attr['Name'], " of type ", attr.end['Name']
  """
  for category in categories(product_node):
      for attr in category.ATTRIBUTE:
          yield attr

class categories(neo4j.Traversal):
  types = [neo4j.Incoming.PRODUCT, neo4j.Incoming.SUBCATEGORY]
  def isReturnable(self, pos):
      return not pos.is_start

Let’s have a final look at the attribute types. Seen from the viewpoint of an attribute type node things look this way:



As the image above shows, it’s really simple to find out which attributes (or categories) are using a specific attribute type. This is typical when working with a graph database: connect the nodes according to your data model, and you’ll be fine.

Wrap-up


Hopefully you had some fun diving into a bit of graph database thinking! These should probably be your next steps forward:



Want to learn more about graph databases? Click below to get your free copy of O’Reilly’s Graph Databases ebook and discover how to use graph technologies for your application today.

The post Modeling Categories in a Graph Database appeared first on Neo4j Graph Database.

From neo4j import GraphDatabase

$
0
0

From neo4j import GraphDatabase

First of all, we’re really sorry. We have been saying that Python support for the embedded database is coming in “a few weeks” or “next month” for over half a year now, and so far, you have waited patiently, and you have waited in vain.
We promise to not give promises we can’t keep again, and we hope ya’ll know that we love Python just as much as the next guy.
Now, finally, the absolutely latest and greatest version of the embedded Neo4j database works in Python, and we’ve put a bunch of effort into ensuring it stays that way. The new bindings are constantly tested against each new build of the database, and are set up to deploy to PyPi as often as we all like them to.
The API is very similar to the original neo4j.py API. We also borrowed some of the API methods introduced in neo4j-rest-client, to make switching between the two as easy as possible.
This is a first release, so there may still be bugs lurking – please make sure to report any that you encounter and ideas for improvements to the project issue tracker!
Quick look
Here is a quick look at how you use neo4j-embedded.
from neo4j import GraphDatabase

db = GraphDatabase(‘/my/db/location’)

with db.transaction:
    oscar = db.node(name=’Oscar Wilde’)
    jacob = db.node(name=’Jacob’)

    # Create a relationship
    oscar.impressed_by_blogging_skills_of(jacob)
db.shutdown()
Requirements
The new bindings are tested on CPython 2.7.2 on Windows and Linux, but should work on Python 2.6 branches as well.
You’ll need JPype installed to bridge the gap to Java land, details about how to set that up can be found in the installation instructions.
Jython support is on the todo list, but because Neo4j uses Java’s ServiceLoader API (which does not currently work in Jython) it will have to wait until we find a good workaround.
Getting started
Full instructions for how to install and get started can be found in the Neo4j Manual. For feedback, hints and contributions, don’t hesitate to ask on the Neo4j Forums.
Happy Hacking!
Heart symbol from http://dryicons.com. Want to learn more about graph databases? Click below to get your free copy of O’Reilly’s Graph Databases ebook and discover how to use graph technologies for your application today. Download My Ebook

The post From neo4j import GraphDatabase appeared first on Neo4j Graph Database.

Py2neo 1.6

$
0
0

Py2neo 1.6

Hi all, It’s a weird thought that although Neo4j has been part of my life for well over two years, I’ve only met in person a few of the people that I know from its community. Thanks to the wonderful generosity of Emil and co though, that will soon change as I’ll be jetting over to San Francisco for Graph Connect, giving me a chance to meet both the Neo guys and my fellow driver authors. The timing is also pretty good as I’ve just released Py2neo 1.6 which introduces one of the most requested features of recent months: node labels. As most Neophiles will know, labels allow nodes to be tagged with keywords that can be used for categorisation and indexing. Adding labels to a node in Py2neo is straightforward with the add_labels method:
>>> from py2neo import neo4j, node
>>> graph_db = neo4j.GraphDatabaseService()
>>> alice, = graph_db.create(node(name=”Alice”))
>>> alice.add_labels(“Female”, “Human”)
The set_labels and remove_labels methods similarly allow labels to be replaced or deleted and get_labels returns the set of labels currently defined. The GraphDatabaseService.find method can then be used to gather up all the nodes with a particular label and iterate through them: >>> for node in graph_db.find(“Human”):
…     print node[“name”]Aside from labels, the biggest change in the 1.6 release is a complete rewrite of the underlying HTTP/REST mechanism. In order to achieve better support for streaming responses, it was necessary to rip out the simple mechanism that had been with Py2neo since the early days and build a more comprehensive layer from the ground up. Incremental JSON decoding is a key feature that allows server responses to be handled step by step instead of only after the response has been completely received. This new layer has grown into a separate project, HTTPStream, but is embedded into Py2neo to avoid dependencies. But what advantages does HTTPStream give to Py2neo-based applications? Well, it’s now possible to incrementally handle the results of Cypher queries and batch requests as well as those from a few other functions, such as match. These functions now provide result iterators instead of full result objects. Here’s an example of a Cypher query streamed against the data inserted above: >>> query = neo4j.CypherQuery(graph_db, “MATCH (being:Human) RETURN being”) >>> for result in query.stream(): …     print result.being[“name”] Neotool has received some love too. The command line variant of Py2neo now fully supports Unicode, provides facilities for Cypher execution, Geoff insertion and XML conversion as well as options for HTTP authentication. The diagram below shows the conversion paths now available:
 
For a quick demonstration of the XML conversion feature in action, check out this web service. Another good place for a good neotool overview is my recent lightning talk from the London Graph Café.So what isn’t included? Cypher transactions are the main omission from the Neo4j 2.0 feature set and have been deliberately left out until a few major technical challenges have been overcome. Other than that, Py2neo 1.6 is the perfect companion to Neo4j 2.0 and well worth a try! Py2neo 1.6 is available from PyPI, the source is hosted on GitHub and the documentation at ReadTheDocs. For a full list of changes, have a peek at the release notes. /Nigel Small (@technige)
Want to learn more about graph databases? Click below to get your free copy of O’Reilly’s Graph Databases ebook and discover how to use graph technologies for your application today. Download My Ebook

The post Py2neo 1.6 appeared first on Neo4j Graph Database.

Building a Python Web Application Using Flask and Neo4j

$
0
0
Flask, a popular Python web framework, has many tutorials available online which use an SQL database to store information about the website’s users and their activities.

While SQL is a great tool for storing information such as usernames and passwords, it is not so great at allowing you to find connections among your users for the purposes of enhancing your website’s social experience.

The quickstart Flask tutorial builds a microblog application using SQLite. 

In my tutorial, I walk through an expanded, Neo4j-powered version of this microblog application that uses py2neo, one of Neo4j’s Python drivers, to build social aspects into the application. This includes recommending similar users to the logged-in user, along with displaying similarities between two users when one user visits another user’s profile.

My microblog application consists of Users, Posts, and Tags modeled in Neo4j:

http://i.imgur.com/9Nuvbpz.png


With this graph model, it is easy to ask questions such as:

“What are the top tags of posts that I’ve liked?”

MATCH (me:User)-[:LIKED]->(post:Post)<-[:TAGGED]-(tag:Tag)
WHERE me.username = 'nicole'
RETURN tag.name, COUNT(*) AS count
ORDER BY count DESC

“Which user is most similar to me based on tags we’ve both posted about?”

MATCH (me:User)-[:PUBLISHED]->(:Post)<-[:TAGGED]-(tag:Tag), 
(other:User)-[:PUBLISHED]->(:Post)<-[:TAGGED]-(tag)
WHERE me.username = 'nicole' AND me <> other
WITH other,
      COLLECT(DISTINCT tag.name) AS tags,
 
    COUNT(DISTINCT tag) AS len
ORDER BY len DESC LIMIT 3 RETURN other.username AS similar_user, tags
Links to the full walkthrough of the application and the complete code are below.

Watch the Webinar:





Want to learn more about graph databases? Click below to get your free copy of O’Reilly’s Graph Databases ebook and discover how to use graph technologies for your application today.

The post Building a Python Web Application Using Flask and Neo4j appeared first on Neo4j Graph Database.

Polyglot Persistence Case Study: Wanderu + Neo4j + MongoDB

$
0
0
Solution Architectural Diagram of Polyglot Persistence for Wanderu between Neo4j and MongoDBEvery language and data storage solution has its strengths. After all, no single solution is most performant and cost-effective for every possible task in your application. In order to tap into the varying strengths of different data storage solutions, your application needs to take advantage of polyglot persistence. That’s exactly what Wanderu did when building their meta-search travel website – and here’s how they did it:

The Technical Challenge

Wanderu provides the ability for consumers to search for bus and train tickets, with journeys combining legs from multiple different transportation companies. The route data is stored in JSON, making a document storage engine like MongoDB a great solution for their route leg data storage. However, they also needed to be able to find the optimal path from origin to destination. This is perfect for a graph database like Neo4j, because Neo4j can understand the data relationships between different transit route legs.

Polyglot Persistence: Using MongoDB and Neo4j

Wanderu didn’t want to force MongoDB (a document-based data store) to handle graph-style relationships because the implementation would have been costly and inefficient. Instead, they used a polyglot persistence approach to capitalize on the strengths of each, deciding to use both MongoDB and Neo4j together.

Solution Architectural Diagram

Wanderu's polyglot persistence architecture between mongoDB and Neo4j

The Wanderu ticket search engine uses both MongoDB (for easy JSON document storage) and Neo4j (for efficient route calculations).

The Challenge of Sync

With the bus route legs stored in MongoDB, Wanderu had to decide whether to write application code to synchronize this information into Neo4j as a graph model or use a syncing technology to handle this automatically. Eddy Wong, CTO and Co-Founder of Wanderu, discovered the GitHub project called “mongo-connector,” which enabled Mongo’s built-in replication service to replicate data to another database. Eddy only had to write a Doc Manager for Neo4j which handled callbacks on each MongoDB insert or update operation. As new entries are added to the MongoDB OpLog, the Mongo Connector calls the Neo4j DocMgr. The Neo4j DocMgr code written by Wanderu then uses the py2neo open sourcePython library to create the corresponding nodes, properties and relationships in Neo4j. The API server then uses Node-Neo4j to send queries to the graph database. The resulting solution takes advantage of Neo4j, MongoDB, JSON, Node.js, Express.js, Mongo Connector, Python and py2neo. Polyglot persistence ensures that each of these technologies are used according to their greatest strengths. And for Wanderu, it means a better search and routing experience for their users. Read more about Wanderu’s use of Neo4j in their online case study. O’Reilly’s Graph Databases compares NoSQL database solutions and shows you how to apply graph technologies to real-world problems. Click below to get your free copy of the definitive book on graph databases and your introduction to Neo4j.

The post Polyglot Persistence Case Study: Wanderu + Neo4j + MongoDB appeared first on Neo4j Graph Database.

Cypher: LOAD JSON from URL AS Data

$
0
0
Neo4j’s query language Cypher supports loading data from CSV directly but not from JSON files or URLs. Almost every site offers some kind of API or endpoint that returns JSON and we can also query many NOSQL databases via HTTP and get JSON responses back. It’s quite useful to be able to ingest document structured information from all those different sources into a more usable graph model. I want to show here that retrieving that data and ingesting it into Neo4j using Cypher is really straightforward and takes only little effort. As Cypher is already pretty good at deconstructing nested documents, it’s actually not that hard to achieve it from a tiny program. I want to show you today how you can achieve this from Python, Javascript, Ruby, Java, and Bash.

The Domain: Stack Overflow

Being a developer I love Stack Overflow; just crossed 20k reputation by only answering 1100 Neo4j-related questions :). You can do that too. That’s why I want to use Stack Overflow users with their questions, answers, comments and tags as our domain today.
Loading JSON Data from a Stack Overflow URL
Pulling Stack Overflow information into a graph model allows me to find interesting insights, like:
    • What are the people asking or answering about Neo4j also interested in
    • How is their activity distributed across tags and between questions, answers and comments
    • Which kinds of questions attract answers and which don’t
    • Looking at my own data, which answers to what kinds of questions got the highest approval rates
We need some data and a model suited to answer those questions.

Stack Overflow API

Stack Overflow offers an API to retrieve that information, it’s credential protected as usual, but there is the cool option to pre-generate an API-URL that encodes your secrets and allows you to retrieve data without sharing them. You can still control some parameters like tags, page size and page-number though. With this API-URL below, we load the last 10 questions with the Neo4j tag. https://api.stackexchange.com/2.2/questions?pagesize=100&order=desc&sort=creation&tagged=neo4j&site=stackoverflow&filter=!5-i6Zw8Y)4W7vpy91PMYsKM-k9yzEsSC1_Uxlf The response should look something like this (or scroll to the far bottom).
Overall Response Structure
{ "items": [{
	"question_id": 24620768,
	"link": "http://stackoverflow.com/questions/24620768/neo4j-cypher-query-get-last-n-elements",
	"title": "Neo4j cypher query: get last N elements",
 	"answer_count": 1,
 	"score": 1,
 	.....
 	"creation_date": 1404771217,
 	"body_markdown": "I have a graph....How can I do that?",
 	"tags": ["neo4j", "cypher"],
 	"owner": {
 		"reputation": 815,
 		"user_id": 1212067,
        ....
 		"link": "http://stackoverflow.com/users/1212067/"
 	},
 	"answers": [{
 		"owner": {
 			"reputation": 488,
 			"user_id": 737080,
 			"display_name": "Chris Leishman",
            ....
 		},
 		"answer_id": 24620959,
 		"share_link": "http://stackoverflow.com/a/24620959",
        ....
 		"body_markdown": "The simplest would be to use an ... some discussion on this here:...",
 		"title": "Neo4j cypher query: get last N elements"
 	}]
 }

Graph Model

So what does the graph-model look like? We can develop it by looking at the questions we want to answer and the entities and relationships they refer to. We need this model upfront to know where to put our data when we insert it into the graph. After all we don’t want to have loose ends.
Discover How to LOAD JSON Files from URLs AS Graph-Ready Data

Cypher Import Statement

The Cypher query to create that domain is also straightforward. You can deconstruct maps with dot notation map.key and arrays with slices array[0..4]. You’d use UNWIND to convert collections into rows and FOREACH to iterate over a collection with update statements. To create nodes and relationships we use MERGE and CREATE commands. My friend Mark just published a blog post explaining in detail how you apply these operations to your data. The JSON response that we retrieved from the API call is passed in as a parameter {json} to the Cypher statement, which we alias with the more handy data identifier. Then we use the aforementioned means to extract the relevant information out of the data collection of questions, treating each as q. For each question we access the direct attributes but also related information like the owner or contained collections like tags or answers which we deconstruct in turn.
WITH {json} as data
UNWIND data.items as q
MERGE (question:Question {id:q.question_id}) ON CREATE
  SET question.title = q.title, question.share_link = q.share_link, question.favorite_count = q.favorite_count

MERGE (owner:User {id:q.owner.user_id}) ON CREATE SET owner.display_name = q.owner.display_name
MERGE (owner)-[:ASKED]->(question)

FOREACH (tagName IN q.tags | MERGE (tag:Tag {name:tagName}) MERGE (question)-[:TAGGED]->(tag))
FOREACH (a IN q.answers |
   MERGE (question)<-[:ANSWERS]-(answer:Answer {id:a.answer_id})
   MERGE (answerer:User {id:a.owner.user_id}) ON CREATE SET answerer.display_name = a.owner.display_name
   MERGE (answer)<-[:PROVIDED]-(answerer)
)

Calling Cypher with the JSON parameters

To pass in the JSON to Cypher we have to programmatically call the Cypher endpoint of the Neo4j server, which can be done via one of the many drivers for Neo4j or manually by POSTing the necessary payload to Neo4j. We can also call the Java API. So without further ado here are our examples for a selection of different languages, drivers and APIs:

Python

We use the py2neo driver by Nigel Small to execute the statement:
import os
import requests
from py2neo import neo4j

# Connect to graph and add constraints.
neo4jUrl = os.environ.get('NEO4J_URL',"http://localhost:7474/db/data/")
graph = neo4j.GraphDatabaseService(neo4jUrl)

# Add uniqueness constraints.
neo4j.CypherQuery(graph, "CREATE CONSTRAINT ON (q:Question) ASSERT q.id IS UNIQUE;").run()

# Build URL.
apiUrl = "https://api.stackexchange.com/2.2/questions...." % (tag,page,page_size)
# Send GET request.
json = requests.get(apiUrl, headers = {"accept":"application/json"}).json()

# Build query.
query = """
UNWIND {json} AS data ....
"""

# Send Cypher query.
neo4j.CypherQuery(graph, query).run(json=json)
We also did something similar with getting tweets from the Twitter search API into Ne4oj for the OSCON conference.

Javascript

For JavaScript I want to show how to call the transactional Cypher endpoint directly, by just using the request node module.
var r=require("request");
var neo4jUrl = (env["NEO4J_URL"] || "http://localhost:7474") + "/db/data/transaction/commit";

function cypher(query,params,cb) {
  r.post({uri:neo4jUrl,
          json:{statements:[{statement:query,parameters:params}]}},
         function(err,res) { cb(err,res.body)})
}

var query="UNWIND {json} AS data ....";
var apiUrl = "https://api.stackexchange.com/2.2/questions....";

r.get({url:apiUrl,json:true,gzip:true}, function(err,res,json) {
  cypher(query,{json:json},function(err, result) { console.log(err, JSON.stringify(result))});
});

Java

With Java I want to show how to use the Neo4j embedded API to execute Cypher.
import org.apache.http.*;
import org.codehaus.jackson.map.ObjectMapper;
import org.neo4j.graphdb.*;

// somewhere in your application-scoped setup code
ObjectMapper mapper = new ObjectMapper();
HttpClient http = HttpClients.createMinimal();
GraphDatabaseService db = new GraphDatabaseFactory().newEmbeddedGraphDatabase(PATH);

// execute API request and parse response as JSON
HttpResponse response = http.execute(new HttpGet( apiUrl ));
Map json = mapper.readValue(response.getEntity().getContent(), Map.class)

// execute Cypher
String query = "UNWIND {json} AS data ....";
db.execute(query, singletonMap("json",json));

// application scoped shutdown, or JVM-shutdown-hook
db.shutdown();

Ruby

Using the neo4j-core Gem, we can talk to Neo4j Server or embedded (using jRuby) by just changing a single line of configuration.
require 'rubygems'
require 'neo4j-core'
require 'rest-client'
require 'json'

QUERY="UNWIND {json} AS data ...."
API = "https://api.stackexchange.com/2.2/questions...."

res = RestClient.get(API)
json = JSON.parse(res.to_str)

session = Neo4j::Session.open
session.query(QUERY, json: json)

Bash

Bash is of course most fun, as we have to do fancy text substitutions to make this work.
load_json.sh
#!/bin/bash
echo "Usage load_json.sh 'http://json.api.com?params=values' import_json.cypher"
echo "Use {data} as parameter in your query for the JSON data"
JSON_API="$1"
QUERY=`cat "$2"` # cypher file
JSON_DATA=`curl --compress -s -H accept:application/json -s "$JSON_API"`
POST_DATA="{\"statements\":[{\"statement\": \"$QUERY\", \"parameters\": {\"data\":\"$JSON_DATA\"}}]}"
DB_URL=${NEO4J_URL-http://localhost:7474}
curl -i -H accept:application/json -H content-type:application/json -d "$POST_DATA" -XPOST 
"$DB_URL/db/data/transaction/commit"

Example Use-Cases

Here are some simple example queries that I now can run on top of this imported dataset. To not overload this blog post with too much information, we’ll answer our original questions in Part 2.

Find the User who was most active

MATCH (u:User)
OPTIONAL MATCH (u)-[:PROVIDED|ASKED|COMMENTED]->()
RETURN u,count(*)
ORDER BY count(*) DESC
LIMIT 5

Find co-used Tags

MATCH (t:Tag)
OPTIONAL MATCH (t)<-[:TAGGED]-(question)-[:TAGGED]->(t2)
RETURN t.name,t2.name,count(distinct question) as questions
ORDER BY questions DESC
MATCH (t:Tag)<-[r:TAGGED]->(question)
RETURN t,r,question

Conclusion

So as you can see, even with LOAD JSON not being part of the language, it’s easy enough to retrieve JSON data from an API endpoint and deconstruct and insert it into Neo4j by just using plain Cypher. Accessing web-APIs is a simple task in all stacks and languages and JSON as transport format is ubiquitous. Fortunately, the unfortunately lesser known capabilities of Cypher to deconstruct complex JSON documents allow us to quickly turn them into a really nice graph structure without duplication of information and rich relationships. I encourage you to try it with your favorite web-APIs and send us your example with graph model, Cypher import query and 2-3 use-case queries that reveal some interesting insights into the data you ingested to content@neotechnology.com. Want to learn more about graph databases? Click below to get your free copy of O’Reilly’s Graph Databases ebook and start building better apps powered by graph technologies.

Appendix: Stack Overflow Response

{
	"items": [{
		"answers": [{
			"owner": {
				"reputation": 488,
				"user_id": 737080,
				"user_type": "registered",
				"accept_rate": 45,
				"profile_image": "https://www.gravatar.com/avatar/
ffa6eed1e8a9c1b2adb37ca88c07dede?s=128&d=identicon&r=PG",
				"display_name": "Chris Leishman",
				"link": "http://stackoverflow.com/users/737080/chris-leishman"
			},
			"tags": [],
			"comment_count": 0,
			"down_vote_count": 0,
			"up_vote_count": 2,
			"is_accepted": false,
			"score": 2,
			"last_activity_date": 1404772223,
			"creation_date": 1404772223,
			"answer_id": 24620959,
			"question_id": 24620768,
			"share_link": "http://stackoverflow.com/a/24620959",
			"body_markdown": "The simplest would be to use an ... some discussion on this here:
 http://docs.neo4j.org/chunked/stable/cypherdoc-linked-lists.html)",
			"link": "http://stackoverflow.com/questions/24620768/neo4j-cypher-query-get-last-n-elements/24620959#24620959",
			"title": "Neo4j cypher query: get last N elements"
		}],
		"tags": ["neo4j", "cypher"],
		"owner": {
			"reputation": 815,
			"user_id": 1212067,
			"user_type": "registered",
			"accept_rate": 73,
			"profile_image": "http://i.stack.imgur.com/nnyS1.png?s=128&g=1",
			"display_name": "C&#233;sar Garc&#237;a Tapia",
			"link": "http://stackoverflow.com/users/1212067/c%c3%a9sar-garc%c3%ada-tapia"
		},
		"comment_count": 0,
		"delete_vote_count": 0,
		"close_vote_count": 0,
		"is_answered": true,
		"view_count": 14,
		"favorite_count": 0,
		"down_vote_count": 0,
		"up_vote_count": 1,
		"answer_count": 1,
		"score": 1,
		"last_activity_date": 1404772230,
		"creation_date": 1404771217,
		"question_id": 24620768,
		"share_link": "http://stackoverflow.com/q/24620768",
		"body_markdown": "I have a graph that...How can I do that?",
		"link": "http://stackoverflow.com/questions/24620768/neo4j-cypher-query-get-last-n-elements",
		"title": "Neo4j cypher query: get last N elements"
	}, {
		"tags": ["neo4j", "cypher"],
		"owner": {
			"reputation": 63,
			"user_id": 845435,
			"user_type": "registered",
			"accept_rate": 67,
			"profile_image": "https://www.gravatar.com/avatar/
610458a30958c9d336ee691fa1a87369?s=128&d=identicon&r=PG",
			"display_name": "user845435",
			"link": "http://stackoverflow.com/users/845435/user845435"
		},
		"comment_count": 0,
		"delete_vote_count": 0,
		"close_vote_count": 0,
		"is_answered": false,
		"view_count": 16,
		"favorite_count": 0,
		"down_vote_count": 0,
		"up_vote_count": 0,
		"answer_count": 0,
		"score": 0,
		"last_activity_date": 1404768987,
		"creation_date": 1404768987,
		"question_id": 24620297,
		"share_link": "http://stackoverflow.com/q/24620297",
		"body_markdown": 
"I&#39;m trying to implement a simple graph db for NYC subway................Thanks!\r\n",
		"link": "http://stackoverflow.com/questions/24620297/cypher-query-with-infinite-relationship-takes-forever",
		"title": "Cypher query with infinite relationship takes forever"
	}],
	"has_more": true,
	"quota_max": 300,
	"quota_remaining": 205
}

The post Cypher: LOAD JSON from URL AS Data appeared first on Neo4j Graph Database.

Making a Difference: The Public Neo4j-Users Slack Group

$
0
0
Learn More about the Public (and Free) Neo4j-Users Slack Group and Sign Up TodayWe’ve been asked several time in the past to open a neo4j-users Slack group for the many enthusiastic people in the Neo4j user community. Now, that Slack group is a reality. This group is meant to be a hip alternative to IRC for quick questions and feedback. StackOverflow is still the place to go for canonical, persistent answers and also providing them. We’ll also posting interesting StackOverflow questions and answers to the Slack group. So, yesterday we gave it a go and were totally overwhelmed by the users pouring in. During the first hours we had about 100 sign-ups per hour, making it 600 members in the first 6 hours. Wow, that was impressive. And everyone was very thankful and enthusiastic. New channels were created and people instantly began helping each other, discussing questions and giving feedback. If you want to join now, please sign up here, and then come back to finish reading this article. Driver authors and community contributors are also all there to help you with specific questions. In addition, many Neo Technology employees participate as well to help you out with questions or ideas.
Screenshot of the new Neo4j-users Slack group

Slack Channels

For the best experience just join the channels that your are interested in, and for whose topics you can provide help. No need to be everywhere. Here is just a quick overview to get an idea:
Purpose Channels

General Discussions

#general

General Help & Questions

#help

Help in different areas of interest

#help-newbies, #help-install, #help-cypher, #help-import, #help-modeling

Language / Driver specific questions

#neo4j-(java, python ,ruby, dotnet, php, golang, sdn, rstats), #neo4j-spatial, #neo4j-unix

Share your project or story

#share-your-story

Organizing Neo4j Meetups

#organize-meetups

Neo4j Trainers (private)

#trainers

Feedback and Ideas for Neo4j

#product-feedback

News from @Neo4j Twitter

#twitter

Latest Neo4j Events

#events

Banter and Fun

#rants

Of course we added our Neo4j-Slack integration so that you can explore channels and users and get recommendations on channels that might be interesting for you with the /graph cypher [query] slash command.
/graph cypher MATCH (c:Channel)<-[:MEMBER_OF]-()
              RETURN c.name, count(*) AS users
              ORDER BY users DESC LIMIT 10;

    | c.name             | users
----+--------------------+-------
  1 | general            |   645
  2 | help-cypher        |    64
  3 | neo4j-java         |    57
  4 | help-modeling      |    53
  5 | share-your-project |    42
  6 | neo4j-python       |    36
  7 | help               |    34
  8 | product-feedback   |    33
  9 | events             |    30
 10 | organize-meetups   |    30
You can use that too, to try tiny Cypher snippets, like this:
/graph cypher unwind range(1,10) as n return n % 2, count(*), collect(n)

  | n % 2 | count(*) | collect(n)
--+-------+----------+------------------
1 |   1.0 |        5 | [1, 3, 5, 7, 9]
2 |   0.0 |        5 | [2, 4, 6, 8, 10]

Slack Experience

The Slack experience has been pretty good so far. It was no issue processing 5,000 total invites (which were a pain to export from Google Groups) and so far interaction has had no hiccups. It is unfortunate that people can’t sign up for public Slack groups on their own, and that signing into another team on the Desktop client is a bit of a hassle. It would be cool if there was another ordering mechanism for channel names other than just alphabetic, and it was a bit tricky to get a good channel structure that doesn’t overlap. Also, the missing history for non-paid Slack teams is sad. I hope that Slack will provide a good solution for open source projects that just want to engage with their users in a better way. Despite these few issues, however, Slack is a great tool for working with our community and we hope you’ll join the discussion. Sign up here to join the Neo4j-users Slack group today. Want to catch up with the rest of the Neo4j community? Click below to get your free copy of the Learning Neo4j ebook and catch up to speed with the world’s leading graph database.

The post Making a Difference: The Public Neo4j-Users Slack Group appeared first on Neo4j Graph Database.


From the Neo4j Community: July 2015

$
0
0
Learn What Videos, Slides and Articles the Neo4j Community Has Been Publishing in the Month of JulyIt’s apparent that Neo4j graphistas have been hard at work this summer, with lots of awesome videos, slides and articles being published. Here are some of our favorites that show off how amazing the community really is. If you would like to see more posts from the graphista community, follow us on Twitter and use the #Neo4j hashtag to be featured in September’s “From the Community” blog post. Videos: Slides: Articles: Want to see more of the best from the graph databases community? Click below to register for GraphConnect San Francisco on October 21, 2015 at Pier 27 and meet more hackers, developers, architects and data professionals from the Neo4j ecosystem.

The post From the Neo4j Community: July 2015 appeared first on Neo4j Graph Database.

Import 10M Stack Overflow Questions into Neo4j In Just 3 Minutes

$
0
0
Learn How We Imported 10 Million Stack Overflow Questions into Neo4j in Just 3 MinutesI want to demonstrate how you can take the Stack Overflow dump and quickly import it into Neo4j. After that, you’re ready to start querying the graph for more insights and then possibly build an application on top of that dataset. If you want to follow along, we have a running (readonly) Neo4j server with the data available here. But first things first: Congratulations to Stack Overflow for being so awesome and helpful. They’ve just recently announced that over ten million programming questions (and counting) have been answered on their site. (They’re also doing a giveaway around the #SOreadytohelp hashtag. More on that below.) Without the Stack Overflow platform, many questions around Neo4j could’ve never been asked nor answered. We’re still happy that we started to move away from Google Groups for our public user support. The Neo4j community on Stack Overflow has grown a lot, as have the volume of questions there.
Stack Overflow Has Answered 10 million questions

(and it is a graph)

Importing the Stack Overflow Data into Neo4j

Importing the millions of Stack Overflow questions, users, answers and comments into Neo4j has been a long-time goal of mine. One of the distractions that kept me from doing it was answering many of the 8,200 Neo4j questions out there. Two weeks ago, Damien at Linkurious pinged me in our public Slack channel. He asked about Neo4j’s import performance for ingesting the full Stack Exchange data dump into Neo4j. After a quick discussion, I pointed him to Neo4j’s CSV import tool, which is perfect for the task as the dump consists of only relational tables wrapped in XML.
stack-overflow-graph
So Damien wrote a small Python script to extract the CSV from XML and with the necessary headers the neo4j-import tool did the grunt work of creating a graph out of huge tables. You can find the script and instructions on GitHub here. Importing the smaller Stack Exchange community data only takes a few seconds. Amazingly, the full Stack Overflow dump with users, questions and answers takes 80 minutes to convert back to CSV and then only 3 minutes to import into Neo4j on a regular laptop with an SSD. Here is how we did it:

Download Stack Exchange Dump Files

First, we downloaded the dump files from the Internet archive for the Stack Overflow community (total 11 GB) into a directory:
    • 7.3G stackoverflow.com-Posts.7z
    • 576K stackoverflow.com-Tags.7z
    • 154M stackoverflow.com-Users.7z
The other data could be imported separately if we wanted to:
    • 91M stackoverflow.com-Badges.7z
    • 2.0G stackoverflow.com-Comments.7z
    • 36M stackoverflow.com-PostLinks.7z
    • 501M stackoverflow.com-Votes.7z

Unzip the .7z Files

for i in *.7z; do 7za -y -oextracted x $i; done
This extracts the files into an extracted directory and takes 20 minutes and uses 66GB on disk.

Clone Damien’s GitHub repository

The next step was to clone Damien’s GitHub repo:
git clone https://github.com/mdamien/stackoverflow-neo4j
Note: This command uses Python 3, so you have to install xmltodict.
sudo apt-get install python3-setuptools
easy_install3 xmltodict

Run the XML-to-CSV Conversion

After that, we ran the conversion of XML to CSV.
python3 to_csv.py extracted
The conversion ran for 80 minutes on my system and resulted in 9.5GB CSV files, which were compressed to 3.4G. This is the data structure imported into Neo4j. The header lines of the CSV files provide the mapping. Nodes:
posts.csv
postId:ID(Post),title,postType:INT,createdAt,score:INT,views:INT,
answers:INT,comments:INT,favorites:INT,updatedAt,body

users.csv userId:ID(User),name,reputation:INT,createdAt,accessedAt,url,location,
views:INT,upvotes:INT,downvotes:INT,age:INT,accountId:INT
tags.csv
tagId:ID(Tag),count:INT,wikiPostId:INT
Relationships:
posts_answers.csv:ANSWER   -> :START_ID(Post),:END_ID(Post)
posts_rel.csv:PARENT_OF    -> :START_ID(Post),:END_ID(Post)
tags_posts_rel.csv:HAS_TAG -> :START_ID(Post),:END_ID(Tag)
users_posts_rel.csv:POSTED -> :START_ID(User),:END_ID(Post)

Import into Neo4j

We then used the Neo4j import tool neo/bin/neo4j-import to ingest Posts, Users, Tags and the relationships between them.
../neo/bin/neo4j-import \
--into ../neo/data/graph.db \
--id-type string \
--nodes:Post csvs/posts.csv \
--nodes:User csvs/users.csv \
--nodes:Tag csvs/tags.csv \
--relationships:PARENT_OF csvs/posts_rel.csv \
--relationships:ANSWER csvs/posts_answers.csv \
--relationships:HAS_TAG csvs/tags_posts_rel.csv \
--relationships:POSTED csvs/users_posts_rel.csv
The actual import only takes 3 minutes, creating a graph store of 18 GB.
IMPORT DONE in 3m 48s 579ms. Imported:
  31138559 nodes
  77930024 relationships
  260665346 properties

Neo4j Configuration

We then wanted to adapt Neo4j’s config in conf/neo4j.properties to increase the dbms.pagecache.memory option to 10G. We also edited the conf/neo4j-wrapper.conf to provide some more heap, like 4G or 8G. Then we started the Neo4j server with ../neo/bin/neo4j start

Adding Indexes

We then had the option of running the next queries either directly in Neo4j’s server UI or on the command-line with ../neo/bin/neo4j-shell which connects to the running server. Here’s how much data we had in there:
neo4j-sh (?)$ match (n) return head(labels(n)) as label, count(*);
+-------------------+
| label  | count(*) |
+-------------------+
| "Tag"  | 41719    |
| "User" | 4551115  |
| "Post" | 26545725 |
+-------------------+
3 rows
Next, we created some indexes and constraints for later use:
create index on :Post(title);
create index on :Post(createdAt);
create index on :Post(score);
create index on :Post(views);
create index on :Post(favorites);
create index on :Post(answers);
create index on :Post(score);

create index on :User(name);
create index on :User(createdAt);
create index on :User(reputation);
create index on :User(age);

create index on :Tag(count);

create constraint on (t:Tag) assert t.tagId is unique;
create constraint on (u:User) assert u.userId is unique;
create constraint on (p:Post) assert p.postId is unique;
We then waited for the indexes to be finished.
schema await
Please note: Neo4j as a graph database wasn’t originally built for these global-aggregating queries. That’s why the responses are not instant.

Getting Insights with Cypher Queries

Below are just some of the insights we gleaned from the Stack Overflow data using Cypher queries:

The Top 10 Stack Overflow Users

match (u:User) 
with u,size( (u)-[:POSTED]->()) as posts order by posts desc limit 10 
return u.name, posts;
+---------------------------+
| u.name            | posts |
+---------------------------+
| "Jon Skeet"       | 32174 |
| "Gordon Linoff"   | 20989 |
| "Darin Dimitrov"  | 20871 |
| "BalusC"          | 16579 |
| "CommonsWare"     | 15493 |
| "anubhava"        | 15207 |
| "Hans Passant"    | 15156 |
| "Martijn Pieters" | 14167 |
| "SLaks"           | 14118 |
| "Marc Gravell"    | 13400 |
+---------------------------+
10 rows
7342 ms

The Top 5 tags That Jon Skeet Used in Asking Questions

It seems he never really asked questions, but only answered. :)
match (u:User)-[:POSTED]->()-[:HAS_TAG]->(t:Tag) 
where u.name = "Jon Skeet" 
return t,count(*) as posts order by posts desc limit 5;
+------------------------------------------------+
| t                                      | posts |
+------------------------------------------------+
| Node[31096861]{tagId:"c#"}             | 14    |
| Node[31096855]{tagId:".net"}           | 7     |
| Node[31101268]{tagId:".net-4.0"}       | 4     |
| Node[31118174]{tagId:"c#-4.0"}         | 4     |
| Node[31096911]{tagId:"asp.net"}        | 3     |
+------------------------------------------------+
10 rows
36 ms

The Top 5 Tags that BalusC Answered

match (u:User)-[:POSTED]->()-[:HAS_TAG]->(t:Tag) 
where u.name = "BalusC" 
return t.tagId,count(*) as posts order by posts desc limit 5;

+------------------------+
| t.tagId        | posts |
+------------------------+
| "java"         | 5     |
| "jsf"          | 3     |
| "managed-bean" | 2     |
| "eclipse"      | 2     |
| "cdi"          | 2     |
+------------------------+
5 rows
23 ms

How am I Connected to Darin Dimitrov

MATCH path = allShortestPaths(
     (u:User {name:"Darin Dimitrov"})-[*]-(me:User {name:"Michael Hunger"}))
RETURN path;
Result Visualization in the Neo4j Browser

Result Visualisation in Neo4j Browser

Which Mark Answered the Most Questions about neo4j?

MATCH (u:User)-[:POSTED]->(answer)<-[:PARENT_OF]-()-[:HAS_TAG]-(:Tag {tagId:"neo4j"}) 
WHERE u.name like "Mark %" 
RETURN u.name, u.reputation,u.location,count(distinct answer) AS answers
ORDER BY answers DESC;

+--------------------------------------------------------------------------+
| u.name                 | u.reputation | u.location             | answers |
+--------------------------------------------------------------------------+
| "Mark Needham"         | 1352         | "United Kingdom"       | 36      |
| "Mark Leighton Fisher" | 4065         | "Indianapolis, IN"     | 3       |
| "Mark Byers"           | 377313       | "Denmark"              | 2       |
| "Mark Whitfield"       | 899          | <null>                 | 1       |
| "Mark Wojciechowicz"   | 1473         | <null>                 | 1       |
| "Mark Hughes"          | 586          | "London, UK"           | 1       |
| "Mark Mandel"          | 859          | "Melbourne, Australia" | 1       |
| "Mark Jackson"         | 56           | "Atlanta, GA"          | 1       |
+--------------------------------------------------------------------------+
8 rows
38 ms
Top 20 paths rendered as graph

Top 20 paths rendered as graph

The Top 5 Tags of All Time

match (t:Tag) 
with t order by t.count desc limit 5 
return t.tagId, t.count;
+------------------------+
| t.tagId      | t.count |
+------------------------+
| "javascript" | 917772  |
| "java"       | 907289  |
| "c#"         | 833458  |
| "php"        | 791534  |
| "android"    | 710585  |
+------------------------+
5 rows
30 ms

Co-occurrence of the javascript Tag

match (t:Tag {tagId:"javascript"})<-[:HAS_TAG]-()-[:HAS_TAG]->(other:Tag) 
WITH other, count(*) as freq order by freq desc limit 5
RETURN other.tagId,freq;
+----------------------+
| other.tagId | freq   |
+----------------------+
| "jquery"    | 318868 |
| "html"      | 165725 |
| "css"       | 76259  |
| "php"       | 65615  |
| "ajax"      | 52080  |
+----------------------+
5 rows

The Most Active Answerers for the neo4j Tag

Quick aside: Thank you to everyone who answered Neo4j questions!
match (t:Tag {tagId:"neo4j"})<-[:HAS_TAG]-()
       -[:PARENT_OF]->()<-[:POSTED]-(u:User) 
WITH u, count(*) as freq order by freq desc limit 10
RETURN u.name,freq;

+-------------------------------+
| u.name                 | freq |
+-------------------------------+
| "Michael Hunger"       | 1352 |
| "Stefan Armbruster"    | 760  |
| "Peter Neubauer"       | 308  |
| "Wes Freeman"          | 277  |
| "FrobberOfBits"        | 277  |
| "cybersam"             | 277  |
| "Luanne"               | 235  |
| "Christophe Willemsen" | 190  |
| "Brian Underwood"      | 169  |
| "jjaderberg"           | 161  |
+-------------------------------+
10 rows
45 ms

Where Else Were the Top Answerers Also Active?

MATCH (neo:Tag {tagId:"neo4j"})<-[:HAS_TAG]-()
      -[:PARENT_OF]->()<-[:POSTED]-(u:User) 
WITH neo,u, count(*) as freq order by freq desc limit 10
MATCH (u)-[:POSTED]->()<-[:PARENT_OF]-(p)-[:HAS_TAG]->(other:Tag)
WHERE NOT (p)-[:HAS_TAG]->(neo)
WITH u,other,count(*) as freq2 order by freq2 desc 
RETURN u.name,collect(distinct other.tagId)[1..5] as tags;


+----------------------------------------------------------------------------------------+
| u.name                 | tags                                                          |
+----------------------------------------------------------------------------------------+
| "cybersam"             | ["java","javascript","node.js","arrays"]                      |
| "Luanne"               | ["spring-data-neo4j","java","cypher","spring"]                |
| "Wes Freeman"          | ["go","node.js","java","php"]                                 |
| "Peter Neubauer"       | ["graph","nosql","data-structures","java"]                    |
| "Brian Underwood"      | ["ruby-on-rails","neo4j.rb","ruby-on-rails-3","activerecord"] |
| "Michael Hunger"       | ["spring-data-neo4j","nosql","cypher","graph-databases"]      |
| "Christophe Willemsen" | ["php","forms","doctrine2","sonata"]                          |
| "Stefan Armbruster"    | ["groovy","intellij-idea","tomcat","grails-plugin"]           |
| "FrobberOfBits"        | ["python","xsd","xml","django"]                               |
| "jjaderberg"           | ["vim","logging","python","maven"]                            |
+----------------------------------------------------------------------------------------+
10 rows
84 ms
Note that this Cypher query above contains the equivalent of 14 SQL joins.
Stack Overflow Data Rendered in Linkurious Visualizer

Rendered in Linkurious Visualizer

People Who Posted the Most Questions about Neo4j

MATCH (t:Tag {tagId:'neo4j'})<-[:HAS_TAG]-(:Post)<-[:POSTED]-(u:User)
RETURN u.name,count(*) as count
ORDER BY count DESC LIMIT 10;

+------------------------+
| c.name         | count |
+------------------------+
| "LDB"          | 39    |
| "deemeetree"   | 39    |
| "alexanoid"    | 38    |
| "MonkeyBonkey" | 35    |
| "Badmiral"     | 35    |
| "Mik378"       | 27    |
| "Kiran"        | 25    |
| "red-devil"    | 24    |
| "raHul"        | 23    |
| "Sovos"        | 23    |
+------------------------+
10 rows
42 ms

The Top Answerers for the py2neo Tag

MATCH (:Tag {tagId:'py2neo'})<-[:HAS_TAG]-()-[:PARENT_OF]->()
      <-[:POSTED]-(u:User)
RETURN u.name,count(*) as count
ORDER BY count DESC LIMIT 10;

+--------------------------------+
| u.name                 | count |
+--------------------------------+
| "Nigel Small"          | 88    |
| "Martin Preusse"       | 24    |
| "Michael Hunger"       | 22    |
| "Nicole White"         | 9     |
| "Stefan Armbruster"    | 8     |
| "FrobberOfBits"        | 6     |
| "Peter Neubauer"       | 5     |
| "Christophe Willemsen" | 5     |
| "cybersam"             | 4     |
| "Wes Freeman"          | 4     |
+--------------------------------+
10 rows
2 ms

Which Users Answered Their Own Question

This global graph query takes a bit of time as it touches 200 million paths in the database, it returns after about 60 seconds.
If you would want to execute it only on a subset of the 4.5M users you could add a filtering condition, e.g. on reputation.
MATCH (u:User) WHERE u.reputation > 20000
MATCH (u)-[:POSTED]->(question)-[:ANSWER]->(answer)<-[:POSTED]-(u)
WITH u,count(distinct question) AS questions
ORDER BY questions DESC LIMIT 5
RETURN u.name, u.reputation, questions;

+---------------------------------------------+
| u.name           | u.reputation | questions |
+---------------------------------------------+
| "Stefan Kendall" | 31622        | 133       |
| "prosseek"       | 31411        | 114       |
| "Cheeso"         | 100779       | 107       |
| "Chase Florell"  | 21207        | 99        |
| "Shimmy"         | 29175        | 96        |
+---------------------------------------------+
5 rows
10 seconds

More Information

We’re happy to provide you with the graph database of the Stack Overflow dump here: If you want to learn about other ways to import or visualize Stack Overflow questions in Neo4j, please have a look at these blog posts: Thanks again to everyone who posts and answers Neo4j questions. You’re the ones who make the Neo4j community really tick, and without you this level of analysis would only be half as much fun. Circling back to Stack Overflow's 10 million question milestone, thank YOU for being #SOreadytohelp with any Stack Overflow questions related to Neo4j and Cypher. Please let us know if you find other interesting questions and answers on this dataset. Just drop us an email to content@neo4j.com. Want to catch up with the rest of the Neo4j community? Click below to get your free copy of the Learning Neo4j ebook and catch up to speed with the world’s leading graph database.

The post Import 10M Stack Overflow Questions into Neo4j In Just 3 Minutes appeared first on Neo4j Graph Database.

Free Neo4j Books (+ Discounts), from Beginner to Advanced

$
0
0
Discover the Ever-Expanding Library of Neo4j Books Available from Packt PublishingWhether you’re a brand-new user of Neo4j or a seasoned vet, you can always stand to polish and refine your skills with graph databases. No doubt you’ve read the classic Graph Databases book by Ian Robinson, Jim Webber and Emil Eifrem, but now it’s time to move beyond the basics, especially when it comes to Neo4j- and Cypher-specific skills. Good news: Packt Publishing has an ever-expanding host of Neo4j books, and they’re offering an exclusive discount to the Neo4j community. Use the discount code NEO4J25 to receive 25% off your order of any of these ebooks listed below. Better yet: When you purchase all of their Neo4j books an automatic discount gets you all seven books for just $100. But you know what’s even better than a discounted ebook? A free one. If you’d like a free copy of one of the Neo4j-specific titles below, tweet this article using the hashtag #PacktNeo4j. After a week, we’ll select seven winners to receive a 100% discount code for the book of your choice! Now, what can you hope to win? We’ve had each of the authors write a quick summary of their Neo4j book to give you an idea:

Beginner Level Books:

Learning Neo4j

By Rik Van Bruggen, @rvanbruggen Learning Neo4j Book by Rik Van BruggenLearning Neo4j will give you a step-by-step way of adopting Neo4j, the world’s leading graph database. The book includes a lot of background information, helps you grasp the fundamental concepts behind this radical new way of dealing with connected data and will give you lots of examples of use cases and environments where a graph database would be a great fit. “Contrary to many other books on Neo4j, this book is not only targeted at the hardcore developer: I have tried to make the book as accessible as possible for less technical audiences. Technically interested project/program managers should be able to get a great feel for the power of Neo4j by going through this book.”

Learning Cypher

By Onofrio Panzarino, @onof80 Learning Cypher Book by Onofrio PanzarinoLearning Cypher is a practical, hands-on guide to learn how to use Neo4j quickly with Cypher, from scratch. The first chapters show you how to manage a Neo4j database in all phases of its lifecycle: creation, querying, updating and maintenance, with a particular focus on Cypher, the powerful Neo4j query language. An entire chapter is dedicated to profiling and improving the performance of queries. The last chapter shows a simple approach to face migrations from SQL. It would be helpful to have a bit of familiarity with Java or and/or SQL but no prior experience is required.”

Neo4j Essentials

By Sumit Gupta Neo4j Essentials Book by Sumit GuptaNeo4j Essentials is a comprehensive and fast-paced guide for developers or expert programmers, especially those experienced in a graph-based or NoSQL-based database and who want to quickly develop and produce real-world, complex use cases on Neo4j. It begins with basic steps of installation and explores various notable features of Neo4j like data structuring, querying (Cypher), pattern matching, integrations with BI tools, Spring Data, utilities and performance tuning, etc. This book also talks about the strategies for efficiently and effectively handling production nuances for enterprise-grade deployments and uncovers the methodologies for extending and securing Neo4j deployments.”

Intermediate Level Books:

Neo4j Cookbook

By Ankur Goel Neo4j Cookbook by Ankur GoelNeo4j Cookbook provides easy-to-follow yet powerful ready-made recipes, which on one side covers all the recipes which you will need most of the time while working with Neo4j and on the other side takes you through new real-world use cases in travel, healthcare and e-commerce domains. Starting with a practical and vital introduction to Neo4j and various aspects of Neo4j installation, you will learn how to connect and access Neo4j servers from programming languages such as Java, Python, Ruby and Scala. You will also learn about Neo4j administration and maintenance before expanding and advancing your knowledge by dealing with large Neo4j installations and optimizing them for both storage and querying.”

Neo4j Graph Data Modeling

By Mahesh Lal, @Mahesh_Lal Neo4j Graph Data Modeling Book by Mahesh LalNeo4j Graph Data Modeling will introduce design concepts used in modeling data as a graph in Neo4j. Written for developers with some familiarity with Neo4j and data architects, the book takes a step-by-step approach to explaining how we can design various data models in Neo4j. The examples have a wide range, starting from graph problems (e.g., routing) to problems that are not an intuitive fit for graph databases. We have tried to craft the examples so that the reader is taken on a journey of discovery of the rationale behind design decisions.”

Advanced Level Books:

Neo4j High Performance

By Sonal Raj, @_sonalraj Neo4j High Performance Book by Sonal RajNeo4j High Performance presents an insight into how Neo4j can be applied to practical industry scenarios and also includes tweaks and optimizations for developers and administrators to make their systems more efficient and high-performing. By the end of this book you will have learnt about the following three aspects of Neo4j:
    • Understand concepts of graphs and Neo4j as a graph database, transactions, indexing and its querying with Cypher.
    • Create, build, deploy and test applications running Neo4j at the backend. Also get an introduction to an embedded application of Neo4j.
    • Use and setup of the Neo4j APIs including core API, REST API and an overview of its High Availability version of the framework.”

Building Web Applications with Python and Neo4j

By Sumit Gupta Building Web Applications with Python and Neo4j Book by Sumit GuptaBuilding Web Applications with Python and Neo4j is a step-by-step guide aimed at competent developers who have exposure and programming experience in Python and who now want to explore the world of relationships with Neo4j. This book discusses data modeling, programming and data analysis for application development with Python and Neo4j. This books also provides all necessary practical skills and exposure to Python developers, which not only helps them in leveraging the power of Neo4j, but at the same time it also provides insight into various Python-based frameworks like py2neo, OGM, Django, Flask, etc. for rapidly developing enterprise-grade application with Python and Neo4j.” Don’t forget – Get 25% off your purchase of any Packt Neo4j title with the discount code NEO4J25 or share this article on Twitter with the hashtag #PacktNeo4j to win your free copy of any of the ebooks listed above! Having trouble with any of the discount codes? Email customercare@packtpub.com for help.

The post Free Neo4j Books (+ Discounts), from Beginner to Advanced appeared first on Neo4j Graph Database.

From the Neo4j Community: September 2015

$
0
0
Explore All of the Great Articles & Videos Created by the Neo4j Community in the Month of SeptemberAutumn is in full swing and as we continue to gather all the slides and videos from GraphConnect San Francisco (coming soon!), it’s time to look back at all of the amazing contributions from the Neo4j community this past September.

Below are some of our top picks from around the world.

If you would like to see more posts from the graphista community, follow us on Twitter and use the #Neo4j hashtag to be featured in November’s “From the Community” blog post.

Articles

Videos

Websites

    • neo4Art by the Larus Business Automation Team


Join the largest graph database ecosystem and catch up with the rest of the Neo4j community – click below to download your free copy of Learning Neo4j and sharpen your skills with the world’s leading graph database.

The post From the Neo4j Community: September 2015 appeared first on Neo4j Graph Database.

Building the Graph Your Network App with the Neo4j Docker Image

$
0
0
Neo4j has long been distributed with a dataset of movies – showing how movies, actors, actresses, directors and more relate to each other.

We’ve also recently added Northwind organizational data for those developers who are more business minded. Nonetheless, these datasets don’t capture the interest of all developers who work with a variety of types of data.

We wanted a dataset that everyone could feel a personal attachment to, so we decided to enable you to analyze your personal Twitter data in Neo4j!

In order to let the masses explore their Twitter data in an isolated environment, we decided to take advantage of the new Neo4j Docker image.

We set up a Neo4j Docker container for each user, running on Amazon’s Elastic Container Service (ECS).

Architecture


The Graph Your Network App Architecture using the Neo4j Docker Image


When a new user visits network.graphdemos.com, they are directed from one of the neo4j-twitter-head instances to login with their Twitter account.

This completes an OAuth 1.0a dance, enabling the Graph Your Network application to access the user’s Twitter data on their behalf. While most of the data being accessed is already public, acting on behalf of the user gives additional Twitter API quota and an ability to authenticate the user.

We then spin up a new instance, which runs the neo4j-twitter docker image, using the official Neo4j Docker image as the base.

This instance starts up Neo4j and then runs a Python script to import the user’s Twitter data into Neo4j. The credentials needed to do the import are passed into the Docker container using environment variables.

After Neo4j is started, the credentials are reset and the URL, username and password are provided to the user on a webpage. We also run some canned queries that are executed by the neo4j-twitter-head instances using py2neo calling your personal Neo4j instance.

Resource Allotment


Each instance is allocated 1/4th of a CPU core and 768MB of memory. While this is only a small amount, it is adequate for the Twitter graphs of most users.

Since the number of EC2 servers needed to host these containers can depend upon the load, we have a cron job running regularly on the head instances which increases the auto-scaling group size appropriately or terminates instances.

Imported Data & Data Model


Learn How We Built the Graph Your Network App Using the Official Neo4j Docker Image


We import the following data:
    • Your followers
    • People you follow
    • Your tweets
    • Your mentions
    • Recent tweets using your top 5 hashtags
    • Recent tweets with #GraphConnect
    • Recent tweets mentioning Neo4j
There are three separate threads running to call the Twitter API. When the threads hit Twitter API quotas, they sleep for 15 minutes. Your new tweets are imported every 30 minutes, and other data is updated every 90 minutes.

Example Queries


We provide a set of example queries on the web app and in the tutorial built in the Neo4j browser, including:
    1. Who’s mentioning you on Twitter?
    2. Who are your most influential followers?
    3. What tags do you use frequently?
    4. How many people you follow also follow you back?
    5. Who are the people tweeting about you, but who you don’t follow?
    6. What are the links from interesting retweets?
    7. Who are other people tweeting with some of your top hashtags?

Browser Guide


Some folks have wondered how we accomplished the built-in browser guide (shown below), invoked by :play twitter. This custom guide was added to a build of the Neo4j browser, which we then replaced in the neo4j-twitter docker image.

cd $NEO4J/community/browser
vim app/content/guides/twitter.jade
mvn package
cp target/neo4j-browser-2.x.x-SNAPSHOT.jar $DOCKER_REPO/neo4j-browser-2.x.x.jar

vim $DOCKER_REPO/Dockerfile
  add: ADD neo4j-browser-2.x.x.jar /var/lib/neo4j/system/lib/neo4j-browser-2.x.x.jar

The Graph Your Network App Browser Guide using the Neo4j Docker Image


What Are Your Favorite Queries?


Let us know if you discover some great queries! Share them with me on Twitter, on Slack or on the issue tracker.

Start Exploring Now!


Visit http://network.graphdemos.com/


Want to build projects like the Graph Your Network app? Click below to get your free copy of the Learning Neo4j ebook and learn to master the world’s leading graph database.

The post Building the Graph Your Network App with the Neo4j Docker Image appeared first on Neo4j Graph Database.

From the Neo4j Community: October 2015

$
0
0
Explore All of the Great Articles & Videos Created by the Neo4j Community in October 2015The Neo4j community has had a very busy October! Besides the three major announcements at GraphConnect San Francisco, community members have been abuzz about everything from real-time databases to competitive benchmarks.

Below are some of our top picks from our stellar (and growing!) community members.

If you would like to see your post featured in December’s “From the Community” blog post, follow us on Twitter and use the #Neo4j hashtag for your chance to get picked.

Articles


Blogs


Videos





Join the fastest growing graph database community – click below to download your free copy of Learning Neo4j and master the world’s leading graph database.

The post From the Neo4j Community: October 2015 appeared first on Neo4j Graph Database.

How Backstory.io Uses Neo4j to Graph the News [Community Post]

$
0
0

[As community content, this post reflects the views and opinions of the particular author and does not necessarily reflect the official stance of Neo4j.]

Backstory is a news exploration website I co-created with my friend Devin.

The site automatically organizes news from hundreds of sources into rich, interconnected timelines. Our goal is to empower people to consume news in a more informative and open-ended way.

The News Graph


Our ability to present and analyze news in interesting ways is based on an extensive and ever-growing “news graph” powered by Neo4j.

The core graph model is shown in simplified form below:

Learn How Backstory.io Uses Neo4j to Graph News Stories in a New Way


Consider three articles published by different news sources on November 16th, 2015.

First, Backstory collects these articles and stores them as ARTICLE nodes in the graph.

Second, article text is analyzed for named entities, stored as ACTOR nodes. Articles have a REFERENCED relationship with their actors.

Thirdly, these articles are clustered because they’re about the same thing: U.S. Secretary of State John Kerry visiting France after the terrorist attacks in Paris. The article cluster is represented by an EVENT node. All articles and actors in a cluster point to their news event with an IN_EVENT relationship.

Finally, all actors in the cluster point to one another using a dated WITH relationship, to record their co-occurrence.

Given enough data, this model allows us to answer interesting questions about the news with simple Cypher queries. For example:

What are the most recent news events involving John Kerry?

MATCH (:ACTOR {name: "John Kerry"})-[:IN_EVENT]-(e:EVENT) RETURN e ORDER BY e.date DESC LIMIT 10

When was the last time Islamism interacted with Paris?

MATCH (:ACTOR {name: "Islamism"})-[w:WITH]-(:ACTOR {name: "Paris"}) RETURN w.date ORDER BY w.date DESC LIMIT 1

How many news events involving France occurred this week?

MATCH (:ACTOR {name: "France"})-[:IN_EVENT]-(e:EVENT) WHERE e.date > 1447215879786 RETURN count(e) AS event_count

In addition to the information present in the news graph itself, we tap into a large amount of enriched data by virtue of correlating all actor nodes to Wikipedia entries.

For example, by including a field about the type of thing an actor is, a query can now differentiate a person from a place. Cypher has risen to the challenge and continues to allow for concise queries over a complexifying graph.

Neo4j For The Win


We are big Neo4j fans at Backstory. The graph technology and community has propelled us forward in many ways.

Here are just a few examples:

There Are Ample Neo4j Clients across Languages

In the Backstory system architecture – described in more detail here – there are a variety of components that read from and write to the graph database.

A combination of requirements and personal taste have led us to write these components in different languages, and we are pleased with the variety of options available for talking to Neo4j.

On the write-side, we use the the Neo4j Java REST Bindings. This component also uses a custom testing framework that allows us to run suites of integration tests against isolated, transient Neo4j embedded instances.

On the read-side, we’ve created an HTTP API that has codified the queries the Backstory.io website makes. This is written in Python and uses py2neo.

There’s also an ExpressJS API for administrative purposes, which constructs custom Cypher queries and manages its own transactions with Neo4j.

The Neo4j Browser Is a Crucial Experimentation Tool

The Neo4j Browser is an excellent tool for anything from experimenting with new Cypher queries to running some sanity checks on your production data.

Every Cypher-based feature I’ve developed for Backstory was conceived and hardened in the Browser. I even used it to develop the example queries above!

Graph Flexibility Is Underrated

Early on in our design process for Backstory we were a bit skeptical of using a graph database. Was it really worth leaving the comfort zone of relational databases or key-value stores?

Even after we had committed to a Neo4j prototype, we expected to end up requiring secondary relational storage for any number of requirements outside of the core news graph.

It turns out Neo4j has sufficed for all of our persistent data requirements, and has even led us to novel solutions in several cases. Four quick examples:

    1. Ability to latently add indexes
    2. The Backstory model has evolved substantially over time. New node and relationship types come and go, and properties are added that need to be queried. Neo4j’s support for adding indexes to an existing graph have allowed us to keep queries performant as things change.
    3. Using Neo4j as an article queue
    4. When Backstory collects news articles from the Internet, it has to queue them for textual analysis and event clustering. Instead of using a traditional persistent queue, we realized that Neo4j would support this requirement with minimal additional effort on our part. We already had Article nodes; so it was a matter of adding an “Unprocessed” label to new ones, and processing them in insertion order.
    5. Using the graph to cluster articles
    6. Our solution for grouping similar articles together into news events is based in part on the similarity of Article/Actor subgraphs. There is a strong signal in the fact that two articles within a small time span refer to the same actors. Some state-of-the-art clustering algorithms are graph-based, and Neo4j allowed us to quickly approach an excellent clustering solution.
    7. Using Neo4j for Named Entity recognition
    8. A central challenge for Backstory is recognizing actors in news article text. Until now, we have used a blend of open-source natural language processing tools and human intervention. But we’ve begun to experiment with using graphs to identify actors, and the results are a marked improvement and extremely promising.

Conclusion


As mentioned above, our goal with Backstory is to create better ways for people to consume news and understand the world. Part of this is having a world-class technology platform for collecting and analyzing news.

Neo4j’s vibrant community and the flexibility of the graph database are enabling us to achieve these goals.

Instead of thinking about our database simply as a place where bits are stored, we think of our data as alive and brimming with insights. The graph lets our data breathe, striking the right balance between structure and versatility. Meanwhile, Cypher queries continue to perform well as the model grows more complex.

The Neo4j-powered news graph is absolutely the centerpiece of our system, and we’re excited for what the future holds.

If you’d like to follow our progress, join the mailing list on http://backstory.io or give us a follow on Twitter at @backstoryio.


Ready to use Neo4j for your next app or project? Get everything you need to know about harnessing graphs in O’Reilly’s Graph Databases – click below to get your free copy.

The post How Backstory.io Uses Neo4j to Graph the News [Community Post] appeared first on Neo4j Graph Database.


Singapore Developers: Welcome to the Neo4j Community!

$
0
0
Learn about the Growing Number of Neo4j Resources for Singapore DevelopersGood news for developers in the Lion City: The Neo4j ecosystem now includes a budding new community in Singapore!

While Neo Technology will always have a Swedish soul, we’ve already welcomed a number of nations to the worldwide graph of Neo4j developers (with the most recent being Japan). Now, Singaporeans are the latest nodes to join the growing Neo4j community.

There’s already been a lot interest in Neo4j and graph databases among academic circles in Singapore, and now that’s spilling over into the wider developer ecosystem – especially around the Internet of Things (IoT) and smart cities.

Here are just a few ways Singaporean developers can now get involved in the other Neo4j developers in their area:

The Singapore Neo4j Meetup Group


The Neo4j Meetup Group in Singapore has just gotten started, which means it needs your participation in order to grow and succeed.

Our first meetup is scheduled for 17 December 2015 from 3 to 5 p.m. We’ll be covering:
Sign up here to join the meetup.

In addition to the existing LinkedIn groups for Neo4j and openCypher, Singapore developers also have a dedicated Facebook group just for local discussions around graph database issues, challenges and projects.

Meet 1degreenorth: The Only Neo4j Distributor in Singapore


Singaporean developers who are new to Neo4j also have another resource in their local Neo4j partner. 1degreenorth is an official Neo4j partner and the official distributor of Neo4j in Singapore.

Feel free to reach out to the 1degreenorth team if you have any questions about Neo4j for your business or development team. We’d be glad to help or point you toward the right resource.

The 1degreenorth team has a successful track record of bringing game-changing technologies to Singapore, usually four to five years ahead of major adoption. For example, the team was the first to bring Red Hat to Singapore, as well as the first partner to introduce VMware.

We have been a major proponent of open source technologies since the very beginning, and we think graph databases will be the next major wave of technology adoption.

Neo4j + The Singapore National Supercomputer Center (NSCC)


1degreenorth has also entered into a one-year agreement with the Singapore National Supercomputer Center (NSCC) to provide on-demand High Performance Computing Big Data Analytics (HPC-BDA) infrastructure for experimentation and proof-of-concept projects by the big data and data science community in Singapore.

This HPC-BDA infrastructure will sit on NSCC’s new 1PetaFlop High Performance Computing (HPC) Cluster with more than 30,000 cores and 10PB of high performance storage.

The system will offer a self-service web portal allowing users to easily self-provision Hadoop and Spark clusters and graph databases such as Neo4j in a multi-tenant private cloud environment.

This initiative will launch in the first quarter of 2016 and will be open to Singapore-based companies and government agencies that are looking to start a proof-of-concept (POC) using big data analytics.

1degreenorth is proud that the first graph database provided on the HPC-BDA platform will be Neo4j – so let us know if you have questions or want to get involved.


Singapore developers are no longer alone when it comes to learning and experimenting with Neo4j. With the new meetup group and other online resources, Singaporeans now have all the resources they need in order to be successful with the world’s leading graph database.


Ready to learn Neo4j for the first time? Click below to get your free copy of Learning Neo4j ebook and sharpen your skills with the world’s leading graph database.

The post Singapore Developers: Welcome to the Neo4j Community! appeared first on Neo4j Graph Database.

Bolting Forward: The Neo4j 3.0 Milestone 1 Release Is Here

$
0
0
Learn about the New Features Now Available in the Neo4j 3.0 Milestone 1 ReleaseHappy Friday, Neo4j alpha-developers! Good news: The first milestone release of Neo4j 3.0 is now ready for testing. Download Neo4j 3.0.0-M01 here.

Disclaimer: Milestone releases like this one are for development and experimentation only as not all features are in their finalized form. Click here if you’re looking for the most fully stable version of Neo4j (2.3.1).

The Neo4j 3.0 Milestone 1 release introduces Bolt, a new network protocol designed for high-performance access to graph databases. Bolt will consolidate and refine all the ways of doing work with Neo4j, providing more consistent access with a pleasant separation of responsibility.

This is the start of a new major branch of development for Neo4j. The work will bring forward all the best features of Neo4j, while leaving behind some things which are no longer useful.

These early days are an incredibly useful time to provide feedback, so please let us know what you think about where we’re headed.

Neo4j 3.0 Milestone 1 is available for download today as part of our Early Access Program.

Uniform Language Drivers with Bolt


For everyone building Neo4j applications, Bolt will improve both the performance as well as the developer experience. Bolt is a connection-oriented protocol, using a compact binary encoding over TCP or web sockets for higher throughput and lower latency.

To complement Bolt, we’re releasing official drivers which encapsulate the protocol, freeing contributors to focus on unique language and framework integration.

Java, JavaScript and Python drivers are available today, with other languages to follow soon. Each language will have a uniform approach for common, basic operations.

As an example, take a look at this code snippet using the Java driver (org.neo4j.driver:neo4j-java-driver:1.0.0-M01):

Driver driver = GraphDatabase.driver("bolt://localhost:7687");
try (Session session = driver.session()) {
  Result rs = session.run(
     "MATCH (:Person {name:{name}})-[:ACTED_IN]->(m:Movie) RETURN m",
                          Values.parameters("name","Keanu Reeves"));
  while (rs.next()) {
     System.out.println(rs.get("m").get("title").javaString());
  }
}

This client-side API is under active development, but this already gives you an idea what it will look like.

The drivers are available through language-specific distribution systems, with source code on GitHub: A PHP driver with support for Bolt is being worked on by our partner GraphAware.

Cypher Advances


Bolt’s always-on connection is a huge advantage over per-request connections when there is a high volume of small Cypher queries.

While the HTTP interface will remain available, we want the faster Bolt connection to be the only one you need. Cypher will expand its reach beyond graph queries to encapsulate all Neo4j operations with a uniform request and response pattern.

Included in this milestone, the Cypher cost-based planner is now capable of handling queries which mix reads and writes.

Previously, executing any write operations would fall back to the rule-based planner. Now, several common writes are supported:
    • CREATE nodes and relationships
    • DELETE and DETACH DELETE
    • SET and REMOVE labels
    • SET properties
    • MERGE node
Ongoing work will add support for MERGE of relationships, LOAD CSV and FOREACH.

Additional Notes about Milestone 1


Neo4j Enterprise Edition now exposes more metrics values and provides more documentation.

The ability to log to a single CSV file was removed. Also, the dashboard in webadmin has been removed; please refer to :sysinfo in the Neo4j browser or the appropriate metrics for that information.

The has() function in Cypher, which was deprecated in Neo4j 2.3, has been removed in favor of the more descriptive exists().

One important thing to note is that Neo4j now requires Java 8. Our desktop installers now bundle a Java 8 JRE. For other installations and embedded use, make sure to have Java 8 available.

We Need Your Feedback!


As always, early access releases give you a way to evaluate fancy new Neo4j features that will make our next versions even better. What we ask in return is for your feedback on what things worked well (or didn’t).

Please be aware that milestone releases are beta software not meant for production use, nor do we provide an upgrade path for them.

Send your feedback for this release to feedback@neotechnology.com or raise an issue on GitHub. If you have any questions, feel free to ask on our public Slack channel or post in our Google Group.

Please download the Neo4j 3.0 Milestone 1 release, and give it a try. Happy testing!

Cheers,
Andreas


The O’Reilly Graph Databases ebook shows you how to use graph technology to solve real-world problems. Click below to get your free copy of the definitive book on graph databases and your introduction to Neo4j.

The post Bolting Forward: The Neo4j 3.0 Milestone 1 Release Is Here appeared first on Neo4j Graph Database.

Non-Text Discovery with ConceptNet as a Neo4j Database [Community Post]

$
0
0

[As community content, this post reflects the views and opinions of the particular author and does not necessarily reflect the official stance of Neo4j.]

The Problem of Discovery


Discovery, especially non-text discovery, is hard.

When looking for a cool T-shirt, for example, I might not know exactly what I want, only that I’m looking for a gift T-shirt that’s a little mathy that emphasizes my friend’s love of nature.

As a retailer, I might notice that geometric nature products are quite popular, and want to capitalize by marketing the more general “math/nature” theme to potential buyers who have demonstrated an affinity for mathy animal shirts as well as improving the browsing experience for new visitors to my site.

Many retail sites with user-generated content rely on user-generated tags to classify image-driven products. However, the quality and number of tags on each item vary widely and depend on the item’s creator and the administrators of the site to curate and sort into browsable categories.

On Threadless, for example, this awesome item has a rich amount of tags:
lim heng swee, ilovedoodle, cats, lol, funny, humor, food, foodies, food with faces, pets, meow, ice cream, desserts,awww, puns, punny, wordplay, v-necks, vnecks, tanks, tank tops, crew sweatshirts, Cute
In contrast, this beautiful item has only a handful:
jimena salas, jimenasalas, funded, birds, animals, geometric shapes, abstract, Patterns
Furthermore, although a human might easily be able to classify an image with the tags [ants, anthill, abstract, goofy] as probably belonging to the “funny animals” category, an automated system would have to know that ants are animals and that goofy is a synonym for funny.

Knowing this, how would a retail site quickly and cheaply implement intelligent categorization and tag curation? ConceptNet5 and (of course), Neo4j.


ConceptNet5


This article introduces the ConceptNet dataset and describes how to import the data into a Neo4j database.

To paraphrase the ConceptNet5 website, ConceptNet5 is a semantic network built from nodes representing words or short phrases of natural language (“terms” or “concepts”), and the relationships (“associations”) between them.

Armed with this information, a system can take human words as input and use them to better search for information, answer questions and understand user goals.

For example, take a look at toast in the ConceptNet5 web demo:

Learn How to Leverage of Non-Text Discovery by using the ConceptNet Dataset within Neo4j


This looks remarkably similar to a graph model. The dataset is incredibly rich, including (in the JSON) the “sense” of toast as a bread and also as a drink one has in tribute.

Let’s take a look at the JSON response for one ConceptNet edge (the association between two concepts) and import some data into a Neo4j database for exploration:

{
     edges: 
     [
          {
               context: "/ctx/all",
               dataset: "/d/globalmind",
               end: "/c/en/bread",
               features: 
               [
                    "/c/en/toast /r/IsA -",
                    "/c/en/toast - /c/en/bread",
                    "- /r/IsA /c/en/bread"
               ],
               id: "/e/ff9b268e050d62255f236f35ba104300551b8a3b",
               license: "/l/CC/By-SA",
               rel: "/r/IsA",
               source_uri:                                              
               "/or/[/and/[/s/activity/globalmind/assert/,/s/
               contributor/omcs/bugmenot/]/,/s/umbel/2013/]",
               sources: 
               [
                    "/s/activity/globalmind/assert",
                    "/s/contributor/omcs/bugmenot",
                    "/s/umbel/2013"
               ],
               start: "/c/en/toast",
               surfaceText: "Kinds of [[bread]] : [[toast]]",
               uri: "/a/[/r/IsA/,/c/en/toast/,/c/en/bread/]",
               weight: 3
          },
}

Modeling the Database


For the purposes of this example, let’s model the database to have the following properties: Term Nodes:
    • concept
    • language
    • partOfSpeech
    • sense
Association Relationships:
    • type
    • weight
    • surfaceText
An alternate model could have “type” be the relationship label instead of a property, but for the sake of this blog post let’s keep types as properties. This allows us to explore the ConceptNet database without making assumptions about the types of relationships in the dataset.

Loading the Data into the Database


Let’s use the following Python script to upload some sample data:

import requests
import json
from py2neo import authenticate, Graph
 
USERNAME = "neo4j" #use your actual username
PASSWORD = "12345678" #use your actual password
authenticate("localhost:7474", USERNAME, PASSWORD)  
graph = Graph()

#sample_tags = ['fruit','orange','bikes','cream','nature', 'toast','electronic', 'techno', 'house', 'dubstep', 'drum_and_bass', 'space_rock', 'psychedelic_rock', 'psytrance', 'garage', 'progressive','Cologne', 'North_Rhine-Westphalia', 'gothic_rock', 'darkwave' 'goth', 'geometric', 'nature', 'skylines', 'landscapes', 'mountains', 'trees', 'silhouettes', 'back_in_stock', 'Patterns', 'raglans','giraffes', 'animals', 'nature', 'tangled', 'funny', 'cute', krautrock]

# Build query.
query = """
WITH {json} AS document
UNWIND document.edges AS edges
WITH 
SPLIT(edges.start,"/")[3] AS startConcept,
SPLIT(edges.start,"/")[2] AS startLanguage,
CASE WHEN SPLIT(edges.start,"/")[4] <> "" THEN SPLIT(edges.start,"/")[4] ELSE "" END AS startPartOfSpeech,
CASE WHEN SPLIT(edges.start,"/")[5] <> "" THEN SPLIT(edges.start,"/")[5] ELSE "" END AS startSense,
SPLIT(edges.rel,"/")[2] AS relType,
CASE WHEN edges.surfaceText <> "" THEN edges.surfaceText ELSE "" END AS surfaceText,
edges.weight AS weight,
SPLIT(edges.end,"/")[3] AS endConcept,
SPLIT(edges.end,"/")[2] AS endLanguage,
CASE WHEN SPLIT(edges.end,"/")[4] <> "" THEN SPLIT(edges.end,"/")[4] ELSE "" END AS endPartOfSpeech,
CASE WHEN SPLIT(edges.end,"/")[5] <> "" THEN SPLIT(edges.end,"/")[5] ELSE "" END AS endSense
MERGE (start:Term {concept:startConcept, language:startLanguage, partOfSpeech:startPartOfSpeech, sense:startSense})
MERGE (end:Term  {concept:endConcept, language:endLanguage, partOfSpeech:endPartOfSpeech, sense:endSense})
MERGE (start)-[r:ASSERTION {type:relType, weight:weight, surfaceText:surfaceText}]-(end)
"""

# Using the Search endpoint to load data into the graph
for tag in sample_tags:
	searchURL = "http://conceptnet5.media.mit.edu/data/5.4/c/en/" + tag + "?limit=500"
	searchJSON = requests.get(searchURL, headers = 
	{"accept":"application/json"}).json()
	graph.cypher.execute(query, json=searchJSON)

Exploring the Data


Use the following Cypher query to explore the data:

MATCH (n:Term {language:'en'})-[r:ASSERTION]->(m:Term {language:'en'})
WHERE 
NOT r.type = 'dbpedia' AND
NOT r.surfaceText = '' AND
NOT n.partOfSpeech = '' AND
NOT n.sense = ''
RETURN n.concept AS `Start Concept`, n.sense AS `in the sense of`, r.type, m.concept AS `End Concept`, m.sense AS `End Sense`
ORDER BY r.weight DESC, n.sense ASC
LIMIT 10

The ConceptNet dataset is incredibly rich, providing various “senses” in which someone might mean “orange” and provides a wide variety of “relationship types” to choose from.

    | Start Concept | in the sense of                                         | r.type     | End Concept     | End Sense
----+---------------+---------------------------------------------------------+------------+-----------------+-----------
  1 | orange        | colour                                                  | IsA        | color           |
  2 | orange        | film                                                    | InstanceOf | film            |
  3 | dynamic       | a_characteristic_or_manner_of_an_interaction_a_behavior | Synonym    | nature          |
  4 | garage        | a_petrol_filling_station                                | Synonym    | petrol_station  |
  5 | garage        | a_petrol_filling_station                                | Synonym    | fill_station    |
  6 | garage        | a_petrol_filling_station                                | Synonym    | gas_station     |
  7 | progressive   | advancing_in_severity                                   | Antonym    | non_progressive |
  8 | shop          | automobile_mechanic's_workplace                         | Synonym    | garage          |
  9 | electronic    | band                                                    | IsA        | band            |
 10 | cream         | band                                                    | IsA        | band            |

Use Cases and Future Directions


When translated into a graph database, the ConceptNet5 API takes the agony out of tag-based recommendations and categorizations.

Small retail and social startups can integrate a Neo4j microservice into their currently existing stack, using it to power recommendations, provide insights on what is the most effective way to categorize products (should “funny cats” have their own first-level category, or should they go under “animals”?), and allow more time and budget for richer innovations.

References


Loading JSON into a Neo4j Database
Dealing with Empty Columns
Data


Learn how to build a real-time recommendation engine for non-text discovery on your website: Download this white paper – Powering Recommendations with a Graph Database – and start offering more timely, relevant suggestions to your users.

The post Non-Text Discovery with ConceptNet as a Neo4j Database [Community Post] appeared first on Neo4j Graph Database.

For Testing & Feedback: Neo4j 3.0 Release Candidate 1

$
0
0
Learn about the First Release Candidate of Neo4j 3.0 and Help Us Make It Better with Your Feedback

Neo4j 3.0 is coming soon, and we would like to ask you today for your feedback on our first release candidate.

For your review


Neo4j 3.0-RC1 previews some significant new features which we’d love for you to review.

  • “Bolt” binary protocol with official drivers: .NET, Python, Java and JavaScript are the first round of languages with new drivers.
  • Java stored procedures: CALL them on their own or mix them within a more complex Cypher statement using CALL ... YIELD.
  • Neo4j Sync: A companion cloud service for the Neo4j Browser that stores scripts, graph style sheets and history.
  • Operability changes: Important changes to the directory layout and the configuration of Neo4j.
  • Documentation library: Introducing a collection of focused publications, starting with a Developer Manual and Operations Manual.

Of course, this release contains welcome bug fixes and impressive performance improvements across the board. You can find more information in our release notes or the detailed changelog.

Known Issues


We’ve already discovered some issues that you should know about. Notably, you may encounter these:
  • Neo4j Browser only works with authenticated Bolt connections. You must either enable authentication on the server or disable Bolt in Neo4j Browser settings.

Feedback, please!


Use this release candidate to help us test Neo4j thoroughly so that we can provide an amazing and stable release. We’d love to hear from you on our public Slack or via e-mail: feedback@neo4j.com.

If you really love what we built you can tell the world on Twitter; if not, tell us directly. For every tweet tagged #neo4j #feedback that provides a lovely quote and a picture we raffle one rare blue Neo4j water bottle per day for each of the next 12 days.

All the Links You Need


Download Neo4j 3.0.0-RC1, grab a driver for your language, then peruse the new documentation.

Neo4j 3.0.0 RC1
Community Edition:
 
Enterprise Edition:

 

Official Drivers
 Language
.Net
Java
Javascript
Python

 

Documentation Library


Cheers,
The Neo4j Team

The post For Testing & Feedback: Neo4j 3.0 Release Candidate 1 appeared first on Neo4j Graph Database.

Official Release: 3 Essentials of Neo4j 3.0, from Scale to Productivity & Deployment

$
0
0
Learn All about the Official Release of Neo4j 3.0, including Scale, Productivity and DeploymentToday, the Neo4j team is proud to announce the release of Neo4j 3.0.0.

As the first release in the 3.x series, Neo4j 3.0 is based on a completely redesigned architecture and a commitment to offer the world’s most scalable graph database, greater developer productivity and a wide range of deployment choices.

What are the top three things to look for in Neo4j 3.0?

    1. Completely redesigned internals that remove all previous limits on the number of nodes, relationships and properties that can be stored and indexed
    2. Officially supported language drivers backed by the new Bolt binary protocol with new support for Java Stored Procedures, together enabling full-stack developers to build powerful applications
    3. A streamlined configuration and deployment structure to deploy Neo4j on premise or in the cloud
Let’s take a closer look at what’s new in Neo4j 3.0:

1. Massive Scale and Performance


Neo4j 3.0 for Giant Graphs


Graph Database Massive Scaling with Neo4j


The centerpiece of Neo4j’s architecture overhaul is a redesigned data store.

Dynamic pointer compression expands Neo4j’s available address space as needed, making it possible to store graphs of any size. That’s right: no more 34 billion node limits!

Performance and scale are often at odds, and scaling without performance isn’t very useful. The approach we’ve invented maintains index-free adjacency, allowing each node to locate adjacent nodes and relationships via a pointer hop, and preserving the ultra-fast performance that Neo4j is known for. This feature is available in Neo4j Enterprise Edition, nicely complementing its scale-out clustering capabilities.

Improved Cost-based Optimizer


The cost-based query optimizer for read-only queries introduced from Neo4j 2.2. onwards has evolved in Neo4j 3.0. In 3.0, the cost-based optimizer has been improved with added support for write queries (in addition to reads). The new parallel indexes capability in the optimizer also enables faster population of indexes.

The Cypher Engine in Neo4j 3.0


In the same realm, here are a few other Cypher improvements in Neo4j 3.0:
    • The keywords ENDS WITH and CONTAINS are now index-backed
    • Global aggregation improvements for activities like counting nodes by label
    • Value joins speed queries where no relationship link exists

2. Developer Productivity Brought to New Heights


Official Language Drivers & Bolt: Neo4j’s New Binary Protocol


One of the things we’re excited to deliver as part of Neo4j 3.0 is official drivers for the most popular programming languages. These will make development easier and more productive, and enable an entirely new range of language-specific learning resources. (More below.)

In order to bring you the best and most usable drivers, we carefully examined the way that applications connect to Neo4j. The result is not just drivers, but an end-to-end infrastructure stack that starts with new drivers, a new binary protocol (which under the hood includes a new multi-versioned client API and new serialization format) and a uniform type system.

New Language Drivers and Bolt Binary Protocol in Neo4j 3.0


Let’s start with Bolt: a new connection-oriented protocol for accessing the graph.

Bolt uses a compact binary encoding over TCP or web sockets for higher throughput and lower latency. Lightweight and efficient, Bolt uses a new serialization format called PackStream and also introduces a new uniform graph type system that bridges storage, transport, Cypher, drivers and apps. Bolt has TLS security built in and enabled by default, as we care about ensuring that your data is protected.

For everyone building Neo4j applications, Bolt will improve both the performance of the graph database as well as the developer experience.

To complement Bolt, we’re releasing official language drivers which encapsulate the protocol, freeing contributors to focus on unique language and framework integration. The new official language drivers include:
    • JavaScript
    • Python
    • Java
    • .NET
The new official language drivers are also pluggable into richer programming frameworks. For example, Spring Data Neo4j v4.1 uses the Bolt-based object-to-graph mapper.

Let’s look at some example code for each of the official language drivers using Bolt:

With JavaScript:

var driver = Graph.Database.driver("bolt://localhost");
var session = driver.session();
var result = session.run("MATCH (u:User) RETURN u.name");

With Python:

driver = Graph.Database.driver("bolt://localhost")
session = driver.session()
result = session.run("MATCH (u:User) RETURN u.name")

With Java:

Driver driver = GraphDatabase.driver( "bolt://localhost" );
try ( Session session = driver.session() ) {
   StatementResult result = session.run("MATCH (u:User) RETURN u.name");
}

With .NET:

using (var driver = GraphDatabase.Driver("bolt://localhost"))
using (var session = driver.Session())
{
   var result = session.Run("MATCH (u:User) RETURN u.name");
}

For all the detail about coding with the official drivers, see the Neo4j Developer Manual.

Java Stored Procedures


Java Stored Procedures in Neo4j 3.0


Another powerful new facility of Neo4j 3.0 is Java Stored Procedures. These provide direct, low-level access to the graph, giving you a way to run imperative code when you want to do more complex work inside of the database.

Java Stored Procedures can be called independently, or they can also be parameterized, can return results, and can even be mixed with Cypher to blend the best of declarative and imperative coding.

You can create Java Stored Procedures that are:
    • Written in any JVM language
    • Stored in Neo4j
    • Accessed by applications over Bolt
    • Executed on the server
The possibilities are endless: loading data from external sources, running graph algorithms, reverse engineering the meta graph or even simply creating UUIDs.

Neo4j comes with a set of built-in procedures, and the community was quick to step up and provide a collection of useful procedures as part of the APOC project, where 99 procedures are available as of today’s launch.

Check out this example, which blends a Java Stored Procedure to load data using JDBC, then applies the rows of the result to a Cypher MERGE clause to create data.

A JDBC and Cypher Example of a Java Stored Procedure in Neo4j 3.0




Neo4j Browser Sync


Neo4j Browser Sync Available in Neo4j 3.0


Neo4j Browser Sync (or just “Browser Sync” for short) is a companion cloud service for Neo4j Browser. Browser Sync allows you to synchronize saved scripts and graph style sheets as well as preserve client-side work across connections. This means that your scripts and settings will be available to you as you move from database to database, machine to machine and web browser to web browser.

Browser Sync gives you quick and easy access to your favorite and most commonly used Cypher queries. In fact, you can store an unlimited number of queries with Browser Sync.

Currently registered users can opt-in to Neo4j Browser Sync and new users can connect with existing online credentials (e.g., GitHub, Google, or Twitter ID).

3. Deploy Neo4j 3.0 Applications Anywhere


Deployable in the Cloud, Containers and On Premise


Cloud, Container and On-Premise Deployment Options in Neo4j 3.0


We took advantage of a major release to make some changes in the way Neo4j is operated and configured, specifically in the areas of the Neo4j directory structure, configuration and logging. These changes incorporate lessons from years of operating Neo4j across many different operating environments, while also accounting for the new deployment infrastructures of today’s development world.

The new file, config and log structures in Neo4j 3.0 are designed to streamline operations and to bring Neo4j better into line with operational IT expectations. One notable change is to move from multiple config files to a single namespaced files. A config migration utility has been supplied to make your transition as seamless as possible.

The result will allow you to more easily run Neo4j on premise and across a wide variety of modern deployment scenarios, allowing you to deploy Neo4j virtually anywhere. Major changes include:
    • Config settings were cleaned-up and renamed to conform to a hierarchical and more consistent namespace structure
    • Log files are now aggregated into single location on disk (enabling to be mounted on separate write-optimized partition)
    • Added support to hold multiple databases in the /database folder and mount them by name
The improved database operations make it easier than ever to deploy Neo4j 3.0 on premise, in virtualized and containerized environments or in the cloud on the IaaS or PaaS of your choice. Look to the new Neo4j Operations Manual to learn all the details.

Conclusion


We’re happy to bring you Neo4j 3.0 today as the next major evolution in graph database technology. It’s a great beginning to a new 3.x series, which bears the hallmark of even more and better things to come.

As we get to work on Neo4j 3.1, we hope this landmark release will help you to push the boundaries of what you believed to be possible with graph databases.

Speaking for the entire Neo4j team, we hope you will enjoy this release, and wish you very happy graphing with the new Neo4j 3.0.

Philip Rathle


Click below to get started with Neo4j 3.0 and start building massive-scale graph applications with unmatched developer productivity.

The post Official Release: 3 Essentials of Neo4j 3.0, from Scale to Productivity & Deployment appeared first on Neo4j Graph Database.

Viewing all 195 articles
Browse latest View live