Published on

January 31, 2023

Databases are of the type Relational, Nosql and Graph. Example of Relational databases are Oracle, DB2, MySql, SQL Server. These are typically used for transactional systems, highly normalized where an insert or update of a single records are quick. These are also good for joining different dataset records to project the desired dataset.

Then comes the NoSQL DBs like MongoDB, Cassandra, Elastic cache, Redis etc.These are highly denormalized and all information regarding the object is kept in one document / record. These are not meant for joining. Also, one can find a variety of NoSQL databases that are meant for a certain scenarios. So one needs to make a choice depending on careful consideration.

The Graph DB is made of Nodes and Edges. The Nodes typically represent every record and the edge represents the relation between the two nodes. Since this is pre-calculated, it is faster to traverse this graph. It is a good choice of DB for recommendation engines, social networking and fraud detection.

USECASE:

Anti money laundering use cases:

I have a set of accounts and I need to find out how these accounts operate in case of layering for eg.

If I were to consider each node as an account.

Every account will be associated with a Customer.

Also each account will be associated with other accounts when there is a transaction, so the edge can be a transaction amount and the date and time of transaction can also be associated with the Edge.

Any account that has outgoing transactions – more than x transactions to the same account with somewhat same amounts will tell that there is some fraudulent activity going on.

Since the Edges are pre calculated, Graph DBs makes it fast to traverse through the edges i.e. accounts that are related through transactions.

1) Installation

At first I tried to install Neo4j (community version) on Windows. I followed the instruction on the home page. On execution, it did not look anything like what it should look like as provided on the internet. Most bloggers suggested that on setting the JAVA_HOME to Java17 in the environment variables should make this work. But this did not work.

So, I decided to use the Docker image. This worked fine.

1) Install the exe on Windows:

Follow the instructions on –

This has 2 main steps.

1) Install Java17 or above and set the environment variable – JAVA_HOME to this

2) Download the neo4j and run the commands as specified on the above page.

** This does not work well

2) Docker

** This works

1) In a windows path, create the following directory –

mkdir data
mkdir logs
mkdir conf
mkdir plugins

2) Edit neo4j.conf

dbms.security.procedures.whitelist=algo.,apoc.
dbms.security.procedures.unrestricted=algo.,apoc..

3) Run the latest Docker Image

docker run -d –publish=7474:7474
–publish=7687:7687
–volume=<windows_path>\data:/data
–volume=<windows_path>\logs:/logs
–volume=<windows_path>\conf:/conf
–volume=<windows_path>\plugins:/plugins
–env=NEO4J_AUTH=none
–name my_neo4j neo4j:5.3.0-community

2) Neo4J UI

Open the below in a Web Browser

http://localhost:7474/browser/

username : Neo4j
password : Neo4j

This opens the Neo4J UI.

This will enable you to run some cql, view all the node and the related nodes, the edges in a graphical manner.

3) Python Client

There are 2 drivers available for Python.

1)py4neo

2) neo4j-driver

Below is the consideration advice from the Py2neo Handbook on when to use which of the two libraries.

When considering whether to use py2neo or the official Python Driver for Neo4j, there is a trade-off to be made.
Py2neo offers a larger surface, with both a higher level API and an OGM,
but the official driver provides mechanisms to work with clusters, such as automatic retries.
If you are new to Neo4j, need an OGM, do not want to learn Cypher immediately, or require data science integrations, py2neo may be the better choice.
If you are building a high-availability Enterprise application, or are using a cluster, you likely need the official driver.

4) Create mock data

Below is code that would create some customers. One can also create more details associated with the Node.

Also, while creating mock data in a batch, it is a good idea to create every node, its related nodes and the edge (the relation) in a single insert and iterate this for each node.

from neo4j import GraphDatabase
def generate_data():
import random

uri = “bolt://localhost:7687”
user_name = “neo4j”
password = “test”

# Connect to the neo4j database server
graphDB_driver = GraphDatabase.driver(uri, auth=(user_name, password))
n = random.randint(1000, 2000)
account_num = 100000
for cus in range(1000000, 1000000+n):
num_acc_link = random.randint(1, 4)
print(num_acc_link)
cql_stmt = []
row_cus = “MERGE (cus:customer {customer_id: $cus})”
cql_stmt.append(row_cus)
for acc in range(0, num_acc_link):
account_num = account_num+1
row_cus_acc = “MERGE (cus)-[:HAS_ACCOUNT]->(acc:account {account_num: $account_num})”
cql_stmt.append(row_cus_acc)
cql_create = f” {‘ ‘.join(cql_stmt)}”
print(cql_create)
with graphDB_driver.session() as graphDB_session:
graphDB_session.run(cql_create, cus=cus, account_num=account_num)
cql_stmt = []

if __name__ == ‘__main__’:
generate_date()

5) Query the GraphDB

Below is how to query a Graph DB.

You can tell which node and what types of Edges one wants to query.

from neo4j import GraphDatabase
def find_related_accounts(cus_id):
uri = "bolt://localhost:7687"
user_name = "neo4j"
password = "test"
# Connect to the neo4j database server
graphDB_driver = GraphDatabase.driver(uri, auth=(user_name, password))

query = (“MATCH (a:customer)-[:HAS_ACCOUNT]->(account)
WHERE a.customer_id = $cus_id
RETURN account.account_num ORDER BY account.account_num”)

with graphDB_driver.session() as graphDB_session:
for record in graphDB_session.run(query, cus_id=cus_id):
print(record[“account.account_num”])

if __name__ == ‘__main__’:
cus_id = 1000002
find_related_accounts(cus_id)