QRdvark.com

Replacing Mongodb With Pgsql JSON Data Type And PHP Tutorial

Replacing Mongodb With Pgsql JSON Data Type And PHP Tutorial

Contents

Abstract.

MongoDb has been the goto solution for many enterprises for a good reason. The

Many projects switched to MongoDB because it offered the promise of being able to store 'big data' in JSON like tables which it calls BSON.

Now, PgSQL has a better solution with the advent of the JSON type.

The JSON datatype provides all the simplicity of a NoSQL database such as MongoDB, and more.

Creating a JSON type field

Like any data type in PgSQL, the JSON type is simply assigned when a table is created. For the purposes of this tutorial, a movie storage table will be created with a JSON field to hold information about a movie, such as title, director etc.

CREATE TABLE movies ( 
        id SERIAL PRIMARY KEY NOT NULL,
        data JSON NOT NULL,
        rating INT NOT NULL
);

In the above code, the data field has been assigned the data type of JSON. This will be the field which holds the JSON data.

INSERT JSON

JSON data is a string of name:value pairs, and because JSON is a string representation of data, the same process is used as one would insert any string into a string type.


INSERT INTO movies (data, rating) VALUES (  '{ "name": "Galaxy Quest", "director": { "first_name": "Dean", "last_name": "Parisot" } }', 8);
INSERT INTO movies (data, rating) VALUES (  '{ "name": "Terminator", "director": { "first_name": "James", "last_name": "Cameron" } }', 8);
INSERT INTO movies (data, rating) VALUES (  '{ "name": "Kung Pow", "director": { "first_name": "Steve", "last_name": "Oedekerk" } }', 8);
-

Fetching Data

One of the problems with 'big data' and storing it in NO SQL databases is that the prevailing philosophy seems to be to shove it all into single, non-relational, field, and worry about getting it out later.

This philosophy has played out well for some, as it has given rise to a new breed of data mining people who have labelled this type of stupidity as 'Business Intelligence'.

With the PostGreSQL JSON type, fetching data is as simple as fetching data from any row. At the highest level, a SELECT query simply fetches all the data from the movies table;

 SELeCT * FROM movies;
 id |                                           data                                           | rating 
----+------------------------------------------------------------------------------------------+--------
  1 | { "name": "Galaxy Quest", "director": { "first_name": "Dean", "last_name": "Parisot" } } |      8
  2 | { "name": "Terminator", "director": { "first_name": "James", "last_name": "Cameron" } }  |      8
  3 | { "name": "Kung Pow", "director": { "first_name": "Steve", "last_name": "Oedekerk" } }   |      8

In the above output, the JSON data can easily be seen, but a finer grain of query is required for drilling down into the data.

Lets look at how to just the id and name fields from the movies table

SELECT id, data->>'name' AS name FROM movies;
 id |     name     
----+--------------
  1 | Galaxy Quest
  2 | Terminator
  3 | Kung Pow

By using the ->> operator, the name field is able to SELECTed as would any table field. But the first_name and last_name fields are nested within the JSON string, and could not be SELECTed in the same way. No problem. PostGreSQL comes with magic.

The -> operator returns the original JSON type (which might be an object), whereas ->> returns text. You can use the -> to return a nested object and thus chain the operators, so fetching the nested data for the director is as below.

SELECT id, data->'director'->>'first_name' as director_first_name FROM movies;
 id | director_first_name 
----+-------------------
  1 | Dean
  2 | James
  3 | Steve

In the above query, the first -> has returned an object, which is then chained to the ->> operator and the first_name of each director is exposed.

To concat the first_name and last_name fields is much the same as any concat.

SELECT id, (data->'director'->>'first_name') || ' ' || (data->'director'->>'last_name') as director_name FROM movies;
 id | director_name  
----+----------------
  1 | Dean Parisot
  2 | James Cameron
  3 | Steve Oedekerk

In the above query, the trick is to be wary of operator precedence, otherwise errors will occur.

Filtering Data

But, what if you wanted to see only about the Terminator movie? Just as simply as any WHERE clause with any SQL query.

SELECT * FROM movies WHERE data->>'name' = 'Terminator';
 id |                                          data                                           | rating 
----+-----------------------------------------------------------------------------------------+--------
  2 | { "name": "Terminator", "director": { "first_name": "James", "last_name": "Cameron" } } |      8

Possibly, the requirement is to see data about movies directed by 'Cameron', which lays nested within the JSON. Again, the PostGreSQL JSON type has the solution.

SELECT * FROM movies WHERE data->'director'->>'last_name' = 'Cameron';
 id |                                          data                                           | rating 
----+-----------------------------------------------------------------------------------------+--------
  2 | { "name": "Terminator", "director": { "first_name": "James", "last_name": "Cameron" } } |      8

                    

                  

The level of simplicity provided by PostGreSQL JSON data type exposes the 'Business Intelligence'.

Indexing Data

Indexes may be created on an JSON labels using the PostgreSQL expression indexes. The expression index will allow the addition of unique constraints even within nested JSON data.

To highlight this, lets place a unique constraint on the first_name field within the nested JSON.

CREATE UNIQUE INDEX movies_director_first_name_idx ON movies ((data->'director'->>'first_name'));
CREATE INDEX

No, lets try to INSERT a new movie, with a duplicate first_name.

INSERT INTO movies (data, rating) VALUES (  '{ "name": "Hunger", "director": { "first_name": "Steve", "last_name": "McQueen" } }', 8);
ERROR:  duplicate key value violates unique constraint "movies_director_first_name_idx"
DETAIL:  Key (((data -> 'director'::text) ->> 'first_name'::text))=(Steve) already exists.

The error explains why the query failed, when the unique constraint is violated.

Is PostgreSQL Better Than MongoDB

This is not really a one-to-one comparison. As this tutorial focuses on docstore type of storage, then PostgreSQL is a good solution. Your own data needs may vary, and you are encouraged to read more on MongoDb and PostgreSQL to find the right solution for your application.