SQL Select Analytics
Retrieves documents from the database and enables the selection of one or many documents or attributes from one or many collections
Select executes 100% in database and utilizes database indexes when available. Select also utilizes sparse bit map indexes and can outperform commercial analytical SQL Engines in some cases
The best way to learn how select handles documents and joins is to run queries in the mongo shell and analyze the output
This feature is only available from your Nodechef Cloud Search instance. None of the examples provided below will work on a mongodb database. To deploy a cloud search instance, login and navigate to the dashboard. From deployments > deploy cloud search, you can then spin up a cloud search server
Syntax
SELECT
[ ALL | DISTINCT ]
{ select_item [ [AS] output_name ] } [, ...]
FROM [ from_collection | ( SUB SELECT.. ) ]
WITH [ with_options ]
[ INNER | LEFT JOIN condition ]
[ WHERE condition ]
[ GROUP BY expression [, ...] [ HAVING condition [, ...] ] ]
[ ORDER BY { order_list [ ASC | DESC] } [, ...] ]
[ LIMIT { count [ OFFSET start ] | start [ , count ] }
Select command
db.runCommand({
select : "SELECT 'foo'"
})
Passing parameters
You can stringify the entire query without using parameters. Parameters are however more efficient in some cases such as querying against documents and binary types and so on
db.runCommand({
select : "SELECT * from collection where _id = ?param1",
params : {
"?param1" : ObjectId("507f1f77bcf86cd799439011")
}
})
With - Useful flags
db.runCommand({select : "select id, type, '$' + price as 'price', name, batters.batter.type FROM desserts
WITH (as_document) WHERE id = '0001'"})
db.desserts.find(
{ id : "0001" },
{ type : 1, price : 1, name : 1, "batters.batter.type" : 1 }
)
db.runCommand({select : "select id, name, topping FROM desserts WITH (unwind_arrays)"})
Array positional operators
The concept of array positional operators come from mongodb. Positional operators allow you to project a specific element in an array based on the query criteria specified.
Nodechef supports two positioanl operators which are the single position projector ($) and the multi position projector ($*)
Single position projector ($)
This projector allows you to project a single element from an array using the query predicate specified. The query returns the first element in the given array that matches the predicate.
Unlike mongodb, the Nodechef single position projector can project elements nested multiple levels in an array.
{ "_id" : 1
"name" : "Dave Gahan"
"medications" : [
{ "id" : 23,
"name" : "Dilaudid",
"type" : "Rx",
"prescriptions" : [
{ "id" : 77,
"quantity" : 45,
"started" : 2009-02-01
}
]
},
{ "id" : 41,
"name" : "Oxycodone",
"type" : "Rx",
"prescriptions" : [
{ "id" : 77,
"quantity" : 45,
"started" : 2009-02-01
}
]
}
]
}
SELECT _id, name, medications.prescriptions.$ as p FROM patients
WHERE medications.id = 23 AND medications.prescriptions.id = 77
{
"_id" : 1,
"name" : "Dave Gahan",
"p" : [
{ "id" : 77,
"quantity" : 45,
"started" : 2009-02-01
}
]
}
Multi position projector ($*)
This operator can be used to project multiple elements that match the query critera
{
"_id" : ObjectId("4d2d8deff4e6c1d71fc29a07"),
"user_id" : "714638ba-2e08-2168-2b99-00002f3d43c0",
"events" : [
{
"handled" : 1,
"profile" : 10,
"data" : { }
}
{
"handled" : 1,
"profile" : 10,
"data" : { }
}
{
"handled" : 0,
"profile" : 20,
"data" : { }
}
...
]
}
SELECT _id, events.$* as handled_events FROM devices WHERE events.handled = 1
{
"_id" : ObjectId("4d2d8deff4e6c1d71fc29a07"),
"handled_events" : [
{
"handled" : 1,
"profile" : 10,
"data" : { }
}
{
"handled" : 1,
"profile" : 10,
"data" : { }
}
]
}
SELECT _id, events.$*.profile as events_profile FROM devices WHERE events.handled = 1
{
"_id" : ObjectId("4d2d8deff4e6c1d71fc29a07"),
"events_profle" : [ 10, 10 ]
}
Array index projection
To project an element at a specific index in an array. You can specify the index of the element using dot notation. It is important to avoid using numbers as attribute names for arrays in your documents as this can possibly cause unexepcted behaviours when querying the array by index.
{
"_id" : ObjectId("4d2d8deff4e6c1d71fc29a07"),
"user_id" : "714638ba-2e08-2168-2b99-00002f3d43c0",
"events" : [
{
"handled" : 1,
"profile" : 10,
"data" : { }
}
{
"handled" : 1,
"profile" : 10,
"data" : { }
}
...
]
}
{
"_id" : ObjectId("4d2d8deff4e6c1d71fc29a07"),
"first_event" : [
{
"handled" : 1,
"profile" : 10,
"data" : { }
}
]
}
SELECT _id, events.0 as first_event FROM devices WHERE user_id = '714638ba-2e08-2168-2b99-00002f3d43c0'
Hierarchical joins
The arrayfromSelect and documentfromSelect functions shorthand "[]" and "{}" respectively can be used to perform hierarchical joins. These two functions accept a select statement as an input parameter.
These functions can then be nested multiple levels to output a rich JSON document with input from multiple tables.
Example
The below query joins two collections to the NewsFeed collection to retrieve
Comments associated with an entry in the newsFeed as well as the user details
associated with the comment. Note only 20 entries in the NewsFeed collection
are returned as well as only the first 10 comments per post are returned.
SELECT summary, post_date, _id,
[(
SELECT _id, text, {(SELECT _id, display_name FROM Users where _id = c.userid)} as user
FROM comments c WHERE postId = n.id LIMIT 10
)] as comments
FROM news_feed n
WHERE tags = 'tech' and postdate < '2015-10-02T01:11:18.965Z'
LIMIT 20
# The above query will output documents with the below structure.
{
"summary" : "...",
"post_date" : "...",
"_id" : "..",
"comments" : [
{
"_id" : "...",
"text" : "...",
"user" : {
"_id" : "...",
"display_name" : "..."
}
}
]
}
Select cursors
By default select returns a single batch of results, in some cases a cursor can be useful.
To create a cursor, use the enablecursor flag. Note this cursor returned is the same returned from any
mongodb aggregate query. The PHP driver supports this as a CommandCursor. However the node.js driver does
not directly support this. Since you might not be able to hack the driver directly, For this cursor to be
useful, MongoDB 3.2 introduces a new command getMore to make it easy to iterate over a command cursor like
aggregate and Nodechef select.
You can repeatedly call this function to exhaust all the results in this cursor.
By default SELECT returns 1024 documents when the limit clause is ommitted.
Example
{
getmore : <cursor_id>,
batchsize : <int>,
collection <string>,
maxTimeMS : <int>
}
db.runCommand({select : "select id, name, topping FROM desserts WITH (enablecursor)"})
{
"cursor" : {
"id" : NumberLong(0),
"ns" : "test.desserts",
"firstBatch" : [ ]
},
"ok" : 1
}
db.runCommand( { getmore : 2, batchsize : 2048, collection : "desserts" })
Multi Select command
Executes one or more select statements either sequentially or in parallel.
In some cases, you might want to query several collections in a way which is not possible with a join. You can avoid the cost of multiple round trips to the database by sending multiple select statements in a single commands. Each select statement can be executed in parallel on the database if required.
db.runCommand({
multiSelect : 1,
ordered : <1 | 0 | true | false>
parallelExec : <1 | 0 | true | false>
workers : <int>
statements: ["sql_statement", "sql_statement"]
})
- Ordered: Optional. If true, perform an ordered execution of the select statements in the array, and if an error occurs with one of the statements, the database will return without processing the remaining statements in the array. Only applies when parallelExec is false. Default value is false
- ParallelExec: Optional. Instructs the database to execute the select statements in parallel. Default value is false
- Workers: Optional. Total number of threads to use for the parallel execution. The database resets this parameter to the number of CPUs * 2 in cases where the input exceeds CPUs * 2.
- Statements: Required. One or more sql statements to be executed
db.runCommand({
multiSelect : 1,
parallelExec : 1,
statements : [
"select make, count(*) from listings_collection
group by make",
"select model, count(*) from listings_collection
group by model",
"select fuel_type, count(*) from listings_collection
group by fuel_type"
]
})
Case
# Simple CASE expression:
CASE input_expression
WHEN when_expression THEN result_expression [ ...n ]
[ ELSE else_result_expression ]
END
# Searched CASE expression:
CASE
WHEN Boolean_expression THEN result_expression [ ...n ]
[ ELSE else_result_expression ]
END
Operators
Use arithmetic + (Add) for string concatenation
Arithmetic
- + (Add)
- - (Subtract)
- * (Multiply)
- / (Divide)
- % (Modulo)
|
Bitwise
|
Comparison
- = (Equals)
- > (Greater than)
- < (Less than)
- >= (Greater than or Equals)
- <= (Less than or Equals)
- <> (Not eqauls)
- != (Not eqauls)
|
Logical
- ALL
- AND
- ANY
- BETWEEN
- EXISTS
- IN
- LIKE
- NOT
- OR
|
Aggregate functions
- int64Count([DISTINCT] expression)
- inputAvg([DISTINCT] expression)
- inputMin([DISTINCT] expression)
- inputMax([DISTINCT] expression)
- inputSum([DISTINCT] expression)
- inputStdev([DISTINCT] expression)
- inputStdevp([DISTINCT] expression)
- inputVar([DISTINCT] expression)
- inputVarp([DISTINCT] expression)
- inputFirst(expression)
- inputLast(expression)
- arrayPush(expression) Returns an unordered array
- arrayAddToSet(expression) Returns an array of ordered elements.
String functions
Math functions
Parameter number refers to any integer or floating point data type which includes int32, int64, float, double, decimal
input means the function returns a value of data type as that of the input parameter
- numberAbs(number)
- doubleAcos(number)
- doubleAsin(number)
- doubleAtan(number)
- doubleAtn2(double, double)
- inputCos(decimal / double)
- doubleCos(double)
- doubleCot(double)
- doubleDegrees(double)
- doubleExp(double)
- inputFloor(decimal / double)
- doubleLog(double, optional double)
- doubleLog10(double)
- doublePi()
- doublePower(double, double)
- doubleRadians(double)
- inputRound(decimal/double, optional int32)
- int32Sign(number)
- doubleSin(double)
- doubleSqrt(double)
- inputSquare(number)
- doubleTan(double)
Date functions
- datetimeNow()
- datetimeDateAdd(string date_part, int32 add_value, datetime)
date_part: day, hour, minute, second, millisecond, month, year
- doubleDateDiff(string date_part, datetime, datetime)
- int32DatePart(string date_part, datetime)
date_part: week, weekday, dayofyear, day, hour, minute, second, millisecond, month, year
- stringDateConvert(string format_str, datetime)
eg: DateConvert('{yyyy}-{mm}-{dd} {hh}:{mm}:{ss}.{ms}', now());
- hh - hour
- mm - minute
- ss - second
- ms - millisecond
- dd - day of month
- yyyy - year (1999)
- yy - year (99)
- MM - month (01 - 12)
- MS - month (Jan - Dec)
- doy - day of year
- ddd - day of week
- dddd - day of week string
Array functions
- arrayArrayFromSelect(sql_select)
SELECT post_tile, post_description, aggregate_likes_count,
ArrayFromSelect((
SELECT comment_ts, comment, u.user, u.profile_pic FROM comments c
INNER join users u on u._id = c._id
WHERE c.post_id = p._id)) as post_comments
FROM posts p
WHERE post_id = 20
If the select statement in the ArrayFromSelect returns multiple columns, this function
will return an array of sub documents. However, if the select statement returned only a single column,
the select statement will return an array with the value of the column from each row returned. A sub document is not created in this case.
This function can be used to perform hierarchical joins.
The select statement inside the ArrayFromSelect must be surrounded with brackets else a syntax error will be thrown.
- inputGetElement(element_index, array)
- int32Arraylength(array)
- arrayArrayslice(array, int32 offset, optional int32 length)
- arrayArrayMultiIntersect(array, array, ...)
- arrayArrayMultiUnion(array, array, ...)
- boolArrayIsSubset(array total_set, array sub_set)
- arrayArrayRelativeComplement(array seta, array setb)
Cast functions
- inputIsNull(check_expression, return_expression)
- int32ToInt32(expression)
- int64ToInt64(expression)
- floatToFloat(expression)
- doubleToDouble(expression)
- decimalToDecimal(expression)
Decimal values must be converted to string first before it can be sent across the wire. BSON documents lack support for decimals.
- DatetimeToDatetime(expression)
- StringToString(expression)
- floatGeoPointX(geopoint_expression)
If the parameter is an array, the first element is treated as Y (Longitude) and the second element is treated as X (Latitude).
If the parameter is a BSON document, the "lng" attribute represents longitude (Y) and "lat" represents latitude (X).
- floatGeoPointY(geopoint_expression)
Geo search functions
Geo search functions can be used in combination with any other group of functions such as text, math etc. Nodechef uses spherical geometry for all operations except when the geojson object specified is a bounding box
- boolnearSphere([field], { $geometry : {}, $minDistance, $maxDistance })
SELECT * FROM restaurants WHERE nearSphere([location], {
$geometry : { type : "point", coordinates : [ -73.9667, 40.78 ] },
$minDistance: 1000,
$maxDistance: 5000
}) AND menu_tags IN ('poutine', 'rotisserie chicken')
Reference the
mongodb docs for a complete guide on how the nearSphere function works
The $geometry specified must be a valid GeoJson object. if the $geometry specified is a point, the function behaves exactly like the mongodb $nearSphere operator.
Unlike in mongodb, the NodeChef $nearSphere accepts other valid GeoJson objects.
- boolgeoWithin([field], { $geometry : { <geojson_geometry> } })
SELECT * FROM restaurants WHERE geoWithin([location], { $geometry : {
type : "Polygon",
coordinates: [ [ [ 0, 0 ], [ 3, 6 ], [ 6, 1 ], [ 0, 0 ] ] ]
}})
- boolgeoWithin([field], { $polygon : [] })
- boolgeoWithin([field], { $box : [] })
- boolgeoWithin([field], { $centerSphere: [ [ <x>, <y> ], <radius> ] })
- boolgeoIntersects([field], { $geometry : { <geojson_geometry> } })
SELECT * FROM restaurants WHERE geoIntersects([location], { $geometry : {
type: "Polygon" ,
coordinates: [ [ [ 0, 0 ], [ 3, 6 ], [ 6, 1 ], [ 0, 0 ] ] ]
}})
Text functions
- booltextSearch({ <valid $textx document> }, [field])
SELECT s.address, s.storeName, p.price, p.title, p.details FROM products p
INNER join seller s on s._id = p.seller_id
WHERE textsearch({ phrase : "civilization or barba", gap : 2, fuzzy : "T2_2" }, [title])
AND in_stock = 1 AND price <= 200
For a complete reference on a $textx document, reference the search guide found
here.
You can use the textsearch function in combination with other groups of functions such as geo search, math etc.
Unfortunately you cannot perform snipettiing using SQL Select. One must use the JSON style query from the search guide.