Logflare is Awesome.
If you use cloudflare for your site, and don't pay for enterprise, server logs can become a problem if you rely on edge caching.
Enter Logflare. This awesome service comes with an easily installable cloudflare app, and streams access data to a BigQuery backend.
The guides on the site will do a far better job than me of explaining how to set it up, so I recommend reading those.
Server Logs are Awesome too Though
You might already have an in-house solution, or use something like the brilliant Screaming Frog Log File Analyser, so all that lovely data in BigQuery is great, but how do you use it without retooling?
That was the problem I faced for a project, so I quickly hacked up this nodejs script that writes the data in NCSA Extended format, as used by nginx. By default logflare doesn't log content length, and it's not something I've needed, so here the script just enters 0 for all lines.
The Script
This assumes that you have logflare enabled, working and recording data to your BigQuery account.
You will need to add the BigQuery node package by running npm install @google-cloud/bigquery
at the
command line.
You will also need to follow the instructions here Authenticating as a service account to create a service account and download the JSON key file
Then edit the config details as you like, these are all at the top of the script
// require libs
// run npm install @google-cloud/bigquery
const { BigQuery } = require('@google-cloud/bigquery');
const fs = require('fs');
// BigQuery Config - see https://cloud.google.com/docs/authentication/production#create_service_account
const options = {
keyFilename: '{path_to_key_file}',
projectId: '{project_id}',
};
// the name of the logfile to create
const logName = 'access.log';
// your logflare BigQuery table
const table = "{table_name}";
// time period, uncomment / comment, or add your own
/*
---------
ALL TODAY
---------
*/
// const period = `TIMESTAMP_TRUNC(CURRENT_TIMESTAMP(), DAY) AND CURRENT_TIMESTAMP()`;
/*
-------------
ALL YESTERDAY
-------------
*/
const period = `TIMESTAMP_TRUNC(TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY), DAY) AND TIMESTAMP_TRUNC(CURRENT_TIMESTAMP(), DAY)`;
/*
-------------
LAST 24 HOURS
-------------
*/
// const period = `TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY) AND CURRENT_TIMESTAMP()`;
/* end config */
const bigquery = new BigQuery(options);
async function createLogfile() {
const query = `SELECT
FORMAT_TIMESTAMP('%d/%b/%G:%H:%M:%S', t0.timestamp) AS formattedTimestamp,
t0.id,
f4.httpProtocol,
f3.referer,
t0.event_message
FROM
${table} AS t0
INNER JOIN UNNEST(t0.metadata) AS f1 ON TRUE
INNER JOIN UNNEST(f1.request) AS f2 ON TRUE
INNER JOIN UNNEST(f2.headers) AS f3 ON TRUE
INNER JOIN UNNEST(f2.cf) AS f4 ON TRUE
WHERE
(t0.timestamp BETWEEN ${period})
ORDER BY
t0.timestamp ASC
`;
const options = {
query: query,
location: 'US',
};
const [job] = await bigquery.createQueryJob(options);
console.log(`Job ${job.id} started.`);
const [rows] = await job.getQueryResults();
console.log('Rows:');
let logContent = '';
rows.forEach(row => {
const parsed = row.event_message.split(' | ');
let ref = `"-"`;
if (row.referer) {
ref = `"${row.referer}"`;
}
let formattedRow = `${parsed[2]} - - [${row.formattedTimestamp} +0000] "${parsed[0]} ${parsed[4]} ${row.httpProtocol}" ${parsed[1]} 0 ${ref} "${parsed[5]}"`;
console.log(formattedRow);
logContent += `${formattedRow}\n`;
}
);
fs.writeFileSync(logName, logContent, 'utf-8')
}
createLogfile();
View in Github Gist
Running this file using node {filename_you_save_as}.js
will create a log file with the name specified.