Proposal - Web3 Analytics - Dune Analytics for off-chain data

Grant/Project Name:

Web3 Analytics

Proposer’s contact info:

Andy Jagoe (Contact Me)

Grant Category:

Tooling/Analytics

ELI5 Project Summary:

A decentralized and composable analytics platform for web3. Like Dune Analytics for off-chain data. Or a web3 version of Google Analytics.

Project Description:

Today, web3 projects have no good way to transparently collect and share usage and traction analytics with stakeholders when it happens off-chain. On-chain analytics are great, but they’re the very bottom of the funnel. A tiny fraction of the user journey. They miss everything that leads up to a transaction, and everything that happens afterwards. Alone, they are just not enough.

80% of the top 10,000 web sites on the internet use Google Analytics. In fact, the majority of web3 projects do too. But, as a web3 project, giving all your data to Google breaks your brand promise to users, goes against the ethos of decentralization, and misses the benefits of transparency and composability. Big Tech sells this data to the highest bidder for ad targeting. And users have no control over their data and no visibility into what is collected.

But a web3 project without analytics is flying blind. Without analytics, you have no data driven product insights. Only guesses. Trying to build a consumer product without analytics is like trying to fly a 747 jet airplane without instruments or a dashboard. It’s very difficult, and often ends in a bad outcome. A competitor who has analytics will fly circles around you if you don’t.

The goal of this project is to do for off-chain data what Ethereum + Dune Analytics has done for on-chain data.

How is this different?

All the analytics solutions today are centralized and default closed. This means that only the app owner can see the dashboards and what data is being collected. Users do not own their data and may not delete it unless the app owner allows them to. Also, because data is centralized, these solutions can be censored or shut down. And most importantly, data in today’s closed systems is not composable or verifiable, inhibiting innovation and creativity.

By contrast, the solution I’m proposing is a decentralized and composable public good, where all data is readable by anyone and censoring it or shutting it down is difficult. Projects can get critical product insights to improve user experience without breaking a user’s trust or compromising web3 values. Composability encourages innovation and enables permissionless analysis not possible today. And it is default open, so it increases transparency around project usage and traction. It does for off-chain data what Dune Analytics has done for on-chain data.

Current Status:

I have received a small grant from Filecoin Foundation toward this project and have already built a proof-of-concept (alpha) using Ceramic DIDDatastore, TileLoader and a custom Secp256k1 key did resolver. The system consists of 5 components:

  • Front-End Instrumentation Tooling
  • Decentralized Data Network (Ceramic)
  • Smart Contract Registry
  • Indexer
  • Dashboard Builder / UI

Here is how the system works, with links to source code and the prototype.

Sample dashboard (live URL):

Sample query (live URL):

Purpose of this grant:

The purpose of this grant is to help me finish migrating the system to ComposeDB and to deliver the below features needed for usability and parity with web2 alternatives:

  • Indexer improvements for production scalability
  • User forking of queries and dashboards (like Dune Analytics)
  • Adding geo tracking, geo queries, and map dashboard components
  • Adding engagement/time spent tracking and retention / cohort analysis queries and dashboard components
  • Automatic updating of queries and dashboards
  • Low balance smart contract monitoring and email notification

Relevant links:

Ceramic Ecosystem Value Proposition:

What is the problem statement this proposal hopes to solve for the Ceramic ecosystem?

  • From the ceramic ecosystem perspective: how do you collect data and share product insights with all project stakeholders in a way that preserves user privacy and encourages permissionless innovation?
  • From the 3Box perspective: how do you accelerate developer adoption of Ceramic?

How does your proposal offer a value proposition solving the above problem?

  • Every application needs analytics, and open and composable analytics is better than closed and proprietary analytics. Web3 is a good niche to start with, but ultimately this project could make every developer a Ceramic customer.

Why will this solution be a source of growth for the Ceramic ecosystem?

  • Each application that instruments with Web3 Analytics generates a continuous river of data into Ceramic. More data means more potential fee income for the Ceramic ecosystem. More fee income enables growth and reinvestment by all participants into the Ceramic ecosystem.
  • The risk of not having an application like this on Ceramic is that someone creates a similar application for a competing ecosystem and that ecosystem is the beneficiary of the increased usage and fee income instead.

Funding requested (DAI/USDC):

$39,200

Milestones:

  • Milestone #1 (4 weeks):
    • Migrate to Composites
      • Create new composite for analytics data
      • Include location and time data in composite (not in current model)
      • Test and publish new composite
    • New data indexer features
      • Batch load event objects instead of loading individually with TileDocument.load()
      • Complete general migration to ComposeDB
    • New instrumentation features
      • Build location database service using maxmind
      • Integrate location data option in tracking package
    • New sample SQL queries
      • Geo/location analysis
      • Retention / cohort analysis
    • New dashboard components
    • Integration testing and dev ops

Future Roadmap:

  • Authenticity verification for tracking (i.e. how to prevent bad data / spam)
    • Modify the tracking library to sign the page and inputs and send the signature with each tracking payload.
    • App owner signs and stores canonical “true” copy of the page/site in Ceramic and a new version is added with each deploy using a Github workflow
    • Indexers can look up the signed and verified page at the time the payload was written and verify it is real user data.
  • Once ComposeDB is more mature and scalable, use GraphQL to go directly to ComposeDB for queries/dashboard instead of the intermediate step of using an indexer to load Ceramic data into an S3 Data Lake and using SQL via AWS Athena.

I accept the 3Box Labs Grants terms and conditions: Yes

I understand that I will be required to provide additional KYC information to the 3Box Labs to receive this grant: Yes

2 Likes

Hi @andyjagoe, thank you for your proposal! We will be in touch with an update once we have completed our initial review (1-2 weeks).

Congratulations @andyjagoe, I’m delighted to inform you that your grant proposal has been accepted! :tada:

We would like to award you a Ceramic Builders Grant.

We will follow up shortly with more details via email.

1 Like

Thanks @0x_Sam ! Very excited to continue the project and work more with the 3Box Labs team!

Web3 Analytics is now fully migrated to ComposeDB

Recent progress includes:

  • Set up a Ceramic node running ComposeDB and configured to allow traffic via SSL. The node is available at https://ceramic.web3analytics.network
  • Migrated Ceramic implementation to ComposeDB:
    • Updated analytics plugin to use ComposeDB
    • Updated demo apps (#1, #2) to use new ComposeDB based analytics plugin
    • Video of demo apps running in debug mode and successful GraphQL console.log update responses from ComposeDB
  • Migrated our data indexer to ComposeDB. The indexer runs every hour, loading the latest data from Ceramic into an AWS data lake that the web app executes SQL queries against using AWS Athena.
  • Put all the pieces back together so all data flowing to https://web3analytics.network/ app uses ComposeDB

If anyone would like to try instrumenting a web app or blog to try out Web3 Analytics, I’d welcome it! I’ve recently implemented a few features that make doing this much easier. Web3 Analytics currently runs on Goerli, so you need to have some GoerliETH to use it. Here are the new features:

ComposeDB Issue Encountered
I’m currently running Ceramic/ComposeDB in bundled mode (according to these instructions) and I’ve noticed that ComposeDB stops responding sometimes and generates 50X errors until it’s restarted. High on my list is moving to a proper production deployment to hopefully not encounter these type of errors anymore.

Next up
Set up a location database service and modify my composite to enable location data support. Then, update analytics instrumentation to populate location data correctly for web visits and enable it to flow into ComposeDB and the rest of the Web3Analytics data pipeline.

2 Likes

Update for this week

Next Up: Final Milestones

1 Like

Update for this week

This week Web3 Analytics got a new home page and also working map charts! Here are some screenshots of a dashboard with the new working map components. You can check them out live here.

US Map:

World Map:

Milestone Update

The following milestones have now been completed:

Final Milestone

The last item to complete in this project is:

Project Update

Completed final task ‘Cohort analysis widget using chartjs-chart-matrix.

The SQL query to support this chart is available here. You can interact with the chart on a sample dashboard here.

All milestones are now completed and this project is finished.

Screenshot of Cohort Weekly Retention Chart:

Hi Andy, this looks great!

I’m wondering about the encoded composite for the Event model, have you made it available somewhere please? It could be useful to have it available so others can deploy it directly to their nodes.
Also, do you have a tutorial or other instructions that could help others setup their app to use your system?
Thanks!

1 Like

Thanks Paul!

Regarding the model, the definition.js file used by the system is here: https://github.com/andyjagoe/analytics-plugin-web3analytics/blob/main/src/definition.js

Also, there is a full gitbook docs site that explains how to setup/instrument your app and how to use the system: https://web3-analytics.gitbook.io/product-docs/. The docs are also linked from the home page (https://web3analytics.network/) via the “Learn More” link.

Thanks!

For the model, the analytics-plugin-web3analytics/definition.js at main · andyjagoe/analytics-plugin-web3analytics · GitHub file is the runtime definition, so it can be used by clients but not deployed to nodes, for this you need the encoded definition generated by running composedb composite:create.

Basically, with the step you have here: GitHub - andyjagoe/web3-analytics-composedb people will create and deploy a new model rather than re-use the one you already created, so they’ll also need compile the composite and use it at runtime.
It would be much easier if everyone simply used the model you already created, this way there’s no need to configure the runtime definition, the one you have in analytics-plugin-web3analytics/definition.js at main · andyjagoe/analytics-plugin-web3analytics · GitHub could be used by everyone directly.

1 Like

Thanks for the clarification, Paul! I’ve made the encoded definition available here and updated the project Readme to recommend using it vs creating a new one.

2 Likes

Thanks @andyjagoe. Congratulations on completing the last of this grant proposal’s milestones. We’ve fully funded this grant and look forward to seeing the continued growth of Web3 Analytics.

2 Likes

Thanks @0x_Sam. Appreciate your support.

Congrats Andy! I flipped through the different queries you set up, really cool stuff you’ve built.

1 Like

Thanks @avi - I’m glad you like it!