Skip to content

mubin986/skype-teams-export-importer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Skype / Microsoft Teams Export Importer

Imports a Microsoft Skype or Microsoft Teams (free / personal) data export (messages.json, endpoints.json, invites.json) into a MongoDB database so the content can be browsed, queried, and searched with any Mongo client (MongoDB Compass, mongosh, Studio 3T, etc.).

Both products share the same export format — consumer Skype was merged into Teams, and the "Download your data" flow produces the same JSON schema for either. Conversations from both apps coexist in a single export:

  • @thread.skype, @cast.skype — classic Skype group chats
  • @thread.v2, uni01_...@thread.v2, meeting_...@thread.v2 — Microsoft Teams chats, meetings, and communities

The messages.json file can be very large (tens to hundreds of MB) and is painful to open in a text editor. This script streams it incrementally — it never loads the full file into memory.

Scope — "Chat history" export only. Microsoft's Export my data page offers two export types: Chat history and Media. This tool only processes the Chat history bundle (messages.json, endpoints.json, invites.json). Images, videos, and other attachments from the Media export are out of scope — message documents will still reference them (via amsreferences or <URIObject> blobs in content), but the binary files themselves are not imported.

What gets imported

The script populates five collections in the target database (skype_export by default):

Collection Source _id Description
conversations messages.json conversation id Conversation metadata (display name, thread properties, member count, etc.)
messages messages.json ${conversationid}:${id} One document per message, linked to its conversation via conversationid
endpoints endpoints.json endpointId Device/transport records (TROUTER, FCM, APNs, etc.)
invites invites.jsonconversations conversation id Per-conversation invite link + history
user invites.jsonuser "invites" User-level invite link, history, community notification settings

Indexes created on messages:

  • { conversationid: 1, originalarrivaltime: -1 } — list a chat's messages newest-first
  • { messagetype: 1 } — filter by type (RichText, RichText/Html, ThreadActivity/AddMember, …)
  • { displayName: 1 } — find messages from a particular sender
  • { content: "text", displayName: "text" } (named content_text) — full-text search over message bodies and sender names, usable immediately with db.messages.find({ $text: { $search: "..." } })

Nested stringified JSON in threadProperties (members, membersBlocked, membersNicknames) is parsed into real arrays on the way in.

Data model

Each export file is broken apart and flattened into its own collection. The top-level envelope of messages.json (userId, exportDate, conversations: [...]) and of invites.json (user, conversations) is discarded — the useful content lives one level down.

conversations

One document per chat. The original MessageList array is stripped out (those become rows in messages) and replaced with a messageCount integer.

{
  "_id": "19:<thread-hash>@thread.skype",
  "id":  "19:<thread-hash>@thread.skype",
  "displayName": "<group name>",
  "version": 1700000000000,
  "properties": {
    "conversationblocked": false,
    "lastimreceivedtime": "2026-01-01T00:00:00.000Z",
    "consumptionhorizon": "<numeric;numeric;numeric>",
    "onetoonev2threadid": null
  },
  "threadProperties": {
    "membercount": 3,
    "members": ["<member 1>", "<member 2>", "<member 3>"], // parsed from a JSON string
    "topic": "<topic>",
    "picture": null,
    "description": null
  },
  "messageCount": 1234
}

messages

One document per message. Linked to conversations via conversationid. _id is ${conversationid}:${id} so re-imports never create duplicates and the same doc is addressable across runs.

{
  "_id":            "19:<thread-hash>@thread.skype:<message-id>",
  "id":             "<message-id>",
  "conversationid": "19:<thread-hash>@thread.skype",
  "displayName":    "<sender display name>",
  "originalarrivaltime": "2024-01-01T12:00:00.000Z",
  "messagetype":    "RichText",
  "version":        1700000000000,
  "content":        "<message body — plain text or HTML depending on messagetype>",
  "from":           null,
  "properties": {
    "s2spartnername": "chat-service-v1",
    "importedBy": { "Prefix": "8", "Network": "live", "RawValue": "8:live:<user>" },
    "importedTime": "2025-01-01T00:00:00.000Z"
  },
  "amsreferences": null
}

Common messagetype values you'll see:

Type Meaning
RichText Plain text message
RichText/Html Formatted (HTML) message
RichText/Media_GenericFile File attachment — content is a <URIObject> XML blob
RichText/UriObject Image/photo attachment
ThreadActivity/AddMember System event: someone joined the chat
ThreadActivity/DeleteMember System event: someone left/was removed
ThreadActivity/HistoryDisclosedUpdate System event: history visibility toggled
ThreadActivity/TopicUpdate System event: topic changed

endpoints

One document per registered endpoint/device. Almost a 1:1 copy of each endpoints[i] from endpoints.json, with _id set to endpointId.

{
  "_id":         "<endpoint-uuid>",
  "endpointId":  "<endpoint-uuid>",
  "aadDeviceId": null,
  "nodeId":      null,
  "timestamp":   "2026-01-01T00:00:00.0000000Z",
  "transports": {
    "transports": [
      { "transportType": "TROUTER", "path": "https://<trouter-host>/...", "contexts": ["MESSAGING"], "isDeleted": false },
      { "transportType": "FCM",     "path": "<fcm-token>",                "contexts": ["TFL"],       "isDeleted": false }
    ]
  }
}

invites

One document per conversation that has an invite link (from invites.json → conversations). _id is the conversation id, so you can join against conversations on _id.

{
  "_id":            "19:<thread-hash>@thread.skype",
  "conversationId": "19:<thread-hash>@thread.skype",
  "inviteLink": "https://teams.live.com/l/invite/<token>",
  "inviteLinkHistory": [
    { "inviteLink": "https://teams.live.com/l/inv<old-token-1>", "createdOn": "2026-01-01T00:00:00.0000000Z" },
    { "inviteLink": "https://teams.live.com/l/inv<old-token-2>", "createdOn": "2025-12-01T00:00:00.0000000Z" }
  ]
}

user

A single-document collection holding the user-level settings from invites.json → user. Stored as _id: "invites" so it's easy to find.

{
  "_id": "invites",
  "communityNotifications": {
    "inviteOnNetworkEmailOptIn": true,
    "announcementEmailOptIn": true
  },
  "inviteLink": "https://teams.live.com/l/invite/<token>",
  "inviteLinkHistory": []
}

Relationships at a glance

conversations._id  ◀──────────  messages.conversationid
conversations._id  ◀──────────  invites._id

There's no server-side foreign-key enforcement — these are plain documents — but the _id scheme makes $lookup joins trivial if you want them.

Requirements

  • Node.js ≥ 20.6 (required for the built-in --env-file flag and top-level await). Tested on Node 22.
  • A reachable MongoDB instance — local, Docker, or Atlas (mongodb+srv://...).

Check your Node version:

node --version

If you need to upgrade, use nvm:

nvm install 22
nvm use 22

Setup

npm install
cp .env.example .env   # then edit .env if your Mongo lives elsewhere

.env supports:

Variable Default Description
MONGO_URI mongodb://localhost:27017 MongoDB connection string
MONGO_DB skype_export Database name
EXPORT_DIR script directory Directory containing the three export JSON files

Usage

Place messages.json, endpoints.json, and invites.json next to import.js (or point EXPORT_DIR at them), then run:

npm run import

Before touching MongoDB the script peeks at the first 4 KB of messages.json to make sure it looks like a Skype/Teams chat-history export (contains userId, exportDate, conversations at the top level). If the file is missing or the shape is wrong the run aborts immediately — protection against accidentally pointing EXPORT_DIR at the wrong file and wiping the DB.

By default the script runs in wipe-and-insert mode, so it shows a destructive-action warning:

⚠  DESTRUCTIVE  These collections will be WIPED and re-imported:
   db:          skype_export
   uri:         mongodb://localhost:27017
   collections: conversations, messages, endpoints, invites, user
   Any data manually added to these collections will be lost.
   (Tip: pass --upsert for a non-destructive re-import.)

Type 'yes' to continue:

Only yes / y proceeds. Flags:

Flag Effect
--upsert Non-destructive re-import. Uses bulkWrite with upsert: true, so existing docs are replaced by _id and anything you added manually (tags, notes, extra fields on your own docs) survives.
--yes/-y Skip the interactive prompt — useful for scripted runs.

Examples:

npm run import                  # wipe + re-import, with confirmation
npm run import -- --upsert      # non-destructive re-import, with confirmation
npm run import -- --upsert -y   # non-destructive, no prompt

Running the script multiple times is safe in either mode — _ids are deterministic so upsert runs stay idempotent.

Browsing the data

Once imported, connect with any Mongo client. Some useful queries:

// 1-to-1 and group chat list sorted by most recent activity
db.conversations.find().sort({ version: -1 }).limit(50);

// All real chat messages in a given conversation, newest first
db.messages
  .find({
    conversationid: "<paste a conversation _id here>",
    messagetype: { $in: ["RichText", "RichText/Html"] },
  })
  .sort({ originalarrivaltime: -1 });

// Everything sent by a specific person
db.messages.find({ displayName: "<sender display name>" }).sort({ originalarrivaltime: -1 });

// Full-text search (the importer already created the text index)
db.messages.find({ $text: { $search: "<keyword>" } });

Files

  • import.js — the streaming importer
  • package.json — dependencies (mongodb, stream-json) and the import script
  • .env.example — template for connection settings
  • .gitignore — keeps node_modules/, .env, and any *.json data dumps out of git

Notes on privacy

The exports contain personal chat history. The default .gitignore excludes *.json (other than package.json / package-lock.json) so you don't accidentally commit them. Double-check before pushing to any public remote.

About

Import Microsoft Skype / Teams 'Chat history' data exports into MongoDB for easy browsing and search. Streams large messages.json files without loading them into memory.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors