Imports a Microsoft Skype or Microsoft Teams (free / personal) data export (messages.json, endpoints.json, invites.json) into a MongoDB database so the content can be browsed, queried, and searched with any Mongo client (MongoDB Compass, mongosh, Studio 3T, etc.).
Both products share the same export format — consumer Skype was merged into Teams, and the "Download your data" flow produces the same JSON schema for either. Conversations from both apps coexist in a single export:
@thread.skype,@cast.skype— classic Skype group chats@thread.v2,uni01_...@thread.v2,meeting_...@thread.v2— Microsoft Teams chats, meetings, and communities
The messages.json file can be very large (tens to hundreds of MB) and is painful to open in a text editor. This script streams it incrementally — it never loads the full file into memory.
Scope — "Chat history" export only. Microsoft's Export my data page offers two export types: Chat history and Media. This tool only processes the Chat history bundle (
messages.json,endpoints.json,invites.json). Images, videos, and other attachments from the Media export are out of scope — message documents will still reference them (viaamsreferencesor<URIObject>blobs incontent), but the binary files themselves are not imported.
The script populates five collections in the target database (skype_export by default):
| Collection | Source | _id |
Description |
|---|---|---|---|
conversations |
messages.json |
conversation id | Conversation metadata (display name, thread properties, member count, etc.) |
messages |
messages.json |
${conversationid}:${id} |
One document per message, linked to its conversation via conversationid |
endpoints |
endpoints.json |
endpointId |
Device/transport records (TROUTER, FCM, APNs, etc.) |
invites |
invites.json → conversations |
conversation id | Per-conversation invite link + history |
user |
invites.json → user |
"invites" |
User-level invite link, history, community notification settings |
Indexes created on messages:
{ conversationid: 1, originalarrivaltime: -1 }— list a chat's messages newest-first{ messagetype: 1 }— filter by type (RichText,RichText/Html,ThreadActivity/AddMember, …){ displayName: 1 }— find messages from a particular sender{ content: "text", displayName: "text" }(namedcontent_text) — full-text search over message bodies and sender names, usable immediately withdb.messages.find({ $text: { $search: "..." } })
Nested stringified JSON in threadProperties (members, membersBlocked, membersNicknames) is parsed into real arrays on the way in.
Each export file is broken apart and flattened into its own collection. The top-level envelope of messages.json (userId, exportDate, conversations: [...]) and of invites.json (user, conversations) is discarded — the useful content lives one level down.
One document per chat. The original MessageList array is stripped out (those become rows in messages) and replaced with a messageCount integer.
One document per message. Linked to conversations via conversationid. _id is ${conversationid}:${id} so re-imports never create duplicates and the same doc is addressable across runs.
{
"_id": "19:<thread-hash>@thread.skype:<message-id>",
"id": "<message-id>",
"conversationid": "19:<thread-hash>@thread.skype",
"displayName": "<sender display name>",
"originalarrivaltime": "2024-01-01T12:00:00.000Z",
"messagetype": "RichText",
"version": 1700000000000,
"content": "<message body — plain text or HTML depending on messagetype>",
"from": null,
"properties": {
"s2spartnername": "chat-service-v1",
"importedBy": { "Prefix": "8", "Network": "live", "RawValue": "8:live:<user>" },
"importedTime": "2025-01-01T00:00:00.000Z"
},
"amsreferences": null
}Common messagetype values you'll see:
| Type | Meaning |
|---|---|
RichText |
Plain text message |
RichText/Html |
Formatted (HTML) message |
RichText/Media_GenericFile |
File attachment — content is a <URIObject> XML blob |
RichText/UriObject |
Image/photo attachment |
ThreadActivity/AddMember |
System event: someone joined the chat |
ThreadActivity/DeleteMember |
System event: someone left/was removed |
ThreadActivity/HistoryDisclosedUpdate |
System event: history visibility toggled |
ThreadActivity/TopicUpdate |
System event: topic changed |
One document per registered endpoint/device. Almost a 1:1 copy of each endpoints[i] from endpoints.json, with _id set to endpointId.
{
"_id": "<endpoint-uuid>",
"endpointId": "<endpoint-uuid>",
"aadDeviceId": null,
"nodeId": null,
"timestamp": "2026-01-01T00:00:00.0000000Z",
"transports": {
"transports": [
{ "transportType": "TROUTER", "path": "https://<trouter-host>/...", "contexts": ["MESSAGING"], "isDeleted": false },
{ "transportType": "FCM", "path": "<fcm-token>", "contexts": ["TFL"], "isDeleted": false }
]
}
}One document per conversation that has an invite link (from invites.json → conversations). _id is the conversation id, so you can join against conversations on _id.
{
"_id": "19:<thread-hash>@thread.skype",
"conversationId": "19:<thread-hash>@thread.skype",
"inviteLink": "https://teams.live.com/l/invite/<token>",
"inviteLinkHistory": [
{ "inviteLink": "https://teams.live.com/l/inv<old-token-1>", "createdOn": "2026-01-01T00:00:00.0000000Z" },
{ "inviteLink": "https://teams.live.com/l/inv<old-token-2>", "createdOn": "2025-12-01T00:00:00.0000000Z" }
]
}A single-document collection holding the user-level settings from invites.json → user. Stored as _id: "invites" so it's easy to find.
{
"_id": "invites",
"communityNotifications": {
"inviteOnNetworkEmailOptIn": true,
"announcementEmailOptIn": true
},
"inviteLink": "https://teams.live.com/l/invite/<token>",
"inviteLinkHistory": []
}conversations._id ◀────────── messages.conversationid
conversations._id ◀────────── invites._id
There's no server-side foreign-key enforcement — these are plain documents — but the _id scheme makes $lookup joins trivial if you want them.
- Node.js ≥ 20.6 (required for the built-in
--env-fileflag and top-levelawait). Tested on Node 22. - A reachable MongoDB instance — local, Docker, or Atlas (
mongodb+srv://...).
Check your Node version:
node --versionIf you need to upgrade, use nvm:
nvm install 22
nvm use 22npm install
cp .env.example .env # then edit .env if your Mongo lives elsewhere.env supports:
| Variable | Default | Description |
|---|---|---|
MONGO_URI |
mongodb://localhost:27017 |
MongoDB connection string |
MONGO_DB |
skype_export |
Database name |
EXPORT_DIR |
script directory | Directory containing the three export JSON files |
Place messages.json, endpoints.json, and invites.json next to import.js (or point EXPORT_DIR at them), then run:
npm run importBefore touching MongoDB the script peeks at the first 4 KB of messages.json to make sure it looks like a Skype/Teams chat-history export (contains userId, exportDate, conversations at the top level). If the file is missing or the shape is wrong the run aborts immediately — protection against accidentally pointing EXPORT_DIR at the wrong file and wiping the DB.
By default the script runs in wipe-and-insert mode, so it shows a destructive-action warning:
⚠ DESTRUCTIVE These collections will be WIPED and re-imported:
db: skype_export
uri: mongodb://localhost:27017
collections: conversations, messages, endpoints, invites, user
Any data manually added to these collections will be lost.
(Tip: pass --upsert for a non-destructive re-import.)
Type 'yes' to continue:
Only yes / y proceeds. Flags:
| Flag | Effect |
|---|---|
--upsert |
Non-destructive re-import. Uses bulkWrite with upsert: true, so existing docs are replaced by _id and anything you added manually (tags, notes, extra fields on your own docs) survives. |
--yes/-y |
Skip the interactive prompt — useful for scripted runs. |
Examples:
npm run import # wipe + re-import, with confirmation
npm run import -- --upsert # non-destructive re-import, with confirmation
npm run import -- --upsert -y # non-destructive, no promptRunning the script multiple times is safe in either mode — _ids are deterministic so upsert runs stay idempotent.
Once imported, connect with any Mongo client. Some useful queries:
// 1-to-1 and group chat list sorted by most recent activity
db.conversations.find().sort({ version: -1 }).limit(50);
// All real chat messages in a given conversation, newest first
db.messages
.find({
conversationid: "<paste a conversation _id here>",
messagetype: { $in: ["RichText", "RichText/Html"] },
})
.sort({ originalarrivaltime: -1 });
// Everything sent by a specific person
db.messages.find({ displayName: "<sender display name>" }).sort({ originalarrivaltime: -1 });
// Full-text search (the importer already created the text index)
db.messages.find({ $text: { $search: "<keyword>" } });- import.js — the streaming importer
- package.json — dependencies (
mongodb,stream-json) and theimportscript - .env.example — template for connection settings
- .gitignore — keeps
node_modules/,.env, and any*.jsondata dumps out of git
The exports contain personal chat history. The default .gitignore excludes *.json (other than package.json / package-lock.json) so you don't accidentally commit them. Double-check before pushing to any public remote.
{ "_id": "19:<thread-hash>@thread.skype", "id": "19:<thread-hash>@thread.skype", "displayName": "<group name>", "version": 1700000000000, "properties": { "conversationblocked": false, "lastimreceivedtime": "2026-01-01T00:00:00.000Z", "consumptionhorizon": "<numeric;numeric;numeric>", "onetoonev2threadid": null }, "threadProperties": { "membercount": 3, "members": ["<member 1>", "<member 2>", "<member 3>"], // parsed from a JSON string "topic": "<topic>", "picture": null, "description": null }, "messageCount": 1234 }