Rich Media parsing was previously handled on-demand with a 2 second HTTP request timeout and retained only in Cachex. Every time a Pleroma instance is restarted it will have to request and parse the data for each status with a URL detected. When fetching a batch of statuses they were processed in parallel to attempt to keep the maximum latency at 2 seconds, but often resulted in a timeline appearing to hang during loading due to a URL that could not be successfully reached. URLs which had images links that expire (Amazon AWS) were parsed and inserted with a TTL to ensure the image link would not break.
Rich Media data is now cached in the database and fetched asynchronously. Cachex is used as a read-through cache. When the data becomes available we stream an update to the clients. If the result is returned quickly the experience is almost seamless. Activities were already processed for their Rich Media data during ingestion to warm the cache, so users should not normally encounter the asynchronous loading of the Rich Media data.
Implementation notes:
- The async worker is a Task with a globally unique process name to prevent duplicate processing of the same URL
- The Task will attempt to fetch the data 3 times with increasing sleep time between attempts
- The HTTP request obeys the default HTTP request timeout value instead of 2 seconds
- URLs that cannot be successfully parsed due to an unexpected error receives a negative cache entry for 15 minutes
- URLs that fail with an expected error will receive a negative cache with no TTL
- Activities that have no detected URLs insert a nil value in the Cachex :scrubber_cache so we do not repeat parsing the object content with Floki every time the activity is rendered
- Expiring image URLs are handled with an Oban job
- There is no automatic cleanup of the Rich Media data in the database, but it is safe to delete at any time
- The post draft/preview feature makes the URL processing synchronous so the rendered post preview will have an accurate rendering
Overall performance of timelines and creating new posts which contain URLs is greatly improved.
This field replaces the now deprecated conversation_id field, and now
exposes the ActivityPub object `context` directly via the MastoAPI
instead of relying on StatusNet-era data concepts.
This field seems to be a left-over from the StatusNet era.
If your application uses `pleroma.conversation_id`: this field is
deprecated.
It is currently stubbed instead by doing a CRC32 of the context, and
clearing the MSB to avoid overflow exceptions with signed integers on
the different clients using this field (Java/Kotlin code, mostly; see
Husky and probably other mobile clients.)
This should be removed in a future version of Pleroma. Pleroma-FE
currently depends on this field, as well.
30 to 70% of the objects in the object table are simple JSON objects
containing a single field, 'id', being the context's ID. The reason for
the creation of an object per context seems to be an old relic from the
StatusNet era, and has only been used nowadays as an helper for threads
in Pleroma-FE via the `pleroma.conversation_id` field in status views.
An object per context was created, and its numerical ID (table column)
was used and stored as 'context_id' in the object and activity along
with the full 'context' URI/string.
This commit removes this field and stops creation of objects for each
context, which will also allow incoming activities to use activity IDs
as contexts, something which was not possible before, or would have been
very broken under most circumstances.
The `pleroma.conversation_id` field has been reimplemented in a way to
maintain backwards-compatibility by calculating a CRC32 of the full
context URI/string in the object, instead of relying on the row ID for
the created context object.
- save object ids on pin, instead of activity ids
- pins federation
- removed pinned_activities field from the users table
- activityPub endpoint for user pins
- pulling remote users pins