spc-pleroma

Commit Graph

Author	SHA1	Message	Date
Mark Felder	a50c657427	Add a dedicated connection pool for Rich Media Sharing this pool with regular Media is problematic as Rich Media will connect to many different domains and thrash the pool, but regular Media will have predictable connections to the webservers hosting media for the fediverse servers you peer with.	2024-05-27 11:17:02 -04:00
Mark Felder	9a83301ff8	Credo	2024-05-07 22:11:19 -04:00
Mark Felder	df0734fcbf	Increase the :max_body for Rich Media to 5MB Websites are increasingly getting more bloated with tricks like inlining content (e.g., CNN.com) which puts pages at or above 5MB. This value may still be too low.	2024-05-07 19:54:56 -04:00
Mark Felder	ede414094f	RichMedia refactor Rich Media parsing was previously handled on-demand with a 2 second HTTP request timeout and retained only in Cachex. Every time a Pleroma instance is restarted it will have to request and parse the data for each status with a URL detected. When fetching a batch of statuses they were processed in parallel to attempt to keep the maximum latency at 2 seconds, but often resulted in a timeline appearing to hang during loading due to a URL that could not be successfully reached. URLs which had images links that expire (Amazon AWS) were parsed and inserted with a TTL to ensure the image link would not break. Rich Media data is now cached in the database and fetched asynchronously. Cachex is used as a read-through cache. When the data becomes available we stream an update to the clients. If the result is returned quickly the experience is almost seamless. Activities were already processed for their Rich Media data during ingestion to warm the cache, so users should not normally encounter the asynchronous loading of the Rich Media data. Implementation notes: - The async worker is a Task with a globally unique process name to prevent duplicate processing of the same URL - The Task will attempt to fetch the data 3 times with increasing sleep time between attempts - The HTTP request obeys the default HTTP request timeout value instead of 2 seconds - URLs that cannot be successfully parsed due to an unexpected error receives a negative cache entry for 15 minutes - URLs that fail with an expected error will receive a negative cache with no TTL - Activities that have no detected URLs insert a nil value in the Cachex :scrubber_cache so we do not repeat parsing the object content with Floki every time the activity is rendered - Expiring image URLs are handled with an Oban job - There is no automatic cleanup of the Rich Media data in the database, but it is safe to delete at any time - The post draft/preview feature makes the URL processing synchronous so the rendered post preview will have an accurate rendering Overall performance of timelines and creating new posts which contain URLs is greatly improved.	2024-05-07 19:54:56 -04:00
Mark Felder	9f2319e50d	RichMedia.Helpers: move the validate_page_url/1 function to the Parser module This will ensure that the page validation happens in Parser.parse/1 so it can be called from anywhere and still filter invalid URLs.	2024-02-06 18:34:02 -05:00
Mark Felder	0cc038b67c	Ensure URLs with IP addresses for the host do not generate previews	2024-02-05 00:09:37 -05:00
Mark Felder	579561e97b	URI.authority is deprecated	2024-02-04 23:49:07 -05:00
Mark Felder	04fc4eddaa	Fix Rich Media Previews for updated activities The Rich Media Previews were not regenerated when a post was updated due to a cache invalidation issue. They are now cached by the activity id so they can be evicted with the other activity cache objects in the :scrubber_cache.	2024-02-04 23:47:04 -05:00
Lain Soykaf	00def0875b	RichMediaTest: Use mocked config	2023-12-12 13:28:11 +04:00
lain	e853cfe7c3	Revert "Merge branch 'copyright-bump' into 'develop'" This reverts merge request !3825	2023-01-02 20:38:50 +00:00
marcin mikołajczak	10886eeaa2	Bump copyright year Signed-off-by: marcin mikołajczak <git@mkljczk.pl>	2023-01-01 12:13:06 +01:00
Sean King	17aa3644be	Copyright bump for 2022	2022-02-25 23:11:42 -07:00
lain	e1e7e4d379	Object: Rework how Object.normalize works Now it defaults to not fetching, and the option is named.	2021-01-04 13:38:31 +01:00
Alexander Strizhakov	8d218ebaf5	Moving some background jobs into simple tasks - fetching activity data - attachment prefetching - using limiter to prevent overload	2020-11-11 13:39:49 +03:00
Mark Felder	ba7f9459b4	Revert Rich Media censorship for sensitive statuses The #NSFW hashtag test was broken anyway.	2020-09-28 18:22:59 -05:00
rinpatch	f70335002d	RichMedia: Do a HEAD request to check content type/length This shouldn't be too expensive, since the connections are pooled, but it should save us some bandwidth since we won't fetch non-html files and files that are too large for us to process (especially since you can't cancel a request without closing the connection with HTTP1).	2020-09-14 14:45:58 +03:00
Alexander Strizhakov	696bf09433	passing adapter options directly without adapter key	2020-09-07 19:59:17 +03:00
Alexander Strizhakov	a83916fdac	adapter options unification not needed options deletion	2020-09-07 19:59:17 +03:00
rinpatch	e198ba492e	Rich Media: Do not cache URLs for preview statuses Closes #1987	2020-09-05 20:53:46 +03:00
Alexander Strizhakov	79f65b4374	correct pool and uniform headers format	2020-09-02 09:16:51 +03:00
Mark Felder	016d8d6c56	Consolidate construction of Rich Media Parser HTTP requests	2020-08-03 12:37:31 -05:00
lain	781b270863	ChatMessageReferenceView: Display preview cards.	2020-07-30 19:57:26 +02:00
lain	5b1eeb06d8	Revert "Merge branch 'revert-2b5d9eb1' into 'develop'" This reverts merge request !2784	2020-07-21 22:18:17 +00:00
lain	696c13ce54	Revert "Merge branch 'linkify' into 'develop'" This reverts merge request !2677	2020-07-21 22:17:34 +00:00
Alex Gleason	8daacc9114	AutoLinker --> Linkify, update to latest version https://git.pleroma.social/pleroma/elixir-libraries/linkify	2020-06-30 16:39:15 -05:00
Egor Kislitsyn	520367d6fd	Fix atom leak in Rich Media Parser	2020-06-13 12:08:46 +03:00
Mark Felder	3bf78f2be7	Fix Oban not receiving :ok from RichMediaHelper job	2020-04-14 11:43:53 -05:00
Mark Felder	05da5f5cca	Update Copyrights	2020-03-03 16:44:49 -06:00
Maksim Pechnikov	5c0f646cef	fix validate_page_url	2019-06-26 06:27:17 +03:00
Maksim Pechnikov	4ad15ad2a9	add ignore hosts and TLDs for rich_media	2019-06-25 22:25:37 +03:00
Maksim Pechnikov	0276cf5a02	fix validate_url for private ip	2019-06-25 17:44:24 +03:00
rinpatch	f30a3241d2	Deps: Update auto_linker	2019-06-18 16:08:18 +03:00
Egor Kislitsyn	bf22ed5fbd	Update `auto_linker` dependency	2019-06-12 15:53:33 +07:00
William Pitcock	0da1233e8e	rich media: suppress link previews if post is marked as sensitive	2019-05-17 18:49:43 +00:00
William Pitcock	57d11ac9db	activitypub: move post rich media fetching to job queue	2019-05-13 19:36:00 +00:00
William Pitcock	c62220c500	rich media: helpers: only crawl Create activities	2019-03-23 02:28:59 +00:00
William Pitcock	b3bf523c09	rich media: use optimized Object.normalize()	2019-03-23 00:22:57 +00:00
Haelwenn (lanodan) Monnier	a3a9cec483	[Credo] fix Credo.Check.Readability.AliasOrder	2019-03-13 04:26:54 +01:00
William Pitcock	b7aa1ea9e6	rich media: helpers: rework validate_page_url()	2019-03-04 18:39:13 +00:00
William Pitcock	9f3cb38012	helpers: use AutoLinker to validate URIs as well as the other tests	2019-03-04 18:31:49 +00:00
William Pitcock	d38d537bee	rich media: don't crawl bogus URIs	2019-03-04 18:31:49 +00:00
Haelwenn (lanodan) Monnier	6a6a5b3251	de-group alias/es	2019-02-09 16:31:17 +01:00
lain	b19b4f8537	Remove default value for rich media. Setting it to true will actually override a 'false' set before.	2019-01-31 20:02:08 +01:00
rinpatch	7057891db6	Make rich media support toggleable	2019-01-31 18:18:20 +03:00
William Pitcock	ddb5545202	rich media: kill some testsuite noise	2019-01-28 20:55:33 +00:00
William Pitcock	ebeabdcc72	rich media: helpers: clean up unused aliases	2019-01-28 06:10:25 +00:00
William Pitcock	8e42251e06	rich media: add helpers module, use instead of MastodonAPI module	2019-01-28 06:04:54 +00:00

47 Commits